PVT: Point-Voxel Transformer for Point Cloud Learning

Zhang, Cheng; Wan, Haocheng; Shen, Xinyi; Wu, Zizhao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.06076 (cs)

[Submitted on 13 Aug 2021 (v1), last revised 25 May 2022 (this version, v4)]

Title:PVT: Point-Voxel Transformer for Point Cloud Learning

Authors:Cheng Zhang, Haocheng Wan, Xinyi Shen, Zizhao Wu

View PDF

Abstract:The recently developed pure Transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive since they waste a significant amount of time on structuring the irregular data. To solve this shortcoming, we present Sparse Window Attention (SWA) module to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, to gather fine-grained features about the global shape, we introduce relative attention (RA) module, a more robust self-attention variant for rigid transformations of objects. Equipped with the SWA and RA, we construct our neural architecture called PVT that integrates both modules into a joint framework for point cloud learning. Compared with previous Transformer-based and attention-based models, our method attains top accuracy of 94.0% on classification benchmark and 10x inference speedup on average. Extensive experiments also valid the effectiveness of PVT on part and semantic segmentation benchmarks (86.6% and 69.2% mIoU, respectively).

Comments:	29 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2108.06076 [cs.CV]
	(or arXiv:2108.06076v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.06076

Submission history

From: Cheng Zhang [view email]
[v1] Fri, 13 Aug 2021 06:07:57 UTC (836 KB)
[v2] Wed, 22 Sep 2021 05:17:40 UTC (2,203 KB)
[v3] Mon, 10 Jan 2022 13:59:37 UTC (2,650 KB)
[v4] Wed, 25 May 2022 06:34:21 UTC (2,649 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PVT: Point-Voxel Transformer for Point Cloud Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PVT: Point-Voxel Transformer for Point Cloud Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators