FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Gang, Arpita; Bajwa, Waheed U.

doi:10.1109/TSP.2022.3229635

Computer Science > Machine Learning

arXiv:2108.12373 (cs)

[Submitted on 27 Aug 2021 (v1), last revised 15 Feb 2022 (this version, v2)]

Title:FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Authors:Arpita Gang, Waheed U. Bajwa

View PDF

Abstract:Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.

Comments:	16 pages (two-column version); substantially revised version, including expanded comparisons with other works
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP); Optimization and Control (math.OC)
Cite as:	arXiv:2108.12373 [cs.LG]
	(or arXiv:2108.12373v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.12373
Related DOI:	https://doi.org/10.1109/TSP.2022.3229635

Submission history

From: Waheed Bajwa [view email]
[v1] Fri, 27 Aug 2021 16:10:59 UTC (1,878 KB)
[v2] Tue, 15 Feb 2022 17:43:18 UTC (3,437 KB)

Computer Science > Machine Learning

Title:FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators