TY - JOUR
T1 - Parallel spectral clustering in distributed systems
AU - Chen, Wen Yen
AU - Song, Yangqiu
AU - Bai, Hongjie
AU - Lin, Chih Jen
AU - Chang, Edward Y.
PY - 2011
Y1 - 2011
N2 - Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
AB - Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
KW - Nyström approximation
KW - Parallel spectral clustering
KW - distributed computing
KW - nearest neighbors
KW - normalized cuts
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000286204700010
UR - https://openalex.org/W2126337883
UR - https://www.scopus.com/pages/publications/79551528802
U2 - 10.1109/TPAMI.2010.88
DO - 10.1109/TPAMI.2010.88
M3 - Journal Article
C2 - 20421667
SN - 0162-8828
VL - 33
SP - 568
EP - 586
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 3
M1 - 5444877
ER -