Efficient SimRank-based similarity join over large graphs

Weiguo Zheng*, Lei Zou, Yansong Feng, Lei Chen, Dongyan Zhao

*Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

Abstract

Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "Sim- Rank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.

Original languageEnglish
Pages (from-to)493-504
Number of pages12
JournalProceedings of the VLDB Endowment
Volume6
Issue number7
DOIs
Publication statusPublished - 2013
Event39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy
Duration: 26 Aug 201330 Aug 2013

Fingerprint

Dive into the research topics of 'Efficient SimRank-based similarity join over large graphs'. Together they form a unique fingerprint.

Cite this