Connected substructure similarity search

Haichuan Shang*, Xuemin Lin, Ying Zhang, Jeffrey Xu Yu, Wei Wang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NP-complete. Hence, existing techniques adopt the filtering-and-verification framework with the focus on developing effective and efficient techniques to remove non-promising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many "low" quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafD-Index, to index graphs according to their "distances" to features. We characterize a tight condition under which the distance-based triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.

Original languageEnglish
Title of host publicationProceedings of the 2010 International Conference on Management of Data, SIGMOD '10
Pages903-914
Number of pages12
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: 6 Jun 201011 Jun 2010

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2010 International Conference on Management of Data, SIGMOD '10
Country/TerritoryUnited States
CityIndianapolis, IN
Period6/06/1011/06/10

Keywords

  • graph database
  • similarity search

Fingerprint

Dive into the research topics of 'Connected substructure similarity search'. Together they form a unique fingerprint.

Cite this