TY - GEN
T1 - Connected substructure similarity search
AU - Shang, Haichuan
AU - Lin, Xuemin
AU - Zhang, Ying
AU - Yu, Jeffrey Xu
AU - Wang, Wei
PY - 2010
Y1 - 2010
N2 - Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NP-complete. Hence, existing techniques adopt the filtering-and-verification framework with the focus on developing effective and efficient techniques to remove non-promising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many "low" quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafD-Index, to index graphs according to their "distances" to features. We characterize a tight condition under which the distance-based triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.
AB - Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NP-complete. Hence, existing techniques adopt the filtering-and-verification framework with the focus on developing effective and efficient techniques to remove non-promising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many "low" quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafD-Index, to index graphs according to their "distances" to features. We characterize a tight condition under which the distance-based triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.
KW - graph database
KW - similarity search
UR - https://www.scopus.com/pages/publications/77954710172
U2 - 10.1145/1807167.1807264
DO - 10.1145/1807167.1807264
M3 - Conference Paper published in a book
AN - SCOPUS:77954710172
SN - 9781450300322
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 903
EP - 914
BT - Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10
T2 - 2010 International Conference on Management of Data, SIGMOD '10
Y2 - 6 June 2010 through 11 June 2010
ER -