Efficient exact edit similarity query processing with the asymmetric signature scheme

Jianbin Qin*, Wei Wang, Yifei Lu, Chuan Xiao, Xuemin Lin

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

101 Citations (Scopus)

Abstract

Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold τ. Most existing method answering edit similarity queries rely on a signature scheme to generate candidates given the query string. We observe that the number of signatures generated by existing methods is far greater than the lower bound, and this results in high query time and index space complexities. In this paper, we show that the minimum signature size lower bound is τ +1. We then propose asymmetric signature schemes that achieve this lower bound. We develop efficient query processing algorithms based on the new scheme. Several dynamic programming-based candidate pruning methods are also developed to further speed up the performance. We have conducted a comprehensive experimental study involving nine state-of-the-art algorithms. The experiment results clearly demonstrate the efficiency of our methods.

Original languageEnglish
Title of host publicationProceedings of SIGMOD 2011 and PODS 2011
PublisherAssociation for Computing Machinery
Pages1033-1044
Number of pages12
ISBN (Print)9781450306614
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: 12 Jun 201116 Jun 2011

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2011 ACM SIGMOD and 30th PODS 2011 Conference
Country/TerritoryGreece
CityAthens
Period12/06/1116/06/11

Keywords

  • Q-gram
  • approximate pattern matching
  • edit distance
  • similarity join
  • similarity search

Fingerprint

Dive into the research topics of 'Efficient exact edit similarity query processing with the asymmetric signature scheme'. Together they form a unique fingerprint.

Cite this