AML: Efficient approximate membership localization within a web-based join framework

Zhixu Li*, Laurianne Sitbon, Liwei Wang, Xiaofang Zhou, Xiaoyong Du

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

11 Citations (Scopus)

Abstract

In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.

Original languageEnglish
Article number5989807
Pages (from-to)298-310
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume25
Issue number2
DOIs
Publication statusPublished - 2013
Externally publishedYes

Keywords

  • AML
  • Web-based join
  • approximate membership location

Fingerprint

Dive into the research topics of 'AML: Efficient approximate membership localization within a web-based join framework'. Together they form a unique fingerprint.

Cite this