TY - JOUR
T1 - AML
T2 - Efficient approximate membership localization within a web-based join framework
AU - Li, Zhixu
AU - Sitbon, Laurianne
AU - Wang, Liwei
AU - Zhou, Xiaofang
AU - Du, Xiaoyong
PY - 2013
Y1 - 2013
N2 - In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.
AB - In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.
KW - AML
KW - Web-based join
KW - approximate membership location
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000314188900007
UR - https://openalex.org/W2102153716
UR - https://www.scopus.com/pages/publications/84871730909
U2 - 10.1109/TKDE.2011.178
DO - 10.1109/TKDE.2011.178
M3 - Journal Article
SN - 1041-4347
VL - 25
SP - 298
EP - 310
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
M1 - 5989807
ER -