TY - GEN
T1 - Approximate Membership Localization (AML) for web-based join
AU - Li, Zhixu
AU - Sitbon, Laurianne
AU - Wang, Liwei
AU - Zhou, Xiaofang
AU - Du, Xiaoyong
PY - 2010
Y1 - 2010
N2 - In this paper, we propose a search-based approach to join two tables in the absence of clean join attributes. Non-structured documents from the web are used to express the correlations between a given query and a reference list. To implement this approach, a major challenge we meet is how to efficiently determine the number of times and the locations of each clean reference from the reference list that is approximately mentioned in the retrieved documents. We formalize the Approximate Membership Localization (AML) problem and propose an efficient partial pruning algorithm to solve it. A study using real-word data sets demonstrates the effectiveness of our search-based approach, and the efficiency of our AML algorithm.
AB - In this paper, we propose a search-based approach to join two tables in the absence of clean join attributes. Non-structured documents from the web are used to express the correlations between a given query and a reference list. To implement this approach, a major challenge we meet is how to efficiently determine the number of times and the locations of each clean reference from the reference list that is approximately mentioned in the retrieved documents. We formalize the Approximate Membership Localization (AML) problem and propose an efficient partial pruning algorithm to solve it. A study using real-word data sets demonstrates the effectiveness of our search-based approach, and the efficiency of our AML algorithm.
KW - AML
KW - Approximate join
KW - Approximate membership location
KW - Web-based join
UR - https://openalex.org/W2084868841
UR - https://www.scopus.com/pages/publications/78651275614
U2 - 10.1145/1871437.1871611
DO - 10.1145/1871437.1871611
M3 - Conference Paper published in a book
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1321
EP - 1324
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Y2 - 26 October 2010 through 30 October 2010
ER -