Top-k keyword search over probabilistic XML data

Jianxin Li*, Chengfei Liu, Rui Zhou, Wei Wang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

60 Citations (Scopus)

Abstract

Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.

Original languageEnglish
Title of host publication2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Pages673-684
Number of pages12
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, Germany
Duration: 11 Apr 201116 Apr 2011

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Country/TerritoryGermany
CityHannover
Period11/04/1116/04/11

Fingerprint

Dive into the research topics of 'Top-k keyword search over probabilistic XML data'. Together they form a unique fingerprint.

Cite this