TY - GEN
T1 - Ranking distributed probabilistic data
AU - Li, Feifei
AU - Yi, Ke
AU - Jestes, Jeffrey
PY - 2009
Y1 - 2009
N2 - Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where uncertainty and fuzzy information arise, data are collected from multiple sources in distributed, networked locations, e.g., distributed sensor fields with imprecise measurements, multiple scientific institutes with inconsistency in their scientific data. Due to the network delay and the economic cost associated with communicating large amounts of data over a network, a fundamental problem in these scenarios is to retrieve the global top-k tuples from all distributed sites with minimum communication cost. Using the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of ranking, this work designs both communication- and computation-efficient algorithms for retrieving the top-k tuples with the smallest ranks from distributed sites. Extensive experiments using both synthetic and real data sets confirm the efficiency and superiority of our algorithms over the straightforward approach of forwarding all data to the server.
AB - Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where uncertainty and fuzzy information arise, data are collected from multiple sources in distributed, networked locations, e.g., distributed sensor fields with imprecise measurements, multiple scientific institutes with inconsistency in their scientific data. Due to the network delay and the economic cost associated with communicating large amounts of data over a network, a fundamental problem in these scenarios is to retrieve the global top-k tuples from all distributed sites with minimum communication cost. Using the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of ranking, this work designs both communication- and computation-efficient algorithms for retrieving the top-k tuples with the smallest ranks from distributed sites. Extensive experiments using both synthetic and real data sets confirm the efficiency and superiority of our algorithms over the straightforward approach of forwarding all data to the server.
KW - Distributed query processing
KW - Probabilistic data
KW - Ranking queries
KW - Top-k
KW - Uncertain databases
UR - https://openalex.org/W2093584660
UR - https://www.scopus.com/pages/publications/70849110773
U2 - 10.1145/1559845.1559885
DO - 10.1145/1559845.1559885
M3 - Conference Paper published in a book
SN - 9781605585543
T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
SP - 361
EP - 373
BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Y2 - 29 June 2009 through 2 July 2009
ER -