Ranking distributed probabilistic data

Feifei Li*, Ke Yi, Jeffrey Jestes

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

53 Citations (Scopus)

Abstract

Ranking queries are essential tools to process large amounts of probabilistic data that encode exponentially many possible deterministic instances. In many applications where uncertainty and fuzzy information arise, data are collected from multiple sources in distributed, networked locations, e.g., distributed sensor fields with imprecise measurements, multiple scientific institutes with inconsistency in their scientific data. Due to the network delay and the economic cost associated with communicating large amounts of data over a network, a fundamental problem in these scenarios is to retrieve the global top-k tuples from all distributed sites with minimum communication cost. Using the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of ranking, this work designs both communication- and computation-efficient algorithms for retrieving the top-k tuples with the smallest ranks from distributed sites. Extensive experiments using both synthetic and real data sets confirm the efficiency and superiority of our algorithms over the straightforward approach of forwarding all data to the server.

Original languageEnglish
Title of host publicationSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Pages361-373
Number of pages13
DOIs
Publication statusPublished - 2009
EventInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 - Providence, RI, United States
Duration: 29 Jun 20092 Jul 2009

Publication series

NameSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

Conference

ConferenceInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Country/TerritoryUnited States
CityProvidence, RI
Period29/06/092/07/09

Keywords

  • Distributed query processing
  • Probabilistic data
  • Ranking queries
  • Top-k
  • Uncertain databases

Fingerprint

Dive into the research topics of 'Ranking distributed probabilistic data'. Together they form a unique fingerprint.

Cite this