Cleaning uncertain data with a noisy crowd

Chen Jason Zhang, Lei Chen, Yongxin Tong, Zheng Liu

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

Uncertain data has been emerged as an important problem in database systems due to the imprecise nature of many applications. To handle the uncertainty, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with confidence. However, the uncertainty may propagate, hence the returned results from a query or mining process may not be useful. In this paper, we leverage the power of crowdsourcing for cleaning uncertain data. Specifically, we will design a set of Human Intelligence Tasks (HIT)s to ask a crowd to improve the quality of uncertain data. Each HIT is associated with a cost, thus, we need to design solutions to maximize the data quality with minimal number of HITs. There are two obstacles for this non-trivial optimization - first, the crowd has a probability to return incorrect answers; second, the HITs decomposed from uncertain data are often correlated. These two obstacles lead to very high computational cost for selecting the optimal set of HITs. Thus, in this paper, we have addressed these challenges by designing an effective approximation algorithm and an efficient heuristic solution. To further improve the efficiency, we derive tight lower and upper bounds, which are used for effective filtering and estimation. We have verified the solutions with extensive experiments on both a simulated crowd and a real crowdsourcing platform.

Original languageEnglish
Title of host publication2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PublisherIEEE Computer Society
Pages6-17
Number of pages12
ISBN (Electronic)9781479979639
DOIs
Publication statusPublished - 26 May 2015
Event2015 31st IEEE International Conference on Data Engineering, ICDE 2015 - Seoul, Korea, Republic of
Duration: 13 Apr 201517 Apr 2015

Publication series

NameProceedings - International Conference on Data Engineering
Volume2015-May
ISSN (Print)1084-4627

Conference

Conference2015 31st IEEE International Conference on Data Engineering, ICDE 2015
Country/TerritoryKorea, Republic of
CitySeoul
Period13/04/1517/04/15

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Fingerprint

Dive into the research topics of 'Cleaning uncertain data with a noisy crowd'. Together they form a unique fingerprint.

Cite this