On gapped set intersection size estimation

Chen Chen, Jianbin Qin, Wei Wang

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

1 Citation (Scopus)

Abstract

There exists considerable literature on estimating the cardinality of set intersection result. In this paper, we consider a generalized problem for integer sets where, given a gap parameter δ, two elements are deemed as matches if their numeric difference equals δ or is within δ. We call this problem the gapped set intersection size estimation (GSISE), and it can be used to model applications in database systems, data mining, and information retrieval. We first distinguish two subtypes of the estimation problem: the point gap estimation and range gap estimation. We propose optimized sketches to tackle the two problems efficiently and effectively with theoretical guarantees. We demonstrate the usage of our proposed techniques in mining top-K related keywords efficiently, by integrating with an inverted index. Finally, substantial experiments based on a large subset of the ClueWed09 dataset demonstrate the efficiency and effectiveness of the proposed methods.

Original languageEnglish
Title of host publicationCIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1351-1360
Number of pages10
ISBN (Electronic)9781450337946
DOIs
Publication statusPublished - 17 Oct 2015
Externally publishedYes
Event24th ACM International Conference on Information and Knowledge Management, CIKM 2015 - Melbourne, Australia
Duration: 19 Oct 201523 Oct 2015

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
Volume19-23-Oct-2015

Conference

Conference24th ACM International Conference on Information and Knowledge Management, CIKM 2015
Country/TerritoryAustralia
CityMelbourne
Period19/10/1523/10/15

Bibliographical note

Publisher Copyright:
© 2015 ACM.

Keywords

  • Indexing
  • Set intersection size estimation
  • Top-k

Fingerprint

Dive into the research topics of 'On gapped set intersection size estimation'. Together they form a unique fingerprint.

Cite this