Federated topic modeling

Di Jiang, Xueyang Wu, Yuanfeng Song*, Weiwei Zhao, Qiang Yang, Yongxin Tong, Qian Xu

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

Topic modeling has been widely applied in a variety of industrial applications. Training a high-quality model usually requires massive amount of in-domain data, in order to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence training topic models in industrial scenarios using conventional approaches faces a dilemma: a party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this paper, we propose a novel framework named Federated Topic Modeling (FTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immune to privacy adversaries. FTM is inspired by federated learning and consists of novel techniques such as private Metropolis Hastings, topic-wise normalization and heterogeneous model integration. We conduct a series of quantitative evaluations to verify the effectiveness of FTM and deploy FTM in an Automatic Speech Recognition (ASR) system to demonstrate its utility in real-life applications. Experimental results verify FTM's superiority over conventional topic modeling.

Original languageEnglish
Title of host publicationCIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1071-1080
Number of pages10
ISBN (Electronic)9781450369763
DOIs
Publication statusPublished - 3 Nov 2019
Externally publishedYes
Event28th ACM International Conference on Information and Knowledge Management, CIKM 2019 - Beijing, China
Duration: 3 Nov 20197 Nov 2019

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference28th ACM International Conference on Information and Knowledge Management, CIKM 2019
Country/TerritoryChina
CityBeijing
Period3/11/197/11/19

Bibliographical note

Publisher Copyright:
© 2019 Association for Computing Machinery.

Keywords

  • Bayesian Networks
  • Text Semantics
  • Topic Model

Fingerprint

Dive into the research topics of 'Federated topic modeling'. Together they form a unique fingerprint.

Cite this