The stability and usability of statistical topic models

Yi Yang, Shimei Pan, Jie Lu, Mercan Topkara, Yangqiu Song

Research output: Contribution to journalJournal Articlepeer-review

4 Citations (Scopus)

Abstract

Statistical topic models have become a useful and ubiquitous tool for analyzing large text corpora. One common application of statistical topic models is to support topic-centric navigation and exploration of document collections. Existing work on topic modeling focuses on the inference of model parameters so the resulting model fits the input data. Since the exact inference is intractable, statistical inference methods, such as Gibbs Sampling, are commonly used to solve the problem. However, most of the existing work ignores an important aspect that is closely related to the end user experience: topic model stability. When the model is either re-trained with the same input data or updated with new documents, the topic previously assigned to a document may change under the new model, which may result in a disruption of end users' mental maps about the relations between documents and topics, thus undermining the usability of the applications. In this article, we propose a novel user-directed non-disruptive topic model update method that balances the tradeoff between finding the model that fits the data and maintaining the stability of the model from end users' perspective. It employs a novel constrained LDA algorithm to incorporate pairwise document constraints, which are converted from user feedback about topics, to achieve topic model stability. Evaluation results demonstrate the advantages of our approach over previous methods.

Original languageEnglish
Article number14
JournalACM Transactions on Interactive Intelligent Systems
Volume6
Issue number2
DOIs
Publication statusPublished - Jul 2016
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2016 ACM.

Keywords

  • Constrained topic model
  • LDA
  • Non-disruptive topic model update
  • Stability
  • Statistical topic model
  • Text analytics
  • Usability

Fingerprint

Dive into the research topics of 'The stability and usability of statistical topic models'. Together they form a unique fingerprint.

Cite this