Diverse topic phrase extraction through latent semantic analysis

Jilin Chen*, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

10 Citations (Scopus)

Abstract

We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.

Original languageEnglish
Title of host publicationProceedings - Sixth International Conference on Data Mining, ICDM 2006
Pages834-838
Number of pages5
DOIs
Publication statusPublished - 2006
Event6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China
Duration: 18 Dec 200622 Dec 2006

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference6th International Conference on Data Mining, ICDM 2006
Country/TerritoryChina
CityHong Kong
Period18/12/0622/12/06

Fingerprint

Dive into the research topics of 'Diverse topic phrase extraction through latent semantic analysis'. Together they form a unique fingerprint.

Cite this