Document transformation for multi-label feature selection in text categorization

Weizhu Chen*, Jun Yan, Benyu Zhang, Zheng Chen, Qiang Yang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

138 Citations (Scopus)

Abstract

Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called Entropy-based Label Assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.

Original languageEnglish
Title of host publicationProceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
Pages451-456
Number of pages6
DOIs
Publication statusPublished - 2007
Event7th IEEE International Conference on Data Mining, ICDM 2007 - Omaha, NE, United States
Duration: 28 Oct 200731 Oct 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference7th IEEE International Conference on Data Mining, ICDM 2007
Country/TerritoryUnited States
CityOmaha, NE
Period28/10/0731/10/07

Fingerprint

Dive into the research topics of 'Document transformation for multi-label feature selection in text categorization'. Together they form a unique fingerprint.

Cite this