TY - GEN
T1 - Document transformation for multi-label feature selection in text categorization
AU - Chen, Weizhu
AU - Yan, Jun
AU - Zhang, Benyu
AU - Chen, Zheng
AU - Yang, Qiang
PY - 2007
Y1 - 2007
N2 - Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called Entropy-based Label Assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.
AB - Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called Entropy-based Label Assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.
UR - https://www.scopus.com/pages/publications/49749095082
U2 - 10.1109/ICDM.2007.18
DO - 10.1109/ICDM.2007.18
M3 - Conference Paper published in a book
AN - SCOPUS:49749095082
SN - 0769530184
SN - 9780769530185
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 451
EP - 456
BT - Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
T2 - 7th IEEE International Conference on Data Mining, ICDM 2007
Y2 - 28 October 2007 through 31 October 2007
ER -