Constrained Co-clustering for Textual Documents

Shixia Liu, Shimei Pan, Weihong Qian, Yangqiu Song, Furu Wei, Michelle X. Zhou

Research output: Contribution to conferenceConference Paper

Abstract

In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Original languageEnglish
Publication statusPublished - 2010
EventProceedings of the National Conference on Artificial Intelligence -
Duration: 1 Jan 20101 Jan 2010

Conference

ConferenceProceedings of the National Conference on Artificial Intelligence
Period1/01/101/01/10

ISBNs

['9781577354642']

Fingerprint

Dive into the research topics of 'Constrained Co-clustering for Textual Documents'. Together they form a unique fingerprint.

Cite this