Abstract
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
| Original language | English |
|---|---|
| Publication status | Published - 2010 |
| Event | Proceedings of the National Conference on Artificial Intelligence - Duration: 1 Jan 2010 → 1 Jan 2010 |
Conference
| Conference | Proceedings of the National Conference on Artificial Intelligence |
|---|---|
| Period | 1/01/10 → 1/01/10 |