A visual attention-based keyword extraction for document classification

Xing Wu*, Zhikang Du, Yike Guo

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

Document classification plays an important role in natural language processing. Among that, keyword extraction algorithm shows its great potential in summarizing the entire document. Attention is the process of selectively concentrating on a discrete aspect of information, while ignoring other perceivable information. A new probabilistic keyword extraction algorithm is proposed, which is inspired by the visual attention mechanism. An unsupervised neural network based pre-training method is proposed for training the semantic attention based keyword extraction algorithm, which is helpful in extracting keywords with rich contextual information from the document. A bidirectional Long short-term memory network combined with the proposed semantic keyword extraction algorithm is designed for both topic and sentiment classification tasks. Experiments on four large scale datasets show that the proposed visual attention based keyword extraction algorithm gives a better performance than the baseline methods. The semantic attention based keyword extraction method is significant in summarizing the content of a document, which is very useful for large scale document classification.

Original languageEnglish
Pages (from-to)25355-25367
Number of pages13
JournalMultimedia Tools and Applications
Volume77
Issue number19
DOIs
Publication statusPublished - 1 Oct 2018
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature.

Keywords

  • Document classification
  • Keyword extraction
  • Long short-term memory
  • Semantic context
  • Visual attention

Fingerprint

Dive into the research topics of 'A visual attention-based keyword extraction for document classification'. Together they form a unique fingerprint.

Cite this