Skip to main navigation Skip to search Skip to main content

Automated document indexing using topic hierarchies from HLTA

  • Chun Fai LEUNG

Student thesis: Master's thesis

Abstract

The Hierarchical Latent Tree Analysis (HLTA) is a recently proposed algorithm for hierarchical topic detection. It takes a collection of unlabeled and unstructured text documents as input and outputs a hierarchy of topics where each topic is a subset of documents. In this thesis, we present an automated document indexing system that automatically builds an index structure for a corpus of documents using the topic hierarchy obtained by HLTA. It also provides tools for visualizing various facts and relationships that can be extracted from the topic hierarchy. We demonstrate the usefulness of the system on three datasets: (1) a collection of research papers published at major AI conferences and journals from 2000 to 2018, (2) two collections of research outputs from researchers at the Hong Kong University of Science and Technology, and (3) a collection of Chinese web posts related to migration posted on the social media and e-commerce platform that was collected by the Internal Organization for Migration.
Date of Award2018
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'