Skip to main navigation Skip to search Skip to main content

Incorporating domain knowledge into big data : with application in smart manufacturing and transportation

  • Ziyue LI

Student thesis: Doctoral thesis

Abstract

The mission of data mining is to discover the knowledge behind the data. Three typical knowledge is trend, cluster, and change, which derive three typical data mining tasks: regression, clustering, and detection. Amounts of studies ranging from mathematical models to deep learning frameworks have been proposed. However, a pure data mining model without domain or human knowledge might provide results that derail from reality. This thesis proposes that the combination of “Data + Domain/Human Knowledge” could potentially offer a better solution. Two major frameworks have been proposed: (1) Data-to-Data knowledge collaborating framework, and (2) Human-to-Data knowledge incorporating framework, with three projects conducted. The first project is to learn the “change” in smart manufacturing, precisely to detect anomalies from a cold-start process, and a decomposition-based hybrid transfer learning framework is proposed to transfer knowledge from experienced domains to the cold-start domain. The knowledge transfer increases the anomaly detection accuracy in cold-start data by 20%. The second project is to learn the “trend” in smart transportation, precisely to predict the passenger flow in a metro system. Human knowledge about the distances and the functional similarities between stations have been formulated as graphs and incorporated into the proposed low-rank tensor completion model. The incorporated graphs improve the prediction results by more than 30%. The third project is to learn the “cluster” in smart transportation, precisely to learn the multiple clusters of origin, destination, time, and passengers from individual trajectory data. A tensor Latent Dirichlet Allocation (LDA) model is proposed with the external knowledge graphs about locations and functions of stations incorporated. The graph structure enhances the interpretability of learned clusters by more than 20%. These essays provide a comprehensive solution for analytical data models coupling with domain and human knowledge, with detailed implementation in real case studies to prove the increased model accuracy, efficiency, and interpretability.
Date of Award2021
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorFugee TSUNG (Supervisor)

Cite this

'