Skip to main navigation Skip to search Skip to main content

Joint embeddings of Chinese words, characters, and fine-grained subcharacter components

  • Jinxing YU

Student thesis: Master's thesis

Abstract

Word embeddings have attracted much attention recently given their simplicity of word representation and generalization ability for a lot of downstream tasks. Different from alphabetic writing systems such as English, Chinese characters are often composed of subcharacter components which are also semantically informative. In this thesis, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the context words, characters, and components can predict the current target word, and collected 13,253 subcharacter components to demonstrate the existing approaches of decomposing Chinese characters are not enough. Evaluation on intrinsic word similarity and word analogy tasks as well as extrinsic downstream classification tasks demonstrates the superior performance of our model.
Date of Award2017
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'