In triphone-based acoustic modeling, it is difficult to robustly model infrequent triphones due to their lack of training samples. Naive maximum-likelihood (ML) estimation of infrequent triphone models produces poor triphone models and eventually affects the overall performance of an automatic speech recognition (ASR) system. Among different techniques proposed to solve the infrequent triphone problem, the most widely used method in current ASR systems is state tying because of its effectiveness in reducing model size and achieving good recognition results. However, state tying inevitably introduces quantization errors since triphones tied to the same state are not distinguishable in that state. This thesis addresses the problem by the use of distinct acoustic modeling where every modeling unit has a unique model and a distinct acoustic score. The main contribution of this thesis is the formulation of the estimation of triphone models as an adaptation problem through our proposed distinct acoustic modeling framework named eigentriphone modeling. The rational behind eigentriphone modeling is that a basis is derived from the frequent triphones and then each triphone is modeled as a point in the space spanned by the basis. The eigenvectors in the basis represent the most important context-dependent characteristics among the triphones and thus the infrequent triphones can be robustly modeled with few training samples. Furthermore, the proposed framework is very flexible and can be applied to other modeling units. Since grapheme-based modeling is useful in automatic speech recognition of under-resourced languages, we further apply our distinct acoustic modeling framework to estimate context-dependent grapheme models and we call our new method eigentrigrapheme modeling. Experimental evaluation of eigentriphone modeling was carried out on the Wall Street Journal word recognition task and the TIMIT phoneme recognition task. Experimental evaluation of eigentrigrapheme modeling was carried out on four official South African under-resourced languages. It is shown that distinct acoustic modeling using the proposed eigentriphone framework consistently performs better than the conventional tied-state HMMs.
| Date of Award | 2014 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Distinct acoustic modeling for automatic speech recognition
Ko, Y. T. (Author). 2014
Student thesis: Doctoral thesis