TY - GEN
T1 - Training of subspace distribution clustering hidden Markov model
AU - Mak, Brian
AU - Bocchieri, Enrico
PY - 1998
Y1 - 1998
N2 - Levinson, Juang and Sondhi (1986), and Mak, Bocchieri, and E. Barnard (see Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 1997) presented novel subspace distribution clustering hidden Markov models (SDCHMMs) which can be converted from continuous density hidden Markov models (CDHMMs) by clustering subspace Gaussians in each stream over all models. Though such model conversion is simple and runs fast, it has two drawbacks: (1) it does not take advantage of the fewer model parameters in SDCHMMs-theoretically SDCHMMs may be trained with smaller amount of data; and, (2) it involves two separate optimization steps (first training CDHMMs, then clustering subspace Gaussians) and the resulting SDCHMMs are not guaranteed to be optimal. We show how SDCHMMs may be trained directly from less speech data if we have a priori knowledge of their architecture. On the ATIS task, a speaker-independent, context-independent (CI) 20-stream SDCHMM system trained using our novel SDCHMM reestimation algorithm with only 8 minutes of speech performs as well as a CDHMM system trained using conventional CDHMM reestimation algorithm with 105 minutes of speech.
AB - Levinson, Juang and Sondhi (1986), and Mak, Bocchieri, and E. Barnard (see Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 1997) presented novel subspace distribution clustering hidden Markov models (SDCHMMs) which can be converted from continuous density hidden Markov models (CDHMMs) by clustering subspace Gaussians in each stream over all models. Though such model conversion is simple and runs fast, it has two drawbacks: (1) it does not take advantage of the fewer model parameters in SDCHMMs-theoretically SDCHMMs may be trained with smaller amount of data; and, (2) it involves two separate optimization steps (first training CDHMMs, then clustering subspace Gaussians) and the resulting SDCHMMs are not guaranteed to be optimal. We show how SDCHMMs may be trained directly from less speech data if we have a priori knowledge of their architecture. On the ATIS task, a speaker-independent, context-independent (CI) 20-stream SDCHMM system trained using our novel SDCHMM reestimation algorithm with only 8 minutes of speech performs as well as a CDHMM system trained using conventional CDHMM reestimation algorithm with 105 minutes of speech.
UR - https://openalex.org/W2479499361
UR - https://www.scopus.com/pages/publications/84892183237
U2 - 10.1109/ICASSP.1998.675354
DO - 10.1109/ICASSP.1998.675354
M3 - Conference Paper published in a book
SN - 0780344286
SN - 9780780344280
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 673
EP - 676
BT - Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
T2 - 1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Y2 - 12 May 1998 through 15 May 1998
ER -