Abstract
Multi-task learning (MTL) can be an effective way to improve the generalization performance of singly learning tasks if the tasks are related, especially when the amount of training data is small. Our previous work applied MTL to the joint training of triphone and trigrapheme acoustic models using deep neural networks (DNNs) for low-resource speech recognition. Significant recognition improvement over the performance of their DNNs trained by single-task learning (STL) was obtained. In that work, both STL-DNNs and MTL-DNNs were trained by minimizing the total frame-wise cross entropies. Since phoneme and grapheme recognition are inherently sequence classification tasks, here we study the effect of sequencediscriminative training on their joint estimation using MTLDNNs. Experimental evaluation on TIMIT phoneme recognition shows that joint sequence training outperforms frame-wise training of phone and grapheme MTL-DNNs significantly.
| Original language | English |
|---|---|
| Pages (from-to) | 1083-1087 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Publication status | Published - 2014 |
| Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: 14 Sept 2014 → 18 Sept 2014 |
Bibliographical note
Publisher Copyright:Copyright © 2014 ISCA.
Keywords
- Deep neural networks
- Grapheme modeling
- Multi-task learning
- Phone modeling
- Sequence training