Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks

Dongpeng Chen*, Brian Mak, Sunil Sivadas

*Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

6 Citations (Scopus)

Abstract

Multi-task learning (MTL) can be an effective way to improve the generalization performance of singly learning tasks if the tasks are related, especially when the amount of training data is small. Our previous work applied MTL to the joint training of triphone and trigrapheme acoustic models using deep neural networks (DNNs) for low-resource speech recognition. Significant recognition improvement over the performance of their DNNs trained by single-task learning (STL) was obtained. In that work, both STL-DNNs and MTL-DNNs were trained by minimizing the total frame-wise cross entropies. Since phoneme and grapheme recognition are inherently sequence classification tasks, here we study the effect of sequencediscriminative training on their joint estimation using MTLDNNs. Experimental evaluation on TIMIT phoneme recognition shows that joint sequence training outperforms frame-wise training of phone and grapheme MTL-DNNs significantly.

Original languageEnglish
Pages (from-to)1083-1087
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 14 Sept 201418 Sept 2014

Bibliographical note

Publisher Copyright:
Copyright © 2014 ISCA.

Keywords

  • Deep neural networks
  • Grapheme modeling
  • Multi-task learning
  • Phone modeling
  • Sequence training

Fingerprint

Dive into the research topics of 'Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks'. Together they form a unique fingerprint.

Cite this