TY - GEN
T1 - Development of a Chinese telephony conversational corpus for speech processing
AU - Liu, Yi
AU - Fung, Pascale
AU - Huang, Shudong
AU - Cieri, Christopher
AU - Zhai, Lufeng
AU - Chen, Benfeng
PY - 2004
Y1 - 2004
N2 - This paper describes the development of the EARS (Effective, Affordable, Reusable Speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DAPRA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.
AB - This paper describes the development of the EARS (Effective, Affordable, Reusable Speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DAPRA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.
UR - https://www.scopus.com/pages/publications/21444441531
M3 - Conference Paper published in a book
AN - SCOPUS:21444441531
SN - 0780386787
T3 - 2004 International Symposium on Chinese Spoken Language Processing - Proceedings
SP - 197
EP - 200
BT - 2004 International Symposium on Chinese Spoken Language Processing - Proceedings
T2 - 2004 International Symposium on Chinese Spoken Language Processing
Y2 - 15 December 2004 through 18 December 2004
ER -