Development of a Chinese telephony conversational corpus for speech processing

Yi Liu*, Pascale Fung, Shudong Huang, Christopher Cieri, Lufeng Zhai, Benfeng Chen

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

This paper describes the development of the EARS (Effective, Affordable, Reusable Speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DAPRA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.

Original languageEnglish
Title of host publication2004 International Symposium on Chinese Spoken Language Processing - Proceedings
Pages197-200
Number of pages4
Publication statusPublished - 2004
Event2004 International Symposium on Chinese Spoken Language Processing - Hong Kong, China, Hong Kong
Duration: 15 Dec 200418 Dec 2004

Publication series

Name2004 International Symposium on Chinese Spoken Language Processing - Proceedings

Conference

Conference2004 International Symposium on Chinese Spoken Language Processing
Country/TerritoryHong Kong
CityHong Kong, China
Period15/12/0418/12/04

Fingerprint

Dive into the research topics of 'Development of a Chinese telephony conversational corpus for speech processing'. Together they form a unique fingerprint.

Cite this