Abstract
Punctuation prediction is an important step in the post processing of ASR systems. Lack of punctuation text is usually difficult to read and understand. In this paper, we propose a method based on Chinese punctuation prediction by combining the Bidirectional Long Short-Term Memory (BLSTM) and the Bidirectional Encoder Representations from Transformers (BERT), which makes the use of BERT as text encoding layers for learning contextualized word representations for improving the performance of BLSTM network. Compared with the previous punctuation prediction methods based on Recurrent Neural Network (RNN), our method improves the performance of punctuation prediction with the powerful ability of capturing semantics and long-distance dependencies in Chinese unsegmented text. Our experimental results on Chinese news datasets that our BERT-BLSTM based method outperforms the baseline by up to 31.07% absolute in overall micro-F1.
| Original language | English |
|---|---|
| Title of host publication | ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728123455 |
| DOIs | |
| Publication status | Published - Dec 2019 |
| Externally published | Yes |
| Event | 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 - Chongqing, China Duration: 11 Dec 2019 → 13 Dec 2019 |
Publication series
| Name | ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019 |
|---|
Conference
| Conference | 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 |
|---|---|
| Country/Territory | China |
| City | Chongqing |
| Period | 11/12/19 → 13/12/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- neural network
- pre-trained language model
- punctuation prediction