Abstract
Polyphone disambiguation is the core of grapheme-to-phoneme(G2P) module for the Chinese speech synthesis system. However, there is a lack of datasets and only one public for polyphone disambiguation. Moreover, due to the double long-tail distribution of polyphones, the ratio of pronunciation data for most polyphones is extremely unbalanced after sampling. To solve these problems, we propose a new dataset with 57,000 sentences from various domains by a new strategy for sampling. In addition, we propose the G2PL, which integrates word features into the bottom of BERT to assist in predicting the correct pronunciation of polyphone. In the experiment, we train the G2PL model to outperform other methods on our and public datasets. Our dataset, codes and user-friendly package are freely available.
| Original language | English |
|---|---|
| Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728163277 |
| DOIs | |
| Publication status | Published - 2023 |
| Externally published | Yes |
| Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| Volume | 2023-June |
| ISSN (Print) | 1520-6149 |
Conference
| Conference | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 |
|---|---|
| Country/Territory | Greece |
| City | Rhodes Island |
| Period | 4/06/23 → 10/06/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- BERT
- Grapheme to phonemes
- Polyphone disambiguation
- Speech synthesis
- Word features