Abstract
Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two strong baselines: end-to-end and two-stage frameworks. Following that, we further design Prompt-BART: fine-tuning the pre-trained generative language models with a mixture of prefix prompts in Penguin. Extensive experiments validated the effectiveness of this design. Our benchmark and codes are available at https://github.com/nuochenpku/Penguin.
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics |
| Subtitle of host publication | EMNLP 2023 |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 11068-11081 |
| Number of pages | 14 |
| ISBN (Electronic) | 9798891760615 |
| DOIs | |
| Publication status | Published - 2023 |
| Externally published | Yes |
| Event | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Hybrid, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 |
Publication series
| Name | Findings of the Association for Computational Linguistics: EMNLP 2023 |
|---|
Conference
| Conference | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 |
|---|---|
| Country/Territory | Singapore |
| City | Hybrid |
| Period | 6/12/23 → 10/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.