Abstract
Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
| Editors | Yoav Goldberg, Zornitsa Kozareva, Yue Zhang |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 6755-6764 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781959429401 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → 11 Dec 2022 |
Publication series
| Name | Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
|---|
Conference
| Conference | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Hybrid, Abu Dhabi |
| Period | 7/12/22 → 11/12/22 |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.