Abstract
Question-Driven Sign Language Translation (QSLT) addresses the challenge of translating sign language using pertinent questions in question-answering contexts. However, the pronounced modality complexity between question text and sign video poses a predicament: the model tends to overly depend on questions to generate translations, thereby neglecting the value of visual cues. To tackle this issue, the paper presents a Gloss-Bridged Translator (GBT), which introduces sign gloss as an intermediary conduit to establish semantic connections between questions and videos. By leveraging gloss, visual features are transformed into textual counterparts, mitigating the modality imbalance between these representations. Moreover, a cross-modal contrastive learning strategy is implemented, bolstering the global contextual relevance and local semantic alignment between questions and sign language. The proposed methodology is validated through extensive experiments on the proposed QSL dataset and other public sign language datasets. The results show the efficacy of integrating questions into sign language translation. The GBT yields remarkable improvements over prevailing SLT methods, attesting to its effectiveness and rationale. Our code and dataset is available at https://github.com/glq-1992/QSL.
| Original language | English |
|---|---|
| Pages (from-to) | 11724-11738 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 34 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 1991-2012 IEEE.
Keywords
- Modality bias
- gloss-bridged translator
- modality complexity alignment
- question-driven sign language dataset
- sign language translation
Fingerprint
Dive into the research topics of 'Overcoming Modality Bias in Question-Driven Sign Language Video Translation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver