Abstract
Image-To-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-Aware Transformer Decoder NETwork to directly generate the tree representation of the target markup in a structure-Aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-Task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-To-end fashion. We evaluate the performance of our model on public image-To-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-The-Art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.
| Original language | English |
|---|---|
| Title of host publication | MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 5751-5760 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781450392037 |
| DOIs | |
| Publication status | Published - 10 Oct 2022 |
| Event | 30th ACM International Conference on Multimedia, MM 2022 - Lisboa, Portugal Duration: 10 Oct 2022 → 14 Oct 2022 |
Publication series
| Name | MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia |
|---|
Conference
| Conference | 30th ACM International Conference on Multimedia, MM 2022 |
|---|---|
| Country/Territory | Portugal |
| City | Lisboa |
| Period | 10/10/22 → 14/10/22 |
Bibliographical note
Publisher Copyright:© 2022 ACM.
Keywords
- image-To-markup generation
- tree decoder
- tree generation
- tree-structured attention
Fingerprint
Dive into the research topics of 'A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver