PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, Yu Li*

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

3 Citations (Scopus)

Abstract

Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines.These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications.Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks.This study introduces PRESTO (Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations.It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding.Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks.The code can be found at https://github.com/IDEA-XL/PRESTO.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages10197-10224
Number of pages28
ISBN (Electronic)9798891761681
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Cite this