Abstract
The recent advancements in language models have significantly catalyzed progress in computational biology. A growing body of research strives to construct unified foundation models for single-cell biology, with language models serving as the cornerstone. In this paper, we systematically review the developments in foundation language models designed specifically for single-cell biology. Our survey offers a thorough analysis of various incarnations of single-cell foundation language models, viewed through the lens of both pre-trained language models (PLMs) and large language models (LLMs). This includes an exploration of data tokenization strategies, pre-training/tuning paradigms, and downstream single-cell data analysis tasks. Additionally, we discuss the current challenges faced by these pioneering works and speculate on future research directions. Overall, this survey provides a comprehensive overview of the existing single-cell foundation language models, paving the way for future research endeavors.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
| Editors | Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 528-549 |
| Number of pages | 22 |
| Volume | 1 |
| ISBN (Electronic) | 9798891762510 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria Duration: 27 Jul 2025 → 1 Aug 2025 |
Publication series
| Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
|---|---|
| Volume | 1 |
| ISSN (Print) | 0736-587X |
Conference
| Conference | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 |
|---|---|
| Country/Territory | Austria |
| City | Vienna |
| Period | 27/07/25 → 1/08/25 |
Bibliographical note
Publisher Copyright:© 2025 Association for Computational Linguistics.