Abstract
2D biomedical foundation models (FM) have demonstrated remarkable capabilities in 2D medical image segmentation across various modalities, with text-prompted approaches offering scalable analysis that facilitate integration with LLMs and clinical application. Adapting these models for 3D medical image segmentation can leverage their rich visual features while enabling text-prompted volumetric image segmentation. However, efficient adaptation poses significant challenges due to the substantial disparity between 2D and 3D medical images and the necessity to establish text-volume alignment. To address these limitations, we propose Bio2Vol, a novel adaptation framework that enables text-prompted 2D biomedical FMs to effectively handle volumetric data. Specifically, (1) To bridge the dimensional disparity, we propose a Dual-Rate Sampling strategy (DRS) that processes inter slices within a volume at both sparse and dense intervals, capturing global contexts and local details; (2) To enhance volumetric feature representation, a Cross-slice Dual-head Attention (CSDHA) is built upon the intra-slice features by repurposing existing pre-trained attention modules for parameter-efficient inter-slice information fusion; and (3) To establish text-volume understanding, a Semantic Text-Visual Alignment loss (SAT) is used to extend the existing 2D text-visual alignment to the volumetric domain. Using BiomedParse as a demonstration case, extensive evaluation across 11 medical datasets across diverse anatomical regions and modalities shows that Bio2Vol significantly improves 3D medical image segmentation performance, enhancing DSC by 4.72% on Amos22 dataset with substantial improvements across MSD tasks. Code will be available https://github.com/JiaxinZhuang/Bio2Vol.
| Original language | English |
|---|---|
| Title of host publication | Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, Proceedings |
| Editors | James C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 24-34 |
| Number of pages | 11 |
| ISBN (Print) | 9783032049773 |
| DOIs | |
| Publication status | Published - 2026 |
| Event | 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of Duration: 23 Sept 2025 → 27 Sept 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15965 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Daejeon |
| Period | 23/09/25 → 27/09/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Keywords
- 3D Medical Images
- Adaptation
- Foundation Model
Fingerprint
Dive into the research topics of 'Bio2Vol: Adapting 2D Biomedical Foundation Models for Volumetric Medical Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver