Large-scale 3D medical image pre-training with geometric context priors

Linshan Wu, Jiaxin Zhuang, Hao Chen*

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

The scarcity of annotations poses a significant challenge in medical image analysis, which demands extensive efforts from radiologists, especially for high-dimension 3D medical images. Large-scale pre-training has emerged as a promising label-efficient solution, owing to the utilization of large-scale data, large models, and advanced pre-training techniques. However, its development in medical images remains underexplored. The primary challenge lies in harnessing large-scale unlabeled data and learning high-level semantics without annotations. We observe that 3D medical images exhibit consistent geometric context, i.e., consistent geometric relations between different organs, which leads to a promising way for learning consistent representations. Motivated by this, we introduce a simple-yet-effective Volume Contrast (VoCo) framework to leverage geometric context priors for self-supervision. Given an input volume, we extract base crops from different regions to construct positive and negative pairs for contrastive learning. Then we predict the contextual position of a random crop by contrasting its similarity to the base crops. In this way, VoCo implicitly encodes the inherent geometric context into model representations, facilitating high-level semantic learning without annotations. To assess effectiveness, we (1) introduce PreCT-160 K, the largest medical image pre-training dataset to date, which comprises 160 K Computed Tomography (CT) volumes covering diverse anatomic structures; (2) investigate scaling laws and propose guidelines for tailoring different model sizes to various medical tasks; (3) build a comprehensive benchmark encompassing 51 medical tasks, including segmentation, classification, registration, and vision-language. Extensive experiments highlight the superiority of VoCo, showcasing promising transferability to unseen modalities and datasets. VoCo notably enhances performance on datasets with limited labeled cases and significantly expedites fine-tuning convergence.

Original languageEnglish
Pages (from-to)1-18
Number of pages18
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs
Publication statusPublished - 3 Dec 2025

Bibliographical note

Publisher Copyright:
© 1979-2012 IEEE.

Keywords

  • Foundation models
  • geometric context priors
  • medical image analysis
  • scalable learners
  • vision pre-training

Fingerprint

Dive into the research topics of 'Large-scale 3D medical image pre-training with geometric context priors'. Together they form a unique fingerprint.

Cite this