Skip to main navigation Skip to search Skip to main content

Matrix3D: Large Photogrammetry Model All-in-One

  • Yuanxun LU
  • , Jingyang ZHANG
  • , Tian FANG
  • , Jean-Daniel NAHMIAS
  • , Yanghai TSIN
  • , Long QUAN
  • , Xun CAO
  • , Yao YAO*
  • , Shiwei LI
  • *Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.
Original languageEnglish
Title of host publicationCVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages14
ISBN (Electronic)9798331543648
ISBN (Print)9798331543655
DOIs
Publication statusPublished - 13 Aug 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: 10 Jun 202517 Jun 2025

Publication series

Name2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

Conference2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
Country/TerritoryUnited States
CityNashville
Period10/06/2517/06/25

Keywords

  • 3d generation
  • multi-view diffusion
  • multi-modal generation
  • 3d reconstruction
  • diffusion transformer
  • masked learning

Fingerprint

Dive into the research topics of 'Matrix3D: Large Photogrammetry Model All-in-One'. Together they form a unique fingerprint.

Cite this