TY - GEN
T1 - Matrix3D: Large Photogrammetry Model All-in-One
AU - LU, Yuanxun
AU - ZHANG, Jingyang
AU - FANG, Tian
AU - NAHMIAS, Jean-Daniel
AU - TSIN, Yanghai
AU - QUAN, Long
AU - CAO, Xun
AU - YAO, Yao
AU - LI, Shiwei
PY - 2025/8/13
Y1 - 2025/8/13
N2 - We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.
AB - We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.
KW - 3d generation
KW - multi-view diffusion
KW - multi-modal generation
KW - 3d reconstruction
KW - diffusion transformer
KW - masked learning
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001601106700467
UR - https://openalex.org/works/w4413157969
U2 - 10.1109/CVPR52734.2025.01051
DO - 10.1109/CVPR52734.2025.01051
M3 - Conference Paper published in a book
SN - 9798331543655
T3 - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
BT - CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
Y2 - 10 June 2025 through 17 June 2025
ER -