Abstract
Neural representations have revolutionized 3D reconstruction by enabling continuous modeling of geometry and appearance from images. However, despite remarkable progress, most existing approaches remain far from practical deployment due to three persistent limitations: lack of generalization across scenes, dependence on dense multi-view inputs, and reliance on explicit camera poses or calibrated supervision. This dissertation, titled Toward Practical Neural 3D Reconstruction: Generalization, Sparse Views, and Pose-Free Methods, systematically addresses these challenges through a sequence of unified contributions.We first present CP-NeRF, a conditionally parameterized neural radiance field that leverages cross-scene contextual priors to dynamically generate model parameters through a HyperNetwork, enabling generalization across diverse scenes without per-scene retraining. Building upon this foundation, ReTR introduces a physically grounded and transformer-based rendering framework for sparse-view reconstruction, reformulating volume rendering to accurately model light transport and achieve high-fidelity geometry under minimal view supervision. To overcome the fragmentation of real-world 3D data, MantraNet proposes a multi-modal alignment framework that unifies heterogeneous datasets through language-driven supervision, aligning visual and semantic spaces via pretrained language models and prompt learning to achieve cross-domain adaptability. Finally, LucidFusion advances the frontier of pose-free, feedforward 3D reconstruction through the introduction of the Relative Coordinate Gaussian (RCG). This framework reconstructs 3D geometry directly from unposed 2D images, bridging the gap between optimization-heavy pipelines and real-time, calibration-free 3D generation.
Together, these contributions form a coherent progression from scene-specific optimization to efficient, feedforward inference. The resulting systems not only enhance generalization and data efficiency but also pave the way for practical 3D reconstruction that can operate flexibly across unposed, sparse, and heterogeneous input settings. This work thus represents a step toward universal, deployable neural 3D reconstructionbringing robust, scalable 3D understanding closer to real-world applications in robotics, vision, and extended reality.
| Date of Award | 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Yingcong CHEN (Supervisor) & Hui XIONG (Supervisor) |
Cite this
- Standard