Learning to perceive 3D surface

  • Mengqi JI

Student thesis: Doctoral thesis

Abstract

Interactive understanding of the 3D real world has been a hot topic, yet a frontier for both academia and industry, due to the inherent high requirements for effective and efficient sensation and perception. Regardless of the emerging of multimodal sensors, the restricted sensation in terms of spatial, temporal, angular, spectral, multimodal, degraded, and dynamic information makes 3D scene perception challenging. This Ph.D. thesis focuses on the fundamental problem of 3D scene understanding, i.e., 3D surface perception, by learning from the undetermined sensation through sight and touch, including geometric reconstruction from sparse views, scene recovery behind scattering, and material identification through multimodal fusion. Firstly, exploiting the observations from sparse views, SurfaceNet, the very first end-to-end learning framework for multiview stereopsis (MVS), directly learn photo-consistency and precisely extract the geometric structure. This work inspired subsequent learning-based MVS algorithms that led and rekindle the MVS community, which includes our next version, called SurfaceNet+, which takes advantage of the sparsity of the 3D surface and markedly improves both the model completeness and the complexity for training and inference with more than 7x speedup. Moreover, sensory degradation widely exists in real-world scenarios by scattering medium, such as fog, frosted glass, biological tissue and opaque obstacles. Therefore, seeing through scattering with limited temporal resolution is intensively demanded by the 3D surface perception system. For example, precisely extracting the vascular structure is valuable for clinical diagnosis. Due to the lack of labeled data, a generic unsupervised domain adversarial network is proposed to extract vasculature for subsequent in vivo disease diagnosis. Lastly, in the process of the 3D surface perception, the fusion of multimodal sensation is demanded for comprehensive understanding of the 3D scene. For example, as a supplementary modality of the contactless visual sensor, the contact haptic sensation is crucial to analysis the surface material from different aspects. Compared with the haptic information encoding the material-invariant sub-surface statistics, the color images focus more on the material-irrelevant texture pattern. In order to adaptively fuse multimodal data, a learning framework is discussed and shows great potential.
Date of Award2019
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'