Skip to main navigation Skip to search Skip to main content

Interactive 3-D navigation based on image plus depth representation

  • Rui MA

Student thesis: Doctoral thesis

Abstract

3-D imaging has achieved significant development in the recent years, thanks to the fast-growing technologies in video capturing, data compression and view synthesis. One of the core task of 3-D imaging is the interactive 3-D navigation, which enables the users to interactively navigate in the 3-D scene instead of watching the content in a fixed FOV (field-of-view) determined by the media producer. Building such an interactive navigation system requires to consider a complete processing chain including 3-D scene representation, data compression and transmission, and (virtual) view synthesis. It should be noticed that, a proper 3-D scene representation is important to the entire system as it influences the following processing modules. Image plus depth representation is currently the most popular and widely used photo-realistic representation for the 3-D scene. The depth map captures a 2-D projection of the 3-D geometry of the scene. With the help of the depth information, it is much easier to reconstruct a virtual view using DIBR (depth-image-based rendering) techniques. In this thesis, we study the practical solutions for the interactive 3-D navigation based on the image plus depth representation. Firstly, we conduct our research on the acquisition and compression of depth maps, because the depth map has different characteristics compared to the natural images. On the acquisition aspect, we study the depth estimation from a stereo image pair, which is the classical stereo matching problem in computer vision. We propose a convex approach to the discrete multi-labeling problem of stereo matching by reformulating it into a quadratic programming problem. On the compression aspect, we propose a novel distortion metric for depth maps in order to replace the conventional SSE (sum-of-squared error) metric, because the depth distortion affects the quality of synthesized views in a different way compared to the image distortion. Next, we move on to the problem of interactive 3-D navigation. In order to provide sufficient navigation range for the users to explore in the 3-D scene, we use multiview images plus depth maps to capture a wider FOV. The growing amount of image and depth data captured by multiview cameras brings challenges in data storage and compression. The state-of-the-art 3-D video compression technology is able to efficiently compress the image and depth data, but at the cost of degradation in navigation flexibility. We propose to organize the multiview data as navigation segments that can be decoded/reconstructed independently from the rest of the data. Navigation flexibility can be adjusted by adjusting the number and size of the navigation segments. Based on the proposed navigation segments, we further study practical solutions to the interactive navigation problem based on 1-D and 2-D navigation segments respectively. In both cases, we consider an end-to-end navigation system including data representation, compression, transmission and view synthesis, and propose an optimization framework based on our novel rate and distortion models. We further investigate practical solving methods for the 1-D and 2-D cases respectively in order to derive the optimal navigation segments that achieve the best trade-offs between various navigation criteria like resource consumptions, viewing quality and decoding complexity.
Date of Award2017
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'