Abstract
Since it is challenging for us to acquire per-pixel ground-truth scene depths in real world, it is significant for researchers to develop self-supervised depth estimation frameworks. In recent years, self-supervised monocular depth estimation has shown impressive results where networks are trained to predict depth map for a single image frame by using adjacent frames as supervision signal during training period. Meanwhile, in many applications, information of video sequences are also available at test time. Many researchers found that multi-view stereo (MVS) depth estimation based on cost volume usually works better than monocular schemes except for moving objects and low-textured surfaces. Based on these facts, we hope to combine advantages of monocular and multi-view schemes and design a new integrated depth estimation framework with better performance.In this paper, we first introduce several representative self-supervised depth estimation frameworks in recent years, including monocular and multi-view cases. Besides, to reduce the influence of observation noises (e.g., occlusion and moving objects), we introduce the concept of Bayesian uncertainty and explain how to improve the depth accuracy with uncertainty estimation. Then we will propose a multi-frame depth estimation framework where monocular depth map can be refined continuously by multi-frame sequential constraints, leveraging a Bayesian fusion layer within several iterations. Both monocular and multi-view networks can be trained with no depth supervision. Our method also enhances the interpretability when combining monocular estimation with multi-view cost volume.
| Date of Award | 2023 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Ming LIU (Supervisor) & Long QUAN (Supervisor) |
Cite this
- Standard