HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions

Hao Ai, Zidong Cao, Yan Pei Cao, Ying Shan, Lin Wang*

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

33 Citations (Scopus)

Abstract

Depth estimation from a monocular 360° image is a burgeoning problem owing to its holistic sensing of a scene. Recently, some methods, e.g., OmniFusion, have applied the tangent projection (TP) to represent a 360° image and predicted depth values via patch-wise regressions, which are merged to get a depth map with equirectangular projection (ERP) format. However, these methods suffer from 1) non-trivial process of merging plenty of patches; 2) capturing less holistic-with-regional contextual information by directly regressing the depth value of each pixel. In this paper, we propose a novel framework, HRDFuse, that subtly combines the potential of convolutional neural networks (CNNs) and transformers by collaboratively learning the holistic contextual information from the ERP and the regional structural information from the TP. Firstly, we propose a spatial feature alignment (SFA) module that learns feature similarities between the TP and ERP to aggregate the TP features into a complete ERP feature map in a pixelwise manner. Secondly, we propose a collaborative depth distribution classification (CDDC) module that learns the holistic-with-regional histograms capturing the ERP and TP depth distributions. As such, the final depth values can be predicted as a linear combination of histogram bin centers. Lastly, we adaptively combine the depth predictions from ERP and TP to obtain the final depth map. Extensive experiments show that our method predicts more smooth and accurate depth results while achieving favorably better results than the SOTA methods.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
PublisherIEEE Computer Society
Pages13273-13282
Number of pages10
ISBN (Electronic)9798350301298
DOIs
Publication statusPublished - 2023
Event2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada
Duration: 18 Jun 202322 Jun 2023

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2023-June
ISSN (Print)1063-6919

Conference

Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
Country/TerritoryCanada
CityVancouver
Period18/06/2322/06/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Scene analysis and understanding

Fingerprint

Dive into the research topics of 'HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions'. Together they form a unique fingerprint.

Cite this