3D object detection and tracking play a significant role for autonomous driving vehicles where the time-independent detection undertakes the fundamental perception, and continuous object tracking further enables temporal motion prediction and planning. In this thesis, we aim to push the limit of image-based 3D object estimation ability step by step by fully exploiting different levels of visual information. We start with the ego-motion tracking problem by proposing a tightly-coupled visual-inertial state estimator with loop closure ability, which can be used for the autonomous robot navigation and augmented reality. Followed by its natural extension for object estimation in autonomous driving scenarios, where we combining object-level semantic prior with our dynamic object bundle adjustment (BA) using sparse feature correspondences geometry, and obtain 3D object pose, velocity and anchored dynamic point cloud estimation with instance accuracy and temporal consistency. To complement the insufficiency of sparse feature representation in handling small or largely occluded objects, we design a Stereo R-CNN network to detect associated objects in stereo images and predict the corresponding object properties (keypoint, dimensions, etc), coarse 3D object bounding boxes are then calculated using this object-level information. We then recover the accurate 3D bounding box by refining the object disparity using a dense photometric alignment in left and right RoIs. The sub-pixel level object disparity estimation enables our method outperforms all existing fully supervised image-based methods while does not require depth input and 3D position supervision. Based on the proposed temporal object geometric modeling and dense photometric alignment, we further integrate them into an elegant 3D object tracking framework that handles simultaneous detection & association via learned correspondences, and solves continuous estimation by fully exploiting exploit dense spatial-temporal constraints in sequential stereo images. Extensive experiments on the KITTI dataset shows our approach outperforms previous image-based methods by significant margins and achieve a new state-of-the-art.
| Date of Award | 2020 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
3D object detection and tracking for autonomous driving
LI, P. (Author). 2020
Student thesis: Doctoral thesis