This thesis advances intelligent robotic manipulation by addressing key challenges in perceiving and interacting with objects in unstructured, dynamic environments. The work spans three interconnected areas: robust grasp pose detection, open-vocabulary object understanding, and physical property identification for non-rigid objects. First, we present a real-time, collision-free grasp detection pipeline for 6-DoF and 7-DoF grasping. By fusing multi-view depth data into a high-resolution Truncated Signed Distance Function volume, our system maintains accurate scene geometry. A novel volume-point network enables efficient grasp candidate evaluation, while a refinement module optimizes poses based on local geometry. This volumetric approach extends to 7-DoF grasping by incorporating reasoning about antipodal contact points, improving precision for complex object interactions. Second, we overcome limitations of traditional pose estimation methods constrained to known objects or categories. We introduce an open-vocabulary, category-level object pose and size estimation framework that leverages large-scale vision-language foundation models. This enables generalization to unseen object categories at test time using only free-form textual descriptions, enhancing flexibility for language-driven manipulation tasks. Third, we address the challenge of interacting with deformable objects by introducing the Gaussian-Informed Continuum (GIC) framework for physical property identification. This approach integrates dynamic 3D Gaussian reconstruction of deforming objects with Material Point Method continuum simulation. By aligning simulation results with observations, the framework estimates physical parameters of objects, enabling more realistic simulation and informed manipulation strategies for non-rigid materials. Extensive experiments on simulated and real benchmarks show our grasping pipeline operates in real-time and outperforms prior art in success rate; the open-vocabulary model achieves state-of-the-art accuracy on unseen categories; and GIC faithfully estimates physical properties, supporting digital-twin simulations. Collectively, these contributions equip robots with richer perceptual and physical reasoning capabilities, paving the way for more versatile, autonomous manipulation in everyday settings.
| Date of Award | 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Qifeng CHEN (Supervisor) & Michael Yu WANG (Supervisor) |
|---|
Towards Intelligent Object Manipulation: Vision-Based Grasping, Pose Estimation, and Physical Property Identification
CAI, J. (Author). 2025
Student thesis: Doctoral thesis