Skip to main navigation Skip to search Skip to main content

Eliminating Semantic Ambiguity in Human Pose Estimation via Stable Feature Upsampling

  • Shu Jiang
  • , Dong Zhang*
  • , Rui Yan
  • , Xiangbo Shu
  • , Pingcheng DONG
  • , Long Chen
  • , Xiaoyu Du
  • *Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

Human pose estimation is a challenging research task in the computer vision community due to the semantic ambiguity problem caused by inevitable occlusions, varying body shapes, and complex articulations. Although deep learning-based methods have significantly improved the performance of this task, existing feature upsampling operations, e.g., bilinear interpolation and transposed convolution, within current convolutional neural networks and Transformer frameworks suffer from a multitude of limitations, including the inability to adapt to specific tasks and the loss of fine-grained semantic details. In this work, we propose a simple yet effective two-step stable feature upsampling (SIU) strategy that addresses these limitations by leveraging a learnable and efficient upsampling operation. Specifically, we first apply periodic shuffling to increase the resolution of the feature maps. Secondly, we utilize convolution layers to adjust the size of feature channels to match those of the input feature maps. The proposed SIU enables the entire network to adapt to the specific feature requirements of the human pose estimation task, making it more effective in preserving spatial information. Quantitatively, extensive experimental results on the challenging COCO-WholeBody dataset validate that our approach outperforms state-of-the-art methods accurately and efficiently, and possesses strong transferability, making it applicable to a wide range of baselines. Moreover, the qualitative results validate that SIU can effectively eliminate the semantic ambiguity problem in challenging pose scenarios, such as occlusions and overlapping. The code and weights have been released at: SIU.

Original languageEnglish
Article number11071896
Pages (from-to)11863-11876
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number12
Early online date4 Jul 2025
DOIs
Publication statusPublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 1991-2012 IEEE.

Keywords

  • Human pose estimation
  • semantic ambiguity
  • feature upsampling
  • COCO-WholeBody

Fingerprint

Dive into the research topics of 'Eliminating Semantic Ambiguity in Human Pose Estimation via Stable Feature Upsampling'. Together they form a unique fingerprint.

Cite this