AUTHOR=Tian Wei , Gao Zhong , Tan Dayi TITLE=Single-view multi-human pose estimation by attentive cross-dimension matching JOURNAL=Frontiers in Neuroscience VOLUME=17 YEAR=2023 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1201088 DOI=10.3389/fnins.2023.1201088 ISSN=1662-453X ABSTRACT=
Vision-based human pose estimation has been widely applied in tasks such as augmented reality, action recognition and human-machine interaction. Current approaches favor the keypoint detection-based paradigm, as it eases the learning by circumventing the highly non-linear problem of direct regressing keypoint coordinates. However, in such a paradigm, each keypoint is predicted based on its small surrounding region in a Gaussian-like heatmap, resulting in a huge waste of information from the rest regions and even limiting the model optimization. In this paper, we design a new k-block multi-person pose estimation architecture with a voting mechanism on the entire heatmap to simultaneously infer the key points and their uncertainties. To further improve the keypoint estimation, this architecture leverages the SMPL 3D human body model, and iteratively mines the information of human body structure to correct the pose estimation from a single image. By experiments on the 3DPW dataset, it improves the state-of-the-art performance by about 8 mm on MPJPE metric and 5 mm on PA-MPJPE metric. Furthermore, its capability to be employed in real-time provides potential applications for multi-person pose estimation to be conducted in complex scenarios.