Skip to main content

ORIGINAL RESEARCH article

Front. Phys.
Sec. Social Physics
Volume 12 - 2024 | doi: 10.3389/fphy.2024.1515842

Towards Accurate Hand Mesh Estimation via Masked Image Modeling

Provisionally accepted
  • 1 Fuzhou Medical College of Nanchang University, Fuzhou, China
  • 2 Industrial Technology Research Center,Guangdong Institute of Scientific & Technical Information, Guangzhou, China

The final, formatted version of the article will be published soon.

    With an enormous number of hand images generated over time, leveraging unlabeled images for pose estimation is an emerging yet challenging topic. While some semi-supervised and self-supervised methods have emerged, they are constrained by their reliance on high-quality keypoint detection models or complicated network architectures. We propose a novel selfsupervised pre-training strategy for 3D hand mesh regression. Our approach integrates a multigranularity strategy with pseudo-keypoint alignment in a teacher-student framework, employing self-distillation and masked image modeling for comprehensive representation learning. We pair this with a robust pose estimation baseline, combining a standard Vision Transformer backbone with a Pyramidal Mesh Alignment Feedback head. Extensive experiments demonstrate HandMIM's competitive performance across diverse datasets, notably achieving an 8.00mmProcrustes Alignment Vertex-Point-Error on the challenging HO3Dv2 test set, which features severe hand occlusions, surpassing many specially optimized architectures.

    Keywords: 3D hand mesh estimation, Multi-granularity representation, Self-supervised learning, masked image modeling, vision Transformer

    Received: 23 Oct 2024; Accepted: 18 Dec 2024.

    Copyright: © 2024 Li, Wang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Huan Wang, Industrial Technology Research Center,Guangdong Institute of Scientific & Technical Information, Guangzhou, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.