AUTHOR=Zandigohar Mehrshad , Han Mo , Sharif Mohammadreza , Günay Sezen Yağmur , Furmanek Mariusz P. , Yarossi Mathew , Bonato Paolo , Onal Cagdas , Padır Taşkın , Erdoğmuş Deniz , Schirner Gunar TITLE=Multimodal fusion of EMG and vision for human grasp intent inference in prosthetic hand control JOURNAL=Frontiers in Robotics and AI VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2024.1312554 DOI=10.3389/frobt.2024.1312554 ISSN=2296-9144 ABSTRACT=

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities.

Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components.

Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%.

Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.