Skip to main content

ORIGINAL RESEARCH article

Front. Comput. Sci.
Sec. Computer Vision
Volume 6 - 2024 | doi: 10.3389/fcomp.2024.1235239
This article is part of the Research Topic Computer Vision and AI in Real-world Applications: Robustness, Generalization, and Engineering View all 6 articles

Applying Learning-from-observation to household service robots: three task common-sense formulations

Provisionally accepted
Katsushi Ikeuchi Katsushi Ikeuchi *Jun Takamatsu Jun Takamatsu Kazuhiro Sasabuchi Kazuhiro Sasabuchi Naoki Wake Naoki Wake Atsushi Kanehira Atsushi Kanehira
  • Microsoft (United States), Redmond, United States

The final, formatted version of the article will be published soon.

    Utilizing a robot in a new application requires the robot to be programmed at each time. To reduce such programming efforts, we have been developing ``Learning-from-observation (LfO)'' that automatically generates robot programs by observing human demonstrations. So far, our previous research has been in the industrial domain. From now on, we want to expand the application field to the household-service domain. One of the main issues with introducing this LfO system into the domain is the cluttered environments, which makes it difficult to discern which movements of the human body parts and their relationships with environment objects are crucial for task execution when observing demonstrations. To overcome this issue, it is necessary for the system to have task common-sense shared with the human demonstrator to focus on the demonstrator's specific movements. Here, task common-sense is defined as the movements humans take almost unconsciously to streamline or optimize the execution of a series of tasks. In this paper, we extract and define three types of task common-sense (semi-conscious movements) that should be focused on when observing demonstrations of household tasks and propose representations to describe them. Specifically, the paper proposes to use labanotation to describe the whole-body movements with respect to the environment, contact-webs to describe the hand-finger movements with respect to the tool for grasping, and physical and semantic constraints to describe the movements of the hand with the tool with respect to the environment. Based on these representations, the paper formulates task models, machine-independent robot programs, that indicate what-to-do and where-to-do. In this design process, the necessary and sufficient set of task models to be prepared in the task-model library are determined on the following criteria: for grasping tasks, according to the classification of contact-webs along the purpose of the grasping, and for manipulation tasks, corresponding to possible transitions between states defined by either physical constraints and semantic constraints. The skill-agent library is also prepared to collect skill-agents corresponding to tasks. The paper explains the task encoder and task decoder to execute the task models on the robot hardware and presents how the system works through several example scenes.

    Keywords: learning-from-observation, task model, skill-agent library, grasp taxonomy, Labanotation, face contact equations, reinforcement learning

    Received: 06 Jun 2023; Accepted: 15 Jul 2024.

    Copyright: © 2024 Ikeuchi, Takamatsu, Sasabuchi, Wake and Kanehira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Katsushi Ikeuchi, Microsoft (United States), Redmond, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.