Event Abstract

Imitating Operations On Internal Cognitive Structures for Language Aquisition

  • 1 Institut National de Recherche en Informatique et Automatique, France

I . I NT RODUC T I ON
A robot that is to operate in human populated environments, such as a home or an office, must be able to take the social
context into account and take simple directions. In addition to all other difficulties that must be overcome before a robot is able
to function around humans, it must be able to learn how it should act as a response to social and linguistic cues. In order to
learn what response is appropriate a robot must observe a human1 . In the case of this paper, the imitator observes the actions of
a human which is assumed to act appropriately. Specifically, the paper examines the problem of learning socio-linguistic skills
through imitation when those skills involve both observable motor patterns and internal unobservable cognitive operations (in
the form of an object focus). This approach is framed in a research program trying to investigate novel links between context-
dependent motor learning by imitation and language acquisition. More precisely, the paper presents an algorithm allowing
a robot to learn how to respond to communicative/linguistic actions of one human, called an interactant, by observing how
another human, called a demonstrator, responds. As a response to 2 continuous communicative signs of the interactant, the
demonstrator focuses on one out of three objects, and then performs a movement in relation to the object focused on. The
response of the demonstrator, which depends on the context, including the signs produced by the interactant, is assumed to be
appropriate and the robotic imitator uses these observations to build a general policy of how to respond to interactant actions.
In this paper the communicative actions of the interactant are based on hand signs. The robot has to learn several things at the
same time: 1) whether it is the first sign or the second sign that requests an internal cognitive operation, and the same for the
request of a movement type. 2) How many hand signs there are and how to recognize them. 3) How many movement types
there are and how to reproduce them in different contexts. 4) How to assign specific signs to specific internal operations and
specific movements. An algorithm is proposed and an experiment is presented where the unseen “focus on object” operation
and the hand movements are successfully imitated, including in situations not observed during the demonstrations.
A. Related work
The are two related lines of work, imitation learning and linguistics. These fields are traditionally studied separately but the
present paper argues that there are fruitful ways to combine them (by proposing a single, combined, action-language system).
Imitation learning examines the problem of learning sensorimotor tasks from demonstrations. Most imitation learning research
consider a single task setup without any linguistic component. However, both [1] and [2] considers multiple tasks and [2] also
deals with an interactant making an unknown number of communicative hand signs. It is however very unusual that these issues
are dealt with in the field of imitation learning, and how to solve the related problems is largely an open research question.
Previous work [5] examined Incremental Local Online GMR (ILO-GMR) that can learn an open ended number of tasks from
unlabeled demonstrations (the position of an object was used to determine what task should be performed). The task solved
in the presented experiment is close to the tasks solved in [3] and [4]. The main difference is that the system does not receive
symbolic input and indeed does not even know how many different types of hand signs it has encountered. The architecture
used [4] is based on two separate systems for speech and action while the architecture presented in [3] is based on a single
combined system (like the system proposed in this paper). Unlike the presented approach, both uses artificial neural networks.
Linguistics research have resulted in models of the evolution of language by using the setup of language games [6] and within
developmental robotics it has been suggested that language and action should be studied together [7]. Language is however still
seen as a separate system and the research problem is framed as finding the link between a language and a sensorimotor system,
or to find out how the two systems co-develop. The proposed algorithm does not include a separate language system but is
instead an imitation learning system whose context has been extended to include communicative hand signs of an interactant.
B. Algorithm
There are always three objects in the context, and their position is randomly set at the start of each demonstration/reproduction
and remain static during the rest of the demonstration/reproduction. The hand trajectories of the demonstrator and the hand signs
of the interactant are captured using a mouse (this was done for simplicity, a Kinect device could have been used and would not lead to lower quality trajectories). Each of the interactants hand signs are transformed into a point in a three dimensional
space (the imitator has only access to this vector). In each demonstration the demonstrator performs the requested object focus
and then performs an instance of the requested movement (for example moving its hand in a circle around whatever object
is focused on). A similarity measure comparing the demonstrator hand trajectories is defined and then an iterative grouping
algorithm uses this similarity to find out how many types of movements have been demonstrated and what trajectories are
instances of the same movement. Groups of trajectories are formed where the members of each group have high similarity to
the other members of that group. The imitator then infers which internal cognitive “focus on object” operation was performed
in each of the demonstrations. Knowing what internal operation was performed and what trajectories are instances of the
same movement, the imitator infers the number of words and the syntax of the sign language (in this experiment the first
hand sign requests a specific internal cognitive operation of the type “focus on object” and the second requests a specific
type of hand movement). During reproduction the interactant performs two hand signs and the imitator infers what internal
cognitive operation is requested and what movement type is requested. The imitator than performs the requested object focus
and retrieves the trajectories of the requested movement. Since the imitator has access to only relevant data in the correct low
dimensional space (the hand trajectory relative to the object focused on), a simple algorithm for reproduction can be used. For
simplicity a simulated robot was used that was able to move its hand in any direction (assuming an inverse kinematics model).
C. Results
In fig 1 we can see 36 reproduction attempts as a response to 9 different combinations of communicative hand signs. The three combinations that was not observed during reproduction was performed as well as the 6 combinations that were observed, showing the ability to generalize to new situations.

Conclusion


We have shown that it is possible to simultaneously learn never before encountered communicative signs and never before encountered movements, without using labeled data, and at the same time learn new compositional associations between movements and signs (and without introducing separate language and action systems). We have also shown that the actions learnt can include unseen internal operations (focus on object) of a demonstrator under a set of conditions. One condition was that the unseen operation is performed as a predictable response to a part of the context visible to the imitator. Another condition was that the operation resulted in a state that had a consistent influence on a policy of the demonstrator which determined actions that were observable by the imitator. We have further shown how imitating these internal operations resulted in a policy that is able to generalize correctly and results in successful reproductions in situations where there are no demonstrations.

Acknowledgements

This research was partially funded by ERC Grant EXPLORERS 240007

References

[1] Calinon, S. and D’halluin, F. and Sauser, E. L. and Caldwell, D. G. and Billard, A. G., Learning and reproduction of gestures by imitation: An approach
based on Hidden Markov Model and Gaussian Mixture Regression, Robotics and Automation Magazine, vol. 17, 2010, pp 44-54.
[2] Yasser Mohammad and Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Proceedings
of IEEE/RSJ International Conference on Intelligent Robots and Systems 2010.
[3] Tuci E., Ferrauto T., Zeschel A., Massera G., Nolfi S. (in press). An Experiment on behaviour generalisation and the emergence of linguistic compositionality
in evolving robots, IEEE Transactions on Autonomous Mental Development 2011
[4] Ito, M., Noda, K., Hoshino, Y., and Tani, J, Dynamic and interactive generation of object handling behaviors by a small humanoid robot using a dynamic
neural network model, Neural Networks, 19 (3), 2006, pp 323-337.
[5] Cederborg, T., Ming, L., Baranes, A., Oudeyer, P-Y: Incremental Local Online Gaussian Mixture Regression for Imitation Learning of Multiple Tasks,
Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.
[6] Steels L., Experiments on the emergence of human communication, Trends in Cognitive Sciences, 10(8), pp. 347-349, 2006.

Keywords: human-robot interaction, Imitation learning, language learning

Conference: IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.

Presentation Type: Poster Presentation

Topic: Social development

Citation: Cederborg T and Oudeyer P (2011). Imitating Operations On Internal Cognitive Structures for Language Aquisition. Front. Comput. Neurosci. Conference Abstract: IEEE ICDL-EPIROB 2011. doi: 10.3389/conf.fncom.2011.52.00013

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 20 Jun 2011; Published Online: 12 Jul 2011.

* Correspondence: Mr. Thomas Cederborg, Institut National de Recherche en Informatique et Automatique, Bordeaux, France, thomas.cederborg@inria.fr