Machine Learning Methods for High-Level Cognitive Capabilities in Robotics

Editorial

22 October 2019

Editorial: Machine Learning Methods for High-Level Cognitive Capabilities in Robotics

Tadahiro Taniguchi

Emre Ugur

Tetsuya Ogata

Takayuki Nagai

and

Yiannis Demiris

3,114 views

2 citations

Editors

Imperial College London

Impact

Original Research

26 June 2018

SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model

Tomoaki Nakamura

, 1 more and

Tadahiro Taniguchi

6,436 views

56 citations

Example of equivalence between two objects for cleaning a whiteboard: a wiper and an eraser. The robot affords to clean the white board by wiping it either with a wiper or an eraser.

Original Research

08 June 2018

Affordance Equivalences in Robotics: A Formalism

Mihai Andries

, 3 more and

Luca Maria Gambardella

7,784 views

9 citations

(Top) Samples of KL divergence between the final recognition state and the posterior probability estimated after obtaining only visual information, (Middle) samples of estimated IGm for each object based on visual information (v), and (Bottom) samples of KL divergence between the final recognition state and the posterior probability estimated after obtaining only visual information and each selected action where as, ah, h represent represent auditory information obtained by shaking an object, one by hitting an object and haptic information, respectively. Our theory of multimodal active perception suggests that the action with the highest information gain (shown in the middle) tends to lead its initial recognition state (whose KL divergence from the final recognition state is shown at the top) to a recognition state whose KL divergence from the final recognition state (shown at the bottom) is the smallest. These figures suggest the probabilistic relationships were satisfied as a whole.

Original Research

22 May 2018

Multimodal Hierarchical Dirichlet Process-Based Active Perception by a Robot

Tadahiro Taniguchi

, 1 more and

Toshiaki Takano

6,020 views

14 citations

Original Research

08 November 2018

Neural-Dynamic Based Synchronous-Optimization Scheme of Dual Redundant Robot Manipulators

Zhijun Zhang

, 1 more and

Weisen Fan

4,052 views

7 citations

Original Research

24 July 2018

Acquisition of Viewpoint Transformation and Action Mappings via Sequence to Sequence Imitative Learning by Deep Neural Networks

Ryoichi Nakajo

, 2 more and

Tetsuya Ogata

3,943 views

1 citations

Original Research

13 March 2018

Hierarchical Spatial Concept Formation Based on Multimodal Information for Human Support Robots

Yoshinobu Hagiwara

, 2 more and

Tadahiro Taniguchi

In this paper, we propose a hierarchical spatial concept formation method based on the Bayesian generative model with multimodal information e.g., vision, position and word information. Since humans have the ability to select an appropriate level of abstraction according to the situation and describe their position linguistically, e.g., “I am in my home” and “I am in front of the table,” a hierarchical structure of spatial concepts is necessary in order for human support robots to communicate smoothly with users. The proposed method enables a robot to form hierarchical spatial concepts by categorizing multimodal information using hierarchical multimodal latent Dirichlet allocation (hMLDA). Object recognition results using convolutional neural network (CNN), hierarchical k-means clustering result of self-position estimated by Monte Carlo localization (MCL), and a set of location names are used, respectively, as features in vision, position, and word information. Experiments in forming hierarchical spatial concepts and evaluating how the proposed method can predict unobserved location names and position categories are performed using a robot in the real world. Results verify that, relative to comparable baseline methods, the proposed method enables a robot to predict location names and position categories closer to predictions made by humans. As an application example of the proposed method in a home environment, a demonstration in which a human support robot moves to an instructed place based on human speech instructions is achieved based on the formed hierarchical spatial concept.

4,850 views

23 citations

Original Research

21 December 2017

Segmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes

Tomoaki Nakamura

, 4 more and

Masahide Kaneko

Humans divide perceived continuous information into segments to facilitate recognition. For example, humans can segment speech waves into recognizable morphemes. Analogously, continuous motions are segmented into recognizable unit actions. People can divide continuous information into segments without using explicit segment points. This capacity for unsupervised segmentation is also useful for robots, because it enables them to flexibly learn languages, gestures, and actions. In this paper, we propose a Gaussian process-hidden semi-Markov model (GP-HSMM) that can divide continuous time series data into segments in an unsupervised manner. Our proposed method consists of a generative model based on the hidden semi-Markov model (HSMM), the emission distributions of which are Gaussian processes (GPs). Continuous time series data is generated by connecting segments generated by the GP. Segmentation can be achieved by using forward filtering-backward sampling to estimate the model's parameters, including the lengths and classes of the segments. In an experiment using the CMU motion capture dataset, we tested GP-HSMM with motion capture data containing simple exercise motions; the results of this experiment showed that the proposed GP-HSMM was comparable with other methods. We also conducted an experiment using karate motion capture data, which is more complex than exercise motion capture data; in this experiment, the segmentation accuracy of GP-HSMM was 0.92, which outperformed other methods.

8,539 views

51 citations

An example sequence that represents the flag task. Each vertical broken line indicates the end of an episode. (Top) An instruction is given as a succession of words, which are each represented as a 1-hot vector. In the waiting and action-generation phases, zero-filled vectors are given. (Middle) Visual information is continuously given as a sequence of three-element (R, G, B) vectors. The flag colors can be changed randomly just after action generation. Because this task was numerically simulated on a computer, changes in flags were represented as instantaneous changes in values. Note that flags are sometimes not changed as in the case from the first episode to the second episode in this figure. (Bottom) Each action immediately follows an instruction.

Original Research

22 December 2017

Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions

Tatsuro Yamada

, 2 more and

Tetsuya Ogata

6,459 views

9 citations

Original Research

19 December 2017

Cross-Situational Learning with Bayesian Generative Models for Multimodal Category and Word Learning in Robots

Akira Taniguchi

, 1 more and

Angelo Cangelosi

In this paper, we propose a Bayesian generative model that can form multiple categories based on each sensory-channel and can associate words with any of the four sensory-channels (action, position, object, and color). This paper focuses on cross-situational learning using the co-occurrence between words and information of sensory-channels in complex situations rather than conventional situations of cross-situational learning. We conducted a learning scenario using a simulator and a real humanoid iCub robot. In the scenario, a human tutor provided a sentence that describes an object of visual attention and an accompanying action to the robot. The scenario was set as follows: the number of words per sensory-channel was three or four, and the number of trials for learning was 20 and 40 for the simulator and 25 and 40 for the real robot. The experimental results showed that the proposed method was able to estimate the multiple categorizations and to learn the relationships between multiple sensory-channels and words accurately. In addition, we conducted an action generation task and an action description task based on word meanings learned in the cross-situational learning scenario. The experimental results showed that the robot could successfully use the word meanings learned by using the proposed method.

7,062 views

24 citations