Endowing machines with the abilities to represent and understand the physical world they live is a longstanding challenge in the AI research community. Last years have seen significant advancements in the fields of Natural Language Processing (NLP) and Computer Vision (CV) as well as in the development of robotic hardware and accompanying algorithms. Even though these fields are among the most actively developing AI research areas, until recently, they have been treated separately without many ways to benefit from each other. On the contrary, it is fundamental to integrate verbal and no verbal communication to take into account the multimodal nature of communication. It is now, with expansion of deep learning approaches, researchers have started exploring the possibilities of jointly applying both NLP and CV approaches to improve robotic capabilities.
This Research Topic is intended to provide an overview of the research being carried out in both the areas of NLP and CV to allow robots to learn and improve their capabilities for exploring, modeling, and learning about the physical world. As this integration requires an interdisciplinary attitude, the Research Topic aims to gather researchers with broad expertise in various fields — machine learning, computer vision, natural language, neuroscience, and psychology — to discuss their cutting edge work as well as perspectives on future directions in this exciting space of language, vision and interactions in robots.
The interests of this topic are focused (but not limited to) to address the following problems:
i) how to jointly represent verbal and visual information into a robotic system;
ii) how to learn and progressively improve communicative and multimodal skills, interactively or autonomously;
iii) how to answer questions also integrating visual stimuli;
iv) how to detect sentiments and emotions both using language, gestures, poses, movements and facial expressions;
v) how to efficiently perform on-robot NLP and CV without losing the quality of models run on servers;
vi) how to perform AI system hardware/software co-design in robots;
vii) how to enable cooperation mechanisms among robots to integrate complementary multimodal skills;
viii) how to evaluate the quality of the human-robot interactions.
Original contributions addressing these issues are sought, covering the whole range of theoretical and practical aspects, technologies and systems.
Endowing machines with the abilities to represent and understand the physical world they live is a longstanding challenge in the AI research community. Last years have seen significant advancements in the fields of Natural Language Processing (NLP) and Computer Vision (CV) as well as in the development of robotic hardware and accompanying algorithms. Even though these fields are among the most actively developing AI research areas, until recently, they have been treated separately without many ways to benefit from each other. On the contrary, it is fundamental to integrate verbal and no verbal communication to take into account the multimodal nature of communication. It is now, with expansion of deep learning approaches, researchers have started exploring the possibilities of jointly applying both NLP and CV approaches to improve robotic capabilities.
This Research Topic is intended to provide an overview of the research being carried out in both the areas of NLP and CV to allow robots to learn and improve their capabilities for exploring, modeling, and learning about the physical world. As this integration requires an interdisciplinary attitude, the Research Topic aims to gather researchers with broad expertise in various fields — machine learning, computer vision, natural language, neuroscience, and psychology — to discuss their cutting edge work as well as perspectives on future directions in this exciting space of language, vision and interactions in robots.
The interests of this topic are focused (but not limited to) to address the following problems:
i) how to jointly represent verbal and visual information into a robotic system;
ii) how to learn and progressively improve communicative and multimodal skills, interactively or autonomously;
iii) how to answer questions also integrating visual stimuli;
iv) how to detect sentiments and emotions both using language, gestures, poses, movements and facial expressions;
v) how to efficiently perform on-robot NLP and CV without losing the quality of models run on servers;
vi) how to perform AI system hardware/software co-design in robots;
vii) how to enable cooperation mechanisms among robots to integrate complementary multimodal skills;
viii) how to evaluate the quality of the human-robot interactions.
Original contributions addressing these issues are sought, covering the whole range of theoretical and practical aspects, technologies and systems.