After a successful but text-centered period, AI, computational linguistics, and natural language engineering need to face the ecological niche of natural language use: face-to-face interaction. A particular challenge of human processing in face-to-face interaction is that it is fed by information from the various sense modalities: it is multimodal. When talking to each other, we constantly and smoothly observe and produce information on several channels, such as speech, facial expressions, hand-and-arm gestures, and head movements. Furthermore, at least some of the concepts associated with the words used in communication are grounded in perceptual information themselves. As a consequence, multimodal communication is, as a rule, characterized by different kinds of representations, which are not fully equivalent, and need to be integrated (in perception) or distributed (in production). This, however, characterizes multimodal computing in general. When driving, for instance, information from the visual scene, radio information about traffic conditions, and spatio-temporal map knowledge is integrated to maneuver the car. Hence, AI, computational linguistics, and natural language engineering that address multimodal communication in face-to-face interaction have to involve multimodal computing -- giving rise to the next grand research challenge of those and related fields.
This Research Topic contributes to the germinating renaissance of understanding languages as multimodal interaction systems which, beyond grammatical structure, make rich use of interfaces between symbolic, deictic, and depicting structures and representations. To further this understanding, the research topic issue will collect contributions from the fields of linguistics, computational linguistics, and computer science to shed light on how linguistic and non-linguistic structures form multimodal ensembles as the basis of language production and understanding. We, therefore, expect contributions from all areas of linguistics, computational linguistics, and related disciplines to shed light on this paradigm shift.
The Frontiers Research Topic invites contributions to all aspects of the interface of Multimodal Communication and Multimodal Computing. We seek for research covering the span from computational-oriented studies on the form and function of multimodal signaling and behavior to novel algorithmic approaches in computational hardware or software, from theoretical essays to industrial applications. The challenges that the topic of Multimodal Communication and Multimodal Computing has to face include:
- multimodal representations for representation learning
- multimodal active learning
- memory and multimodal integration
- 4E-cognition
- neural-to-symbolic AI
- spatial representation and reasoning
- the creation and exploitation of multimodal datasets
- the design and computational use of experimental studies on verbal and non-verbal communication
- representation formats and tools for annotating multimodal data
- visual reasoning
- diagrammatic reasoning
- applications: question--answering, multimodal distributional semantics, and others
- cognitive architectures for multimodal processing
- neural models and multimodal input
- multimodal generation and fission, or binding and disentanglement
- text2scene
After a successful but text-centered period, AI, computational linguistics, and natural language engineering need to face the ecological niche of natural language use: face-to-face interaction. A particular challenge of human processing in face-to-face interaction is that it is fed by information from the various sense modalities: it is multimodal. When talking to each other, we constantly and smoothly observe and produce information on several channels, such as speech, facial expressions, hand-and-arm gestures, and head movements. Furthermore, at least some of the concepts associated with the words used in communication are grounded in perceptual information themselves. As a consequence, multimodal communication is, as a rule, characterized by different kinds of representations, which are not fully equivalent, and need to be integrated (in perception) or distributed (in production). This, however, characterizes multimodal computing in general. When driving, for instance, information from the visual scene, radio information about traffic conditions, and spatio-temporal map knowledge is integrated to maneuver the car. Hence, AI, computational linguistics, and natural language engineering that address multimodal communication in face-to-face interaction have to involve multimodal computing -- giving rise to the next grand research challenge of those and related fields.
This Research Topic contributes to the germinating renaissance of understanding languages as multimodal interaction systems which, beyond grammatical structure, make rich use of interfaces between symbolic, deictic, and depicting structures and representations. To further this understanding, the research topic issue will collect contributions from the fields of linguistics, computational linguistics, and computer science to shed light on how linguistic and non-linguistic structures form multimodal ensembles as the basis of language production and understanding. We, therefore, expect contributions from all areas of linguistics, computational linguistics, and related disciplines to shed light on this paradigm shift.
The Frontiers Research Topic invites contributions to all aspects of the interface of Multimodal Communication and Multimodal Computing. We seek for research covering the span from computational-oriented studies on the form and function of multimodal signaling and behavior to novel algorithmic approaches in computational hardware or software, from theoretical essays to industrial applications. The challenges that the topic of Multimodal Communication and Multimodal Computing has to face include:
- multimodal representations for representation learning
- multimodal active learning
- memory and multimodal integration
- 4E-cognition
- neural-to-symbolic AI
- spatial representation and reasoning
- the creation and exploitation of multimodal datasets
- the design and computational use of experimental studies on verbal and non-verbal communication
- representation formats and tools for annotating multimodal data
- visual reasoning
- diagrammatic reasoning
- applications: question--answering, multimodal distributional semantics, and others
- cognitive architectures for multimodal processing
- neural models and multimodal input
- multimodal generation and fission, or binding and disentanglement
- text2scene