AUTHOR=Moore Roger K. , Nicolao Mauro TITLE=Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention JOURNAL=Frontiers in Robotics and AI VOLUME=4 YEAR=2017 URL=https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2017.00066 DOI=10.3389/frobt.2017.00066 ISSN=2296-9144 ABSTRACT=

The past few years have seen considerable progress in the deployment of voice-enabled personal assistants, first on smartphones (such as Apple’s Siri) and most recently as standalone devices in people’s homes (such as Amazon’s Alexa). Such ‘intelligent’ communicative agents are distinguished from the previous generation of speech-based systems in that they claim to offer access to services and information via conversational interaction (rather than simple voice commands). In reality, conversations with such agents have limited depth and, after initial enthusiasm, users typically revert to more traditional ways of getting things done. It is argued here that one source of the problem is that the standard architecture for a contemporary spoken language interface fails to capture the fundamental teleological properties of human spoken language. As a consequence, users have difficulty engaging with such systems, primarily due to a gross mismatch in intentional priors. This paper presents an alternative needs-driven cognitive architecture which models speech-based interaction as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the needs of a listener and a listener has in mind the intentions of a speaker. The implications of this architecture for future spoken language systems are illustrated using results from a new type of ‘intentional speech synthesiser’ that is capable of optimising its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success. It is concluded that such purposeful behavior is essential to the facilitation of meaningful and productive spoken language interaction between human beings and autonomous social agents (such as robots). However, it is also noted that persistent mismatched priors may ultimately impose a fundamental limit on the effectiveness of speech-based human–robot interaction.