eXtended Artificial Intelligence: New Prospects of Human-AI Interaction Research

Wienrich, Carolin; Latoschik, Marc Erich

doi:10.3389/frvir.2021.686783

ORIGINAL RESEARCH article

Front. Virtual Real. , 06 September 2021

Sec. Virtual Reality and Human Behaviour

Volume 2 - 2021 | https://doi.org/10.3389/frvir.2021.686783

eXtended Artificial Intelligence: New Prospects of Human-AI Interaction Research

Carolin Wienrich¹*^†

Marc Erich Latoschik²^†

¹Human-Technology-Systems Group, University of Würzburg, Würzburg, Germany
²Human-Computer Interaction Group, University of Würzburg, Würzburg, Germany

Artificial Intelligence (AI) covers a broad spectrum of computational problems and use cases. Many of those implicate profound and sometimes intricate questions of how humans interact or should interact with AIs. Moreover, many users or future users do have abstract ideas of what AI is, significantly depending on the specific embodiment of AI applications. Human-centered-design approaches would suggest evaluating the impact of different embodiments on human perception of and interaction with AI. An approach that is difficult to realize due to the sheer complexity of application fields and embodiments in reality. However, here XR opens new possibilities to research human-AI interactions. The article’s contribution is twofold: First, it provides a theoretical treatment and model of human-AI interaction based on an XR-AI continuum as a framework for and a perspective of different approaches of XR-AI combinations. It motivates XR-AI combinations as a method to learn about the effects of prospective human-AI interfaces and shows why the combination of XR and AI fruitfully contributes to a valid and systematic investigation of human-AI interactions and interfaces. Second, the article provides two exemplary experiments investigating the aforementioned approach for two distinct AI-systems. The first experiment reveals an interesting gender effect in human-robot interaction, while the second experiment reveals an Eliza effect of a recommender system. Here the article introduces two paradigmatic implementations of the proposed XR testbed for human-AI interactions and interfaces and shows how a valid and systematic investigation can be conducted. In sum, the article opens new perspectives on how XR benefits human-centered AI design and development.

Introduction

Artificial Intelligence (AI) today covers a broad spectrum of application use cases and the associated computational problems. Many of those implicate profound and sometimes intricate questions of how humans interact or should interact with AIs. The continuous proliferation of AIs and AI-based solutions into more and more areas of our work and private lives also significantly extends the potential range of users in direct contact with these AIs.

There is an open and ongoing debate on the necessity of required media competencies or, even more, on required computer science competencies for users of computer systems. This digital literacy (competencies needed to use computational devices (Bawden and others, 2008) and computational literacy (the ability to use code to express, explore, and communicate ideas (DiSessa, 2001)), lately has been extended to also include AI literacy to denote competencies that enable individuals to critically evaluate AI technologies; communicate and collaborate effectively with AI; and use AI as a tool online, at home, and in the workplace (Long and Magerko, 2020). This debate roots deep into the progress of the digital revolution for some decades now. AI brings in an exciting flavor to this debate since it risks significantly amplifies the digital divide for certain (groups of) individuals just by the implicit connotation of the term. For one, AI’s implicit claim to replicate human intelligence can be attributed to the term “Artificial Intelligence” itself, as John McCarthy proposed at the famous Dartmouth conference in 1956. Some researchers still consider the term ill-posed to begin with, and the history of AI records alternatives with less implicate associations, see contemporary textbooks on AI, e.g., by Russel and Norvig (Russell and Norvig, 2020).

Nevertheless, now AI-applications are omnipresent, the term AI is commonly used for the field, and the term certainly implies far-reaching connotations for many users not experts in the specific field of AI or in computer science in general. Additionally, the reception and presentation of AI by mainstream media, e.g., in movies and other works of fiction, has undoubtedly contributed to shaping a very characteristic AI profile (Kelley et al., 2019; Zhang and Dafoe, 2019). This public understanding often sketches at least a skewed image of the principles and the potentials and risks of AI (see Figure 1). As Clarke has pointed out in his often quoted third law, “Any sufficiently advanced technology is indistinguishable from magic” (Clarke, 1962). Suppose the latter could already be observed for such simple computing applications like a spread-sheet, or more complex examples of Information Technology (IT) like “the Internet”. In that case, it seems even more likely to be true for assistive devices that listen to our voices and speak to us in our native tongue, robots that operate in the same physical realm with us and which—for a naïve observer—seem to be alive, or self-driving cars, or many more incarnations of modern AI systems. From a Human-Computer Interaction (HCI) perspective, it is of utmost importance to understand and investigate if, and if, how the human user perceives the AI she is interacting with. Besides, most AI-systems will incorporate a human-computer interface. With interface we here denote the space where the interaction between human and machine takes place, including all hardware and software components and the underlying interaction concepts and styles.

FIGURE 1

FIGURE 1. Examples of movie presentations of AI embodiments. From upper left to lower right: Dallas (Tom Skerrit) talking to mother, the ship’s AI (all around him in the background) via a terminal in Alien (Scott, 1979); The famous red eye of H.A.L. 9,000, the AI in 2001: A Space Odessey (Kubrick, 1968), which later follows its own agenda; Philosophical debate between Doolittle (Brian Narelle) and reeled-out bomb 20, a star killing device, why not to detonate on potentially false evidence in Dark Star (Carpenter, 1974); The threatening T 800 stepping out of the fire to hunt down its human prey in the dystopian fiction The Terminator (Cameron, 1984); Frank talking to his domestic robot who later becomes his partner in crime in Robot & Frank (Schreier, 2012); Two replicants, bio-engineered synthethic humans, in Blade Runner (Scott, 1982), Rachel (Sean Young), who does not know about her real form of existence and operating/lifetime expecation, and Roy Batty (Ruttger Hauer), philosphing about the essence of life before he dies at the end of his operating/lifetime expecation. Screenshots made by the authors.

The appearance of an AI at the interface can range from simple in general pervasive effects like the execution of a requested operation, such as switching on a lamp in a smart home appliance, to simple text displays, to humanoid and human looking robots or virtual agents trying to mimic real persons with communicative behaviors typically associated with real humans. Often, AIs will appear to the user at the interface with some sort of embodiment, such as a specific device like a smart speaker, as a self-driving car, or as a humanoid robot or a virtual agent appearing in Virtual, Augmented, or Mixed Reality (VR, AR, MR: XR, for short). The latter is specifically interesting since the embodied aspect of the interface is mapped to the virtual world in contrast to the real physical environment. Embodiment itself has interesting effects on the user such as the well-known uncanny valley effect (Mori et al., 2012) or the Proteus effect (Yee and Bailenson, 2007). The latter describes a change of behavior caused by a modified and perceived self-representation, such as the appearance of a self-avatar. First results indicate that this effect also exists regarding the perception of others (Latoschik et al., 2017) which can be hypothesized to also apply to the human-AI interface.

The media equation postulates that the sheer interaction itself already contributes significantly to the perception and, more specifically, an anthropomorphization of technical systems by users (Reeves and Nass, 1996a). In combination with an already potentially shallow or even skewed understanding of AI by some users, i.e., a sketchy AI literacy, or an unawareness of interacting with an AI by others, and the huge design space of potential human-AI interfaces, it seems obvious that we need a firm understanding of the potential effects the developers’ choice of the appearance of human-AI interfaces has on the users. Human-centered-design approaches would suggest evaluating the impact of different embodiments on human perception of and interaction with AIs and identifying the effects these manipulations would have on users. This approach, which is central to the principles in HCI, however, is difficult to realize due to the large design space of embodiments in, and the sheer complexity of many fields of applications of the real physical world. Another rising technology cluster, XR, accounts for control and systematic manipulations of complex interactions (Blascovich et al., 2002; Wienrich and Gramlich, 2020; Wienrich et al., 2021a). Hence, XR provides much potential to increase the investigability of human-AI interactions and interfaces.

In sum, a real-world embodiment of an AI might need considerable resources, e.g., when we think of humanoid robots or self-driving cars. In turn, systematic investigations are essential for an evidence-based human-centered AI design, the design and evaluation of explainable AIs and tangible training modules, and basic research of human-AI interfaces and interactions. The present paper suggests and discusses XR as a new perspective on the XR-AI combination space and as a new testbed for human-AI interactions and interfaces by raising the question:

How can we establish valid and systematic investigation procedures for human-AI interfaces and interactions?

Four sub-questions structure the first part of the article. Theoretical examinations about these questions contribute to a new perspective on the XR-AI combination space on the one hand and a new testbed for human-AI interactions and interfaces on the other hand (Table 1).

TABLE 1

TABLE 1. Summarizes the research questions and corresponding contributions of the article’s first part.

The second part of the present article introduces two paradigmatic implementations of the proposed XR testbed for human-AI interactions and interfaces. An XR environment simulated interactive and embodied AIs (Experiment 1: a conversational robot, Experiment 2: recommender system) to evaluate the perception of the AI and the interaction in dependency of various AI embodiments (Table 2).

TABLE 2

TABLE 2. Summarizes the experimental approach and corresponding contributions of the article’s second part.

A New Perspective on the XR-Artificial Intelligence Combination Space

How Is Human-Artificial Intelligence Interaction Defined?

Human-AI interaction (HAII) has its roots in the more general concept of HCI, since we here assume that some sort of computing machinery realizes an artificial intelligence. Hence human-AI interaction is a special form of HCI where the AI is a special incarnation of a computer system. Note that this definition does not distinguish between a hardware and a software layer, but uses the term computer system in the general sense, combining both aspects of hardware and software together to constitute a system that interacts with the user.

The second aspect to clarify is what precisely we mean by the term Artificial Intelligence or AI. As we have briefly noted in the introduction, the term has a long history going back at least to the Dartmouth conference in 1956. Typical definitions of AI usually incorporate some definition of the aspect of artificiality referring to a machine or computer system as the executor of some sort of simulation or process which can be attributed to some type of intelligence either in its relationship between input and output or in its internal functioning or model of operation, which might try to mimic the internal functioning of a biological entity one considers intelligent. Note that in these types of definitions, the term intelligence is not defined but often implicitly refers to human cognitive behaviors.

Russell and Norvig demarcate AI as a separate field of research and application from other fields like mathematics, control theory, or operations research by two descriptions (Russell and Norvig, 2020): First, they state that from the beginning, AI included the concept to replicate human capabilities like creativity, self-dependent learning, or utilization of speech. Second, they point to the employed methods, which first and centrally are rooted in computer science: AI uses computing machines as the executor of some creative or intelligent processes, i.e., processes that allow the machine to autonomously operate in complex, continuously changing environments. Both descriptions cover a good number of AI use cases. However, they are not comprehensive enough to cover all use cases, unless one interprets complex, continuously changing environments very broadly. We want to extend this circumscription of AI with processes that adopt computational models of intelligent behavior to solve complex problems that humans are not able to solve due to the sheer quantity and/or complexity of input data, i.e., as typical in big data and data mining.

To further clarify what we mean by human-AI interaction, we use the term AI in human-AI to denote the incarnation of a computing system incorporating AI capabilities as described above. For the remainder of this paper, the context should clarify if we talk about AI as the field of research or AI as such an incarnation of an intelligent computing system, and we will precisely specify what we mean where it might be ambiguous. Note that this ambiguity between AI as a field or AI as a system already indicates the tendency to attribute particular capabilities and human-like attributes to such a system and potentially see it as an independent entity that we can interact with. The latter is not specific to an AI but can already be noticed for non-AI computer systems where users tend to attribute independent behavior to the system, specifically if something is going wrong. Typical examples from user reactions calling a support line like “He doesn’t print” or “He is not letting me do this” indicate this tendency and usually are interpreted as examples of the media equation (Reeves and Nass, 1996b; Nass and Gong, 2000; Wienrich et al., 2021b). However, we want to stress the point that such effects might be amplified by AI-systems due to their, in principle, rather complex behaviors and, as we will discuss in the next sections, due to the general character and appearance of an AI’s embodiment.

How to Classify Combined XR With Artificial Intelligence? The XR-Artificial Intelligence Continuum

As we pointed out, there are various use-cases for and also incarnations of AIs, and in this context, we do not want to restrict the kinds of AIs in human-AI interaction any further. As we have seen, AI are specialized computer systems either by their internal workings or by their use case, and capabilities. As such, if they do not operate 100% independently from humans but will serve a role, task, or function, there will be the need to interact with human users. For some AI-applications, it seems more straightforward to think about forms of an embodiment for an AI, e.g., robots, conversational virtual agents, or smart speakers (examples Figure 2). However, if there is a benefit of AI-embodiment for the user, e.g., if it helps to increase the usability or user experience (UX), or if it helps the user to gain a deeper understanding of an AI’s function and capabilities, then we should consider extending the idea of AI-embodiment to more use cases of AIs, from expert systems to data science to self-driving cars. However, appearance matters, i.e., the kind of embodiment impacts significantly on human perception and acceptance. From an HCI perspective, it is of utmost importance to understand and investigate how an AI’s appearance influences the human counterpart. Only when we understand this influence we can scientifically contribute to a user-centered AI design process, considering responsively different user groups. Note, the term embodiment is not restricted to self-avatars representing humans in a virtual environment. We follow the common understanding of embodiment addressing also “others” in social encounters (also in conjunction with the term other-avatar). The complexity of human-AI interactions and interfaces challenges such investigations (see below). XR offers promising potentials to meet these challenges.

FIGURE 2

FIGURE 2. Examples of XR-AI integrations. From upper left to lower right: A user is interacting with an intelligent virtual agent to solve a construction task (Latoschik, 2005) and interaction with an agent actor in Madam Bovary, an interactive intelligent story telling piece (Cavazza et al., 2007), both in a CAVE (Cruz-Neira et al., 1992); Virtual agents in an Augmented Reality (AR) (Obaid et al., 2012) and in a Mixed Reality (MR) (Kistler et al., 2012); Speech and gesture interaction in a virtual construction scenario in front of a power wall (Latoschik and Wachsmuth, 1998) and in a CAVE (Latoschik, 2005). Multimodal interactions in game-like scenarios full-immersed using a Head-Mounted Display (HMD) (Zimmerer et al., 2018b) and placed at an MR tabletop (Zimmerer et al., 2018a).

Intelligent Graphics is about visually representing the world and visually representing our ideas. Artificial intelligence is about symbolically representing the world, and symbolically representing our ideas. And between the visual and the symbolic, between the concrete and the abstract, there should be no boundary. (Lieberman, 1996)

Lieberman’s quote describes a central paradigm that combines AI with computer graphics (CG). Its focus on symbolic AI methods seems limited today. However, in the late nineties of the last century, the surge and success of machine learning and deep learning approaches was yet to come. Hence the quote should be seen as a general statement about the combination of AI and CG. Today, intelligent graphics or synonymously smart graphics refers to a wide variety of application scenarios. These range from the intelligent and context-sensitive arrangement of graphical elements in 2D desktop systems to speech-gesture interfaces or intelligent agents in virtual environments as assistants to users. All these approaches have in common that a graphical human-computer interface is adapted to the user’s cognitive characteristics with the help of AI processes to improve the operation (Latoschik, 2014).

The combination of AI and XR, more specifically of Artificial Intelligence and artificial life techniques with those of virtual environments, has been denoted by Aylett and Luck (2000) as Intelligent Virtual Environments. They specifically concentrated on autonomous, physical, and cognitive agents and argued that “embodiment may be as significant for virtual agents as they are for real agents.” They proposed a spectrum between physical and cognitive and identified autonomy as an important quality for such virtual entities. We here argue that the combination of XR and AI is significantly broader and propose the XR-AI continuum (see Figure 3).

FIGURE 3

FIGURE 3. The XR-AI Continuum classifies potential XR-AI combination approaches with respect to the main epistemological perspective motivating the combination. The scale’s poles denote approaches that purely target AI as the object of investigation (XR 4 AI) or that target AI as an enabling technology (AI 4 XR). Well-known research areas which utilize XR-AI approaches are depiced with their potential mapping onto the scale. Notably, approaches will often serve both perspectives to various degrees, mapping them to the respective positions between the poles. The established fields and terms of intelligent virtual environments and smart graphics/intelligent graphics will mostly cover the left spectrum of the continuum. In contrast, the right spectrum, which concentrates on AI as the object of investigation, is covered by approaches we denote as eXtended Artificial Intelligence.

The XR-AI continuum is spanned between two endpoints defined by the overall epistemological perspective and goal of a given XR-AI combination. The continuum ranks XR-AI combinations with respect to the general question of what we want to achieve by a given XR-AI combination, i.e. the epistemological motive. Are we using AI as an enabling technology to improve an XR system, e.g., to realize certain AI-supported functionalities, user interfaces, and/or to improve the overall usability? Or do we use XR as a tool to investigate AI? The first perspective is on-trend typical for the majority of early approaches of XR-AI combinations, e.g., as described by intelligent virtual environments, intelligent real-time interactive systems, or, with less focus on immersive and highly interactive displays, as described by smart or intelligent graphics (Latoschik, 2014). While Aylett and Luck mainly focused on AI as an enabling technology to improve the virtual environment (Aylett and Luck, 2000), the XR-AI continuum also highlights how XR technologies provide a new investigability of HAII.

The proposed XR-AI continuum with its explicit perspective of the investigability of HAII circumscribed by the eXtended AI approaches combines current developments and results from XR and AI. Recent work has motivated to investigate how humans react to AIs during interaction. These approaches include investigations of verbal interaction (Fraser et al., 2018; Rieser, 2019) and its influence on the tendency to anthropomorphize (Strathmann et al., 2020), and behavior design (Azmandian et al., 2019) including emotional intelligence (Fan et al., 2017). (Kulms and Kopp, 2016) explicitly target embodiment effects. There are some early examples motivating the methodological aspects of investigations of HAII, though they concentrate on emulating investigability by desktop applications (Zhang et al., 2010; Bickmore et al., 2013; Mattar et al., 2015). However, none of the approaches motivate to utilize the increased design and effect space of XR, e.g., to provide situated and/or embodied interactions in a simulated real-like context. There is strong evidence that media-related characteristics like interaction, immersion (inclusive, extensive, surrounding, and vivid) (Slater and Wilbur, 1997), or plausibility (Slater, 2009; Skarbez et al., 2017; Latoschik and Wienrich, 2021) have a huge impact on investigated and expected target effects, for example in terms of emotional response and/or embodiment (Yee and Bailenson, 2007; Gall and Latoschik, 2018; Waltemate et al., 2018).

More recently, Antakli et al. (2018) presented a virtual simulation for testing human-robot interaction (Antakli et al., 2018). However, the authors only briefly illustrated the entire design space and the advantages of an XR testbed proposed in the present paper. Very recently, Sterna et al. (2021) pointed to the lack of pretesting in the VR community (Sterna et al., 2021). They presented preliminary work on a web-based tool for pretesting virtual agents. Again, the application does not allow situated or embodied interaction in a real-like context.

Two currently published contributions point to a lack of research referring to the right side of our continuum. Ospina-Bohórquez et al. (2021) analyzed in a structured literature review the interplay between VR and multi-agent systems (Ospina-Bohórquez et al., 2021). It answered two research questions: “What applications have been developed with Multi-agent systems in the field of Virtual Reality?” and “How does Virtual Reality benefit from the use of Multi-agent systems?”. The analysis revealed fruitful combination and application areas. Remarkable, the search also demonstrated that most combinations are intended to make the virtual environment more intelligent referring to the left side of our continuum. The authors conclude that there is a lack of research investigating more complex simulations to examine the user-avatar relationship on the one hand. On the other hand, they stated that “[…] it would be necessary to include the mental state of agents (emotions, personality, etc.) as part of the agent’s perception process, since these factors influence human attention processes in the real world” (p. 17). Both desiderates hints to the significance of the right sight of our continuum. Further, Fitrianie et al. (2020) pointed to a methodological crisis on the evaluation of artificial social agents (Fitrianie et al., 2020). They discuss that most studies use different approaches to investigate human-agent interactions, resulting in a lack of comparability and replicability. They contributed to a solution by looking for constructs and questionnaire items to make the research on virtual agents more comparable and replicable.

In sum, many research can be classified to the left side of the XR-AI Continuum, while using XR as an investigation method for AI needs further research.

The following sections discuss why the extension is necessary and how it contributes to a better understanding of human-AI interactions and interfaces.

XR as a New Testbed for Human-AI Interactions and Interfaces

What can We Learn From the Challenges and XR Solutions Concerning the Investigability of Human-Human Interaction?

In psychology, interaction is defined as a dynamic sequence of social actions between individuals (or groups) who modify their actions and reactions due to actions by their interaction partner(s) (Jonas et al., 2014).

Researchers studying individual differences in human-human social interactions face the challenge of keeping constant or changing systematically the behavior and appearance of the interaction partner across participants (Hatfield et al., 1992). Even slightly different behaviors and appearances influence participants’ behavior (Congdon and Schober, 2002; Topál et al., 2008; Kuhlen and Brennan, 2013). For investigating social interactions between humans, the potentials of XR are already recognized (Blascovich, 2002; Blascovich et al., 2002). Using virtual humans provides high ecological validity and high standardization (Bombari et al., 2015; Pan and Hamilton, 2018). In addition, using a virtual simulation of interaction enables researchers to easily replicate the studies, which is essential for social psychology, in which replication is lacking (Blascovich et al., 2002; Bombari et al., 2015; Pan and Hamilton, 2018). Another advantage of using XR to study human-human interactions is that situations and manipulations that would be impossible in real life can be created (Bombari et al., 2015; Pan and Hamilton, 2018). Many studies substantiated XR’s applicability and versatility to simulate and investigate social interaction between (virtual) humans (Blascovich, 2002; Bombari et al., 2015; Wienrich et al., 2018a, 2018b). Many of them showed the significant impact of different self-embodiments on self-perception, known as the Proteus effect (Yee and Bailenson, 2007; Latoschik et al., 2017; Ratan et al., 2019). Recent results show the Proteus effect caused by self-avatars also applies to the digital counterparts (the avatars) of others. Others demonstrated how XR potentials are linked to psychological variables (Wienrich and Gramlich, 2020; Wienrich et al., 2021a).

Which Challenges and Solutions Arise for the Systematic Investigation of Human-AI Interaction and Interfaces?

In HCI and hence in HAII, at least one interacting partner is a human, and at least one partner is constituted by a computing system or an AI, respectively. However, the focus has traditionally less been on the social aspects but a task level or the pragmatic quality of interaction. Notably, social aspects of the interaction are recently becoming more and more found interesting in HCI (Carolus and Wienrich, 2019). This trend is becoming even more relevant for HAII due to the close resemblance of certain HAII properties with human intelligence (Carolus et al., 2019).

Consequently, AI applications, becoming increasingly interactive and embodied, leads up to essential changes from an HCI perspective:

1) Interactive embodied AI changes the interface conceptualization from an artificial tool/device into an artificial (social) counterpart.

2) Interactive embodied AI changes the usage by skilled experts into diverse users usage.

3) Interactive embodied AI changes the applications in specific domains into diverse (every day) domain applications.

4) Additionally, the penetration of AI in almost every domain of life also changes the consequences of lacking acceptance and misperceptions. While a lack of acceptance and misperception resulted in usage avoidance in the past, avoidance passes into incorrect usage with considerable consequences in the future.

Thus, AI applications are becoming increasingly interactive and embodied, leading to the question of how researchers can study individual differences in human-AI interactions and the impact of different AI embodiments on human perception of and interaction with AI. Similar to human-human interaction, we face the challenge of keeping constant or changing systematically the behavior and appearance of the artificial interaction partner across participants. Such an approach is challenging to realize due to the sheer complexity of application fields and embodiments in reality. Similar to human-human interaction, interactive and embodied AIs lead up to considerable challenges for the systematic investigability of human-AI interactions. However, XR opens four essential potentials shown in Table 3.

TABLE 3

TABLE 3. Describes the four potentials of XR as a new testbed for human-AI interactions and interfaces. Each potential can be realized by more or less complex and realistic prototypes of HAII.

From an HCI perspective, eXtended AI (left side of the XR-AI continuum) constitutes a variant of rapid prototyping for HAII (Table 3). Rapid prototyping includes methods to fabricate a scale model of a physical part or to show the function of a software product quickly (Yan and Gu, 1996; Pham and Gault, 1998). The clue is that users interact with a prestage or a simulation instead of the fully developed product. Such methods are essential for iterative user-centered design processes because they supply user insights in the early stages of development processes (Razzouk and Shute, 2012). Computer aided design (CAD), Wizard-of-Oz, mock-ups, darkhorse prototyping, or the Eliza principle are established rapid prototyping methods. XR as a testbed for HAII allows for rapid prototyping of interactive and embodied AIs, for complex interactions, and in different development stages to understand user’s mental models about AI, predict interaction paths and reactions. Besides, multimodel interactions and analyzes can be quickly realized and yield interesting results for the design, accessibility, versatility, and training effects. Of course, XR technology faces some challenges regarding the accessibility of the technology itself (Peck et al., 2020). However, XR technology becomes increasingly mobile and cheaper (e.g., Oculus Quest, Pico neo) and easier to developed (e.g., with Unity). Accessibility and versability refer instead to the advantage of XR reaching regions or user groups that are often cut off from industrial centers or innovation hubs. Moreover, situated interaction in different contexts can be realized facilitating participative design approaches and contextual learning.

In sum, the first part presented theoretical examinations about four sub-questions revealing a new perspective on the XR-AI combination space on the one hand and a new testbed for human-AI interactions and interfaces on the other hand. Hence, the first part show why the combination of XR and AI fruitfully contribute to a valid and systematic investigation of human-AI interactions and interfaces.

The following second part of the present article introduces two paradigmatic implementations of the proposed XR testbed for human-AI interactions and interfaces and show how a valid and systematic investigation can be conducted.

Paradigmatic Implementations of an XR Testbed for Human-Artifical Intelligence Interactions and Interfaces

In the following, we outline two experiments simulating a human-AI interaction in XR. The first experiment simulated a human-robot interaction in an industrial context. The second one resembled an interaction with an embodied recommender system in a quiz game context. The experiments serve as an illustration to elucidate the four potentials of XR mentioned above. Thus, only a few pieces of information are presented related to the scope of the article leading to a deviation from a typical method and results presentation. Please refer to the authors for more detailed information regarding the experiments. In the discussion section, the possibilities and limits of such an XR testbed for AI-human interactions, in general, are illustrated on the paradigmatic implementations.

Paradigmatic Experiment 1: Simulated Human-Robot Interaction

Background

Robots reflect an embodied AI. In industrial contexts, robots and humans already work side by side. Mainly, robots operate within a security zone to ensure the safety of human co-workers. However, collaboration or cooperation, including contact and interaction with robots, will gain importance. As mentioned above, investigating collaborative or cooperative human-robot interactions are complex due to myriads of gestalt variants, tasks, and security reasons. Besides, many different user groups with different needs and motives will work with robots. One crucial aspect of interaction is the sense of social intelligence in the artificial co-worker (Biocca, 1999). The scientific literature describes different cues implying the social intelligence of an artificial counterpart, such as starting a conversation, adaptive answering, or sharing personal experiences (Aragon, 2003; Terry and Garrison, 2007).

The present experiment manipulated the conversational ability to vary the sense of social intelligence of a simulated robot. The experiment asks: How does the sense of the robot’s social intelligence influence the perception of the robot and the evaluation of the interaction?

Method

Thirty-five participants (age in years: M = 22.00, SD = 1.91; 24 females) interacted with a simulated robot in an industrial XR environment (see Figure 4). All participants were students and received course credit for participation. The environment was created in Unity Engine Version 2019.2.13f1. The player interactions are pre-made and imported through the Steam-VR plugin. All assets (tools and objects for use in Unity, like 3D-models) were available in the Unity Asset Store. The Vive pro headset was used.

FIGURE 4

FIGURE 4. shows the virtual environment and the embodiment of the robot (stereoscopic view of a user wearing a Head-Mounted Display—HMD).

Collaboratively, they sorted packages with different colors and letters as fast and accurately as possible. First, the robot sorted the packages by color (yellow, cyan, pink, dark blue). Second, the robot delivered the packages with the right color to the participant. Third, the participant threw the packages into one of two shafts, either into the shaft for the letters A to M or into that for N to Z’s letters. A gauge showed the number of correct sorted packages. The performance of the robot was the same in all interactions.

Participants conducted two conditions in a within subject design while the order of the conditions was balanced. In the conversational condition (short: CR), the robot showed cues of social intelligence by starting a conversation, adaptive answering, and sharing personal preferences about small-talk topics (e.g., “Hi, I am Roni, we are working together today!”; “How long do you live here?”; What is your favorite movie?”). In a Wizard-of-Oz scenario, the robot answered adaptively to the reaction of the participants. The conversation ran alongside the tasks. In the control condition, the robot did not talk to the participant (short: nCR). The experimenter was present during the whole experiment.

To assess the perception of the robot, we measured the uncanniness on a five-point Likert-scale (Ho and MacDorman, 2010) with the subscales humanness, eeriness, and attractiveness. Furthermore, the sense of social presence was measured with the subscale social of the Bailenson social presence scale (Bailenson et al., 2004). Finally, the valence of the robot evaluation was assessed by the negative attitude towards robots scale (short: NARS), including the subscales S1 negative attitudes towards situations of interactions with robots, S2 negative attitudes toward the social influence of robots, S3 negative attitudes toward emotions in interaction with robots (Nomura et al., 2006). Participants gave their answers on a five-point Likert-scale. For data analyses, an overall score as the average of the subscales was built.

To assess the evaluation of the interaction, the user experience of the participants was measured. Four items assessed the pragmatic (e.g., “The interaction fulfilled my seeking for simplicity.”) and hedonic (e.g., “The interaction fulfilled my seeking for pleasure.”) quality based on the short version of the AttrackDiff mini (Hassenzahl and Monk, 2010). The eudaimonic quality was measured by four items (e.g., “The interaction fulfilled my seeking to do what you believe in.”) adapted from (Huta, 2016). Besides, a the social quality was assessed by four items (e.g., “The interaction fulfilled my seeking for social contact.”) based on (Hassenzahl et al., 2015). Additionally, NASA-TLX with the subscales mental demand, physical demand, temporal demand, performance, effort, and frustration was measured on a scale ranging from 0 to 100 (Hart and Staveland, 1988).

Since the experiment serves as an example, we followed an explorative data analysis by comparing the two conditions with an undirected paired t-Test. Further, some explorative moderator analyses, including participant’s gender as a moderator, were analyzed. For each analysis, the alpha level was set up to 0.05 to indicate the significance and to 0.2 to indicate significance by trend (Field, 2009).

Results

Table 4 summarizes the results. The robot with conversational abilities was evaluated as more human-like, attractive, social present, and positive. The evaluation of the interaction yielded mixed results. The pragmatic quality and the mental effort indicated a more negative evaluation of the conversational robot than the non-conversational robot. In contrast, the hedonic, and social quality, the perceived performance were evaluated more positively after interacting with the conversational robot than the non-conversational robot.

TABLE 4

TABLE 4. shows the descriptive and t-test results for the robot condition. M refers to the mean and SD to the standard deviation in the corresponding condition. The t-value represents the test statistic resulting from the t-Test of dependent samples and includes the degrees of freedom. The p-value indicates to the significance of the test. nCR refers to the interaction with the non-conversational robot, i.e., the control condition. CR refers to the robot talking to the participants during the interaction, i.e., the conversational robot.

Furthermore, the results showed that gender matters in human-robot interaction (see Figure 5 and Figure 6). Women rated the conversational robot as more positive (significant) and attractive (by trend) than the non-conversational robot, while men did not show any differences. Regarding the interaction evaluation, men showed lower values of the pragmatic quality after interacting with the conversational robot, while women showed no difference for the conversational conditions. Moreover, women rated the interaction with the conversational robot as more hedonic, while men did not show any differences. Finally, the gradient regarding the social quality for the conversational robot was stronger for women than for men. In general, men showed higher ratings for the robot. Women only showed similar high ratings in the conversational robot condition.

FIGURE 5

FIGURE 5. shows the interaction effect between robot condition and gender of participants regarding the UX ratings. nCR refers to the non-conversational robot. CR refers to the conversational robot. The results demonstrate that men decreased the pragmatic rating for the conversational robot compared to the non-conversational robot. In contrast, women increased their hedonic, social, and eudaimic rating for the conversational robot (by trend).

FIGURE 6

FIGURE 6. shows the interaction effect between robot condition and gender of participants regarding some of the robot ratings. nCR refers to the non-conversational robot. CR refers to the conversational robot. The results demonstrate that women rated the conversational robot more positive than the non-conversational robot while men did not. Similar, but not significant, women rated the conversational robot more attractive than the non-conversational robot while men did not.