A Technical and Conceptual Framework for Serious Role-Playing Games in the Area of Social Skill Training

Othlinghaus-Wulhorst, Julia; Hoppe, H. Ulrich

doi:10.3389/fcomp.2020.00028

ORIGINAL RESEARCH article

Front. Comput. Sci. , 31 July 2020

Sec. Human-Media Interaction

Volume 2 - 2020 | https://doi.org/10.3389/fcomp.2020.00028

This article is part of the Research Topic Serious Games View all 10 articles

A Technical and Conceptual Framework for Serious Role-Playing Games in the Area of Social Skill Training

$\nJulia Othlinghaus-Wulhorst$ Julia Othlinghaus-Wulhorst^*

H. Ulrich Hoppe

COLLIDE Group, Department of Computer Science and Applied Cognitive Science, Faculty of Engineering, University of Duisburg-Essen, Duisburg, Germany

Virtual role-playing games can provide an authentic experience of situated learning and allow for trying out different problem-solving and communication strategies without consequences in the real world. This is of particular interest and benefit for the training of social skills. This article presents a conceptual and technical framework for serious role-playing games for the training of specific social skills in virtual 2D learning environments involving chatbots in dialog-centric settings. It summarizes different use cases and evaluation results from prior studies. From the design perspective, several distinctive conceptual features characterize our framework: (1) chat-like interaction with an AI-controlled chatbot, (2) separate phases of immersion and reflection to facilitate a change of perspective that is considered conducive for learning, (3) the learning process is emphasized by means of adaptive feedback based on individual analyses. We propose a system architecture that is based on three components: (1) AI-controlled chatbots that adapt to the player's behavior, (2) a multi-agent blackboard system as the backbone in order to keep components independent and optimize performance due to parallel processing, and (3) intelligent support for an automated evaluation of the player's performance and feedback generation. The training scenarios presented and discussed in this article include workplace-oriented conflict management, patient-centered medical interviews, and customer complaint management. First evaluation studies indicate that the scenarios may be well-suited for real training situations. Due to its flexible architecture, our framework and approach can easily be tailored to different settings and use cases and thus serve as a basis for future research focusing on the adaptation to other contexts and systems. On the basis of these developments, we elaborate important design dimensions, reflect and discuss general issues and major challenges, summarize and contrast different approaches and strategies, and identify opportunities for serious role-playing games in the area of social skills training.

Introduction

In recent years, serious games have been established as an efficient medium in education and professional training (Michael and Chen, 2006; Marr, 2010). The serious gaming approach attempts to use the appeal of digital games not only for entertainment purposes but also to convey “serious” content and to train practice-oriented skills (Ritterfeld et al., 2009). The combination of the serious gaming approach with role play scenarios is particularly promising. Role play enables learners to explore new situations and train how to act and react in these situations (Martens et al., 2008). Virtual role-playing games provide mobile, safe, and continuable environments, whereas traditional role plays can be time-consuming, costly, and difficult to administer (Totty, 2005). In addition, they lack repeatability. One general problem in the evaluation of role play experiences for educational purposes is the effort involved in analyzing and reflecting on the actual role play following the enactment. Traditional scenarios typically rely on video recording and, if applicable, note-taking. However, virtual learning environments enable structured recordings with integrated indexing, navigation instruments, search functions, and cross-references between different media and data sources. In addition, computer-supported analyses can help to evaluate and track the learners' performance. This is an important aspect, since without feedback and post-role-play reflection, the transfer to real word situations cannot be ensured (Lim et al., 2009). An additional important advantage of serious role-playing games in contrast to other virtual learning activities and environments is the motivational component, which may lead to intense and passionate involvement of learners (Susi et al., 2007).

Based on a series of different instances of role-playing games for the training of specific social skills, this article presents the underlying conceptual and technical framework that facilitated the implementation of the different applications. This framework is characterized by using scripted chatbots as training cases in a dialogic setting. A multi-agent architecture supports both the actual dialogic processing as well as the evaluation of the dialogs and the generation of adaptive feedback. Conceptual and technical aspects of this framework are described in chapters Framework: Conceptual Approach and Framework: Technical Approach, following up on a discussion of related work in this area (chapter Related Work). Chapter Case Studies assembles several case studies conducted with different instances of virtual role-playing environments based on the framework, reporting on experience and evaluation results. Chapter Dimensions of the Design of Serious Role-Playing Games for the Training of Social Skills combines this specific experience with general issues in the design of serious role-playing games to devise a set design dimensions in the sense of important aspects to be considered in the design, description and comparison of serious role-playing games.

Related Work

Serious Role-Playing Games for the Training of Social Skills

Serious games can be defined as “any form of interactive computer-based game software of one or multiple players to be used on any platform and hat has been developed with the intention to be more than entertainment” (Ritterfeld et al., 2009) and with an explicit focus on education. Games of this category are supposed to convey specific knowledge or train certain skills by using the attractiveness of entertainment games (Susi et al., 2007). Serious games can generally cover many different subject areas, but their application is mainly found in healthcare, education, and training, including military or employee training in companies (Marr, 2010). Serious games are widely accepted as an important and efficient medium with respect to education, training, and behavioral change (Michael and Chen, 2006). They are recognized to have several benefits: Serious games facilitate learning experiences while not having negative or harmful impacts (Ritterfeld et al., 2009). Games in general not only have a positive effect on the development of the player but can also be conducive to many different skills. Among others, Mitchell and Savill-Smith suggest that such target competences can be related to cognitive, social, analytical, and strategic aspects (Mitchell and Savill-Smith, 2005). Squire and Jenkins also made a comparable assessment (Squire and Jenkins, 2003). Further advantages include the reduction of costs and time associated with the use of serious games. They make it possible to recreate situations or working conditions that would otherwise not be possible in the real world (Corti, 2006; Susi et al., 2007). Serious games intend to facilitate deep and sustained learning (Gee, 2007) and prove to be more effective than traditional pedagogy and other educational technologies (Prensky, 2000; Ritterfeld et al., 2009).

Michael and Chen differentiate between games that educate and games that train (Michael and Chen, 2006). Games that educate should convey knowledge, facts and processes in a playful way, thereby contributing to education, while games that train are intended to improve the learners' skills in virtual environments or simulations. Our work is focused on the second category, more specifically on serious games for the training of social skills based on role play. Social skills can be seen as a sub-category of soft skills. The term soft skills refers to a broad concept that describes a set of personal attributes or traits expressing how persons know and manage both themselves and their relationship with other people (Dell'Aquila et al., 2017). While no universal definition of the term “soft skills” is available, Dell'Aquila et al. combine several different approaches to the following definition (Dell'Aquila et al., 2017): “Soft skills are not domain or practice specific; experientially based; both self and people orientated; goal-related behaviors; inextricably complementary to hard technical knowledge and skills enabling completion of activities and accomplishment of results; and crucial for effective leadership performance.” Social skills refer to soft skills related to interaction with other people. It describes “the ability to interact with others in a given social context in specific ways that are societally acceptable or valued and at the same time personally beneficial, mutually beneficial, or beneficial primarily to others” (Combs and Slaby, 1977) and includes, e.g., communication, cooperation, assertion, responsibility, empathy, engagement, and self-control (Gresham and Elliott, 2008). Role play is a great instrument to train interaction with other people. Assuming roles provides the opportunity to train to act and re-act in new situations. It facilitates the creation of knowledge and meaning through concrete experiences (Lim et al., 2009). Also, the observation of role play can lead to conclusions about own behavior (Martens et al., 2008). The integration of role play in a serious gaming context seems to be particularly promising, as this combination (a) incorporates a highly motivational character and (b) creates opportunities for exploration and experimentation in a protective environment without any consequences in the real world. In addition, virtual role plays may be much more effective than conventional approaches in settings where the social component is a crucial factor (Lim et al., 2009).

Several serious role-playing games for the training of social skills are available. They can be assigned to three main categories of relevant social skills: (1) leadership skills, (2) communication skills, and (3) conflict management. Examples for serious role-playing games for training leadership skills are Virtual Leader (Knode and Knode, 2011), TeamUp (Bezuijen, 2012), and Learn to Lead (Di Ferdinando et al., 2011). Virtual Leader is a simulation game in which students practice leadership styles and approaches within a 3D environment using avatars and intelligent agents in order to create a preferably realistic environment (Knode and Knode, 2011). Players participate in virtual business meetings with animated characters and are required to make a series of decisions in five scenarios with increasing complexity. TeamUp is a collaborative game for the training of teamwork and leadership skills, developed at the TU Delft (Bezuijen, 2012). In this game, four players need to work together to overcome several challenges, each designed to cover a specific element of effective teamwork. In Learn to Lead, the players have to lead a simulated team of employees (e.g., workers in a bank, a post-office, or a local government office) that is competing against other teams (Di Ferdinando et al., 2011). In this game, the players have two main objectives: First, they need to ensure that the company is running efficiently and productively. Second, they need to ensure that their teams develop in the desired manner. The Productive Leadership Game is a simulation game that is supposed to foster leadership competencies to improve team-based and organizational productivity (Kesti et al., 2017). A recapitulatory overview of serious role-playing games for training leadership skills can be seen in Table 1.

TABLE 1

Table 1. Serious games for the training of leadership skills (overview).

There are various examples for serious role-playing games aiming at the training of communication skills: ENACT (Marocco et al., 2015) is an online game for the standardized psychometric assessment and training of negotiation skills based on Rahim's model of conflict handling styles (Rahim and Bonoma, 1979). In this game, players assume different characters to negotiate with computer-controlled virtual 3D agents in various scenarios representing everyday life situations. They can always choose one of four possible pre-defined sentences to communicate with the agents. In DREAD-ED, players become part of a crisis management team that is dealing with an emergency situation (Haferkamp et al., 2011). The game is organized into a series of timed rounds, separated by phases in which a tutor can provide feedback to the players. Bosse et al. developed a game targeted at police academy students that focuses on decision-making aspects in critical situations like the so-called “door scene” in which a police officer has been informed about an incoming emergency call and is supposed to find out if it is indeed a case of domestic violence or not (Bosse and Gerritsen, 2016). The players interact with virtual characters in a realistic 3D environment by using a relatively simple interaction paradigm based on multiple choice and dialog trees. In the game deLearyous, players assume the role of a manager who just announced that the parking facilities of the company are no longer free and needs to deal with the reaction of an employee (Vaassen and Wauters, 2012) by using unconstrained written natural language input. The design of the virtual character representing the employee is based on a framework for interpersonal communication called Leary's Rose (Leary, 1957). JUST-TALK is a serious game to train law enforcement personnel for encounters with persons showing symptoms of serious mental illness (Hubal et al., 2003). The players interact with these computer-controlled characters using spoken natural language. They are supposed to look for indications of particular forms of mental illness so that they can adapt their approach in an appropriate way und thus defuse the situation. In POINTER, a game developed for interview training targeted at police officers, the players assume the role of a police officer interacting with a subject in the context of a police interview (Linssen et al., 2014). The subject here is a virtual agent who is not cooperating during the interview. The players' task is to interact with the subject in a way that makes it cooperate in order to gather information from them. ELECT BiLAT is a simulation game in which soldiers practice bilateral engagements within a cultural context (Lane and Hays, 2008). The recruits are supposed to conduct meetings and negotiations with local leaders. Maritime City is a game targeted at social workers. It aims at training the ability to read emotional states of persons and improving communication skills in verbal and non-verbal forms (Flynn et al., 2011). In this game, players are asked to investigate a disturbance at a house where a woman is living with her two children and need to investigate a range of approaches for each part of the scenario. TARDIS is a scenario-based serious game simulation platform that supports social training and coaching in the context of job interviews (Gebhard et al., 2018). It is specifically intended to be used by young people and job-inclusion associations to explore, practice, and improve their skills in a diverse range of possible interview situations by interacting with virtual agents acting as recruiters. Communicate! is a serious role-playing game designed to support practicing interpersonal communication between health care professionals such as doctors, pharmacists, or psychologists and a patient or client (Jeuring et al., 2015). In the scenarios included in the game, the players find themselves in a consultation with a virtual character during which they can choose between various options. They receive immediate feedback through the utterances and emotions of the conversational partner. The game SALVE (Augello et al., 2016) is using AI-controlled chatbots participating in medical consultations and is based on the Social Practice Theory (Schatzki, 1996). In contrast, Even et al. developed a serious game primarily targeting schizophrenia patients to support rehabilitation programs for social skills (Even et al., 2016). This approach is combining role play with problem-solving exercises on which remediation therapies rely. A recapitulatory overview of serious role-playing games for the training of communication skills can be seen in Table 2.

TABLE 2

Table 2. Serious games for the training of communication skills (overview).

Conflict management is an important social skill that has been the subject of serious role-playing games in the past. Choices and Voices, for example, is an interactive simulation game for preventing violent extremism. In it, players explore and discuss issues and influences leading to tension and disruption in communities (Memarzia and Star, 2011). In this game, players face several moral dilemmas in which their decisions determine the outcome of the game (for themselves, their family, and their friends). This is supposed to show the significant consequences real life decisions can have. The storytelling game Façade asks players to resolve a conflict between a married couple. Through communication with the conflicted parties, they are to investigate the causes of their issues and provide counseling (Mateas and Stern, 2003). The emphasis here is on believable characters, natural language conversation, and a dynamic storyline. In Office Brawl the player assumes the role of a mediator, who is moderating a conflict between two parties in a workplace-oriented setting, using AI-controlled virtual characters (Glock et al., 2011). As a project manager in the game, the player needs to handle an argument between two members of a team. FearNOT! is a virtual drama for anti-bullying education targeted at children (Aylett et al., 2005). In this game, the bullying behavior of one of the characters is leading to dramatic episodes. The victim is seeking advice of the player who can interact with this character by using free text input. It is supposed to allow children to explore what happens in bullying situations in which they take responsibility for what happens to a victim without feeling victimized themselves. The game LOITER lets prospective police officers enact street interventions with loitering juveniles (Linssen et al., 2014) and aims to improve their social awareness. Here, players can experiment with different ways of interacting with the juveniles. Self City is a serious game developed for emotionally impaired adolescents, which is supposed to help them develop skills such as process-oriented thinking and conflict resolution (Van Dijk et al., 2008). In this game, players can walk around online in a virtual city. On their way to the cinema, they experience challenging social situations and learn how to deal with them. Players are accompanied by a daemon that provides advice in conflict situations and suggests alternative actions. The Junior Detective Computer Game has been developed as part of a multi-component social skills intervention for children with Asperger syndrome (Beaumont and Sofronoff, 2008). Here, players take the role of a trainee at the Detective Academy and are taught how to recognize complex emotions in computer-animated and human characters. They need to complete several missions, such as dealing with bullying, playing with others, and trying out new things. A recapitulatory overview of serious role-playing games for training conflict management can be seen in Table 3.

TABLE 3

Table 3. Serious games for the training of conflict management (overview).

Frameworks for the Design of Serious Games

There is a number of existing models and frameworks for the general design of serious games, which describe fundamental components of such systems and support formal approaches to game design. A very general approach is the so-called MDA (Mechanics—Dynamics—Aesthetics) framework (Hunicke et al., 2004). It proposes three different perspectives for understanding and designing games: Mechanics refer to the actual implementation of the game. They describe its particular components (actions, behaviors and control mechanisms) at the level of data representation and algorithms. Dynamics relate to the overarching design goals and run-time behavior of the mechanics acting on player inputs and each other's output over time. Aesthetics refers to the resulting game experience. They describe the desirable emotional responses evoked in players, when interacting with the game system. Although the MDA framework is widely accepted and practically employed, it has weaknesses and limitations (Walk et al., 2017): It focuses too much on game mechanics, neglecting many design aspects of games, including an over-arching narrative. Therefore, it is not really suitable for all types of games, including particularly gamified content or any type of experience-oriented design.

Another approach toward serious game design is the Four-Dimensional Framework suggested by De Freitas and Oliver (2006). It postulates four main dimensions of learning processes to be considered in the design process of serious games: the context in which learning takes place (e.g., classroom-based or outdoors, access to equipment, technical support), the learner specification (e.g., learner profile, pathways, learning background), the mode of representation (e.g., level of fidelity, interactivity, and immersion used in the game), and pedagogic considerations (e.g., learning models, approaches for learning support). Like the MDA framework, this framework is a high-level model, meaning that it specifies a limited number of generic concepts that can or should be taken into consideration when designing or evaluating serious games, but only on a very general level with no concrete design or evaluation guidelines (Mayer et al., 2014).

This also applies to the RETAIN (Relevance Embedding Translation Adaptation Immersion & Naturalization) model by Gunter et al. (2006). This model was developed to support game development and to assess whether a serious game is appropriate for educational purposes, how well the academic or pedagogical content is immersed and embedded in the game's narrative and how knowledge transfer is promoted. Relevance means that the information students learn in the game should be relevant to the game world as well as to the players' targeted objectives. Embedding should be done in a way that learning objectives and fantasy are tightly coupled. Transfer refers to how well players can recognize and apply newly learned information outside the game environment. Adaptation means that players apply their learned knowledge to create new scenarios that apply literacy skills in a new domain. Immersion should be facilitated by the game environment and the ability to create customizable social presence. Naturalization means that players should be encouraged to gradually use their own skills to gain the knowledge necessary for success in other problems and subject areas (Kenny and Gunter, 2011).

The Triadic Game Evaluation (TGE) (Harteveld, 2011) approach stresses three different perspectives for the design and evaluation of serious games: reality, meaning and play. The reality component determines the game subject, variables and definitions. It could be represented by players from the real world or a representation of the real world inside the game. Evaluation criteria in regards to this component include fidelity, realism, and validity. The meaning component of the framework considers how a meaningful effect beyond the game experience can be achieved and incorporates aspects such as communication, learning, rhetoric, and opinions. Evaluation criteria include reflection, transfer, and relevance. The play component refers to the fact that games are primarily highly interactive and engaging tools that immerse players into a fictitious situation, and is related to game elements like actors, rules, resources, challenges, and competition. Evaluation criteria for this component are engagement, fun, and immersion. The TGE framework claims that games need to be designed equally along these three components (Kortmann and Harteveld, 2009). In contrast to the aforementioned models, this framework comes with a concrete agile development model that describes different software engineering phases and decision moments in the creation process. However, specific design and implementation guidelines are not included.

In summary, the various promising approaches to training social skills by means of role-playing games are still defined on a very general level. Our aim is to provide a comprehensive conceptual and technical framework for the concrete design and implementation of serious role-playing games for the training of social skills in dialog-centric settings with virtual characters through which we would support more efficient and effective design and implementation of such game environments.

Framework: Conceptual Approach

From the design perspective, several distinctive conceptual features characterize our framework: (1) chat-like interaction with an AI-controlled chatbot, (2) separate phases of immersion (role-playing) and reflection to facilitate a change of perspective that is considered conducive for learning, (3) the learning process is emphasized by means of adaptive feedback based on individual analyses.

Chatbots in Virtual Role-Playing Environments

Chatbots are computer programs (conversational agents) that communicate with users in natural language. Their purpose is to simulate a human conversation via text or voice interactions. Originally, chatbots were developed for entertainment purposes. However, especially in today's world, in which the possibilities of computer use are becoming more and more diverse, the use of chatbots can be extended to many other areas. Chatbots are found in daily life now, such as personal assistants (like Google Assistant, Amazon Alexa, or Apple's Siri), search engines, customer service and support, and healthcare coaching (Winkler and Söllner, 2018). They can be used in a variety of domains including business, e-commerce, entertainment, medicine, and others (Kerly et al., 2006; Shawar and Atwell, 2007).

Chatbots can also be used successfully for learning. Past studies even show that chatbots present feasible means to improve learners' results (Kerly et al., 2006). They have been used for a variety of purposes including medical education and therapy, language learning, as well as receiving feedback and strengthen motivation and self-efficacy (Winkler and Söllner, 2018). Chatbots have also been used in serious role-playing games, as shown in the examples in chapter Related Work. The use of chatbots in serious role-playing games has several advantages. First, having a chatbot interact with the player instead of a human ensures a certain level of standardization that could never be achieved in a setting with human actors. Second, scenarios including a chatbot are repeatable, independent of time and place, and no additional resources are needed. An important part of chatbots is the creation of dialogs. A chatbot can only be as good as its knowledge base used for answer generation (Abdul-Kader and Woods, 2015). The problem of the “classic” chatbots is that they do not allow to store the course of the conversation and have no real understanding of the answers. However, a realistic and responsive behavior of chatbots is important to increase the players' engagement and contribute to the immersive nature of role plays. To achieve this, our approach proposes several technical workarounds that will be explained in detail in chapter Multi-Agent Architecture.

Immersion and Reflection

The educational impact of serious role-playing games highly draws on the “willing suspension of disbelief” by the players who commit to the role they are supposed to play (Lim et al., 2009). Thus, this kind of system intends to create a certain degree if immersion. Janet H. Murray defines the term immersion as follows: “A stirring narrative in any medium can be experienced as a virtual reality because our brains are programmed to tune into stories with an intensity that can obliterate the world around us… The experience of being transported to an elaborately simulated place is pleasurable in itself, regardless of the fantasy content. Immersion is a metaphorical term derived from the physical experience of being submerged in water. We seek the same feeling from a psychologically immersive experience that we do from a plunge in the ocean or swimming pool: the sensation of being surrounded by a completely other reality, as different as water is from air, that takes over all of our attention, our whole perceptual apparatus” (Murray, 2017). When players identify themselves with the character they are assuming in the game and are immersed, their motivation to proceed and succeed in the game increases (Annetta, 2010). This intrinsic way of motivating learners is something conventional instruction modes do not have (Yee, 2006). Players become immersed in a game because they find it satisfying, and through this intrinsic motivation, they get more engaged in the learning task (Annetta, 2010).

In terms of experience-based, authentic learning, it seems reasonable to carry out the enactment in an immersive situation. However, there is reason to believe that the immersion tends to impede the critical self-reflection that is important for the learning process (Malzahn et al., 2010). Reflection is a successful tool to improve the learning process (Jonassen et al., 1993), and it is needed to ensure the transfer to real-life situations (Lim et al., 2009). During the reflection process, people recapture, rethink, and evaluate their experiences to develop new understandings and appreciations (Boud et al., 1985). It is to be expected that the amount of reactive attention required for immersion impedes the players' ability to distance themselves from the role, which in turn interferes with self-reflection. Thus, the requirement of role distance in phases of reflection suggests that the mode should be changed to help the learner step out of his role and adopt a different perspective. Based on this assumption, we decided to separate the actual role-playing game from the reflection session in our framework.

Adaptive Feedback

As stated above, an important challenge for serious role-playing games is shaping the narrative experience and the pedagogical outcomes that generally depend on post-role-play reflection and feedback (Lim et al., 2009). Feedback on the performance of the player(s) during the role-playing session is necessary to ensure the transfer to real-life settings. It is supposed to help learners to improve their performance by providing information about the correctness of their actions (Shute, 2008). Johnson et al. identified four feedback characteristics: (1) the type of feedback (e.g., outcome-based or process-based feedback), (2) the timing of feedback after an action (i.e., immediate or delayed feedback), (3) the modality in which the feedback is presented (e.g., spoken or text-based feedback), and (4) adaptation to learner characteristics (e.g., in regards to prior knowledge or spatial ability) (Johnson et al., 2017).

Our framework relies on adaptive feedback based on an automated, individual performance analysis. We differentiate between three types of feedback: The first one is implicit feedback during the role-playing session through the reactions of the chatbot (ingame feedback). These reactions can be non-verbal (e.g., facial expressions) or verbal. Real-life situations are simulated through both types of reactions to the players' actions. The second one is a general summary of the analysis results (aftergame feedback). Players should receive an overall feedback on their performance during the role play that summarizes the most important aspects (positive and negative). The third type is direct and specific feedback on single incidents during the role play that can be provided through prompts in a replay of the conversation. A replay offers several advantages: The whole conversation can be shown again step by step and augmented with individual feedback at certain points, commenting on specific actions of the player. Also, it provides the possibility to navigate between the different phases of a conversation, pause the replay, or jump to the next feedback marker. As a result, it is much more flexible and searchable than, e.g., a video of a conventional role play.

Framework: Technical Approach

In our approach, the technical implementation of such systems entails three main challenges: (1) dialog modeling of the chatbot, (2) implementing a multi-agent system as the backbone in order to keep components independent and optimize performance, and (3) performance analysis and feedback generation. The following section will present our approach toward each of these aspects in detail.

Dialog Modeling

In our framework, the Artificial Intelligence Markup Language (AIML) is used for the implementation of the chatbots' conversational logic. It is a common XML-based solution for passive AI-controlled chatbots, which comes with an easy syntax and a small number of control structures (Wallace, 2004). AIML relies on a simple pattern matching. It consists of categories, each containing a pattern and a template. If the user input is matching a pattern, the template defines the answer or action to be given. Recursion and wildcards allow for many different inputs matching one single pattern, while the ability to store a context and the use of variables and conditions allow a complex and sophisticated chatbot design.

Although AIML has a long history and is a common solution for chatbots used in educational contexts, it has certain limitations. One problem is the passive nature of AIML. An AIML chatbot only reacts to an input it receives, it cannot take the initiative. This behavior can be bypassed by using external triggers to make the bot become active when required in certain situations. Another problem is that an AIML chatbot (as is true for all artificial natural language processing) cannot truly grasp the sense of what has been said. The AIML chatbot only checks the user input against predefined patterns; if there is no match, it can at most output some default statements (which need to be predefined as well). To solve this problem, our framework proposes the use of sentence openers in dialog-centric role play scenarios. This means that players always have to select a sentence opener from a predefined set and supplement it with free text input to compose a message.

This approach has several advantages: First, a sentence opener already defines the general gist of a message (e.g., affirmation, rejection, proposal, inquiry). As a result, it is at least possible to provide a default answer that is tailored to the selected sentence opener even if the free text input following the opener does not match a predefined pattern. Furthermore, if each phase of the chat conversation has unique sentence openers, the chatbot always has some kind of context information. Second, the use of sentence openers reduces the complexity of the dialog scripts dramatically because the possible starting points of all input sentences are already known. Third, sentence openers provide support to the players and help them phrase their messages. In addition, sentence openers improve the overall atmosphere of the simulated conversation and make it seem more realistic and natural. Last, sentence openers (in contrast to fully predefined text messages) still allow for free text input that can be analyzed in detail and influence the course of the game.

Multi-Agent Architecture

Our technical framework is based on a uniform multi-agent system architecture with a blackboard as the communication and integration mechanism. The blackboard is realized through a so-called tuple space. The components (agents) in this system are loosely coupled, i.e., they do not communicate with each other directly but only via entries on a central tuple space server (Gelernter, 1985). These entries have a simple tuple structure that contains primitive data types (integers, characters, booleans) and strings. According to the original concept of Gelernter, there are only a few generic operations (read, write, take, wait-to-take, etc.) to interact with such a blackboard. In contrast to a pure database solution, however, there are active trigger mechanisms such as notifications. The SQLSpaces developed in the COLLIDE group itself serve as a specific implementation basis in our framework (Weinbrenner, 2012). While the server itself is implemented in Java, the system framework of SQLSpaces provides clients for the agent programming in various programming languages. SQLSpaces also facilitates the logging of relevant data of each gaming session, which can later be used for analysis and comparison.

The overall system consists of a user interface and various agents, each of which is responsible for one task in either dialog analysis, feedback creation, or game control. The user interface in the three implemented training scenarios described in this article have been implemented as a web application using HTML, CSS, and JavaScript (2D frontend). Previous implementations were based on OpenSimulator3 (3D frontend), but since there were no specific advantages of 3D environments over 2D environments, we decided to go ahead with a 2D approach (Malzahn et al., 2010). As described above, the client (user interface) and all agents are writing and reading tuples from the tuple space server without communicating with each other directly, which results in a loosely coupled and adaptive system. That means, agents can easily be adapted, added, or removed depending on the actual application scenario.

The agents can be divided into three groups, depending on their functionality. Pervasive agents are overarching agents, which are crucial in connecting the individual game components. The register agent, for example, is managing the log-in of the player (or players in a collaborative scenario). When a new client is logging in, the register agent receives a request via the tuple space (callback) and starts a new gaming session. The silence agent reacts if a player has been inactive for a certain amount of time, in which case the agent is triggered and sends an internal message to which the chat bot responds. After the fourth internal message from the silence agent, the conversation ends. Pre-processing agents are used to pre-process the player's input before the answer to it is generated in order to provide the best possible answer. This pre-processing is mainly used to overcome the limited capabilities of AIML: Analyzing certain aspects separately helps to prioritize specific behaviors, i.e., make sure that the chatbot is reacting adequately to rude or aggressive behavior. In addition, this procedure reduces the structure of the AIML scripts and supports the feedback creation. Each of the implemented scenarios uses different pre-processing agents depending on the context. All pre-processing agents analyze the player's input regarding one specific aspect. Figure 1 shows the basic architecture.

FIGURE 1

Figure 1. Basic system architecture.

Performance Analysis and Feedback Generation

Both performance analysis and feedback generation always depend on the context and the learning objectives of the serious role-playing game. As described above, our architecture is using analysis agents, each of which is responsible for the evaluation of one specific aspect of the player's communication behavior. They are divided into pre-processing agents and regular agents. Pre-processing is necessary for generating a suitable chatbot response. For example, if a player acts aggressive or rude, the chatbot should react to this behavior regardless of any other information the player's message contains. The results of the pre-processing are collected, and if an immediate reaction to a specific behavior is required, the text input is modified. If, for instance, a swearword has been detected in a player's message, the complete input string is replaced by a specific trigger (“swearword”), causing the bot to react appropriately. The same applies to other behaviors. In case the pre-processing agents do not find anything that needs an immediate reaction of the chatbot, the bot receives the original text input. Simultaneously, all other analysis agents evaluate the message and add their feedback to it in the form of feedback tags (e.g., #praise#, #interruption#, or #criticize#). These feedback tags mark any situations in which the player is supposed to receive feedback during the replay that is taking place in the reflection phase following the role play session. The tags are filtered out during the chat session; the players do not get to see them during the game, but they play an important role in the feedback generation.

Case Studies

Based on the framework described above, the research group COLLIDE at the University of Duisburg-Essen has conducted various case studies with different instances of virtual role-play environments. The training scenarios include workplace-oriented conflict management, patient-centered medical interviews, and customer complaint management.

Case Study: Conflict Management

ColCoMa (Collaborative Conflict Management) is a collaborative serious game for the training of conflict management strategies in an organizational context within a role-playing scenario, developed at the COLLIDE group in 2012. It involves two players in a conversation with an AI-controlled chatbot acting as a mediator in a 2D virtual environment. The following description of the approach and game design is based on the work of Emmerich et al. (2012).

Approach

In ColCoMa, two players have a conversation about a fictitious conflict, moderated by an AIML chatbot in the role of a mediator. The main goal of the players is to resolve the conflict by showing constructive and appropriate behavior during the conversation. Each player is assigned a predefined role in this fictitious scenario: As a member of the computer support hotline of a big software company, Mr. Meier is conscientiously taking much time for his customers. Mrs. Schmidt is his supervisor. She is dissatisfied with Mr. Meier's way of working. She notices that he takes too much time for the customers and therefore does not work efficiently in her eyes. Mr. Meier does not agree with her, and the situation escalates after a negative appraisal of Mr. Meier's performance. In order to support immediate understanding of the situation and empathy with the assigned role, the scenario is kept as simple and comprehensible as possible and focuses on the main conflict as well as the person's feelings.

Game Design

The players are introduced to the game and the scenario through a cartoon-like picture story that is told from their respective role's perspective and is supposed to result in conflicting points of view. The conversation itself takes place in a chat window where graphical representations of the mediator and the other player's character are shown to create the association of sitting opposite each other. The dialog partners can communicate via simple text messages. Facial animations can be evoked via common emoticons. The interface also includes a notepad with hints as well as a help section that offers additional information on the game controls and the fictitious scenario if needed. Figure 2 is showing the basic user interface.

FIGURE 2

Figure 2. ColCoMa chat interface.

The conversation is divided into five phases according to Proksch (2010): (1) framing phase, (2) topic collection, (3) working on the conflict, (4) looking for a solution, and (5) contract. The framing phase represents the starting point of the mediation talk and is important for establishing certain rules for the conflicted parties and their behavior toward each other. The actual conflict is not yet the focus. Instead, the participants state their personal hopes and mediation goals and reflect on their own point of view as well as the opponent's position. In the second phase, both parties are supposed to name relevant topics they would like to put on the agenda during the mediation talk, like performance review, working conditions, the participants' perspective in the company as well as their behavior toward each other. The mediator chatbot recognizes the topics based on a list of keywords and phrases. In order to be able to advance in the game, the two players need to name three topics; otherwise the mediator terminates the conversation due to a lack of contribution. If only two topics are volunteered, the mediator will suggest a third one. The mediation talk itself takes place in the third phase. The main task during this phase is to discuss the selected topics in detail. Both players are given the opportunity to explain why a topic is important to them, what changes they would like to see in regard to the specific topic, and what they themselves can contribute to realize these changes. They are also given the opportunity to comment on whether the other party's perception is correct and to rectify their position if this is not the case. The aim of the fourth phase is to find solutions for the different topics that are acceptable for both parties. Finally, in phase five, they are supposed to agree to adhere to the solutions they came up with and enter into a contract.

The mediation talk is followed by a reflection phase in which both players receive feedback on their performance in order to help them reflect on their behavior. At the start of this phase, players get the opportunity to directly exchange feedback with each other in a free chat without the mediator. After this free chat, each of them receives an overall feedback on the own performance during the mediation talk. Finally, the players take part in a replay session of the whole chat conversation, but this time augmented with individual feedback commenting on especially positive and negative contributions of the players. A change of the graphical interface during the replay is supposed to reinforce role distance, which is assumed to be conducive for learning (see chapter Immersion and Reflection).

The performance analysis and assessment is based on general rules that conflicting parties have to adhere to during a mediation talk, such as not being aggressive or rude, not being reproachful, and not impairing the opponent's autonomy (Stauss and Seidel, 2010). Instead, the participants are supposed to have an open and constructive attitude, name topics and issues in a concrete way, and help the other party understand their perspective. The evaluation of the players' performance during the mediation talk is done by several analysis agents, each responsible for one specific behavioral aspect, e.g., rudeness (by comparing the players' input to a list of swearwords and defamations), aggression (e.g., by checking for multiple exclamation marks or use of all-caps spelling), emotion-showing (e.g., use of emoticons), or the use of I- and you-statements (by counting the amount of words referring to the speaker and those referring to the dialog partner). Some of the analysis results are used just for the overall feedback that is provided to players after the conversation.

Evaluation Results

In 2018, an eye-tracking study has been conducted in collaboration with the Dortmund University of Applied Sciences and Arts (Othlinghaus-Wulhorst et al., 2018). The results of this study will be summarized and discussed in this section.

Apart from getting feedback on the prototype, the main goal of the study was to investigate the question if there is a correspondence between gaze synchronicity of the two players and the quality of collaboration. Twenty subjects (average 22.8, SD = 2.84, 5 females, 15 males) participated in the study and have been tested in dyads, using two desktop-based eye-trackers to track the players' gaze during the experiment. To investigate the research question, three main hypotheses have been examined: The first hypotheses postulated “a positive relation between the convergence of visual foci of attention (gaze synchronicity) and the successful completion of the game (achievement score)” (Othlinghaus-Wulhorst et al., 2018). In this study, gaze synchronicity has been defined as the extent to what the two players have been looking at the same areas of interest in the same time interval during the course of a gaming session. The so-called achievement score has been used to measure the success in the game and reflects the players' performance during the mediation talk based on three criteria: (1) automated feedback generated by the system, which summarizes the players' behavior during the game, (2) the successful completion of the topic collection phase (which has been considered a major milestone in the game), and (3) the successful completion of the game, which is achieved when both players sign a contract, which includes the agreements and rules they worked out together with the mediator. Referring to the hypothesis, a highly significant correlation between the gaze synchronicity and the achievement score has been found on the aggregate level (taking overall eye-tracking convergence as a global parameter).

In the second hypothesis it is assumed that “there is a positive relation between the convergence of visual foci of attention (gaze synchronicity) and the quality of collaboration in the chat.” (Othlinghaus-Wulhorst et al., 2018). In order to define the quality of collaboration, a rating scheme has been developed, which includes five dimensions: (1) argumentation (players discuss or bring forward justifying arguments), (2) agreement/disagreement (players endorse or dissent from one another), (3) collaborative orientation (players refer to each other, ask questions, provide feedback or refer to topics brought up by the other player), (4) solution orientation (players try to find or propose a solution), and (5) shared awareness/reinforcing shared history (players share common knowledge or explain their situation). Based on this scheme, all chat messages have been analyzed and checked against the five dimensions and assigned a total quality score. Finally, all matches of a gaming session have been added up to a percentage indicating the overall quality of the collaboration for a pair of players. Relating to the hypothesis, a high correlation between the gaze synchronicity and the collaboration quality has been found, especially for the dimension's agreement/disagreement, solution orientation and shared awareness.

The third hypothesis proposes “a dynamic (time-related) congruence between similar eye movements (synchronicity) and the quality of collaboration in the chat” (Othlinghaus-Wulhorst et al., 2018), meaning that there is not only a gaze synchronicity on the aggregate level, but also synchronicity between convergent eye-tracking and chat interaction during the course of the game. This hypothesis could not be verified. It is assumed, that the specific nature of the chat might be a reason for this, as three persons are involved (the two players and the mediator chatbot) and thus the two human actors do not really communicate directly, but only to the mediator. They answer his questions and do not really have the chance to communicate with each other directly, which is resulting in a predefined structure of the chat conversation and rather long time interval between the utterances of the two players.

Case Study: Patient-Centered Medical Interview

In 2013, a training scenario for medical interviews has been developed at the COLLIDE group. It is supposed to give medical students the opportunity to train doctor-patient conversations autonomously and systematically in the form of role plays with simulated patients. The following description of the approach and game design is based on the work of Behler et al. (2013):