An Android for Emotional Interaction: Spatiotemporal Validation of Its Facial Expressions

Sato, Wataru; Namba, Shushi; Yang, Dongsheng; Nishida, Shin’ya; Ishi, Carlos; Minato, Takashi

doi:10.3389/fpsyg.2021.800657

METHODS article

Front. Psychol., 04 February 2022

Sec. Emotion Science

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.800657

This article is part of the Research TopicMethods and Applications in Emotion ScienceView all 9 articles

An Android for Emotional Interaction: Spatiotemporal Validation of Its Facial Expressions

Wataru Sato^1,2*

Shushi Namba¹

Dongsheng Yang³

Shin’ya Nishida^3,4

Carlos Ishi⁵

Takashi Minato⁵

¹Psychological Process Research Team, Guardian Robot Project, RIKEN, Kyoto, Japan
²Field Science Education and Research Center, Kyoto University, Kyoto, Japan
³Graduate School of Informatics, Kyoto University, Kyoto, Japan
⁴NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan
⁵Interactive Robot Research Team, Guardian Robot Project, RIKEN, Kyoto, Japan

Android robots capable of emotional interactions with humans have considerable potential for application to research. While several studies developed androids that can exhibit human-like emotional facial expressions, few have empirically validated androids’ facial expressions. To investigate this issue, we developed an android head called Nikola based on human psychology and conducted three studies to test the validity of its facial expressions. In Study 1, Nikola produced single facial actions, which were evaluated in accordance with the Facial Action Coding System. The results showed that 17 action units were appropriately produced. In Study 2, Nikola produced the prototypical facial expressions for six basic emotions (anger, disgust, fear, happiness, sadness, and surprise), and naïve participants labeled photographs of the expressions. The recognition accuracy of all emotions was higher than chance level. In Study 3, Nikola produced dynamic facial expressions for six basic emotions at four different speeds, and naïve participants evaluated the naturalness of the speed of each expression. The effect of speed differed across emotions, as in previous studies of human expressions. These data validate the spatial and temporal patterns of Nikola’s emotional facial expressions, and suggest that it may be useful for future psychological studies and real-life applications.

Introduction

Emotional interactions with other people are important for wellbeing (Keltner and Kring, 1998) but difficult to investigate in controlled laboratory experiments. While numerous psychological studies have presented pre-recorded photographs or videos of emotional expressions to participants and reported interesting findings regarding the psychological processes underlying emotional interactions (e.g., Dimberg, 1982), this method may lack the liveliness of real interactions, thus reducing ecological validity (Shamay-Tsoory and Mendelsohn, 2019; Hsu et al., 2020). Other studies used confederates as interaction partners and tested live emotional interactions (e.g., Vaughan and Lanzetta, 1980), but this strategy can lack rigorous control of confederates’ behaviors (Bavelas and Healing, 2013; Kuhlen and Brennan, 2013). Androids—that is, humanoid robots that exhibit appearances and behaviors that closely resemble those of humans (Ishiguro and Nishio, 2007)—could become an important tool for testing live face-to-face emotional interactions with rigorous control.

To implement emotional interaction in androids, the androids’ facial expressions must be carefully developed. Psychological studies have verified that facial expressions play a key role in transmitting information about emotional states in humans (Mehrabian, 1971). Studies of facial expressions developed methods for objectively evaluating facial actions (for a review, see Ekman, 1982), and the Facial Action Coding System (FACS; Ekman and Friesen, 1978; Ekman et al., 2002) is among the most refined of these methods. Based on observations of thousands of facial expressions in natural settings, together with a series of controlled psychological experiments, researchers identified the sets of facial action units (AUs) in the FACS corresponding to prototypical expressions of six basic emotions (Ekman and Friesen, 1975; Friesen and Ekman, 1983). For example, happy expressions involve an AU set consisting of the cheek raiser (AU 6) and lip corner puller (AU 12); surprised expressions involve the inner and outer brow raisers (AUs 1 and 2, respectively), the upper lid raiser (AU 5), and the jaw drop (AU 25). Numerous studies testing the recognition of photographs of facial expressions created based on this system verified that the expressions were recognized as the target emotional expressions above chance level across various cultures (e.g., Ekman and Friesen, 1971; for a review, see Ekman, 1993). Furthermore, the researchers described how the temporal aspects of dynamic emotional facial expressions are informative (Ekman and Friesen, 1975), which was supported by several subsequent experimental studies (for reviews, see Krumhuber et al., 2016; Dobs et al., 2018; Sato et al., 2019a). For example, Sato and Yoshikawa (2004) tested the naturalness ratings of dynamic changes in facial expressions and found that expressions that changed too slowly were generally rated as unnatural. Additionally, the effects of changing speeds differed across emotions, where fast and slow changes were regarded as relatively natural for surprised and sad expressions, respectively. Collectively, these psychological findings specify the spatial and temporal patterns of facial actions associated with facial expressions of emotions. Based on such findings, researchers have developed and validated novel research tools, including emotional facial expressions of virtual agents (Roesch et al., 2011; Krumhuber et al., 2012; Ochs et al., 2015). Virtual agents are promising tools to investigate emotional interactions with high ecological validity and control (Parsons, 2015; Pan and Hamilton, 2018). Androids may be comparably useful in this respect, and also have the unique advantage of being physically present (Li, 2015). If androids’ facial expressions can be developed and validated based on psychological evidence, they will constitute an important research tool for investigating emotional interactions.

However, although numerous studies have developed androids for emotional interactions (Kobayashi and Hara, 1993; Kobayashi et al., 2000; Minato et al., 2004, 2006, 2007; Weiguo et al., 2004; Ishihara et al., 2005; Matsui et al., 2005; Berns and Hirth, 2006; Blow et al., 2006; Hashimoto et al., 2006, 2008; Oh et al., 2006; Sakamoto et al., 2007; Lee et al., 2008; Takeno et al., 2008; Allison et al., 2009; Lin et al., 2009, 2016; Kaneko et al., 2010; Becker-Asano and Ishiguro, 2011; Ahn et al., 2012; Mazzei et al., 2012; Tadesse and Priya, 2012; Cheng et al., 2013; Habib et al., 2014; Yu et al., 2014; Asheber et al., 2016; Glas et al., 2016; Marcos et al., 2016; Faraj et al., 2021; Nakata et al., 2021; Table 1), few have empirically validated the androids that were developed. First, no study validated androids’ AUs coded using FACS (Ekman and Friesen, 1978; Ekman et al., 2002). Second, no study sufficiently demonstrated recognition of the six basic emotions conveyed by androids’ facial expressions. Many androids’ facial expressions were reportedly insufficiently developed to exhibit all six basic emotions (e.g., Minato et al., 2004). While several studies developed androids capable of exhibiting the six basic emotions, and recruited naïve participants to label the facial expressions, most did not statistically evaluate the accuracy (e.g., Kobayashi and Hara, 1993). One study conducted a statistical analysis that did not reveal significantly high level of recognition of disgust and fear (Berns and Hirth, 2006). Another study testing five basic emotions failed to observe better-than-chance recognition of fear (Becker-Asano and Ishiguro, 2011). Finally, no study systematically validated whether androids can show dynamic changes in facial expressions like humans. Only a few studies reported that incorporating the dynamic patterns of human facial expressions into an androids’ facial expressions led to high naturalness ratings of facial expressions during laughter (Ishi et al., 2019) and vocalized surprise (Ishi et al., 2017).

TABLE 1

Table 1. Summary of studies on androids’ emotional facial expressions.

To resolve the issues described above, we developed an android head, called Nikola, and validated its facial actions and emotional expressions. Nikola has 35 actuators, designed to implement AUs relevant to prototypical facial expressions based on psychological evidence (Ekman and Friesen, 1975, 1978; Friesen and Ekman, 1983; Ekman et al., 2002). The temporal patterns of the actions can be programmed at a resolution of milliseconds. We conducted a series of psychological studies to validate Nikola’s emotional facial expressions. In Study 1, we applied FACS coding to Nikola’s single AUs, which underlie appropriate emotional facial expressions. In Study 2, we evaluated emotional recognition accuracy based on the spatial patterns of Nikola’s emotional facial expressions through an emotion labeling task. In Study 3, we evaluated the temporal patterns of Nikola’s dynamic facial expressions through a naturalness rating task.

Study 1

Here, we used FACS coding for Nikola’s single facial actions. We expected that AUs specifically associated with the facial expressions corresponding to the six basic emotions to be produced.

Materials and Methods

Development of the Android

Nikola was developed for the purpose of studying emotional interaction with humans. Currently, only the head and neck are complete; the body parts are under construction. It is human-like in appearance, similar to a male human child; it resembles a child to promote natural interactions with both adults and children. It is about 28.5 cm high and weighs about 4.6 kg. It has 35 actuators: 29 for facial muscle actions, 3 for head movement (roll, pitch, and yaw rotation), and 3 for eyeball control (pan movements of the individual eyeballs and tilt movements of both eyeballs). The facial and head movements are driven by pneumatic (air) actuators, which create safe, silent, and human-like motions (Ishiguro and Nishio, 2007; Minato et al., 2007). The pneumatic actuators are controlled by an air pressure control valve. The entire surface, except for the back of the head, is covered in a soft silicone skin. Video cameras are mounted inside the left and right eyeballs. Nikola is not a stand-alone system; the control valves, air compressor, and computer for controlling the actuators and sensor information processing are external.

The facial muscle actuators’ locations were selected to produce as many AUs as possible, specifically those associated with emotional facial expressions (Ekman and Friesen, 1975, 1978; Friesen and Ekman, 1983; Ekman et al., 2002), together with the information provided by previously constructed androids (Minato et al., 2004, 2006, 2007; Matsui et al., 2005; Glas et al., 2016). Specifically, we designed Nikola to produce the following AUs corresponding to the emotional expressions associated with six basic emotions: 1 (inner brow raiser), 2 (outer brow raiser), 4 (brow lowerer), 5 (upper lid raiser), 6 (cheek raiser), 7 (lid tightener), 10 (upper lip raiser), 12 (lip corner puller), 15 (lip corner depressor), 20 (lip stretcher), 25 (lips part), and 26 (jaw drop). Although AUs 9 (nose wrinkler), 17 (chin raiser), and 23 (lip tightener) are reportedly relevant to prototypical facial expressions (Ekman and Friesen, 1975; Friesen and Ekman, 1983), these AUs were not implemented owing to the technical limitations of the silicone skin. AUs 14 (dimpler), 16 (lower lip depressor), 18 (lip pucker), 22 (lip funneler), and 43 (eyes closed) were also designed to implement other communication-related facial actions (e.g., speech and blinking).

Procedure

We programmed Nikola to exhibit AUs on an individual basis. A certified FACS coder scored the AUs from the neutral status to the action apex using FACS (Ekman et al., 2002). When the AU was detected, the coder evaluated it according to five discrete levels of intensity (A: trace, B: slight, C: marked/pronounced, D: severe, and E: extreme/maximum) according to FACS guidelines (Ekman et al., 2002). The coder could view the sequence repeatedly by adjusting the program settings. The Supplementary Material provides video clips of these AUs.

Results

The AUs produced by Nikola are illustrated in Figure 1, and the results of the FACS coding are presented in Table 2. Figure 1 demonstrates that Nikola is capable of performing each AU. It was difficult to distinguish between AUs 6 (cheek raiser) and 7 (lid tightener), but the eyes’ outer corners were slightly lowered in AU 6. The maximum intensity of the AUs ranged from A (e.g., AU 12) to E (e.g., AU 26).

FIGURE 1

Figure 1. Illustrations of the facial action units (AUs) produced by the android Nikola. For AU 25, AU 25 + 26 is shown.

TABLE 2

Table 2. Results of the Facial Action Coding System (FACS) coding of Nikola’s facial actions.

Discussion

Our results demonstrated that Nikola was capable of producing each AU based on manual FACS coding performed by a certified FACS coder. The results are consistent with several earlier studies’ findings that androids could exhibit AUs designed based on FACS (e.g., Kobayashi and Hara, 1993), but none of these studies involved evaluation by certified FACS coders. The coder found it difficult to differentiate AUs 6 (cheek raiser) and 7 (lid tightener). This is in line with earlier findings that androids struggled to replicate z-vector movements, including wrinkles and tension, compared with human expressions (Ishihara et al., 2021), owing to the physical constraints of artificial skin materials. The results of our intensity evaluation revealed that some AUs’ maximum intensities were not realized. This resulted from technical limitations, such as an insufficient number of actuators and skin materials. Collectively, the data suggest that Nikola can produce AUs associated with prototypical facial expressions, albeit with limited intensity.

Study 2

Next, we devised prototypical facial expressions for Nikola reflecting six basic emotions and asked naïve participants to label photographs of these expressions, as in earlier psychological studies using photographs of human facial expressions as stimuli (Sato et al., 2002, 2009; Kubota et al., 2003; Uono et al., 2011; Okada et al., 2015). Because earlier studies of human expression stimuli consistently demonstrated emotion recognition above the level of chance, as well as differences across emotions (such as lower recognition rates for angry, disgusted, and fearful expressions than happy, sad, and surprised expressions), we expected such patterns to be seen with respect to emotion recognition of Nikola’s facial expressions.