Taking Others' Perspectives Enhances Situation Awareness in the Smart Home Interface

Yu, Sanghyeong; Han, Kwanghee

doi:10.3389/fpsyg.2019.02761

ORIGINAL RESEARCH article

Front. Psychol. , 10 December 2019

Sec. Cognition

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.02761

Taking Others' Perspectives Enhances Situation Awareness in the Smart Home Interface

$\nSanghyeong Yu$ Sanghyeong Yu¹

Kwanghee Han²^*

¹Laboratory of Cognitive Engineering, Graduate Program in Cognitive Science, Yonsei University, Seoul, South Korea
²Laboratory of Cognitive Engineering, Department of Psychology, Yonsei University, Seoul, South Korea

In the smart home environment, all devices are connected to each other and are shared by co-users living together. This may make people's interactions with the devices more complicated, owing not only to the difficulty of meeting each co-user's tastes with respect to how the devices operate, but also to variations in the frequency of device use among family members. If so, it is inevitable that using multiple devices by multiple users can lead to difficulty maintaining situation awareness. Therefore, to relieve such interaction problems caused by the presence of co-users, we examined the effect of spontaneous visuospatial perspective taking on situation awareness with respect to the smart home interface. To this end, we measured whether merely the affordance of other users can elicit spontaneous visuospatial perspective-taking, replicating previous research. We also examined whether the affordances of other users can help enhance situation awareness in the mock-up smart home interface design we created. When participants adopted affordances of other users' perspectives, they could easily perceive the information about the devices. However, when they viewed the devices from other's perspective, their understanding of devices mainly used by the self-remained relatively low. Potential reasons for these findings are discussed along with proposals for future research.

Introduction

The emerging Internet of Things (IoT) requires us to think about human-computer interaction (HCI) from a new perspective. Traditional HCI research has long investigated how we design interactions using a single device (e.g., personal computer, mobile phone) by a single person. However, the IoT —a new type of interaction—is the interaction involving multiple users with multiple devices that are connected to each other (Ashton, 2009). Thus, we need to consider the relationships among many users and the context between the connected devices for design purposes (Cila et al., 2017; Cervantes-Solis, 2019). For example, in the smart home—which means that the IoT is installed in the house—we can control multiple devices simultaneously through a controller. Usually, the device is a mobile application, which helps users to understand the situation and to do their household or routine work (Jakobi et al., 2017). Specifically, Amazon Alexa—an AI speaker mobile application which is also a smart home controller—has the routine menu including features such as sleep mode, a situational mode that allows turning off the TV or dimming the lights at a specific time. In addition, Google Assistant—another AI speaker mobile application that is also a smart home controller—makes users identify the device's status and control the devices based on their room. As such, the emergence of the IoT may cause an interaction paradigm shift that would make traditional HCI disappear (Console et al., 2013; Rapp et al., 2019). Therefore, we need to cope with interaction problems that may arise owing to the difference between traditional HCI and IoT.

Above all, sharing across devices by co-users living together may cause a complicated interaction within the IoT. Even if they share the same devices, different users may have different needs with respect to when and how the devices work. For example, some users may want the light of a lamp currently beside them to be lime-colored. However, that lamp may have previously been programmed to turn blue at bedtime by other users. Likewise, the configuration of the devices and how the devices operate considering the daily routine—situational modes such as Home mode, Away mode, Wake-up mode, and Sleep mode—can differ across family members. Unfortunately, the smart home controller interface does not currently support variations across co-user's tastes or preferences.

Furthermore, the fact that the frequency of device use varies across family members may increase the complexity of the interaction. Each device's primary user is not always the same person—the primary user can be various family members. For example, even if I rarely use the vacuum cleaner, other family members might use the vacuum cleaner almost every day. However, sometimes we use the vacuum cleaner because it is common for family members to ask one another to do others' housework when they are busy or absent. As such, when using devices I seldom use, it can be difficult to understand what the status of the device is and which functions are reserved. In other words, it may be hard to maintain situational awareness (Endsley, 1995) about the smart home controller interface when using infrequently used devices. According to Endsley (1995), situational awareness is the understanding of dynamic system interfaces. Specifically, situation awareness follows three steps: Level 1 is the perception of the interface elements, Level 2 is the understanding of the interface elements, and Level 3 is the projection based on the understanding of the interface. Returning to the fictitious case of the vacuum cleaner I rarely use, it can be hard to understand where the vacuum cleaner is located (i.e., Level 1), what the status of the vacuum cleaner is (i.e., Level 2), and how the vacuum cleaner will be operated in the future (i.e., Level 3).

Such interaction problems caused by co-users could be solved by understanding affordances in the smart home interface. Affordances refer not only to the interpretation of an object or environment (Norman, 1999) but also to inducing a user's behavior in pursuit of a shared intention and goal through understanding the interaction (Baber, 2018). An advantage of this affordance is that it can make users automatically and tacitly perform certain behaviors with little mental effort (Grgic et al., 2016). If we design optimized affordances for the smart home environment, it would help users to better encode and retrieve information about smart home interactions. For example, even if the device is not frequently used by some users, those users could take advantage of high situation awareness through intuitive affordances of co-users—such as icons—in the smart home interface. The affordances of co-users can be effective for helping users think about interactions from the perspective of those users.

We sometimes look at or think of an object from others' perspectives owing only to the mere presence of those others. According to Tversky and Hard (2009), it is natural to take an egocentric reference frame, which refers to viewing the world through the perspective of the self. However, numerous recent studies have revealed that we adopt the visuospatial perspective of others when sharing physical space with others (Tversky and Hard, 2009; Kockler et al., 2010; Freundlieb et al., 2016; Furlanetto et al., 2016; Cavallo et al., 2017; Quesque et al., 2018). For example, we may explain the location of an object based on the perspective of another person sitting across from you by referring it to it as “the apple on your left” instead of “the apple on my right” (Cavallo et al., 2017). This is called the spontaneous visuospatial perspective taking (i.e., VSP taking; Tversky and Hard, 2009; Freundlieb et al., 2016, 2018; Cavallo et al., 2017). This VSP taking is divided into two levels. Contrary to VSP taking level 1 that refers to whether other people can see an object or not (Flavell et al., 1981; Samson et al., 2010; Furlanetto et al., 2016) VSP taking level 2 focuses on how an object is shown from others' point of view, as mentioned above. Recent findings show that such VSP taking level 2 may not only be limited to the physical realm but may also extend to mental activity such as word reading (Freundlieb et al., 2018). According to Freundlieb et al. (2018), this propensity to adopt other peoples' VSPs can help to create shared meaning and facilitate information processing. Thus, if we adopt another person's perspective, we may perceive and understand the smart home interface—especially devices we rarely use—as if walking in others' shoes.

Therefore, we examined whether the affordance of other users' perspectives can enhance situational awareness about the smart home interface in two experiments. In Experiment 1, we aimed to measure whether spontaneous VSP taking can occur solely through the affordance of other users. In Experiment 2, we examined whether the affordance of other users' perspectives not only causes spontaneous VSP taking but enhances situation awareness in a mock-up design of a smart home interface.

To this end, we adopted a previously used paradigm showing VSP taking in mental space (Freundlieb et al., 2018). In that experiment, participants categorized a word from two categories (i.e., animals and vegetables/fruits) that was always displayed vertically—rotated 90 degrees—from the perspective of a participant (i.e., self-perspective), as shown in Figure 1. From the perspective of others, the word was placed at 0 degrees (i.e., congruent condition) or rotated 180 degrees (i.e., incongruent condition). According to Aretz and Wickens (1992), angular disparity can make it take longer to read a word. Therefore, it may take the least amount of time to read words at 0 degrees (i.e., congruent condition), and then 90 degrees, and finally 180 degrees (i.e., incongruent condition) owing to the discrepancy of the mental rotation. As such, if participants take their own perspective, there will be no difference in time to read between the congruent and incongruent condition; however, if they adopt others' perspectives, it takes less time to read a word in the congruent condition and more time in the incongruent condition. If a participant takes the perspective of others, they should categorize the words faster and more accurately in the congruent condition as compared to their own perspective. As mentioned previously, that result can be explained by congruency effects where the congruent condition—a match between the other's perspective and the word direction—leads to better performance than the incongruent condition.

FIGURE 1

Figure 1. Stimulus in Experiment 1. A stimulus word “TV” rotated 0 degrees from the other's perspective in congruent condition (Left) and rotated 180 degrees from the affordance of the other' perspective in incongruent condition (Right).

In Experiment 1, the previous experiment (Freundlieb et al., 2018) was modified to make it more suitable for a smart home environment. While the original experiment used stimulus words from two categories (animals and vegetables/fruits), we used a word list referring to smart home devices. The words were divided into two categories (self and other user) depending on the primary user of the device. In addition, unlike the previous experiment with an actual person (i.e., confederate) in physical space, the current study created a virtual shared physical context (stimulus image and mock-up interface) by displaying the affordance of other users, affordance of the self, and words (i.e., smart devices) on the computer screen (as shown in Figure 1). If this experiment replicated the results of the previous experiment, participants may categorize more quickly and accurately in the congruent condition where the affordance of another user's perspective and the words match.

In Experiment 2, we designed the mock-up application for the smart home to make the experiment environment more similar to actual smart home context. On the top of the mock-up design of the smart home interface were same stimulus images as in Experiment 1, including the affordance of the other user, affordance of the self, and the device words. On the bottom of that smart home interface was the status of devices and further information of situational modes. We assumed that if VSP taking occurred in the smart home interface, the affordance of the other user would help participants maintain high situational awareness.

Experiment 1

Materials and Methods

Participants

Sixty-four undergraduates from Yonsei University participated in this experiment (aged 21–26 years, M_age = 22.24 years, SD_age = 1.78 years; 32 women). All participants received course credit as compensation. Participants provided written consent, and all procedures were conducted in accordance with Code of Ethics of the World Medical Association (Declaration of Helsinki). Experimental procedures were approved by the Institutional Review Board of Yonsei University.

An a priori power analysis (Faul et al., 2007) using an effect size of f = 0.2, α = 0.05, and 1 – β = 0.95 indicated that data from 55 subjects should be collected. We recruited nine additional participants to prepare for outliers exceeding 2 SD from the mean reaction time.

Design

A repeated-measures analysis of variance (ANOVA) was selected as the analysis technique. The independent variables were the congruency between the affordance of the other user's perspective and the word (2: congruent, incongruent) and the primary user-specific devices (2: self, other). The dependent variable was the time it took to categorize the smart home device words accurately depending on the primary user of the device.

Task

The current study was based on a previous experiment that revealed that VSP taking occurs in mental space (Freundlieb et al., 2018). In Phase 1, participants were instructed to memorize the 16 smart home device words (see Table 1), which were divided into two categories depending on the primary user (e.g., self, other) who frequently uses that device. Among them, eight words referring to the smart home devices were frequently used by the self and eight were mainly used by the other. Participants were asked to memorize the 16 devices to focus on who is the primary user.

TABLE 1

Table 1. List of primary user-specific smart home devices in Experiment 1.

In Phase 2, there were evaluated on whether they memorized the smart home devices and the primary users. After a word referring to the smart home device (e.g., TV, lamp, and robot vacuum cleaner) appeared for 1,500 ms on the computer screen, the following screen displayed a word indicating the primary user (e.g., self or other). If the user displayed on the screen is the primary user of that device, participants were instructed to press the Yes button. If not, they should press the No button. This screen disappeared when users answered.

Given that categorizing those smart home devices into self or other categorizes can be more difficult than those in previous experiment. Because the stimulus words of current study are about smart home devices, which could be put in the same category. Therefore, we offer participants time to memorize in phase 1 and phase 2. Participants memorized the written list of smart home devices word on paper in phase 1 and they could check whether their response to who the device's primary user is correct through the computer screen in phase 2. The data of participants with a memory accuracy of <90% will be excluded from the analysis.

Last, in Phase 3, participants should respond with respect to who is the primary user of the smart home device just after reading a word related to the smart home device on the computer screen. The stimulus image included the affordance of the self, affordance of the other, and a device word. A stimulus word was always displayed rotated 90 degrees from the affordance of the self's perspective in all conditions; however, from the affordance of the other's perspective, device words rotated 0 degrees were in the congruent condition, and words rotated 180 degrees were in the incongruent condition.

Procedure

Participants were requested to memorize the list of primary user-specific smart home devices for 10 min in Phase 1 (see Table 2). In Phase 2, participants viewed smart home device words on the computer screen for 1,500 ms. After this screen, participants read a primary user word (self or other) located in the center of the slide until a response was made.

TABLE 2

Table 2. Procedures of Experiment 1.

All participants were asked to respond to whether the user (i.e., self or other) appearing on the screen was the primary user of the device written on the previous screen. If the user is the primary user of the device, participants should press “1” with sticker “Y” on it. If the user is not the primary user, they should press “0” with sticker “N” on it. A smart home device word appeared in 48 trials, consisting of three repetitions of 16 trials.

In Phase 3, the stimulus image (see Figure 1) was a smart home device word in the middle of two affordances, and participants were requested to specify the primary user of that smart home device. The participants were told that the human agent is a human figure that face to the device word. If participants thought that they were the primary user of that smart home device, they pressed “g” with sticker “ME.” If they thought the primary user was someone else, they pressed “j” with sticker “FAMILY.” To match the meaning of the image and the stickers, we used the “ME” and “FAMILY” stickers instead self and other. There were 128 trials in total: 64 in the congruent condition and 64 in the incongruent condition.

Result and Discussion

All participants showed >90% accuracy in the memory task in Phase 2. Therefore, all participants' data were used in the analysis except for the data (2.89%) of error trials and reaction times (RTs) more than 2 SDs (4.69%) from each participant's condition mean.

We conducted a 2 (congruency with other's perspective: congruent, incongruent) × 2 (primary user-specific device: self, other) repeated-measures ANOVA for the results in Phase 3 (see Table 3). There was no significant difference [F_{(1, 63)} = 0.011, p = 0.918, η²p = 0.000] between the congruent (M = 901.234, SD = 146.936) and incongruent condition (M = 902.675, SD = 137.767). However, there was a main effect for the primary user-specific device condition [F_{(1, 63)} = 10.458, p = 0.002, η²p = 0.142]. When the primary user of that stimulus device word was the other user (M = 921.449, SD = 141.091), it took longer to judge who is the primary user of that device than it did in the self-user condition (M = 883.788, SD = 155.160).

TABLE 3

Table 3. A 2 (congruency with others' perspective: congruent, incongruent) × 2 (primary user-specific device: self, other) repeated-measures ANOVA in Experiment 1.

In addition, there was no interaction between congruency and primary user-specific device [F_{(1, 63)} = 1.045, p = 0.311, η²p = 0.016; see Figure 2]. Specifically, the mean reaction time was lowest for the self-user specific device condition and congruent condition (M = 879.684, SD = 163.971), the self-user specific device condition and incongruent condition (M = 887.598, SD = 154.805), the other-user specific device condition and incongruent condition (M = 917.527, SD = 142.989), and finally, the other-user specific device condition and congruent condition (M = 924.306, SD = 145.455).

FIGURE 2

Figure 2. The results of mean reaction time (± SE) it took to categorize the smart home device words accurately as the congruency between affordance of the other user's perspective and the word (2: congruent, incongruent) and the primary user-specific devices (2: self, other) in Experiment 1.

In conclusion, the result of Experiment 1 did not replicate the previous experiment (Freundlieb et al., 2018). We only found a main effect of the primary user-specific device (i.e., category effect). In that previous experiment, there was no category effect—that is, there was no difference in mean reaction time between the two categories. However, the results of Experiment 1 revealed a category effect according to who is the primary user of the smart home devices (self or other).

The category effect may have emerged here because the stimulus words (i.e., the smart home devices) in the current experiment were different from the stimulus words (i.e., the animals and vegetables/fruits) used previously. The stimulus words referring to smart home devices were more related to the self and other people as compared to animals and vegetables/fruits. Animals and vegetables/fruits are independent of people (i.e., self and others); however, in Experiment 1, both stimulus words and affordances divided into self and other were related to people. Thus, rather than eliciting a spontaneous VSP taking phenomenon, participants might have engaged in more elaborative cognitive processing to recognize and classify the self and others upon reading the stimulus words.

In addition, the way of presenting the human agent may also have affected the category effect. While there was an actual human (i.e., confederate) in the previous experiment, there was a human figure (i.e., affordance of self and other) on display in the current experiment. We initially expected that this difference would not affect the VSP taking phenomenon because this phenomenon also occurred in non-human agent features such as an arrow and triangle. However, given that the congruency effect—reading stimulus words using the affordance of other's perspectives— did not occur in Experiment 1, the human figure that looks like a human but is not a real person might be not suitable to induce VSP taking than an arrow or triangle.

Otherwise, these results might be due to participants' memory strategy in which they memorize self-user specific device more intensively. Participants categorized self-user specific device faster than other-user specific devices regardless of whether they were in the congruent or incongruent condition. If participants use a strategy of memorizing one of the two categories (e.g., self and other), the majority may have chosen their own category than others.

However, above all, we thought the congruency effect may appear if the experimental environment better approximates the actual smart home interface. This is because the simplicity of the experimental environment in Experiment 1 could make participants focus more on stimulus words. The stimulus words may have been too salient to adopt others' perspective and this salient stimulus words may contribute to executing participants' memory strategy rather than VSP taking. This saliency of stimulus words may be relieved by other important information which is needed to be processed in an actual smart home interface. Therefore, we thought that participants may not be affected by stimulus words such as the primary user-specific device, if we made a complicated experimental environment similar to an actual smart home interface.

Experiment 2

We examined whether participants adopted the affordance of others' perspectives in an experimental environment similar to a real smart home. To this end, we designed a mock smart home mobile application interface. In that mock-up design interface, there is information about the smart home devices such as location, status, and situational reservation functions (i.e., situational mode).

All the participants completed the Situation Awareness Global Assessment Technique (SAGAT) (Endsley, 1995, 2017; Scholtz et al., 2005) after watching the mock-up of the smart home mobile application interface. We assumed that if participants adopted the affordance of the other user's perspective, their situation awareness may be enhanced in the congruent condition (i.e., a matching between the affordance of the other user's perspective and direction of stimulus words).