Judging the emotional states of customer service staff in the workplace: A multimodal dataset analysis

Liu, Ping; Zhang, Yi; Xiong, Ziyue; Wang, Yijie; Qing, Linbo

doi:10.3389/fpsyg.2022.1001885

ORIGINAL RESEARCH article

Front. Psychol., 11 November 2022

Sec. Emotion Science

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.1001885

Judging the emotional states of customer service staff in the workplace: A multimodal dataset analysis

Ping Liu¹

Yi Zhang^1*

Ziyue Xiong¹

Yijie Wang²

Linbo Qing³

¹School of Business, Sichuan University, Chengdu, China
²School of Business and Tourism Management, Yunnan University, Kunming, China
³School of Electronic and Information Engineering, Sichuan University, Chengdu, China

Background: Emotions play a decisive and central role in the workplace, especially in the service-oriented enterprises. Due to the highly participatory and interactive nature of the service process, employees’ emotions are usually highly volatile during the service delivery process, which can have a negative impact on business performance. Therefore, it is important to effectively judge the emotional states of customer service staff.

Methods: We collected data on real-life work situations of call center employees in a large company. Three consecutive studies were conducted: first, the emotional states of 29 customer service staff were videotaped by wide-angle cameras. In Study 1, we constructed scoring criteria and auxiliary tools of picture-type scales through a free association test. In Study 2, two groups of experts were invited to evaluate the emotional states of customer service staff. In Study 3, based on the results in Study 2 and a multimodal emotional recognition method, a multimodal dataset was constructed to explore how each modality conveys the emotions of customer service staff in workplace.

Results: Through the scoring by 2 groups of experts and 1 group of volunteers, we first developed a set of scoring criteria and picture-type scales with the combination of SAM scale for judging the emotional state of customer service staff. Then we constructed 99 (out of 297) sets of stable multimodal emotion datasets. Based on the comparison among the datasets, we found that voice conveys emotional valence in the workplace more significantly, and that facial expressions have more prominant connection with emotional arousal.

Conclusion: Theoretically, this study enriches the way in which emotion data is collected and can provide a basis for the subsequent development of multimodal emotional datasets. Practically, it can provide guidance for the effective judgment of employee emotions in the workplace.

Introduction

The attributes of emotion, such as intangibility, high intensity, and contagiousness, highlight the importance of recognizing and managing employees’ emotions in the workplace (Camacho et al., 1991; Liu et al., 2019). Workplace emotion refers to a subjective experience that comes from an individual’s physiological arousal evoked by workplace stimuli (Jordi et al., 2015; Rueff-Lopes et al., 2015; Liu et al., 2019). A recent IMF’s World Trade Statistical Review (July 2021) projected that the worldwide service trade had dropped by 16% in 2020 while online services had risen by 9% in 2021, with even greater growth expected in some countries. Given the high-interaction and high-participation nature of service industries (Zemke and Bell, 1990; Rueff-Lopes et al., 2015; Liu et al., 2019), service staff are usually required to manage their emotions, and only demonstrate those emotions allowed by organizational policies. This phenomenon is often interpreted as emotional labor (Hochschild, 1979; Hatfield et al., 1993; Grandey and Melloy, 2017). Providing services that evoke emotional labor can increase employees’ workload (Destephe et al., 2015), aggravate work pressure, and cause job burnout (Baeriswy et al., 2021; Schabram and Tseheng, 2022), ultimately leading to emotional disorders (Farchione et al., 2012) and poor enterprise performance (Rueff-Lopes et al., 2015; Liu et al., 2019). Therefore, it is of vital importance for enterprises, especially those in the service industry, to acknowledge and effectively manage emotions in the workplace.

In the modern service industry, the ways of delivering service to customers have changed (Jeremy et al., 2005; Liu et al., 2019). Voice-to-voice communication has gradually become a prevailing method for service enterprises to attend to customers’ needs (Rueff-Lopes et al., 2015; Sparks et al., 2015). Meanwhile, the application of 5G and other digital technology is enriching the ways of online services, including online healthcare, online shopping, and telecommuting (Khan and Zhang, 2007; He et al., 2013). Under these circumstances, voice-to-voice interaction between service employees and customers has become a key element of service delivery (Barry and Crant, 2000; Goldberg and Grandey, 2007; Rueff-Lopes et al., 2015). Some scholars have found that negative externalities, such as business bankruptcies and prolonged isolation (Bherwani et al., 2020), will led to rising stress and workloads (Wheaton et al., 2021). Thus, in the process of providing service, employees are more likely to be susceptible to negative emotions, which threatens the results of the voice-to-voice service delivery. Therefore, judging the employees’ emotional states in the workplace, it is of great practical significance for managers.

To date, although there has been an increase in studies of employees’ emotions, there are shortcomings in the literature. First, in terms of the research methods, to achieve statistics on individual emotions, most projects adopt the paradigms of case studies applying stimulus materials (Michel, 2001; Du and Fan, 2007) or self-report scales, such as PANAS, PAD, SDS, etc. (Bradley et al., 2001; Li et al., 2005; Kuesten et al., 2014). While these approaches are reasonable, they overlook the influence of situational and contextual elements on individual emotions to some extent. Further, it is hard to avoid common method bias with case stimulus and questionnaires (Jordan and Troth, 2019). Second, most of the research results are presented in the form of discrete emotions and thereby lack the characteristics of real situations (Ekman, 1994; Gao et al., 2019). Because individual emotions at a specific point in time are a complex combination of discrete emotions, the practical guidance offered by existing studies is relatively weak. Third, some research groups have come to realize that single-modal emotion measurement cannot accurately identify the individual emotional state, and that emotion recognition needs to be treated as a multi-modal problem in the research field of OB (organization behavior) and psychology (Balconi and Fronda, 2021; Guedes et al., 2022; Zhao et al., 2022). This is because human emotions are relatively rich and complex in terms of expression (Lackovi, 2018; Liu et al., 2019). For example, a sentence may contain multiple, even conflicting emotions; a positively worded sentence, for instance, may express sarcasm (Gao et al., 2019). Therefore, it is necessary to analyze the emotional states of employees in workplaces from a multi-modal perspective, so as to enrich the methods of organization behavior and psychological research.

This study aims to find out which emotional modality can most accurately convey the emotions of customer service staff in the workplace and how to do so. Based on the multimodal emotion recognition method (Lahat et al., 2015; Baltrušaitis et al., 2018; Ethriaj and Isaac, 2019), emotions can be divided into three fundamental modalities: body language, voice, and facial expression. This multimodal classification features both visual and auditory channels of an individual’s physiology (Wang et al., 2019). Although data of each modality can convey the emotional state of customer service in a workplace, in this article we mainly focus on comparing the differences of modalities that have a high level of practical relevance. The innovations of this study are as follows. First, from the perspective of situational embedding, the research team observed the work emotions of customer service in a real service-oriented enterprise. Second, in terms of the experimental research paradigm, this paper constructed a multimodal data set, compared the heterogeneity of different emotional modalities, and extracted the key elements of the emotional states of customer services employees. Third, this paper summarized the theoretical and practical significance of managing service employees’ emotional states and suggested future directions for this research filed.

Theoretical background

Workplace emotions

Emotions usually reflect people’s attitudes toward objective things or situations (Stanger et al., 2017). Emotions are short-lived and high-intense responses that develop automatically when an organism is stimulated by an external irritant (Venkatraman et al., 2017). In academia, there are several different views on the understanding of emotion. The biological view holds that emotions arise from the nervous system, and they are a product of evolution of living creatures (Sun et al., 2016). The functionalist perspective believes that emotions evaluate a particular environment, and are specific mental activities produced by the individual in response to stimuli from personally meaningful events (Boiten et al., 1994; Barret, 1998; Stanger et al., 2016). Campos et al. (1989) defined emotion as “not just feelings, but rather the process of maintaining, disrupting or maintaining the relationship between an organism and environment, when such relationship has implication to the person.” The organizational perspective holds that emotions are a kind of “imitation-response” mechanism, which is generated by individual’s interaction with their environment (Sroufe, 1996; Lewis, 1998). The socio-cultural perspective holds that emotions are a profound psychological and physical experience, mediated by a variety of social and cultural factors (Zhang and Lu, 2013).

At present, the interpretation of emotion continues to be debated in the academic community, but there are several common features. First, emotion is a physiological and psychological state, including subjective experience, behavioral expressions, and peripheral physiological responses (Nitsche et al., 2012). Second, emotion is typically triggered by a specific reason or situation with short duration and high intensity (Camacho et al., 1991; Ekman, 1994; Gao et al., 2019). Third, emotion is responsive to the external environment and manifests as a form of experience (Perveen et al., 2020). Fourth, emotion has two attributes: biological and cultural. Individuals who grow up in different cultures may have different ways of expressing the complexity of emotions.

In the context of this research, workplace emotion in this article refers to an emotional experience that is felt by customer service staff during the service process. It is a physiological response to internal and external stimuli, with the characteristics of short duration and high intensity.

Multimodal emotion recognition method

Emotion recognition is a dynamic process that aims to identify the emotional states of individuals (Jess et al., 2020). The classification of emotions falls into two schools of thought: categorical and dimensional. The categorical approach views that emotions can be summarized in terms of basic emotions such as joy, anger, sadness, fear etc. (Ekman, 1994). However, the dimensional approach argues that people often have difficulty when evaluating, distinguishing, and describing their emotions. That is to say, emotions are more like a blurred set of conceptions that blend with each other than a discrete system. Therefore, the theory of dimensional emotion has been favored in academic circles. After in-deep study of pleasure-arousal-dominance (PAD) (Mehrabian and Russell, 1974). Russell (1980) pointed out a structure model of affective experience, he believed that various emotions were not separate categories, but have certain values in the two dimensions of valence (pleasure) and arousal (alertness). So, valence is a unpleasant-pleasant experience, a process from one extreme to the origin of coordinates and then to the other extreme (e.g., from distress to ecstasy), and the arousal is the feeling of vitality or energy, such as the progression from drowsiness, relaxation and alertness to excitement (Russell, 1980; Waston and Tellegen, 1985). Therefore, according to the level of emotional experience, and the degree of energy and vitality respectively, we divided the emotional valence and arousal into three levels, namely, valence (negative, neutral, positive), arousal (low, medium, high).

Modality is a representable, objective social symbol system that is an important vehicle for emotional signaling (Ethriaj and Isaac, 2019; Gao et al., 2019). There are two major views on the interpretation of modality. One sees modality as the form of data representation, in which text, video, image, and sound are separate modalities (Hong and Tam, 2006; Baltrušaitis et al., 2018; Balconi and Fronda, 2021). The other views modality as the mechanism of data collection, whether through self-report scales or electrophysiological equipment (Wang et al., 2022). Multimodal emotional recognition is more accurate than traditional single-modal emotional recognition, because information if integrated from different modalities (Lahat et al., 2015; Yoon et al., 2015; Wang et al., 2019). Scholars studying emotions have found that single-modal data of emotion are prone to greater errors in emotional recognition than multi-modal data. For instance, Battaglia (2010) and Lambercht et al. (2012) found that there were significant errors in recognizing the facial expressions of anger and fear in subjects of different ages, which suggests that single-modal emotion recognition results are susceptible to the influence of the subject’s age. Aviezer et al. (2012) and Wang et al. (2021) researched online images and constructed a mixed stimulation material (including limbs, expressions, and gestures). Their results consistently show that limbs are more accurately convey the tennis players’ emotional statues. Therefore, compared to multimodal emotion recognition, we propose that single-model recognition is susceptible to subjective and objective factors, resulting in distorted judgment.

In summary, multimodal data sets have a diversity of data representations (such as visual and auditory) and are collected in at least two different channels (such as self-report scale or electrophysiological equipment). This study constructs a multimodal data set to explore the differences of each modality (body language, voice, and facial expression) in conveying the emotions of customer service staff. Further, this study provides recommendations for effectively identifying and managing customer service sentiment in the workplace.

Overview of the study

Our research follows an experimental research approach. We collected the emotional data of 29 customer service staff from March 15 to March 30, 2021. Three distinct sample groups were recruited to evaluate the emotions of the staff (Expert 1: doctors in organizational behavior and psychology; Expert 2: doctors in computer image emotion recognition; Volunteers: a social group recruited online). The research framework of this study is shown in Figure 1.

FIGURE 1

Figure 1. Research framework.

This study was field at the Ethics Committee of Sichuan University, China, No. KS2022984 (for detailed information of the submission to Ethics Committee, view)¹. The customer service staff signed the agreement to participate in this study, allowing the research team to record both their physical and psychological emotional data, and authorizing the research team to use their personal information in academic papers after proper concealment, but not allowing for any type of commercial utilization of their information.

Emotional data collection and pre-processing

We collected the customer services staff’s emotional data in the workplace. We chose the customer service staff from a large decoration company’s call center, headquartered in Chengdu, China. The company’s call center has two types of service mode: pre-sales and after-sales, which covers most types of emotional situations that exist during the service process between the customer service staff and customer.

Sample

We recruited 29 full-time participants for our study. On average, the customer service staff were 27.556 (SD = 4.853) years old and had worked at this company for 2.211 years (SD = 2.266). Among the participants, 79.31% and 20.69% of the customer service staff identified as female and male respectively. About four-fifth (86.21%) of the employees held nonsupervisory positions and had at least a college degree (93.11%).

Procedure

Data collection

Our data acquisition method was to set up wide-angle cameras (device type: Aigo DSJ-T5) on the customer service staff’s workstations (see Figure 2A). The customer service staff were informed by their supervisor that they needed to cooperate with the collection of their emotional states for two weeks (from 9:00 am to 18:00 pm every day from March 15 to March 31, in 2021). The schematic of data collection is shown in Figure 2.

FIGURE 2

Figure 2. Data acquisition schematic. (A) Data collection diagram. (B) Results presentation (partial).

During the data acquisition process, in order to eliminate the resistance of employees as much as possible, the research team’s members promised each customer service staff member that their original emotional data would not be directly submitted to the company, and never be used to evaluate their job performance. More importantly, this research would not have any negative influence on their vocational development in the company. After data collection, the research team gave generous remuneration to the employees who successfully finished the experiment.

Data pre-processing

In order to explore the genuine emotional state of customer service staff, each phone call was set as a research unit in this research (Rueff-Lopes et al., 2015), and a pre-processing of the original video data was conducted. The standards of pre-processing were as follows:

(a) Keep the phone call records ∈ [0.5,5] miniutes;

(b) Intercepted manually from 15 seconds before the call was answered to 15 seconds after the call was hung up.

The reasons were that if the calling time was too short (0–30 s), the phone call usually contained limited information. Calls of this short time were normally hung up by the customers. However, if the calling time was too long (>5 min), it suggested that the customer was interested in the company’s business service, and the emotion of customer service staff was usually positive. At the same time, in regard to employees’ emotional labor, Rueff-Lopes et al. (2015) pointed out that customer service staff would unconsciously show their real emotions about the customer before and after the phone calls. Therefore, ±15 s calls were reserved to reflect the genuine emotions expressed by the customer service staff.

Results

We collected close to 10,240 GB of raw data. During the data collection process, two employees showed subjectively confrontational behavior, and another two employees resigned from this company for personal reasons. Thus, in order to ensure continuity of data, we excluded the relevant data of these four employees in the data analysis process. In all, the total amount of original data was about 9,215 GB (approximately 89.99% of the raw data). Of the eligible 25 participants, 19 were female (76%). The participants’ average age was 27.556 (SD = 4.853) years old and average organizational tenure was about 2.214 years (SD = 5.566).

To avoid the influence of disturbing elements, the acquired data were transcoded and unified by the Adobe Premiere Pro 2021. The resolution of data was adjusted to 960*540; the frame rate of the picture was adjusted to 30 frames; the sampling frequency of sound was adjusted to 48.0 kHz; all the data was saved as MP4 files.

The total duration of calls within 15 days was calculated as 154 h. Rueff-Lopes et al. (2015) had two trained researchers listen randomly to 967 live phone calls to narrow down the research material from 8,747 calls. Learning from Rueff’s method, 561 call records were selected by research team members as experimental samples from the original data. The data pre-processing results are shown in Table 1.

TABLE 1

Table 1. Emotional dimensions for 561 phone call records.

Discussion

This section provides solid data for our further research. Traditional studies are mainly based on external stimulation and arousal of the subjects’ emotion, rather than unconscious emotional expression. Through the collection of customer service emotional data in the workplace, our research can circumvent the inherent shortcomings of previous research methods (Du and Fan, 2007; Liu and Li, 2017), improving the authenticity and reliability of the results.

As can be seen from the emotional valence in Table 1, the amount of negative emotions is much greater than neutral and positive emotions. This indicate that employees are more susceptible to customers’ negative emotions in their service delivery and thus show emotional convergence. In terms of emotional arousal, service staff are less likely to show over excited or excessively negative emotions due to the objective conditions in the workplace (e.g., workstation environment, company’s rules and regulations, etc.). The results in Table 1 are consistent with actual situation.

Study 1: Free association test of customer service staff’s emotional representations

In Study 1, two groups of doctoral experts were recruited to engage in two consecutive tasks. Through a free association test (Lei et al., 2019), we provided a relatively comprehensive word pool of emotions for image selection (Task 1), which then helped us construct a scientific scoring criteria/tool for subsequent research (Task 2).

Participants

We recruited 20 Ph.D. students who came from business school and school of electronics and information engineering, Sichuan University for our scoring criteria research. Participants’ average age was 26.2 (SD = 0.618). They all had normal or corrected-to-normal vision, and were right-handed. Fifty percent of participants’ research fields were identified as computer image emotion recognition, and the remaining individual research fields were organizational behavior and psychology. Further, there were differences in students’ research mindset for computer emotional recognition and organizational behavior and psychology, which provided reasonable scientific suggestions on the construction of scoring standards from different areas of study. All participants recruited had a minimum of two years’ research experience in emotional recognition. These two groups are abbreviated as: Expert 1: OB&Psy and Expert 2: CE&Rec.