Case Report: Women, Be Aware that Your Vocal Charisma can Dwindle in Remote Meetings

Siegert, Ingo; Niebuhr, Oliver

doi:10.3389/fcomm.2020.611555

BRIEF RESEARCH REPORT article

Front. Commun., 25 January 2021

Sec. Media Governance and the Public Sphere

Volume 5 - 2020 | https://doi.org/10.3389/fcomm.2020.611555

This article is part of the Research TopicThe Age of Mass Deception: Manipulation and Control in Digital CommunicationsView all 7 articles

Case Report: Women, Be Aware that Your Vocal Charisma can Dwindle in Remote Meetings

Ingo Siegert¹*

Oliver Niebuhr²

¹Mobile Dialog Systems, Institute for Information Technology and Communication, Otto von Guericke University Magdeburg, Magdeburg, Germany
²Centre for Industrial Electronics, University of Southern Denmark,, Sonderborg, Denmark

Remote meetings via Zoom, Skype, or Teams limit the range and richness of nonverbal communication signals. Not just because of the typically sub-optimal light, posture, and gaze conditions, but also because of the reduced speaker visibility. Consequently, the speaker’s voice becomes immensely important, especially when it comes to being persuasive and conveying charismatic attributes. However, to offer a reliable service and limit the transmission bandwidth, remote meeting tools heavily rely on signal compression. It has never been analyzed how this compression affects a speaker’s persuasive and overall charismatic impact. Our study addresses this gap for the audio signal. A perception experiment was carried out in which listeners rated short stimulus utterances with systematically varied compression rates and techniques. The scalar ratings concerned a set of charismatic speaker attributes. Results show that the applied audio compression significantly influences the assessment of a speaker’s charismatic impact and that, particularly female speakers seem to be systematically disadvantaged by audio compression rates and techniques. Their charismatic impact decreases over a larger range of different codecs; and this decrease is additionally also more strongly pronounced than for male speakers. We discuss these findings with respect to two possible explanations. The first explanation is signal-based: audio compression codecs could be generally optimized for male speech and, thus, degrade female speech more (particularly in terms of charisma-associated features). Alternatively, the explanation is in the ears of the listeners who are less forgiving of signal degradation when rating female speakers’ charisma.

1 Introduction

According to the definition of charisma by (Michalsky, 2019), the power of charismatic speakers originates from three key abilities: The ability to evoke trust by conveying competence, the ability to evoke inspiration by conveying passion, and the ability to evoke motivation by conveying self-confidence. Thus, in listener ratings of speakers, the concept of charisma is—as in the present study—highly correlated with attributes like trustworthy, persuasive, and decided, as well as likeable, enthusiastic, stimulating, and visionary, see (Rosenberg and Hirschberg, 2009; Niebuhr and Wrzeczsz, 2019; Brem and Niebuhr, 2020).

Perceived charisma is not only a fascinating research subject. It also gives speakers many practical advantages in their everyday life. Research shows that charismatic speech makes listeners more attentive and, in addition, increases the willingness of, for example, employees or students to learn, act, and work in an effective and committed manner (Antonakis et al., 2011; Lee et al., 2014; Towler et al., 2014). Creativity workshops, which are led by more charismatic moderators, end with a significantly better and qualitatively higher idea output (Bachsleitner, 2018) and, thus, contribute to the competitiveness and innovative strength of companies or societies.

The overall impression of a charismatic speaker is the result of a complex interplay of many parameters. They range from age and gender (Jokisch et al., 2018) through clothing (Brem and Niebuhr, 2020) and choice of words (Antonakis et al., 2011) to body language and voice (Scherer et al., 2012; Wörtwein et al., 2015; Caspi et al., 2019). The voice seems to play a particularly important role in this interplay. Experimental studies show that the voice not only allows predicting significantly (with 70–80 percent correctness) which idea presentation in startup contests will receive an investment and which will not. Voice analyses can also indicate how high the investment will be (Schweißfurth et al., 2020). Implemented in robots, voices with more charismatic parameter settings make listeners fill out longer questionnaires, book certain travel destinations, and even take detours by car (Fischer et al., 2019; Niebuhr and Michalsky, 2019). Vocal charisma signals include virtually all aspects of a speaker’s voice, but are particularly associated with pitch (i.e., fundamental frequency, f0) level and variability, speaking rate, and aspects of voice quality in terms of both loudness and spectral-energy distribution measures. Experimental-phonetic research shows that higher parameter levels (i.e., e.g., higher levels of pitch and speaking rate) make speakers sound more charismatically in the ears of listeners. The only except are voice-quality measures of spectral energy distribution. Here, it is often lowers values, i.e., a more balanced or flat spectral-energy distribution, that increase a speaker’s charismatic impact on listeners (Rosenberg and Hirschberg, 2009; Niebuhr et al., 2018; Fischer et al., 2019; Schweißfurth et al., 2020). Regarding speaker gender, previous findings suggest that, all else equal, women’s voices have a less charismatic impact than men’s voices. This applies in particular to persuasion-oriented settings like startup funding contests or similar investment decision-making and business tasks (Brooks et al., 2014; Niebuhr et al., 2018; Niebuhr and Skarnitzl, 2019; Niebuhr and Wrzeczsz, 2019; Brem and Niebuhr, 2020) Since the all-else-equal principle does not usually apply in everyday life, women do not necessarily have a disadvantage. Especially since women can train your vocal parameters faster and use them more flexibly in many situations (Reichel and Beňuš, 2018; Niebuhr et al., 2019), the acoustic cues to perceived speaker charisma are sometimes even more pronounced for women than for men.

Our key point in the present study is that, in remote meetings, voice-related charisma is virtually the only means that remains for charismatic speakers to win over their audience and make their listeners buy into their ideas, actions, or offers. However, at the same time, the speaker’s voice is subject to a more or less strong signal compression, which, in turn, generates more or less audible artifacts. Whether and to what extent these artifacts of signal compression influence the charismatic impact of a speaker’s voice has so far hardly been researched, even though remote meetings have been on the rise not only since the Corona pandemic (Maximize Market Research, 2020).

Gallardo (2018) investigated the effects of communication channel bandwidth (narrow-band speech transmission 300–3,400 Hz vs. wide-band speech transmission 50–7,000 Hz) on the perceived warmth and attractiveness of speaker voices. Her results show clear effects of transmission bandwidth on perceived speaker attributes; moreover, she found male and female speakers to be differently affected by these bandwidth effects. Male speakers were overall affected more strongly and consistently. But these effects were restricted to behavioral or interactional attributes that are less relevant for perceived speaker charisma. Specifically, male speakers sounded more childish and less sympathetic under low-quality narrow-band than under high-quality wide-band speech transmission conditions. Female speakers, by contrast, sounded more ugly¹ and submissive and less competent in the low-quality narrow-band speech transmission condition. That is, for them the transmission-bandwidth effects concerned attributes that are more relevant for perceived speaker charisma. This is noteworthy not least because experimental charisma research suggests that women are, all else equal, generally perceived to be less charismatic then men (Brooks et al., 2014; Niebuhr et al., 2018; Niebuhr et al., 2019).

In a follow-up study, Gallardo and Sanchez-Iborra (2019) varied and compared different speech compression codecs, but instead of investigating the effects of these codecs on perceived speaker attributes, they looked at how severely the codecs deteriorated the performance of automatic speaker-classification algorithms. Also, the study by Siegert et al. (2016b) did not directly deal with speaker attributes but investigated the effects of codec-degraded speech on the ability of listeners to identify and distinguish the perceived speakers’ emotions. Siegert et al. found that any signal degradation made the speakers’ emotions less intelligible for listeners.

Although the variables and findings of Gallardo and colleagues and Siegert and colleagues are only loosely associated with our aim to shed light on the connections between speech signal compression and perceived speaker charisma, they provide a solid empirical basis for putting forward two pairs of hypotheses:

• H (1a): Speech signal compression codecs affect perceived speaker charisma;

• H (1b): This effect is overall unfavorable, i.e., stronger compression means lower charisma;

• H (2a): Male and female speakers’ charisma is differently affected by speech signal compression;

• H (2b): Female speakers’ perceived charisma is more strongly affected than that of male speakers.

We test these 2 × 2 Hypotheses with German stimuli and listeners in a perception experiment, using scalar ratings. The complex impression of perceived charisma is queried both directly and indirectly via closely correlated attributes selected on an empirical basis.

2 Study Design

2.1 Stimuli

To have high-quality speech utterances, independent of the speech content, we used the Berlin Database of Emotional Speech (EMO-DB) (Burkhardt et al., 2005). This database comprises German utterances that have a neutral semantic content, but are realized with different emotional prosodies as well as in a neutral matter-of-fact version by 10 professional actors (five female), pseudonymized via a speaker-id². It comprises high-quality recordings both in technical (sampling frequency of 16 kHz, stored as uncompressed WAV at 16-bit depth, bit-rate: 256 kBit/s) and acoustic terms (clear sonorous voices, no influencing content).

For our study, we selected a subset of two male (11 and 15) and two female (13 and 14) speakers from whom a constant utterance set was available for all emotions and the neutral version. The selected subset included 26 uncompressed utterances that constituted the first part of our experimental stimuli.

2.2 Audio Codecs

For data transmission, compression is heavily used within modern (mobile/remote) systems. Compression allows reducing the transmission bandwidth while retaining the speech intelligibility (ITU-T, 1996; ITU-T, 2014; Maruschke et al., 2016). Several codecs have been developed to meet various applications with different quality requirements (Siegert et al., 2016a). More details about the degradation of acoustic characteristics under compressed speech can be found in (Byrne and Foulkes, 2004; Lee et al., 2006; Siegert et al., 2016b).

For this study, we selected four wideband/fullband codecs aiming at specific application scenarios. Adaptive Multi-Rate Wideband (AMR-WB) is a high-quality audio compression format mainly used in mobile communications (ITU-T, 2003), also known as “HD Voice” and Voice over LTE (VoLTE). We chose a bitrate of 12.65 kBit/s, which is intended for pure speech signals (ITU-T, 2003).

MPEG-1/MPEG-2 Audio Layer III (MP3) is a popular lossy fullband audio codec (Brandenburg, 1999). It uses perceptual coding for audio compression: certain parts of the original signal, considered to be beyond the auditory resolution ability, are discarded. Besides its usage for music storage, lower bitrates (16 kBit/s) are used to encode audio dramas (Ahern, 2020).

OPUS is an open-source lossy audio codec usable for both speech and music (Valin et al., 2012). It further offers a hybrid mode to improve the speech intelligibility at low bit rates, by enriching the synthesized signal with characteristics represented by a psychoacoustic model (Valin et al., 2013). The application of the hybrid mode can be controlled by the bitrate, which has to be 34 kBit/s in our case.

SPEEX is an open-source fullband speech codec for internet applications requiring particularly low bit rates (Xiph.Org Foundation, 2014). It is also used as a speech codec in common voice assistant platforms (Caviglione, 2015). The encoding is controlled by a quality parameter that ranges from 0 (worst) to 10 (best). In our study, we used 0 (i.e., 3.95 kBit/s).

All 26 uncompressed stimuli have been compressed employing of each of the four presented codecs at the specified bit rate (AMR-WB: 12.65 kBit/s, MP3: 16 kBit/s, OPUS: 34 kBit/s, SPEEX: 3.95 kBit/s). This resulted in 104 compressed stimuli. The total number of stimuli in our experiment was hence 26 + 104 = 130 stimuli.

2.3 Procedure

The perception test was conducted via an online survey tool [SoSci Survey Version 3.2.03-i, Leiner (2019)]. The samples were presented in a pseudo-randomized order to avoid that similar samples (same speaker and/or encoding quality) directly follow each other.

After a short audio test and an introduction of the variables, the participants (aka labellers) were asked to listen to all stimuli and rate the respective speaker’s performance on five-point Likert scales (ranging from one “not at all” to five “very strong”), see Figure 2. As previous studies showed that speaker charisma is a fairly complex concept and not always easy to apply by listeners in rating tasks, we additionally included nine other speaker attributes that are closely related to speaker charisma and derived from the studies of Rosenberg and Hirschberg (2009); Weninger et al. (2011) as well as from the charisma model of Michalsky (2019). Many of these additional attributes have already been successfully applied in charisma rating tasks of previous studies [e.g., Niebuhr and Wrzeczsz (2019); Niebuhr (2020)].

At the end of the rating task, socio-demographic information (age, sex, mother tongue, BFI-S16) was collected from the participants. Including this additional information, the entire perception experiment took about 1 h.

2.4 Participants

Overall, 21 participants took part in the perception experiment (12 female and nine male, all between 19 and 43 years old, mean 24.76 years). All of them were fluent speakers of German with no reported speech-production or -perception disorders. Furthermore the expression of the personality dimensions was comparable among the participants, thus an influence of the personality on the rating could be excluded.

3 Evaluation and Results

In this study, we only analyzed the assessment of the neutral stimuli in various codec qualities. The different emotional versions were excluded. We were interested in how women’s and men’s power of persuasion would be affected by perceived acoustic qualities resulting from different compression codecs. Therefore, we focus here mainly on comparing the results between the male and female speakers.

For the inferential statistics, we used a two-way repeated-measures ANOVA with the fixed factors Compression (four codecs plus one uncompressed baseline) and Speaker Gender (m/f). Speaker Rating (i.e., the scale values 1–5) served as the dependent variable. Note that, because, as expected, all nine additional speaker attributes were significantly correlated (in terms of Pearson PM r) with each other and with the key attribute of perceived speaker charisma at r (82) $>$ 0.6 (p $<$ 0.001), cf. also (Rosenberg and Hirschberg, 2009; Niebuhr and Wrzeczsz, 2019) for similar correlation results, we pooled the ratings across all 10 scales and treated them as one coherent rating of perceived speaker charisma. Participant was included in the RM-ANOVA as a random factor.

The RM-ANOVA returned a significant main effect of Compression [F (4,3356) = 135.7, p $<$ 0.0001, $η_{p}^{2} = 0.87$ ], a significant main effect of Speaker Gender [F (1,3356) = 86.3, p $<$ 0.0001, $η_{p}^{2} = 0.65$ ] as well as a significant interaction between the two fixed factors Compression × Speaker Gender [F (4,3356) = 58.9, p $<$ 0.0001, $η_{p}^{2}$ = 0.44].

As is summarized in Table 1, almost all codecs caused a decrease in perceived speaker charisma as compared to the uncompressed baseline condition (WAV), thus resulting in the significant main effect of Compression. Moreover, we can see in the WAV column of Table 1 that the overall magnitude of perceived speaker charisma differed between men and women, with the women’s stimuli received higher charisma ratings than the men’s stimuli. This result underlies the main effect of Speaker Gender. This finding is probably caused by the fact that our stimuli were realized by professional actors. Previous studies show that women benefit quicker and overall more from voice training then men Niebuhr et al., 2019 and that listeners pay more attention to other features than voice when rating men’s than women’s speeches (Sellnow and Treinen, 2004; Niebuhr and Wrzeczsz, 2019).

TABLE 1

TABLE 1. Average evaluation of men and women’s charisma regarding different compressions denoted as percentual loss compared to high-quality uncompressed speech (WAV).

Finally, Table 1 shows that the factor Compression in the form of its four different codec conditions affects the charisma ratings of men and women to different degrees. The two women suffered more from speech compression than the two men. More specifically, speech compression decreased the men’s perceived charisma on average by only 6.5% across all compression conditions, lowering it from 100 to 93.5%. Women’s perceived charisma, in contrast, was lowered three times as much across all four compression conditions, i.e., by on average 20%, from 100 to 80%. The RM-ANOVA reflects this fact in the significant interaction between Compression and Speaker Gender. Note that, for OPUS (a frequently used web codec), women’s charisma is even absolutely “worse” than for men, as men are rated more charismatic under OPUS compression than in the WAV baseline condition. We suppose that this outcome was caused by the underlying codec. It has been already shown for the hybrid operation mode of OPUS that certain emotions can be recognized better (Siegert et al., 2017), which is in some sense comparable to gender differences in the pronunciation of charisma. Furthermore, for men, only the differences between SPEEX and all other codecs are significant (according to Wilcoxon-Wilcox post-hoc tests with Bonferroni correction of alpha-error levels), whereas for women, also the comparisons of WAV vs. MP3 and WAV vs. OPUS came out significantly.

In order to study the interaction Compression*Speaker Gender at greater detail, we broke up our integrated charisma rating again and investigated, again based on Wilcoxon-Wilcox pairwise comparisons tests, how many statistically significant rating decreases we find per attribute for each male and female speaker. The theoretical maximum number of significant rating decreases was 100 (5 × 2 = 10 codec versions × 10 attributes). We found that the total number of statistically significant codec-induced decreases of charisma or charisma-related ratings differed remarkably between men and women. While for the men only 20% of all comparisons showed a significant decrease (at p < 0.01), this value rose to almost 30% in the case of the women (at p < 0.01). To illustrate this gender-specific difference, Figure 1 depicts the decreased ratings caused by SPEEX or OPUS for the charisma-related attributes trustworthy, persuasive, and likeable of men and women. On average, the rating decrease caused the codecs amounted to 1.01 scale points for men, as against 1.49 scale points for women. In addition, Figure 1 exemplifies our finding that the greater decrease of women’s charisma under speech compression was not tied to specific attributes related, e.g., to competence, power or attractiveness. Rather, it occurred across the board and affected all speaker attributes to similar degrees.

FIGURE 1

FIGURE 1. Changes of selected perceived charisma related attributes for men ( ) and women ( ).

FIGURE 2

FIGURE 2. Screenshot of utilized annotation questionnaire.

4 Discussion and Conclusion

Two pairs of Hypotheses were addressed here, based on previous findings of Siegert and colleagues and Gallardo and colleagues. The results of our perception experiment provide supporting evidence for all four Hypotheses. On this basis, we can draw the following conclusions:

• H (1a): Yes, speech signal compression codecs affect perceived speaker charisma;

• H (1b): Yes, this effect is overall unfavorable, i.e., stronger compression means lower charisma;

• H (2a): Yes, male and female speakers’ charisma is differently affected by speech signal compression;

• H (2b): Yes, female speakers’ perceived charisma is more strongly affected than that of male speakers.

Note with respect to the latter two conclusions that (Gallardo, 2018) suggested that the stronger negative influence of signal compression on women’s voice could be restricted to a few charisma-related attributes like competence and self-confidence (or its counterpart submissiveness). However, this idea is not consistent with our results. They show that the harmful effect of speech compression on women’s vocal charisma occurred across the board and concerned all tested charisma-related attributes to a similar degree. Thus, in terms of their relevance for everyday (professional) life, using communication tools with speech compression seems to pose an even more risk for women than previously assumed.

While it is worth pursuing this assumption further in experiments with behavioral or decision-making tasks, the way our data converges with previous evidence leaves hardly any doubt that men and women are not similarly affected by signal compression. This can be either because the compression algorithms treat female speech acoustics worse than male speech acoustics, or because listeners are less forgiving of compression artifacts when they concern female speakers. What argues in favor of the latter is the study of Niebuhr and Wrzeczsz (2019). Its results suggest that the charismatic impact of women more strongly depends on voice-related features than for men. Given that, it is plausible that a degradation of these voice-related features through signal compression particularly lowers the women’s charismatic impact. We are currently designing follow-up experiments that take up these more specific questions.

However, irrespective of what the underlying mechanism(s) of the effect may be, the practical implication of our findings is that for women to reach the full charismatic potential it seems crucial to avoid speech-interaction situations with a high risk of signal compression. Even under codecs like Opus or MP3 that are very widespread and not too severe in terms of compression rate, the women’s charismatic impact decreases already by about 20% compared to an uncompressed speech-interaction situation (which is at least 10% more than for men). So, for example, women might want to take special care to find a strong network connection before they start their remote meeting and/or rather switch off the video streaming even if it would reduce the bit rate of the audio stream. Note that we make these recommendations based on the neutral matter-of-fact stimuli only, but preliminary analyses of the rating data for the emotional stimuli suggest that the gender-specific compression effect also exists here, probably even more pronounced.

Finally, have to address the women’s higher absolute charisma-related ratings compared to the men (Table 1). This difference does not contradict previous research showing that women’s speeches are, all else equal, less charismatic than men’s speeches. The ceteris-paribus principle is just not applicable here as we compared natural stimuli of men and women. That is, the women’s higher absolute charisma-related ratings only mean that we by chance selected two women for our 2 + 2 speaker sample, who spoke at a higher level of vocal charisma than the two men we selected, especially, as the speakers were trained actors. What matters here are the relative differences and changes in charisma-related ratings, and we have started to understand how they look like and how we can potentially overcome them.

We are aware that this study has some limitations, especially the small number of samples as well as the use of acted speech samples. However, our aim was to draw first conclusions and generate valid hypotheses for a clearly defined larger-scale study. Using a small pilot stimulus set ensured a constant, high focus on the task among the raters; and using acted stimuli ensured that we could start from a high charisma level in the uncompressed baseline condition (WAV) so that effects of compression can clearly manifest themselves. In future studies we will include non-acted speech samples (of real video/phone calls) as well as all emotional stimuli that, in combination, are meant to cover the whole spectrum of human feelings in real situations of remote (digital) communication. Based on that, we get a more realistic and differentiated idea about how and to what gender-specific degrees compression in connection with other factors diminishes perceived speaker charisma.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

¹The term ugly is a scale-item taken from the utilized semantic differential questionnaire of speaker characteristics [cf. Fernández Gallardo and Weiss (2018)].

²More details about this database can be found at: http://emodb.bilderbar.info/index-1024.html.

References

Ahern, S. (2020). Acoustical design of concert halls and theatres: a personal account. 3rd Edn. Abingdon, United Kingdom: Routledge.