Decoupling the role of verbal and non-verbal audience behavior on public speaking anxiety in virtual reality using behavioral and psychological measures

Girondini, Matteo; Frigione, Ivana; Marra, Mariapia; Stefanova, Milena; Pillan, Margherita; Maravita, Angelo; Gallace, Alberto

doi:10.3389/frvir.2024.1347102

ORIGINAL RESEARCH article

Front. Virtual Real., 19 March 2024

Sec. Virtual Reality and Human Behaviour

Volume 5 - 2024 | https://doi.org/10.3389/frvir.2024.1347102

This article is part of the Research TopicUnderstanding and Improving the ‘Self’ Using Immersive Virtual RealityView all 8 articles

Decoupling the role of verbal and non-verbal audience behavior on public speaking anxiety in virtual reality using behavioral and psychological measures

Matteo Girondini^1,2,3*^†

Ivana Frigione^1,2^†

Mariapia Marra^1,2^†

Milena Stefanova^1,4

Margherita Pillan⁴

Angelo Maravita^1,2

Alberto Gallace^1,2

¹Mind and Behavior Technological Center, Department of Psychology, Università Milano-Bicocca, Milan, Italy
²Department of Psychology, Università Milano-Bicocca, Milan, Italy
³MySpace lab, University Hospital of Lausanne, Lausanne, Switzerland
⁴Department of Design, Politecnico di Milano, Milan, Italy

Public speaking is a communication ability that is expressed in social contexts. Public speaking anxiety consists of the fear of giving a speech or a presentation and the perception of being badly judged by others. Such feelings can impair the performance and physiological activation of the presenter. In this study, eighty participants, most naive in Virtual Reality experience, underwent one of four virtual reality public speaking scenarios. Four different conditions were tested in a between-group design, where the audience could express positive or negative non-verbal behavior (in terms of body gesture and facial expression), together with positive or adverse questions raised during a question-and-answer session (Q&A). The primary outcomes concerned the virtual audience behavior’s effect on perceived anxiety and physiological arousal. In general, perceived anxiety seemed to be unaffected neither by the verbal nor non-verbal behavior of the audience. Nevertheless, experimental manipulation showed a higher susceptibility to public speaking anxiety in those participants who scored higher on the Social Interaction Anxiety Scale (SIAS) than those with lower Social Interaction Anxiety Scale scores. Specifically, in the case where the verbal attitude was negative, high SIAS trait participants reported a higher level of anxiety. Participants’ physiological arousal was also affected by the proposed scenarios. Participants dealing with an approving audience and encouraging Q&A reported increased skin conductance response. The lack of correlation between reported anxiety and skin conductance response might suggest a physiological engagement in an interactive exchange with the virtual audience rather than a form of discomfort during the task.

1 Introduction

Public speaking anxiety (PSA) is a specific subtype social anxiety (Hofmann et al., 2004) that considerably impacts on people’s behavior and feelings during public speaking interactions. It consists of a sense of unease and distress experienced when presenting a discussion or speaking in front of an audience. McCroskey (1970), McCroskey (1977) referred to the fear or anxiety felt during real or anticipated communicative scenarios as communicational apprehensions (CA). Actually, it is quite common that individuals may experience different CA levels in both public and interpersonal communicational situations. This apprehension often fluctuates depending on situational variability (state communication apprehension) but may also manifest as a consistent trait within some individuals (trait communication apprehension). People with high CA trait levels, exhibit heightened apprehension across a broad spectrum of oral communication interactions, leading to avoidance behaviors and negative self-evaluation (McCroskey, 1977). This condition can spread beyond mere apprehension and evolve into a pathological state, as PSA is, characterized by elevated stress levels and bodily discomfort that occurs during social interactions (Stein and Chavira, 1998; Hoffman et al., 2004). PSA can be itself an obstacle to professional growth (Harris et al., 2002; Raja, 2017), as it hampers the effective communication of ideas and expertise in public environments (Ferreira Marinho et al., 2017). Moreover, the negative consequences of PSA extend to interpersonal relations, leading to social isolation and hinder the development of meaningful personal relationships due to the fear of negative judgment coupled with the importance attributed to positive evaluations from others (Clark and Wells, 1995; Rapee and Heimberg, 1997; Raja, 2017). According to Clark and Wells (1995), people with social phobia exhibit cognitive distortions regarding their social performance and how others perceive them. Specifically, they tend to overestimate the extent of negative evaluations from others and the probability of committing social errors. This cognitive distortion fuels persistent rumination over perceived flaws or mistakes, perpetuating an inaccurate self-perception and a distorted view of own social performances and interactions. Similarly, Rapee and Heimberg (1997) emphasize that negative self-perception plays a pivotal role in sustaining social anxiety. This negative self-perception is exacerbated by individual’s heightened sensitivity to every negative social cues. Furthermore, those suffering with social phobia often exaggerated the probability of others detecting and negatively evaluating their perceived flaws, even when such judgments are unlikely or non-existent. All of this amplifies anxious symptoms and behaviors, such as avoidance, reinforcing the negative cognitive framework. These models together undelight the critical role of cognitive distortion in perpetuating social phobic conditions, such as PSA. Such distortions may rise to a detrimental cycle of fear, inhibition, and avoidance of anxiety-producing social situations. This contributes to the persistence of the anxiety, increasing both its frequency and intensity, reinforcing dysfunctional beliefs of the individual as inadequate and under judgment.

PSA affects physical and emotional wellbeing, also increasing physiological arousal before, during, and after the task (Goodman et al., 2017). Indeed, according to the three-system model (Lang, 1968), socially triggering situations are linked with a multifaceted range of responses. It is not just the cognitive and overt behavioral reactions that come into play; there’s also an interplay with physiological responses. Grappling with actual or anticipated public speaking situations, individuals experience physiological reactions such as heightened heart rate, facial or skin flushing, and electrodermal activation (Bodie, 2010).

Virtual Reality (VR) is a valuable tool for experiencing public speaking in an immersive environment and studying anxiety in a realistic yet controlled way (North et al., 1998). Indeed, VR has the potential to evoke physiological reactions and comparable levels of presence to natural public talk environments within a PSA situation (Riva, 2009; Gallace et al., 2011; Higuera-Trujillo et al., 2017; Lanier, 2017), even in front of an audience composed by a small group of virtual character (Mostajeran et al., 2020). Furthermore, people tend to treat a virtual human as a social actor rather than a mere image (Reeves and Nass, 1996). Thus, a VR conversation task may elicit levels of subjective distress comparable to a corresponding in vivo task (Powers et al., 2013). Moreover, VR enables manipulation of the virtual audience and the interaction dynamics between participants and the audience, exploring the influence of listeners’ features during a public speech. According to Blascovich (2002), humans induce a greater sense of social presence than agents during interactions, that result in greater social influence. Essentially, real humans consistently wield social influence, while virtual humans influence hinges on their behavioral authenticity, depending on the realism and the agency of the virtual audience. Research in this direction (El-Yamri et al., 2019) pointed out how it is a complex task to create realistic feedback in the audience for the presentation. It is influenced not only by the words but emotions, as a non-verbal behavior transmitted by the presenter and the audience. One of the hardest tasks of researchers is to create agents or avatars in virtual reality environments expressing emotions in a way that is convincing on a subconscious level, too (Norman, 2005). That’s why creating virtual audiences (VAs), made of humanlike virtual spectators, takes inspiration from the seven universal emotional facial expressions (Ekman, 1999) and how the entire body, by postures or gestures, is involved in sending information about our emotional state to others around us (Nummenmaa et al., 2014). Static nonverbal behavior of the virtual audience (VA) can stand for an emotionally neutral audience, friendly and appreciative behavior simulates positive reactions towards the speaker; instead, adverse and bored expressions throughout the presenter’s speech can lead to perceiving the audience negatively (Pertaub et al., 2001). Also, supportive or non supportive VAs’ feedback, comparable to real audiences, can influence the levels of anxiety in speakers (Kelly et al., 2007). Considering that nonverbal behaviors of the audience can be perceived in the dimensions of valence (opinion) and arousal (engagement) (Kang et al., 2016), cognitive models, such as valence-arousal (Chollet et al., 2014; Chollet and Scherer, 2017), can also guide the configuration of nonverbal behaviors in order to express clearly recognizable and replicable information in virtual multimodal public speaking performance (Glemarec et al., 2021). The frequency of gazing away or looking at the presenter is a fundamental indicator of the audience’s engagement, such as a bored or interested attitude. Head movements, like nodding in agreement or disagreement, suggest the speaker the audience’s opinion (Chollet and Scherer, 2017; Etienne et al., 2022).

A recent study (Girondini et al., 2023a) shows that the audience’s feedback during a simulated public speaking task can uplift or relieve Public Speaking Anxiety (PSA). Positive audience feedback can enhance the speaker’s self-esteem, reduce anxiety, and increase overall satisfaction with the speaking experience. Conversely, negative audience feedback can have detrimental effects on the speaker. However, in real-life scenarios, the audience’s feedback might be expressed through verbal and non-verbal communication during a speech. Nonverbal interactions, such as body language, can complete verbal messages in a public speaking context and play a crucial role in connecting the speaker and the audience. Non-verbal behavior cues, such as eye-gazing, facial expression, hand gestures, and voice pinch, impact the perceived persuasiveness and credibility of the speaker (Burgoon et al., 1990), as well as be an indicator of perceived anxiety and the emotional states lived by the speaker during the speech (Laukka et al., 2008). The audience’s body gestures or postures can add further negative or positive feedback to the speakers and thus affect their performance and feelings.

Recognizing the influence of the audience’s non-verbal behavior and more explicit feedback might help understand and define mechanisms contributing to public speaking anxiety. Nevertheless, in virtual as in real scenarios, public speaking is not a one-way but a bidirectional exchange of communication between the speakers and the audience; the orator can overcome the positive or negative feedback of the listeners (Slater et al., 1999). Notably, the specific effects of audience feedback can vary depending on individual differences and situational factors. Some speakers may be more sensitive to feedback, while others may be more resilient. In this context, considering psychological traits related to anxiety may be an essential aspect to be considered when investigating public speaking anxiety. Indeed, previous studies have shown that individual anxiety traits could be a predictive factor influencing the outcomes of experimental manipulations on public speaking performance, with a positive correlation between social anxiety traits and experienced anxiety before and during the speech (Cornwell, et al., 2006; Witt et al., 2006).

The present work aims to explore the interplay among different audience behaviors (verbal and non-verbal) and unique anxiety traits of the speakers in inducing public speaking anxiety. We assigned each participant to one of four experimental groups characterized by various combinations of the audience’s non-verbal behavior (interested vs. uninterested) and explicit audience feedback following the talk (positive questions vs. negative questions). The measurements in this study included self-report anxiety questionnaires, the perceived anxiety after the performance, sense of presence, and audience perception. Moreover, the physiological activation (Heart Rate and Skin Conductance Response) was recorded before and during the task. We expected a difference in perceived anxiety and physiological arousal between the supportive and the hostile audience, with a more pronounced effect for the verbal component than the non-verbal manipulation. The hypothesis is supported by the fact that the nonverbal component may be less clearly inferred than the verbal one; some people could not be able to use facial expressions as evidence of the feedback judgment of audience (Kang et al., 2016). Also, nonverbal neutral static audience scenarios are less effective than positive or negative, more interactive speaking publics (Pertaub et al., 2001). Specifically, based on previous studies, we hypothesized that negative verbal behavior would induce a higher level of perceived anxiety and increased arousal. Instead, positive feedback (in both verbal and non-verbal features) should ease perceived anxiety during and after the performance. However, the impact of hostile non-verbal feedback might amplify the effect of negative verbal behavior compared to hybrid conditions, whereas verbal and non-verbal behaviors do not align. Given the lack of previous studies investigating the relationship between the audience’s verbal and non-verbal components on induced anxiety during public speaking, the nature of this experiment is, in part, explorative in this specific aspect.

2 Methods

2.1 Participants

The participants were recruited by self-enrollment using the University website. University students were given course credit for participating. The sample size for the study was comparable with previous research with similar design and measurement (Kroczek and Mühlberger, 2023). Eighty healthy participants (mean age = 25,16 years, SD = 6,48, age range = 18–58, 48 female) participated in the experiment. A sensitivity power analysis with eighty participants, beta = 0.80 and alpha = 0.05, allow detecting the effect of small-to-medium effect size (f = 0.3). Thirty-two participants had previous experiences with Virtual Reality, while forty-eight had not. They gave written informed consent to participate in the study, which was approved by the Ethical Committee of the University of Milano-Bicocca and conducted following the standards of the Helsinki Declaration.

2.2 Experimental design

The experimental design involved participants performing a virtual reality public speaking task. Participants were randomly distributed into four experimental groups (twenty participants in each group), which differed for the Verbal behavior (positive vs. negative questions) and Nonverbal behavior (positive vs. negative attitudes) expressed by the virtual audience during the performance. This resulted in a 2 × 2 between design, with the main factors of Verbal and Non-verbal behaviors. The nonverbal behavior refers to the first part of the talk, where the audience could express positive (e.g., Nodding in agreement, Looking the presenter in the eyes, Facial Expressions: Enjoyment and Surprise) or negative (e.g., Nodding in disagreement, talking to others in the audience, Looking another way instead) nonverbal behaviors while the participants exposed a cooking recipe (rice or cake recipe). The second part of the performance comprised a Question-and-Answer session (Q&A), where the virtual audience asked participants about the recipe. Again, the virtual audience could ask positive questions (e.g., encouraging questions: It was a very innovative recipe! Will be difficult to find ingredients during summer?) or negative questions (e.g., annoying questions: I’m not convinced about your recipe, it seems so obvious, why did you choose to present this recipe?).

Using a mixed factorial design regarding Verbal and Nonverbal behaviors expressed by the audience, four experimental conditions characterized the investigation:

- Positive Attitude–Positive Questions (PA_PQ): The scenario was composed by a virtual audience who expressed positive Nonverbal behavior and raised positive questions

- Positive Attitude–Negative Questions (PA_NQ): The scenario was composed by a virtual audience with positive Nonverbal behavior, but raised negative questions

- Negative Attitude–Positive Questions (NA_PQ): The scenario was composed by a virtual audience who expressed negative Nonverbal behavior but raised positive questions

- Negative Attitude–Negative Questions (NA_NQ): The scenario was composed by a virtual audience who expressed negative Nonverbal behavior and raised negative questions

Figure 1 shows a graphical representation of the experimental design used in this study. The primary outcome measure concerned the participants’ perceived anxiety after the talk, depending on the verbal and non-verbal behaviors of the audience. Secondary analyses on perceived anxiety were performed considering the anxiety questionnaire scores. Additionally, physiological activity during the performance (skin conductance activity and heart rate) was collected and analyzed. Finally, correlation analyses were conducted between subjective reports, anxiety, and physiological activity. Overall, this design served us to investigate how Verbal and Nonverbal behaviors expressed by the virtual audience during the task impact anxiety and physiological responses during public speaking performances in virtual reality.

Figure 1

Figure 1. Graphical representation of experimental design.

2.3 VR equipment and scenarios

The VR equipment used for the experiment included an Oculus Quest 2 Head-mounted- display (HMD) with a 1280 × 1440-pixel resolution per eye (refresh rate 80 Hz). The HMD was connected to a computer featuring an Intel Core i7-7800X CPU, 16 GB of RAM and a GeForce GTX GPU. The four virtual reality environments (VREs) (Figure 2) were developed with the Unity graphics Engine (https://unity.com/.) for this experiment. Human voices were added to the audience characters. Each scenario showed a room where eleven chairs were arranged. Eight avatars were sitting in some of these chairs, with four males and four females, all dressed informally. A virtual projection board hanging on the wall just behind the audience and in front of the speaker showed the sequence of the different pictures involved in preparing the recipe being presented by the participant. Adobe Fuse was used for the creation of the humanlike avatars. Body movements were selected from the Adobe Mixamo library (https://www.mixamo.com), combined, and blended into Unity using layers and masks. All members of the audience were in a sitting position. Different sitting positions were used again for the purpose of a natural look. For example, sitting with legs crossed was applied on female avatars, while a more relaxed pose was applied on male avatars. In total, 5 sitting poses are used, including 3 females and 2 males. To create a natural setting, avatars in the same pose are not sitting close to each other. The neutral movements are turning the head left or right, scratching a hand, moving on the chair, breathing, and blinking. The positive and negative gestures are represented by nodding respectively in agreement or disagreement. Facial expressions were also part of the movements. The positive ones are represented by enjoyment and surprise, while the negative ones by contempt and disgust. The animations on the faces were based on Paul Ekman’s characteristics of the facial expressions of emotions (Ekman, 1999). Negative non-verbal behavior was conveyed through facial expressions expressing contempt or disgust, whereas positive non-verbal conditions were represented by expressions of enjoyment or surprise. Unity blend shapes were used to create these animations by altering the faces of the avatars. Once created, those animations were reused by applying them on different avatars. The facial expressions were applied to the audience from the first rows to increase the visibility of the speaker. All animated body movements were created with the Unity graphics engine. The timeline and the number of gestures were kept the same between the two visual settings. The frequency of the movements is spread throughout the length of the experiment, with different time windows between each. The same number of facial expressions and gestures with the same timeline are used in both experimental environments with replacing of the negative with positive gestures and facial expressions between the versions. In total, 17 facial expressions and 17 body movements were implemented for all avatars throughout the experiment’s duration (i.e., positive non-verbal behavior: Nodding in agreement, Looking the presenter in the eyes; i.e., negative non-verbal behavior: Nodding in disagreement, Talking to others in the audience, Looking another way instead of at the presenter). The spread of movements, both facial and body, was random. For the question session, mouth movement was implemented in Unity and added to one of the audience avatars. The animation was done using blend shapes in Unity, the same used for the facial expressions animation. The mouth movement was synchronized with the duration of the avatar speech. Two audio recordings, respectively negative and positive versions, were used and embedded into the timeline to be replicated the same way between experiments. All animations (body movements, facial expressions, blinking, breathing, speaking) were blended together with Unity features layers and masks. This allows the avatar to perform a few movements in a smoothed way simultaneously, e.g., nodding in agreement and blinking. Different speed index is applied between the avatars on the animations for breathing and blinking to avoid the movements happening at the same time, therefore, creating a natural looking audience.

Figure 2

Figure 2. Audience during the recipe presentation.

2.4 Measurements

2.4.1 Anxiety questionnaires

STAI (State-Trait Anxiety Inventory): STAI is a well-known anxiety self-assessment scale comprising two subscales measuring transient and enduring anxiety levels (Spielberger et al., 1971). Each scale has 20 items, and both scales include items that describe symptoms of anxiety and items that define the absence of anxiety. We used the Italian version of STAI.

SIAS (Social Interaction Anxiety Scale): SIAS is a self-report questionnaire that measures the presence of fear during general social interactions. The Italian validated version of the questionnaire contains 19 items evaluated on a 5-point Likert scale, rating from 0 (not at all) to 4 (extremely) (Heimberg et al., 1992).

SSQ (Simulator Sickness Questionnaire): SSQ is a motion sickness questionnaire that assesses sickness after virtual reality experiences. The version used in this experiment contains 16 items, divided into three categories: nausea, oculomotor problems and disorientation (Kennedy et al., 1993).

2.4.2 Experience evaluation (self-report)

Three self-report statements were used to assess participants’ subjective experiences during the task. Each statement was evaluated using a 10-point Likert scale, ranging from 1 (not at all) to 10 (extremely). The measurements were collected after each experimental session. The focus of Likert assessments concerned:

- Perceived immersion in the virtual environment (I felt like I was inside the environment shown)

- Perceived audience attention (I had the feeling that the audience (e.g., characters) was listening to me)

- Level of anxiety evoked by the task (What level of anxiety did you experience on a scale from 1 to 10)

Furthermore, the participants’ arousal and emotional states following the performance were assessed using the Self-Assessment Manikin (SAM) (Bradley and Lang, 1994). They were instructed to express their arousal and emotional state by choosing one of nine manikins that depicted varying levels of emotional (facial expression) and arousal (body) states.

2.4.3 Physiological measurements

Electrodermal activity (EDA): The measurement of the phasic level of skin conductance is a highly suitable marker for sympathetic nervous system activation (Turpin et al., 2009). EDA is a well-known marker of anxiety during public speaking (Giesen and McGlynn, 1997; Arsalan and Majid, 2021). For example, Croft and colleagues used EDA values to predict state-dependent speech anxiety in a student sample during a public speaking task (Croft et al., 2004). The focus of EDA measurement during public speaking was related to the phasic (fast) change of electrodermal activity. In particular, for each exposure, we used the Non-specific Skin Conductance Response (NS-SCR) as an index of electrodermal activity: The NS-SCR is the frequency of phasic level of electrodermal activity that occurs spontaneously, not related to external stimuli (in a fixed time-window) (Nikula, 1991; Gertler et al., 2020). This measure has been previously used to measure fear-induced arousal during public speaking situations (Niles et al., 2015). In the present study, a Biopac BioNomadix MP 150 device recorded the electrodermal signal through two AgCl electrodes attached to the participant’s index finger and ring finger for Skin Conductance measurements. As regards the electrodermal activity measurements, data were further elaborated using the Matlab-based script Ledalab (version 3.4.8) by adopting a continuous decomposition approach (Benedek and Kaernbach, 2010). The signal was recorded at 100 Hz and downsampled to 10 Hz for the analysis. We extracted one measure of interest for each exposure, which lasted approximately 6 min: the mean amplitude of non-specific skin conductance responses (NS-SCR) that overcame the 0.03 micro Siemens and used it as the dependent variable in the analyses.

Heart rate (HR). Heart rate is a standard measurement for evaluating physiological stress in public speaking situations (Slater et al., 2006; Owens and Beidel, 2015; Takac et al., 2019). We measured Beats per minute (bpm) using a Procompt Infiniti 5 device through a Blood Volume Pulse (BVP) sensor attached to the middle finger. Since HR indicates participants’ stress-related sympathetic activity, elevated HR values are markers for heightened physiological arousal.

2.5 Procedures

The participants were asked to sign the informed consent form and complete online self-report questionnaires assessing anxiety, interoception and previous VR experiences. Then, participants were comfortably seated in a silent room, and they were asked to wear the respiration sensor around their chest, and the EDA and BVP sensors were attached to the participant’s left hand. A 3-min preliminary psychophysiological data recording was made (Baseline phase). The experimenter explained to the participants that they would have to present a recipe to the VR audience and recipe images behind the audience, which would have helped them during the task. So, participants read and memorized the steps of a recipe, rice or pie, as best they could in about 5 minutes. When the participants were ready, participants wore the Head-Mounted-Display (HMD). They started their public speaking task (PST), which consisted of explaining the recipe they had read before to the avatars while psychophysiological data were recorded (Figure 3).

Figure 3

Figure 3. Participants’ posture and position during the public speaking task.

In the first part of the task, the audience rested in silence, listening to the participant’s speech and showing a positive or negative non-verbal attitude. Specifically, in the positive scenario, the audience was quiet, nodding, and attentive, while in the negative scenario, the audience was inattentive, shaking their heads, rolling their eyes or snorting. At the end of the 3-min presentation, the avatars applauded the speaker. Then, the Questions & Answers (Q&A) session started, characterized by automatic questions from the audience about the recipe that participants had to answer in 45 s each. The four rice recipe questions were positive (es. “Thanks for your presentation; I like your recipe so much … I do not like mushrooms. Can I substitute them with something else?”; “I always found it very difficult to cut the pumpkin, but your methods seem very efficient. Could you repeat this?”), while the pie recipe ones were negative (es. “Cinnamon is disgusting, why would I ever mess this recipe with it?”, “Honestly, I too have my doubts, are you sure lemon is an ingredient … my grandmother never prepared the recipe with lemon”).

The speaking task lasted 6 minutes; after that, the participants removed the HMD and underwent the phase of 3-min rest psychophysiological data recording (Rest phase).

At the end of the experiment, participants filled in a 10-point Likert scale to report their perceived sense of presence, the audience’s interest, anxiety, and experience of pleasantness; SAM to state their perceived level of arousal and emotional feelings; SSQ to check their sickness after virtual reality experience (Figure 4).

Figure 4

Figure 4. Sketch of the experimental procedure. Participants filled in preliminary questionnaires. In the following baseline phase, psychophysiological data were recorded (3 min). Then, participants read the recipe steps. During the Public Speaking Task, participants explained the recipe and answered the audience’s questions while psychophysiological data were recorded (6 min). At the end of the task, psychophysiological data were recorded in the Rest phase (3 min). Then, SAMs, Likert scales, and SSQ were filled in.

2.6 Data analysis

Statistical analyses were performed by R software (www.r-project.org). As the first analysis, we performed a correlation analysis (r-Spearman) to individuate a possible correlation between the anxiety questionnaire scores and perceived anxiety after the public speaking performance. The same approach was used for correlation analysis between physiological measurements with perceived anxiety after the performance and anxiety questionnaire scores.

Then, self-report measurements (Likert scales) were analyzed using a 2 × 2 between model ANOVA with Verbal behavior (positive vs. negative questions) and Non-verbal behavior (positive vs. negative attitude) as main factors. ANOVA analyses exhibited robustness to normality distribution violations arising from using the Likert scale, as indicated by Norman (2010) and also supported by Higgins et al. (2022) and Girondini et al. (2023b) within public speaking investigations. Secondary analysis explored any plausible relationship between experimental manipulation, self-report anxiety scores, and perceived anxiety. We extracted each participant’s average mean value as the dependent variable to analyze the Skin conductance (NS-SCR). Raw data were normalized using log-transformation and analyzed using mixed ANOVA with two main factors: verbal behavior (positive vs. negative questions) and non-verbal behavior (positive vs. negative attitude). For Heart rate (HR), the average values were first corrected with baseline subtraction. A mixed ANOVA with two main factors of verbal behavior (positive vs. negative questions) and non-verbal behavior (positive vs. negative attitude) was used to analyze skin conductance and HR measurements.

3 Results

3.1 Descriptive results: anxiety scale and VR sickness

The average anxiety score revealed normative values, confirming that our participants belonged to a nonpathological sample: STAI-S (M = 38.04, SD =8.67), STAI-T (M = 45.90, SD = 9.56), SIAS (M = 26.70, SD = 11.10). The internal reliability for anxiety self-report questionnaire was 0.74 (ω). The values of each experimental group are presented in Table 1. SSQ questionnaires were assessed after the VR exposure to detect the possible presence of sickness due to the device used. The SSQ scores revealed a low level of sickness (M= 3.13, SD = 2.68, raw values), suggesting that the participants endured the VR public speaking experiences well.

Table 1

Table 1. Values of each experimental group on the SIAS, STAI-S, and STAI-T questionnaires.

3.2 Correlation analysis between self-report anxiety questionnaires and perceived anxiety

The scatterplot with Spearman Correlation is presented in Figure 5. The p-value of each correlation is presented in Table 2. A positive correlation among anxiety questionnaires (SIAS STAI-T and STAI-S) was found. Only SIAS score positively correlated with perceived anxiety (r = 038, p < 0.001). Perceived immersion in virtual environment negatively correlated with STAI-T (r = −0.27, p = 0.018) and SIAS (r = 0.029, p = 0.012), but positively correlated with perceived audience attention (r = 0.61, p < 0.005). SAM arousal positively correlated with SIAS (r = 0.23, p = 0.049) and perceived anxiety (r = 0.51, p < 0.001). SAM emotion negatively correlated with STAI-S (r =–0.38, p < 0.001) and SIAS (r = −0.44, p < 0.001) and positively correlated with perceived audience attention (r = 0.26, p = 0.023) and perceived immersion in virtual environment (0.036, p = 0.001). Notably, no correlation was found between self-report measurements and skin conductance activity.

Figure 5

Figure 5. Scatterplot with Spearman Correlation for self-report measurements and skin conductance activity.

Table 2

Table 2. Table of correlation matrix p-values.

3.3 Self-report measurements

Perceived immersion in virtual reality: No main effect of Verbal behavior [F _(1,76)=1.28, p = 0.261] or Nonverbal behavior [F _(1,76) = 0.49, p = 0.485] was found in respect to the perceived immersion in virtual reality. The interaction effect Verbal behavior * Nonverbal behavior resulted non-significant [F _(1,76) = 1.53, p = 0.219].

Perceived audience attention: No main effect of Verbal behavior [F _(1,76) = 1.12, p = 0.292] or Nonverbal behavior [F _(1,76) < 0.01, p = 0.995] was found for the perceived audience attention. The interaction effect Verbal behavior * Nonverbal behavior showed a non-significant trend [F _(1,76) = 2.98, p = 0.087]. The interaction plot (Figure 6) shows the source of the exchange. Specifically, for negative Nonverbal behavior (negative attitude during the speech), the audience’s evaluation depended on the audience’s verbal behavior during Q&A. That is, participants who experienced negative nonverbal behavior but coupled with positive questions judged the audience to be more attentive to the speech (M = 7.15, SD = 2.08) compared to when the negative attitude was followed by negative questions (M = 5.9, SD = 1.71). Regarding positive Nonverbal behaviors, the perceived audience attention was comparable after positive questions (M = 6.65, SD = 2.03) and negative questions (M = 6.35, SD = 2.16).

Figure 6

Figure 6. Interaction Verbal behavior * Nonverbal behavior on perceived audience attention.

Perceived anxiety after the performance: No main effect of Verbal behavior [F _(1,76) = 0.05, p = 0.817] or Nonverbal behavior [F _(1,76) = 0.66, p = 0.419] was found in perceived anxiety after the performance. The interaction effect Verbal behavior * Nonverbal behavior showed a non-significant effect [F _(1,76) = 1.34, p = 0.250]. A second analysis, including anxiety questionnaire scores as a covariate, was conducted as no significant effects were observed when considering only the two factors. The secondary analysis considered three models, including the anxiety questionnaire scores (SIAS, STAI-T, and STAI-S) as covariates. We employed the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) parameters to select the most appropriate model. The analysis revealed that the covariate SIAS had a more significant influence on the statistical model than other scales (DV ∼ verbal behavior * nonverbal behavior * SIAS: AIC = 327.744, BIC = 346.182; DV ∼ verbal behavior * nonverbal behavior * STAI-S: AIC = 337.679, BIC = 359.117; DV ∼ verbal behavior * nonverbal behavior * STAI-T: AIC = 335.233, BIC = 356.671).

Mixed model ANOVA with SIAS as a covariate revealed no main effect of Verbal behavior [F _(1,76) = 0.06, p = 0.797] and no main effect of Nonverbal behavior [F _(1,76) = 0.81, p = 0.369] in respect to the perceived anxiety. SIAS score was significant on perceived anxiety [F = 12.82, p < 0.001]. The interaction between Verbal behavior * Nonverbal Behavior was not significant [F _(1,76) = 2.01, p = .160]. However, a significant interaction effect on Verbal behavior * SIAS was found [F _(1,76) = 4.45, p = 0.038].

In order to investigate the interaction effect, we performed a post hoc analysis utilizing an independent t-test with Bonferroni correction, where the adjusted p-value was set to 0.012 (0.05 divided by 4) and participants were divided into two groups (High vs. Low Social Interaction Anxiety Scale (SIAS)) using a median split. Post-hoc t-test resulted in a non-significant effect for perceived anxiety in high vs. low anxiety traits under positive verbal behavior (t = 0.26, df = 37.56, p = 0.791). The contrast positive vs. negative behavior resulted in non-significant in both high SIAS participants (t = 1.95, df = 35.37, p = 0.058) and in the low SIAS participants (t = - 1.16, df = 39.91, p = 0.251). However, post hoc analysis comparing perceived anxiety in high vs. low SIAS anxiety traits resulted significantly under the negative verbal behavior condition (t = −3.28, df = 37.81, p = 0.002). The interaction effect is presented in Figure 7. That is, participants with high SIAS scores exposed to negative questions reported more anxiety (M = 5.83, SD = 1.72) compared to participants exposed to the same scenario but with low SIAS scores (M = 3.91, SD = 1.97). In comparison, the SIAS score did not impact perceived anxiety in the case of positive verbal behavior (High SIAS: M = 4.75, SD = 1.68; low SIAS: M = 4.6, SD = 1.88). No interaction effect in Nonverbal behavior * SIAS was found [F _(1,76) < 0.01, p = 0.994].

Figure 7

Figure 7. Interaction Verbal behavior * SIAS traits on perceived anxiety.

Notably, the three-way interaction Verbal Behavior * Nonverbal behavior * SIAS was marginally significant [F _(1,76) = 4.17, p = 0.044]. Again, the source of three-way interaction concerned the difference in perceived anxiety depending on the SIAS score within the combination of Verbal and Nonverbal behavior (Figure 8). Specifically, participants with higher SIAS scores exposed to both negative verbal and non-verbal behavior reported more anxiety (M = 6.12, SD = 1.96) compared to low SIAS participants (M = 4.58, SD = 2.19) exposed to the same scenario. Similarly, participants higher in SIAS exposed to positive non-verbal behavior but negative questions reported more anxiety (M = 5.6, SD 1.58) compared to low SIAS participants exposed to the same scenario (M = 3.1, SD = 1.37). In case of positive verbal behavior expressed by the audience, the perceived anxiety was comparable between positive vs. negative nonverbal behavior in high vs. low SIAS score (negative nonverbal behavior and high SIAS score, M = 4.5, SD = 1.78; negative nonverbal behavior and low SIAS score, M = 4.7, SD = 2.31; positive nonverbal behavior and high SIAS score, M = 5, SD = 1.63; positive nonverbal behavior and low SIAS score, M = 4.5, SD = 1.43). However, none of the comparisons were significant after post hoc correction (all p n. s).

Figure 8

Figure 8. Three-way interaction Verbal behavior * Nonverbal behavior * SIAS traits on perceived anxiety.

3.4 Physiological results

Heart rate (HR): No main effects of Verbal behavior [F _(1,76) = 2.24, p = .138] and Nonverbal behavior [F _(1,76) = 0.23, p = 0.630] were found concerning the HR (baseline corrected). The interaction effect Verbal behavior * Nonverbal behavior resulted non-significant [F _(1,76) = 0.91, p = 0.647 The interaction effect Verbal behavior * Nonverbal behavior resulted non-significant [F _(1,76) = 0.21, p = 0.342].

Non-specific skin conductance response (NS-SCR): Six participants (PA-PQ = 3, PA-NQ = 1, NA-PQ = 2) were excluded from the analysis given the lack of skin conductance response during the public speaking task. Then, we performed the analysis on the remaining 74 participants. A main effect of Verbal behavior [F _(1,70) = 5.52, p = 0.021] was found (Figure 9). Participants exposed to a positive verbal behavior audience presented higher amplitude (M = 0.541, SD = 0.343) compared to the negative condition (M = 0.377, SD = 0.255) (Figure 9). The main effect of Nonverbal behavior [F _(1,70) = 0.29, p = 0.682] and the interaction effect [F _(1,70) = 3.51, p = 0.065] Verbal behavior * Nonverbal behavior resulted in a non-significant effect.

Figure 9

Figure 9. The effect of Verbal behavior on NS-SCR in the negative and positive exposure conditions.

4 Discussion

The present study investigated the interplay of verbal and non-verbal behavior of virtual audiences in inducing public speaking anxiety using virtual reality. Participants were distributed into four experimental groups in which the virtual audience of the speech exhibited positive or negative non-verbal attitudes when the participants told the topic of the speech. During the second part of the task, the same virtual audience was engaged in a Q&A session with the participants, expressing supportive or annoying questions. The study included self-report measurements of the virtual experience, perceived anxiety and the physiological activation (skin conductance) during the performance.

First, no difference in the perceived immersion inside the virtual environment was found across different conditions. This result is not surprising, given that the graphic and acoustic elements used for the favorable and hostile audiences were similar among the scenarios. Moreover, participants did not report a clear difference in audience perception between positive and negative non-verbal behavior. Participants’ self-report measurement revealed a trend toward the impact of non-verbal behavior in evaluating the audience’s interest during the speech. This might suggest that our participants probably did not entirely capture the features used to manipulate the audience’s attitude, perhaps due to the difficulty of mimicry implicit and subtle audience features in virtual reality (Kroczek and Mühlberger, 2023). This lack of distinction might reflect the difficulty of reproducing a more implicit component of social interaction, as the non-verbal behavior is. Another explanation is that including pictures behind the agents might have distracted the participants from the audience, letting the speaker focus more on the speech. Including further measurements, such as eye-tracking, might be helpful to clarify if participants pay attention to the audience or not during the exposure.

The analysis of the perceived anxiety did not reveal the main effects of verbal and non-verbal behavior, nor the interaction effect. This evidence could come from the fact that our participants may have failed to capture the manipulated public attitudes features completely; this would not have allowed us to detect an effect of the assigned scenario on perceived anxiety. However, the absence of any effect from the experimental manipulation could also be ascribed to variations in susceptibility to public speaking anxiety between individuals with high versus low anxiety traits. Indeed, a supplementary analysis was conducted using anxiety scale scores as covariates. Notably, the Social Interaction Anxiety Scale (SIAS), which concentrates on the anxiety of social interactions, explained the more variance with the data as opposed to more general anxiety scales such as the State-Trait Anxiety Inventory (STAI). This is further substantiated by the observed positive correlation between SIAS scores and reported anxiety post-performance, contrasting with the absence of such correlation for the STAI scales. As one might expect, participants higher on the SIAS scale reported more anxiety compared to the low SIAS score participants. Notably, the SIAS score also influenced the impact of the negative verbal behavior of the audience during the speech on perceived anxiety. Indeed, participants with high SIAS scores reported much more anxiety compared to participants exposed to the same scenario but characterized by low SIAS scores. Similarly, the three-way interaction showed a higher level of anxiety in the case of both negative verbal and nonverbal behavior scenarios for high SIAS scores participants. This suggests that a hostile and adverse audience can significantly impact anxious speakers, particularly those with higher SIAS scores, who may also experience more pronounced anxiety, maybe due to a specific negative belief system. (Beidel et al., 1985). According to Rokeach (1960), a belief system constitutes the array of convictions, values, and perspectives that influence how individuals interpret the world and their experiences. So, during our task, participants with high SIAS levels might have analyzed the experience as more anxious by perceiving more significant threats or risks than they were. Indeed, our findings align with previous research on public speaking and anxiety traits. For instance, Perowne and Mansell (2002) found that, compared to low-trait anxiety scores, participants with higher anxiety scores were more likely to perceive their performance as worse.

Considering the physiological arousal during the speech, participants engaged in a positive and encouraging Q&A session showed increased skin conductance activity. Our finding might contradict the classical idea of physiological arousal as a marker of stress and anxiety (Jacobs et al., 1994). However, it is essential to note that skin conductance and physiological arousal, in general, reflect the activity of the sympathetic nervous system without any emotional valence (i.e., excitement or stress or fear). Indeed, physiological arousal indicates how exciting an emotional experience is (Kensinger, 2004) but does not explain the valence of the experience: the aroused state feels no different from one type of emotion to another (i.e., joy, anger, passion, anxiety). Moreover, skin conductance might also reflect a measurement of attention and engagement (Frith and Allen, 1983). The physiological activity should also be interpreted considering the self-report experience and the context of experimental manipulation. In a previous study, our research group (Frigione et al., 2022) demonstrated different meanings of skin conductance activity depending on the context in which the participant was exposed. This is particularly true for VR experiments involving participants in immersive and realistic experiences. Indeed, in the present study, it is more plausible that the increased skin conductance activity reflected the “engagement” of the participants in a pleasant and interactive exchange with the virtual audience. This interpretation seems supported by the lack of correlation between the skin conductance activity and the perceived anxiety reported by the participants. Future investigations are needed to clarify the meaning of physiological measurement in a public speaking context and integrate it with what the participants report from the experience (implicit explicit measurement comparison). In this case, the arousal experienced during the PS task may increase goal engagement in the speaker, progress toward goals (i.e., completing the speech), and self-efficacy (Carver and Scheier, 1998; Pavett, 2016).

Limitations of the study

One limitation of this study is related to the graphic similarity among the scenarios that could have weakened the impact of the nonverbal gestures of avatars that needed to be more clearly distinguishable in positive and negative. Moreover, the interaction between audience and speakers was based on pre-recorded questions. In the future, the use of artificial intelligence may overcome these limits and allow online interaction. A second main limit is the lack of deeper investigation regarding the sense of presence: we limited the investigation of perceived immersion in the virtual environment using a Likert scale. Indeed, the lack of Italian-validated questionnaires prevents us from using standardized presence scores. However, it is important to note that sense of presence was not a primary outcome in our study since graphical features of the environment were comparable across experimental conditions. In general, using HMD allowed participants to live a virtual experience. Nevertheless, the absence of any sensory feedback limited the sense of immersion and emotional engagement (Montana et al., 2020). This circumstance could have affected the participants’ responses. Another important consideration concerns the sample, which is mainly composed of university students. Future investigations could confirm these findings in a large and heterogeneous population.

Conclusions

To summarize, this study explored the impact of the audience’s verbal and non-verbal behavior in inducing public speaking anxiety in immersive exposure (VR). As the main findings, the negative attitudes expressed by a virtual audience impact differently on perceived anxiety in high vs. low anxiety traits. Moreover, participants involved in a pleasant Q&A session (verbal behavior) showed increased physiological activity, which might reflect engagement during the performance.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://osf.io/hnwbf/.

Ethics statement

The studies involving humans were approved by the University of Milano Bicocca, Department of Psychology. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any identifiable images or data included in this article.

Author contributions

MG: Conceptualization, Data curation, Formal Analysis, Methodology, Supervision, Writing–original draft, Investigation. IF: Conceptualization, Data curation, Investigation, Writing–original draft. MM: Data curation, Investigation, Writing–original draft. MS: Methodology, Software, Writing–review and editing. MP: Project administration, Writing–review and editing. AM: Resources, Writing–review and editing. AG: Conceptualization, Project administration, Resources, Writing–review and editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Arsalan, A., and Majid, M. (2021). Human stress classification during public speaking using physiological signals. Comput. Biol. Med. 133, 104377. doi:10.1016/j.compbiomed.2021.104377

PubMed Abstract | CrossRef Full Text | Google Scholar

Beidel, D. C., Turner, S. M., and Dancu, C. V. (1985). Physiological, cognitive, and behavioral aspects of social anxiety. Behav. Res. Ther. 23, 109–117. doi:10.1016/0005-7967(85)90019-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Benedek, M., and Kaernbach, C. (2010). A continuous measure of phasic electrodermal activity. J. Neurosci. Methods 190 (1), 80–91.

PubMed Abstract | CrossRef Full Text | Google Scholar

Blascovich, J. (2002). “A theoretical model of social influence for increasing the utility of collaborative virtual environments,” in Proceedings of the 4th international conference on collaborative virtual environments (CVE '02) (New York, NY, USA: Association for Computing Machinery), 25–30. doi:10.1145/571878.571883