Gender Stereotypes in Student Evaluations of Teaching

Renström, Emma A.; Gustafsson Sendén, Marie; Lindqvist, Anna

doi:10.3389/feduc.2020.571287

ORIGINAL RESEARCH article

Front. Educ. , 11 January 2021

Sec. Teacher Education

Volume 5 - 2020 | https://doi.org/10.3389/feduc.2020.571287

This article is part of the Research Topic Gender Equality and Women's Empowerment in Education View all 16 articles

Gender Stereotypes in Student Evaluations of Teaching

$\nEmma A. Renstrm$ Emma A. Renström¹^*

Marie Gustafsson Sendén²

Anna Lindqvist³

¹Department of Psychology, University of Gothenburg, Gothenburg, Sweden
²Department of Psychology, Stockholm University, Stockholm, Sweden
³Department of Psychology, Lund University, Lund, Sweden

This paper tests how gender stereotypes may result in biased student evaluations of teaching (SET). We thereby contribute to an ongoing discussion about the validity and use of SET in academia. According to social psychological theory, gender biases in SET may occur because of a lack of fit between gender stereotypes, and the professional roles individuals engage in. A lack of fit often leads to more negative evaluations. Given that the role as a lecturer is associated with masculinity, women might suffer from biased SET because gender stereotypes indicate that they do not fit with this role. In two 2 × 2 between groups online experiments (N's = 400 and 452), participants read about a fictitious woman or man lecturer, described in terms of stereotypically feminine or masculine behavior, and evaluated the lecturer on different SET outcomes. Results showed that women lecturers were not disfavored in general, but that described feminine or masculine behaviors led to gendered evaluations of the lecturer. The results were especially pronounced in Experiment 2 where a lecturer described as displaying feminine behaviors was expected to also be more approachable, was better liked and the students rather attended their course. However, a lecturer displaying masculine behaviors were instead perceived as being more competent, a better pedagogue and leader. Gender incongruent behavior was therefore not sanctioned by lower SET. The results still support that SET should not be used as sole indicators of pedagogic ability of a lecturer for promotion and hiring decisions because they may be gender-biased.

Introduction

The purpose of this article was to test the impact of gender stereotypes in student evaluations of teaching (SET), in two online social psychological experiments. Previous research in this field indicates a gender bias in SET where women generally receive lower SET compared to men (e.g., MacNell et al., 2015; Boring, 2016; Mengel et al., 2018; Mitchell and Martin, 2018; Fan et al., 2019). With this article, we contribute to an ongoing discussion about the use of SET, both as formative and summative evaluations of teaching and teachers. We provide new insights into the mechanisms behind SET and how they relate to a lecturer's gender identity and gendered behavior.

Taking a social psychological perspective, gender biases may occur because gender stereotypes prescribe and proscribe certain behaviors for individuals of different genders. Specifically, when gender stereotypes and professional roles do not fit, the individual can be sanctioned with negative evaluations (Heilman, 2001; Heilman and Chen, 2005; Heilman and Haynes, 2005). In this article, we test to what extent women lecturers in higher education are sanctioned by low SET due to a tradeoff between behaviors expected from the supposedly masculine-coded role as a university lecturer, and the stereotypes about how women should and should not be.

Student Evaluations of Teaching

Originally, SET were introduced for formative purposes. That is, the evaluations were to be used in order to improve and shape the quality of teaching (Hornstein, 2016). Since then, SET has become a primary indicator of summative evaluations of a lecturer's performance. That is, SET are used as an overall sum of pedagogical competence, often as the sole indicator of this competence (Berk, 2005; Galbraith et al., 2012; Spooren et al., 2013). SET are now often used for promotion and hiring decisions (Cashin, 1999; Seldin, 1999; Clayson, 2009; Davis, 2009; Seldin et al., 2010), indicating that it is important to understand systematic variations in SET.

SET were first criticized by Adams (1997), where he pointed out several flaws such as validity, reliability, gender bias, and a number of other related issues (Yunker and Yunker, 2003; Wright, 2006; Beecham, 2009; Hoefer et al., 2012; Spooren et al., 2013; Braga et al., 2014; Stark and Freishtat, 2014; Boring et al., 2016). It is suggested that SET mainly reflects satisfaction with teaching among students after they have finished a course. As such, it is argued that SET rather should be seen as a popularity measurement, rather than a measurement of teaching capability (Beecham, 2009; Spooren et al., 2013; Braga et al., 2014; Stark and Freishtat, 2014). This paves the way for both individual and contextual factors to exert influence regarding high or low evaluations and leads to the aim of the present article—to test if gender stereotypes influence SET.

Several studies have shown a gender bias in SET, although the results are inconclusive. Many studies have shown that women receive lower evaluations than men (MacNell et al., 2015; Boring et al., 2016; Mengel et al., 2018; Mitchell and Martin, 2018). For instance, Boring et al. (2016) showed a systematic gender bias in SET where women lecturers received lower evaluations on seemingly objective aspects, such as how promptly assignments were graded. Likewise, Mitchell and Martin (2018) showed that a woman lecturer was rated lower on other similar aspects, such as the course itself, work load, the technology, etc. However, some studies show that women receive higher ratings than men (Rowden and Carlson, 1996; Bachen et al., 1999), and finally, some have not found a difference between evaluations of women and men (Feldman, 1993; Centra and Gaubatz, 2000). These results imply that gender of a lecturer alone is not sufficient to explain variations in SET between women and men lecturers. One possible cause to the inconsistencies in earlier results may be that both individual and contextual factors interact with a lecturer's gender (Boring et al., 2016). For instance, Boring et al. (2016) found that the gender bias in SET varied with, for example, discipline. These results are supported by Mengel et al. (2018), who showed that the gender bias is magnified in mathematical courses, and particularly pronounced for younger women lecturers. One explanation might be that the STEM-field (Science, Technology, Engineering, and Math) is heavily dominated by men (Makarova et al., 2019), where (younger) women accordingly violate the gender norms, resulting in a lack of fit between the expectations of their gender role and the expectations of the role as a university lecturer, which could explain the bias (Heilman, 1983, 2012). Such lack of fit, described more below, indicate that a woman lecturer behaving in a “masculine” way may receive different SET as compared to a woman lecturer acting in a “feminine” way, which essentially decreases the lack of fit. To better understand the complexity of how gender, stereotypes and fit between a lecturer's gender and their behavior operate to influence biases in SET, we now turn to social psychological theory.

Gender Stereotypes

Gender stereotypes are collective mental representations about what is typical regarding women and men when it comes to personality, behavior, and/or expression (Ellemers, 2018). This means that gender stereotypes are shared generalizations about women and men, and the consensus of these generalizations among the population is high (Hentschel et al., 2019). The content of the gender stereotypes pertain to two core dimensions in social judgment, referred to as agency and communion (Abele and Wojciszke, 2014). Agency refers to goal-achievement, whereas communion refers to the maintenance of social relationships (Bakan, 1966). Women are more often perceived as communal (e.g., caring, sensitive, loyal, and understanding; Eagly and Wood, 2012), while men are more often perceived as agentic (e.g., independent, assertive, dominant, self-reliant, and determined). Hence, agentic traits are traditionally associated with masculinity, while communal traits are traditionally associated with femininity. Importantly, gender stereotypes function both prescriptively (what women and men should engage in, and how they should be), and proscriptively (what they should not engage in and be) (Gustafsson Sendén et al., 2019; Hentschel et al., 2019).

When gender stereotypes are fulfilled, that is, when women perform communal tasks and men perform agentic tasks, individuals are positively evaluated. Thus, lecturers who adhere to gendered expectations can be evaluated more favorably (Andersen and Miller, 1997). For example, Boring (2016) found that women lecturers received the highest ratings on availability and quality of contact—two characteristics typical of the stereotypes for women (Abele and Wojciszke, 2014). In relation to social perception and evaluation of others, the problem with stereotypes becomes evident when they are challenged—when gender and role, or behavior, mismatch. When stereotypes regarding roles or behavior and gender are incongruent (i.e., lack of fit), individuals are likely to be sanctioned and negatively evaluated (Heilman, 1983, 2012; Eagly and Karau, 2002; Heilman and Okimoto, 2007; Brescoll et al., 2010). Rudman et al. (2012) discuss a gender backlash effect where women can reach higher positions through agentic behaviors, but they are at the same time disliked and hence not viewed as hirable. This leads women to a situation where they are forced between being liked or being respected, which undermines their ability to achieve positions of power (Rudman et al., 2012). For instance, when women engage in behaviors typically considered as masculine, they are less liked and their behavior is found to be less socially accepted, as compared to when men engage in the same behavior (Bartol and Butterfield, 1976; Jago and Vroom, 1982; Carli, 1990; Carli et al., 1995; Heilman and Okimoto, 2007). This seems to be true in students' perceptions of lecturers as well. When gender roles are violated by lecturers, students become critical (Chamberlin and Hickey, 2001; Sprague and Massoni, 2005). This suggests that if gender stereotypes are responsible for the variation in SET between women and men lecturers that has been observed in previous research, the role as a lecturer is coded as masculine. Traditionally, higher education has been exclusively for men, which could still affect how the role as a university lecturer is perceived in terms of gender. Moreover, being a lecturer at a higher education institution is a leadership role, and because leadership and authority traditionally are associated with masculinity (see Heilman and Okimoto, 2007), women lecturers violate gender stereotypes and may face biases and criticism (Eagly and Karau, 2002). Hence, women lecturers must balance the demands of their gender role, as well as the demands of being an authority figure, which inevitably will lead to some sort of discrepancy. Taken together, theory and empirical studies highlight the difficulty that women lecturers have in balancing the tension between agentic demands from the leadership role and communal demands from the gender role (Zhen et al., 2018).

Overview of the Present Research

The present research zooms in on the discrepancy between gender stereotypes and the role as a university lecturer as a source of gender bias in SET. Specifically, we test if women lecturers are sanctioned if they do not engage in traditionally feminine behaviors, or lack traditionally feminine characteristics (Rudman, 1998; Rudman and Glick, 2001). The following hypotheses are formulated:

H1: Women lecturers receive lower SET on average, compared to men lecturers.

H2: A woman lecturer described as having traditionally masculine behavior and characteristics, receive the lowest SET.

In two experiments, students were presented with a description of a fictive lecturer. The descriptions varied with respect to the lecturer's gender (the lecturer was referred to as either “she” or “he” in the text). Moreover, the behavior and characteristics of the lecturer were described as either stereotypically feminine or stereotypically masculine. In Experiment 1, the description of the lecturer contained both positive and negative feminine/masculine behaviors and traits. In Experiment 2, the valence of feminine/masculine behaviors and traits (i.e., positive and negative) was even more balanced. Participants' task was to rate the lecturer on common SET items. Experiment 1 used a wide range of SET items, mainly from previous literature. In Experiment 2, the number of items were reduced due to semantical overlap.

The studies were carried out in accordance with the national guidelines on ethical research established by the Swedish Research Council retrievable at: https://publikationer.vr.se/en/product/good-research-practice/.

Experiment 1

Because our hypotheses are formulated to test the potential mismatch between the role as a university lecturer, and the female gender role, we first established that the role as a university lecturer was indeed coded as masculine. In a pilot study, 82 students read a description of a lecturer. The description varied with respect to gender stereotypical (feminine and masculine) characteristics and behaviors of the lecturer, but no actual gender information was provided (i.e., we replaced the pronoun with X). After reading the description of the lecturer, participants indicated what gender they thought the lecturer had, as a free-text response. Across the feminine (n = 33) and masculine (n = 49) conditions, 74 (90%) participants indicated that the lecturer was a man, only 8 (10%) indicated a woman (masculine condition: man = 44, woman = 5; feminine condition: man = 30, woman = 3). No other genders were suggested. Hence, the role as university lecturer is clearly associated with masculinity.

Method

Participants, Design, and Procedure

Four hundred US students, who were currently enrolled in higher education, were recruited from the platform Prolific Academic. Participant gender was assessed by free-text (Lindqvist et al., 2020); the sample consisted of 196 men (49%), 185 women (46%), 21 participants (5%) gave another response than woman/man.¹ Mean age was 27 years old (range: 18–63, SD = 8.26).

To assess the impact of lack of fit between the lecturer role and gender role, we designed an experiment where the lecturer's gender and behavior varied between conditions. The design was a 2 (gender: she/he) × 2 (behavior: feminine/masculine), between groups factorial design. For example, in the feminine version, the lecturer was described as supportive and caring, being available for students, being responsive and empathic, while the masculine version was described as more focused on the research, being assertive and demanding, expecting hard work, and being unavailable. The descriptions were balanced in that the feminine version also contained some negative feminine traits, such as being uncertain, whereas the masculine version contained some positive masculine traits, such as being certain. The descriptions are provided in the Supplementary Material. Participants were randomly assigned to one of the four conditions (n's= she/masculine = 119, she/feminine = 89, he/masculine = 99, he/feminine = 94).

Measures

To measure SET, a range of measures from previous research were included. The Professor Effectiveness scale (Goebel and Cashen, 1979; Wilson et al., 2014), The Brief Professor-Student Rapport Scale (Ryan and Wilson, 2014) with two sub-scales (Perceptions of the teacher and Student Engagement). Personal characteristics of the lecturer were assessed by items suggested by MacNell et al. (2015) and Boring (2016). To assess perceptions of the lecturer's competence, we included items referring to more general perceptions of the course and the pedagogy, since these may better reflect competence compared to the evaluation of individual characteristics. These items were averaged into a mean index. Two items measured the difficulty level of the course, and two items measured the general impression of the course. Finally, participants rated warmth and competence (Fiske et al., 2007). Where indices were made of the scales, we averaged the items into a mean index. Cronbach's α's for these scales are shown in Table 1, where it is also detailed if the items were analyzed separately (i.e., not included in a scale). The questions are summarized in Table 1.

TABLE 1

Table 1. Scales and items used in the experiments.

Results

For all of the outcome measures detailed in Table 1, we computed 2 × 2 ANOVAs with gender of the lecturer (she/he) and gendered behavior (feminine/masculine) as between-participant factors. We also included participant gender as covariate. Means, standard deviations and F-values for the main effects are shown in Table 2. Only the main effects are presented, because none of the interaction effects were significant.

TABLE 2

Table 2. Means, standard deviations (in parentheses) and F-values from univariate ANOVAs for main effects of conditions (she/he; feminine/masculine), in Experiment 1, N = 400.

The first hypothesis stated that women lecturers overall should receive lower SET than men. The results showed no main effects of the lecturer's gender on any of the outcome variables, see Table 2. The second hypothesis stated that women lecturers described as having masculine characteristics and behavior should receive the lowest SET. This hypothesis implies that we would see interaction effects between gender of the lecturer and described behavior. However, none of the interactions were significant. Thus, the results indicate that there were no differences between how a woman lecturer was rated depending on feminine/masculine behavior, as compared to a man lecturer described with feminine/masculine behavior. This means that neither of the hypotheses were supported. Interestingly, there were significant main effects of whether the lecturer was described as having feminine or masculine characteristics on all outcome variables. The means are shown in Table 2. For easier overview, significant differences in favor of the feminine description are marked in bold, while differences in favor of the masculine description are marked in gray.

In sum, participants rated a feminine behavior more positively than the masculine behavior on almost all the outcome measures. The difference on many items are unsurprising since the text in the feminine condition described a lecturer that was more involved with the students and teaching, therefore it can be expected that students would prefer a lecturer with these characteristics. For instance, in the Professor Effectiveness scale, the items encourages questions, is organized, can explain concepts, behaves in a friendly manner, and is generally a good teacher should receive higher values based on the text in the feminine condition. An interesting finding was that the participants expected that the masculine lecturer would expect good work and assign too much work to a higher degree compared to the feminine lecturer. Other results that are not easily explained by the descriptions of the lecturer are the items related to difficulty. The participants thought that the course had higher requirements and that students at the course studied more when the behavior of the lecturer was masculine.

Combined, the results indicate that the participants rate a lecturer described in feminine terms more positively, and they rather attend their course, compared to a lecturer described in masculine terms. However, the participants thought that the masculine behavior implied higher demands and a more difficult course, where students actually did put in more hours. These are not unambiguously negative features from a learning perspective.

Finally, the lecturer with masculine behavior was rated as less competent than the lecturer with feminine behavior. Even though the effect was smaller compared to the other effects in this study, it was significant. This was surprising since competence has been strongly associated with masculinity (Fiske et al., 2007). However, recent research show that competence is one aspect of gender stereotypes that has changed the most over the years, and that women now sometimes are perceived as more competent than men (Gustafsson Sendén et al., 2019; Eagly et al., 2020). Hence, the results are not contradicting of recent research. Also, in the masculine condition, the lecturer was described as more competent as a researcher than teacher, while the feminine behavior was described as more competent in pedagogy. It is possible that this asymmetry between competence in different areas influenced the participants when they made the overall competence rating. From a student perspective, pedagogical competence should be more important in SET than research competence.

One reason for the lack of main effects of the lecturer's gender, or interactions with description of behavior and characteristics, may be that the feminine version overall was seen as more positive from a student's perspective. Hence, in a second experiment, the descriptions of the lecturer were more ambiguous, so that the feminine condition also entailed more negative feminine traits and the masculine condition entailed more positive masculine traits. We also reduced the number of outcome variables, and focused on assessments of the course that were not directly related to the individual described.

Experiment 2