Professor Age and Gender Affect Student Perceptions and Grades

Student evaluations provide rich information about teaching performance, but a number of factors beyond teacher effectiveness influence student evaluations. In this study we examined the effects of professor gender and perceived age on ratings of effectiveness and rapport as well as academic performance. We also asked students to rate professor attractiveness as a potential explanation for group differences. Participants (N = 308) saw a picture of either a young or old male or female professor while listening to an audio lecture. Students reported greater perceived rapport and attractiveness with the female relative to the male professors and for younger versus older professors. However, students reported the male professors as more effective than the female professors. An interaction revealed that among female professors only, younger women were rated as more attractive than comparison conditions. Thus, age and gender bias likely impact student evaluations of teaching. Our study also revealed higher quiz grades in the older-female condition.

extended research outside of the classroom by showing students photographs of younger and older professors. Students rated younger professors as more attractive than older professors, revealing a bias toward youth. This societal perception is pervasive, with youth typically seen as more attractive (Bargh, Chen, & Burrows, 1996;Zebrowitz, Olson, & Hoffman, 1993). Because attractiveness relates with higher teaching evaluations, older professors may suffer.
Of course cultural expectations of attractiveness across the lifespan differ for men and women. A youthful appearance among women is particularly valued, with younger women seen as more attractive than older women (Sarwer, Grossbart, & Didie, 2003). Although Wilson and colleagues (2014) found higher student ratings of attractiveness for younger professors, the effect was driven by ratings of female pictures. In an interaction, younger female professors were rated as more attractive than older female professors; this effect was not seen for male professors. Based on these data, older female professors may experience the most bias in teaching evaluations if they are seen as unattractive by a society valuing youth.
Interestingly, even when beauty cannot be assessed, instructor age and gender influences student evaluations. Arbuckle and Williams (2003) showed students a computer-generated gender-neutral stick figure presenting a 35-minute lecture in a gender-neutral voice. They asked students to identify the stick figure as either male or female and old or young as well as evaluate the lecture. Of the four possible professor age/gender groups, when students perceived the stick figure as a young male professor, they rated the figure as speaking more enthusiastically and using a meaningful tone of voice.
The available literature suggests that instructor gender and age influence evaluations even when students have only a picture or a stick figure on which to base judgments. Researchers might argue that such studies have limited external validity. Conversely, classroom studies may be influenced by the ongoing, dynamic nature of professor-student interactions, reducing internal validity. In the current study, we provided photographs of professors as well as additional classroom-related information in the form of a lecture. Students viewed either a male or female professor who was either young or old, listened to and completed a quiz on a brief lecture, and then rated the professors on attractiveness, effectiveness, and rapport. Our hypotheses were as follows: 1. We expected professor gender to affect student expectations of instructor effectiveness, with higher effectiveness ratings for the male professors, regardless of age, than female professors. 2. We expected professor gender and age to affect ratings of the instructor, with higher ratings of attractiveness and rapport expected for the younger female professor. 3. Finally, grades were expected to be higher for the young female instructor based on the supposition (Hamermesh & Parker, 2005) that students' focus is enhanced by attractiveness.

Participants
In this study, 340 (127 men and 213 women) students from a southeastern university participated. Our sample consisted of 242 white participants, 75 black participants, and 23 minorities including Asian, Native American, Hispanic, and mixed ethnicities as well as no response to this demographic item. The average age of our sample was 19.88 years old (SD = 2.79). The majority of participants were enrolled in an introductory psychology course, although students in additional psychology courses were invited to participate if their instructor allowed credit. All students signed up to participate using an online participant-management system, after which they received a link to an online study hosted by Qualtrics. We received IRB approval prior to running the study, and all students were treated ethically. Of the original 340 participants, 308 completed the entire survey and remained in the data set.

Materials
Two pictures of "instructors" were chosen from a web search of publically available pictures. Both images were then digitally altered to make each of them appear older (i.e., wrinkles and lighter hair color), for a total of four images. We used black-and-white, highquality pictures of the head and neck only.
The lecture was a three-minute audio file about the history of Bedlam Hospital in London. A 16-year-old boy prepared the audio file, and digital alteration increased the frequency to create a gender-ambiguous audio that could feasibly represent both ages and genders of "instructors." At the beginning of the Qualtrics survey, students were told that they would hear a file that was digitally altered. Thus, the artificial sound of the file should have been explained, allowing students to "buy in" to the gender implied by the instructor's picture they viewed. Before the lecture began, students read that they would be quizzed on lecture material. A 10item, true-false quiz assessed learning based on lecture content.
Perceptions of the professor effectiveness were assessed using measures from both Goebel and Cashen (1979) and Wilson and colleagues (2014). These included seven items of teacher effectiveness, including the teacher encouraging questions, expecting good work, assigning too much work, being organized, explaining concepts, behaving in a friendly manner toward students, and overall being a good teacher. For each item, ratings ranged from 1-5, with 1 representing "Strongly Disagree," 2 indicating "Disagree," 3 rating "Neither Agree nor Disagree," 4 representing "Agree," and 5 indicating "Strongly Agree." Based on prior research, these items were not consolidated but instead served as separate measures of teacher effectiveness.
Students also completed the Brief Professor-Student Rapport Scale. The scale contains six items that assess student perceptions of rapport with an instructor. Five-point ratings range from Strongly Disagree to Strongly Agree; two of the items are reverse scored. We altered the wording slightly by asking students to rate how they thought the professor in the picture would behave rather than indicate how an existing professor did, in fact, behave. For example, "My professor makes class enjoyable" was changed to "This instructor would make class enjoyable." The brief version of the scale shows good convergent and discriminant validity with other measures of rapport (Ryan & Wilson, 2014) and correlates with numerous positive student outcomes, including student motivation, number of classes missed, attitudes toward the professor and course, and learning, including end-of-term grades (Wilson & Ryan, 2013). We used average ratings in data analyses.
Attractiveness of the pictured person was rated using one item: "How attractive do you think this instructor is?" rated on a 7-point Likert scale from Very Unattractive to Very Attractive. Students also rated their perception of the teacher's age in years by answering the item: "How old do you think this instructor is?"

Procedure
Participants clicked on a link to the survey and indicated their agreement to continue by clicking the bottom of an informed-consent form. Next they saw one of four instructor pictures randomized based on the computer program: adult man, older adult man, adult woman, and older adult woman. Near the picture was a standard bar for an audio file with an iconic Play button. In bold below the box, students learned that the voice they would hear had been digitally altered. They were instructed to listen to the entire lecture because they would be quizzed on the material immediately following the lecture.
After listening to the lecture and completing the quiz, students completed the seven professor assessments used by Goebel and Cashen (1979) and Wilson and colleagues (2014). Next, participants completed the Brief Professor-Student Rapport Scale. The picture remained above the scale when completing ratings. On the next page, the picture again appeared at the top, and students were asked to indicate perceived age of the instructor and attractiveness. Lastly, students provided basic demographic information on themselves.

Collapsing Data
Although our manipulation check revealed that images altered to communicate an older professor did cause students to estimate a higher age relative to the younger images, the perceived ages differed by only 2.84 years based on variability within conditions. Further, the male instructor was seen as older than the female instructor, particularly in pictures intended to communicate youth, even though the pictures were intended to create perceptions of similar age. Because student perceptions of instructor age were our primary concern, and perceived age relates to student ratings of teaching effectiveness (Arbuckle & Williams, 2003), we divided participants into two groups based on participants' perceptions of the instructor as up to 40 years old or above 40 (median-split procedure using perceived age across the entire data set). Of the group in which the man was intended to look young, 49 participants assumed he was, in fact, at or below 40 years old; however, 33 students perceived him as over 40. In the male picture intended to look older, 66 students assessed him as over 40, and only 10 assumed he was 40 years old or less. Of female pictures, the younger image revealed a good student match with expected and perceived age, with 67 assuming she was 40 or younger; only 8 thought she was older than 40. Finally, the picture of the older female was perceived as above 40 by 59 participants and up to 40 years old by 16 participants.
The final analysis contained 77 participants perceiving a younger male professor, 81 perceiving an older male professor, 94 perceiving a younger female professor, and 56 perceiving an older female professor.

Primary Analysis
We analyzed these data using a 2 (gender of professor) X 2 (perceived age of professor), between-groups MANOVA. Dependent variables of interest included seven items related to teacher effectiveness as well as the Brief Professor-Student Rapport Scale (Wilson & Ryan, 2013), the attractiveness rating, and quiz grades in percent correct.
Attractiveness and quiz-grade outcomes were further explained by an interaction between instructor gender and perceived age, F(1, 304) = 11.83, p = .001, partial η 2 = .037, and F(1, 304) = 4.93, p = .027, partial η 2 = .016, respectively. As seen in Figure 1, across pictures of the female professor, the younger picture was rated as more attractive (M = 5.26, SEM = .12) than the older picture (M = 4.21, SEM = .15). Ratings of male attractiveness did not vary with age (p > .05). On the quiz outcome, students earned higher scores when they thought they were hearing from an older female (M = 63.39, SEM = 2.00) than a younger female (M = 54.68, SEM = 1.55) as well as  Figure 2. Participants earned better quiz grades on a lecture they perceived to be given by an older female instructor than either a younger female instructor or an older male instructor. Error bars represent SEM.

Discussion
Our first hypothesis was that students would rate the male professor in the current study as more effective than the female professor, regardless of age. This hypothesis was supported. Of the seven separate items measuring effectiveness, explaining concepts arguably serves as the clearest indication of effective teaching, and this item reflected student perceptions in favor of male instructors. Our second hypothesis was that whereas the younger female professor would earn higher ratings of attractiveness and rapport than the older female professor, this effect would not be seen for male professors. This hypothesis was partially supported. Results were as expected for attractiveness, but for rapport, there were overall effects for gender and age (with younger and female professors being seen as engendering higher rapport), but these variables did not interact. Our third hypothesis that the younger female professor would inspire higher grades was not supported. In fact, participants who perceived an older female professor scored higher on the quiz.
Students expect male professors to be effective in their work but expect female professors to spend time building supportive relationships with students. According to Kierstead, D'Agostino, and Dill (1988), male professors earned better student evaluations if they demonstrated competence, but female professors had to demonstrate both competence and warmth to obtain the same high ratings. Unfortunately, failure to behave as gender roles dictate (according to students) results in greater hostility toward female professors, in particular (Sprague & Massoni, 2005) and may also result in poorer evaluations. Professors who behave in a way to support expectations are rewarded in student evaluations. In our study, perhaps higher rapport ratings for female professors can be explained by higher ratings of attractiveness. Goebel and Cashen (1979) found that students perceived more attractive professors as friendlier, more encouraging, more organized, less likely to give too much work, and better professors overall than unattractive professors. Perhaps it is not women who are expected to be warm but attractive people who are expected to be warm. To test this potential explanation, we examined the correlations between attractiveness and rapport based on the female pictures, r(148) = .24, p = .003, and male pictures, r(156) = .23, p = .003, both of which yielded significant relationships. However, the large sample size influenced our ability to find significance, and the correlational values were weak to moderate. Certainly, attractiveness explains some variability in rapport, but we cannot say with confidence that gender of the professor did not further explain student ratings of rapport.
Likewise, students rated younger professors as more attractive and warmer (higher rapport) than older professors. In fact, perceived age correlated negatively with rapport, r(306) = -.27, p < .001, and ratings of attractiveness, r(306) = -.31, p < .001. These results are not surprising given societal norms. Lucăcel and Băban (2014) asked participants in their twenties about perceptions on aging and found that the majority of people in their sample held a negative perception of old age and the aging process. Similarly, individuals under 35 years are less likely to believe that older people can be as effective as younger workers (Abramson & Silverstein, 2004). Our study suggests that ageist attitudes are not discarded at the classroom door. Students expect older professors to be less effective teachers.
In addition to student perceptions, we directly measured learning with a lecture quiz. We expected attractiveness, rapport, and grades to correlate positively with each other based on Hamermesh and Parker's (2005) argument that students focus more on attractive teachers. However, we found significantly better recall when students thought the lecturer was an older female. This result is particularly surprising because students rated the older female as less attractive than the younger female. Students chose to focus most on a lecture provided by a female they perceived to be over 40 years old.
An older female could activate a schema for "mother," a female likely to expect a strong work ethic. Students' desire to please a mother figure could increase focus during a brief lecture. Indeed, based on a significantly higher quiz grade, we must assume more focus. Higher quiz averages for those perceiving an older woman may indicate that students work harder for older women than younger women or men. This idea is supported by the fact that although perceived age and quiz grades correlated significantly in the overall sample, r(306) = .16, p = .006, this correlation was driven by female professors, r(148) = .27, p = .001. For male professors, the correlation between perceived age and quiz grades failed to reach significance, r(156) = .07, p > .05. Based on the positive relationship between quiz grades and perceived age of the female professor, activating a schema for "mother" may explain higher grades.
Taken together, results of the current study reveal the impact of professor gender and age on student evaluations of teaching and grades. As instructors, we would benefit from minimizing potential negative effects reported here and maximizing benefits associated with professor gender and age. For example, Legg and Wilson (2009) found that when students received an emailed welcome message one week prior to the first day of class, motivation, attitudes toward the instructor, and retention were enhanced. Although students may be able to guess a professor's gender based on a name, an email can attenuate the impact of age.
Professors can also improve students' early impressions on the first day of class. For example, professors might have what students consider an "ideal" first day, which includes covering the syllabus in a welcoming manner, avoiding homework, and ending class early. Although some may argue that ending class early on the first sets a "lazy" tone for the remainder of the course, Wilson and Wilson (2008) found that following students' wishes for the first day of class improved both student motivation and end-of-term grades. As another first-day activity, welcoming students by shaking hands increased ratings of instructor skill and ability to motivate students (Wilson, Stadler, Schwartz, & Goff, 2009). We should caution that this effect occurred for female professors only; for male professors, the opposite effect was seen. Certainly, many approaches can enhance students' perceived effectiveness related to female instructors and rapport related to male instructors as well as older professors.
How does the current study inform the use of teaching evaluations for professor tenure, promotion, raises, and awards? Traditional measures focus on student perceptions, not student performance. As a result, older female professors may be at a disadvantage. The practice of rewarding or punishing faculty based on student evaluations may be unfair if biases exist. In the case of the older female professor, she may be highly effective at helping students learn even if her teaching evaluations are relatively low.

Potential Limitations and Suggestions for Future Research
A potential limitation in this study is the fact that most students in this study were in an introductory psychology course. It is quite possible that more advanced students work through their expectations of others and learn to avoid bias. Unfortunately, a wealth of research suggests that bias, whether explicit or implicit, exists within people regardless of their age (e.g., Koch, D'Mello, & Sackett, 2015). To assess the external validity of our study, additional groups of students beyond those in introductory psychology should be examined.
Students who reviewed professors in this study merely viewed pictures and heard a brief lecture. We recognize that the dynamic nature of a classroom is much more complex, limiting our external validity. For example, with limited teacher information, participants may have relied heavily on perceptions of physical attributes to make inferences about rapport and effectiveness. In the richness of classroom environments, students have additional information based on social interactions, perhaps creating a different pattern of results. However, a manipulated empirical study allows us to identify experimentally an ongoing bias in student evaluations. Such information illustrates the need for caution when depending on student evaluations for faculty tenure, promotion, raises, and awards. As long as gender and age bias exists in the minds of students, discrimination can occur.