Small Group Learning is Associated with Reduced Salivary Cortisol and Testosterone in Undergraduate Students

Small group learning activities have been shown to improve student academic performance and educational outcomes. Yet, we have an imperfect understanding of the mechanisms by which this occurs. Group learning may mediate student stress by placing learning in a context where students have both social support and greater control over their learning. We hypothesize that one of the methods by which small group activities improve learning is by mitigating student stress. To test this, we collected physiological measures of stress and self-reported perceived stress from 26 students in two undergraduate classes. Salivary cortisol and testosterone were measured within students across five contexts: a) preinstructional baseline, b) following a traditional lecture, c) after participating in a structured small group learning activity, d) following completion of multiple choice, and e) essay sections of an exam. Results indicate students have lower salivary cortisol after small group learning activities, as compared to traditional lectures. Further, there is no evidence of a relationship between physiological measures of stress and self-reported perceived stress levels. We discuss how structured small group activities may be beneficial for reducing stress and improving student-learning outcomes. Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster, and Amend Journal of the Scholarship of Teaching and Learning, Vol. 19, No. 5, December 2019. josotl.indiana.edu

Previous research has shown that active student learning activities, such as those that occur in small groups, improve student performance and engagement as compared with traditional instructional lecture (Bradford, Mowder, & Bohte, 2016;Byun, 2014;Coakley & Sousa, 2013;Freeman et al., 2014;Simonson & Shadle, 2013;Swap & Walter, 2015;J. D. Walker, Cotner, Baepler, & Decker, 2008;L. Walker & Warfa, 2017). However, little is known about the mechanism by which this occurs. Identifying these mechanisms may lead to improved learning outcomes both within the classroom and beyond.
We hypothesize that one of the methods by which small group activities improve learning is by mitigating student stress. To compare the stressfulness of group learning to traditional lecture we use measures of salivary cortisol as a physiological indicator of stress in undergraduate courses. We also examined students' physiological stress responses during multiple-choice and short-answer essay sections of an exam. Finally, we compared salivary cortisol to self-reported levels of stress. Since most research on student stress and learning rely on student self-reports, it is important to document how closely these correlate with physiological measures of stress.

Stress, Cortisol, and Learning
Although cortisol has often been called the 'stress hormone', a more accurate statement would describe it as the 'arousal hormone' (Hoyt, Zeiders, Ehrlich, & Adam, 2016). Cortisol is a glucocorticoid hormone that is released when the hypothalamic-pituitary-adrenal (HPA) axis is activated. Circulating cortisol levels increase in response to physical and psychological activation; thus, it is used as a biomarker of a stress response (Kirschbaum, Pirke, & Hellhammer, 1993;Lighthall, Gorlick, Schoeke, Frank, & Mather, 2013;McEwen, 1998;Stephens, Mahon, McCaul, & Wand, 2016). While other measures of stress exist, including serum cortisol, galvanic skin response, heart rate, and blood pressure (Campbell & Ehlert, 2012;Villanueva, Valladares, & Goodridge, 2016), these other measures typically are utilized within laboratory settings due to the need for aseptic techniques, machines, or continuous sensor leads connected to individual participants throughout the evaluation process. In contrast, the use of salivary cortisol readily lends itself to the formal classroom setting where less-invasive and least restrictive environmental conditions are desired to enable the simultaneous evaluation of many students.
Stress has been demonstrated to have positive and negative effects on learning and memory (Sapolsky, 2004). While too little stress can elicit boredom (Merrifield & Danckert, 2014), prior research indicates that a sufficient level of stress enhances learning, but only for positive learning outcomes (Lighthall, Gorlick, Schoeke, Frank, & Mather, 2013), such as those associated with instructional reinforcement of correct responses. However, excessive and repeated stress can reduce cognitive function, increase the risk of cardiovascular disease, and decrease immune function (Lee et al., 2007;McEwen, 1998;Robinson, Sünram-Lea, Leach, & Owen-Lynch, 2008). Evidence has shown that inducing stress results in increased cortisol levels and impaired memory (Kirschbaum, Wolf, May, Wippich, & Hellhammer, 1996). In overly stressful learning contexts, learners may be activating the sympathetic nervous system, often called the "fight-or-flight" response, increasing circulating cortisol, and impairing their ability to retain course material.
Studies using salivary cortisol as a biomarker of stress have found that exams can be used as a naturalistic experiment to examine the effects of academic context on stress. For example, students had increased levels of salivary cortisol immediately prior to oral exams (Lacey et al., 2000;Schoofs, Hartmann, & Wolf, 2008;Singh et al., 2012). Similar results have been found for undergraduates facing oral presentations (Merz & Wolf, 2015). In fact, asking people to do math problems in front of an audience is part of the Trier Social Stress Test (TSST), a reliable and validated method to induce stress in laboratory settings (Allen et al., 2017;Kirschbaum et al., 1993). While oral exams consistently increase salivary cortisol, written exams have more variable effects. For instance, Austrian high school students had varied reactions to written exams (some increased, some decreased, and some showed no change in salivary cortisol levels) (Martinek, Oberascher-Holzinger, Weishuhn, Klimesch, & Kerschbaum, 2003), while British university students had a significant reduction in salivary cortisol levels during exam weeks as compared to non-exam times (Vedhara, Hyde, Gilchrist, Tytherleigh, & Plummer, 2000), and German undergraduates saw elevated cortisol concentrations at the start of a written examination compared to a control day -if the control day was after the exam, but not if it was before (Preuß, Schoofs, Schlotz, & Wolf, 2010).
Many studies have focused on the physiological stress induced by examinations, but less is known about how student stress, both physiological and self-reported, varies across learning contexts. One study examined medical students in a problem-based learning curriculum and found that students reported via a questionnaire that the group-learning environment caused little stress (contrary to many other aspects of their program) (Moffat, McConnachie, Ross, & Morrison, 2004). No prior research (that we are aware of) has examined the effect of classroom learning context on the physiological stress response as measured by salivary cortisol. Finally, the relationship between self-reported perceived stress and physiological markers of stress is unclear. In some studies, salivary cortisol is unrelated to perceived stress. For instance, studies comparing undergraduates before and during exam weeks found no association between salivary cortisol and self-reported stress (Murphy, Denis, Ward, & Tartar, 2010;Weekes et al., 2006). In others, researchers found a positive relationship between self-reported stress and biomarkers of stress (Ng, Koh, & Chia, 2003). Finally, salivary cortisol can also be negatively correlated with self-reported stress. Acute stress can lead to the release of endorphins, which reduces the perception of pain (for example, the runner's high) (Sapolsky, 2004). Under these conditions, stress actually results in the perception of positive feelings. Given the complexity of the HPA axis and its reaction to stressors and other neurobiological events and the complexity of assessing self-reported stress including types of measures, previous exposure to the stressor, and an individual's coping mechanism, it may be unsurprising that salivary cortisol is not correlated with self-reported perceived stress in many studies (Hellhammer, Wüst, & Kudielka, 2009). If self-reports of perceived stress are not associated with physiological stress (or associated in a convoluted way), this may alter how we both interpret selfreports of "stress" and understand how it relates to learning. Differing pedagogical approaches to higher education create different environmental contexts that may impact the stress experienced by students, thereby impacting memory and learning. Importantly, whether these modalities directly affect student hormonal stress levels requires investigation beyond subjective student perception.

Salivary Testosterone
We also examined salivary testosterone across these learning and exam-taking contexts. Ellison and Gray (2009) determined that cortisol and testosterone are implicated in the physiological stress response and noted that these hormones are tied to both individual and group learning. Previous research has found that acute stress is associated with increased testosterone (within individuals) and positive associations occur between salivary cortisol and testosterone, known as 'coupling' (Harden et al., 2016). Testosterone is also known to be associated with competition, where individuals who win competitions exhibit higher testosterone levels than those who lose (Booth, Shelley, Mazur, Tharp, & Kittok, 1989). In some circumstances, rises in testosterone occur in anticipation of competition (Mazur, Booth, & Dabbs, 1992), although patterns appear to differ for males and females (Kivlighan, Granger, & Booth, 2005;Mazur, Susman, & Edelbrock, 1997;Taylor et al., 2000). If inter-student competition for grades or academic recognition is contributing to classroom stress, we would expect to see a signature in salivary testosterone profiles of students across contexts. Because the connection between stress, learning, and testosterone is unclear, we refrain from making predictions and only report exploratory results.
Despite the potential significance of the interaction between stress, hormones, and learning (Flegr & Priplatova, 2010;Lacey et al., 2000;Lighthall et al., 2013;Martinek, Oberascher-Holzinger, Weishuhn, Klimesch, & Kerschbaum, 2003) as evidenced by a growing body of current literature exploring the relationships between hormones and student academic performance on exams (Bardi, Koone, Mewaldt, & O'Connor, 2011;Kenwright et al., 2011;Takatsuji et al., 2008;Vedhara, Hyde, Gilchrist, Tytherleigh, & Plummer, 2000), there is a need to broaden this investigation to include the hormonal implications that different teaching modalities have on students within the formal classroom setting. In doing so, additional insight may be gained into the endocrinology and context of human learning.

Methods
We collected saliva samples to measure (salivary) cortisol and testosterone. Most studies report a relatively high correlation between serum (blood) cortisol levels and salivary cortisol, particularly among individuals with normal endocrine functioning (Hellhammer, Wüst, & Kudielka, 2009). Drawing blood may increase stress (influencing the hormones we wish to measure), so saliva collection is our preferred method to measure cortisol and testosterone levels.
Saliva samples were collected from students after five different conditions. In each case, saliva samples were collected twenty minutes after the onset of the condition (except for baseline, which was collected at the beginning of class before instruction, immediately after acquiring participant consent). These five conditions included: 1) baseline sample; 2) following ~30 minutes of traditional lecture; 3) following ~30 minutes of a small group activity; 4) following at least 20 minutes of a multiple choice examination; and 5) following at least 20 minutes of an essay examination. We modeled our small group activity after the POGIL (Process Oriented Guided Inquiry Learning) instructional technique (Moog, Creegan, Hanson, Spencer, & Straumanis, 2006), but it has not earned POGIL Project endorsement. The collection of saliva after the traditional lecture and the small group activity occurred on the same day (referred to as instructional day). Similarly, the collection of saliva after the multiple choice questions and the essay questions in an exam also occurred on the same day (referred to as exam day). The order of the instructional techniques (lecture then small group) and exam questions (multiple choice then essay) remained the same across both semesters to maintain a consistent experimental protocol. The instructional day samples were collected between two to seven days after the baseline samples were collected, and the exam day samples were collected seven to ten days following the instructional day. Each class engaged in at least two previous small group activities (with the same group members) and several traditional lectures to familiarize students to these instructional methods.
Hormones can be influenced by countless factors, including time of day (Dowd et al., 2011), yearly season , time since last food intake , amount of previous night's sleep (Leproult, Copinschi, Buxton, & Van Cauter, 1997), exercise , alcohol consumption , education (Dowd et al., 2011), academic performance measures (Preuß, Schoofs, Schlotz, & Wolf, 2010), biological sex (Stephens et al., 2016), prior trauma (Suzuki, Poon, Papadopoulos, Kumari, & Cleare, 2014), and medications (Hellhammer, Wüst, & Kudielka, 2009), among others. Statistically controlling for all of these factors would require a substantial sample size. Given the small sample size of our study, we opted to use the same Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster, and Amend Journal of the Scholarship of Teaching and Learning, Vol. 19, No. 5, December 2019. josotl.indiana.edu individuals over time to control for within-subject variation, with saliva samples taken at approximately the same time each day (since saliva samples were always taken during a class period that met the same time each week). Additionally, we recorded information on each subject regarding their wake time (all three collection days) and the time of last food intake (on instructional and exam days). All subjects had been awake for over two hours at the time of collecting their saliva sample, with the exception of one individual who woke approximately 1 hour and 40 minutes before samples were taken. This individual woke at the same time on all class days where saliva was collected, so elevated cortisol levels (due to the strong diurnal pattern of cortisol peaking 30-45 minutes after waking followed by declines for the rest of the day (Dowd et al., 2011)) would be consistent across this individual's samples. Similarly, all subjects reported not eating in the hour prior to the saliva collection (on the instructional and exam days). Other factors, such as biological sex or prior trauma would be controlled for within an individual. Our statistical analyses will examine within individual changes, as opposed to mean changes of the entire sample, since we cannot control for the variety of factors that may influence salivary cortisol levels.

Sample
Our sample included two undergraduate classes taught by the same professor that met during two consecutive semesters in 2016. The classes were an introductory biological anthropology class (Spring 2016) and an upper division anthropology class (Fall 2016). Students were asked to volunteer after the goals and motivations of the study were presented (which took fifteen minutes or less at the beginning of class). All participants provided their written consent to participate. Student identification remained confidential through the use of unique codes generated by each student. The instructor of the courses was unaware of which students chose to participate. This study was approved by the Boise State Institutional Review Board (#028-SB16-036). In total, 26 Boise State students completed the on-line questionnaire (see below) and provided saliva samples (although some of these participants were absent on either the instructional or exam day). There were three students who reported taking medication for stress, anxiety, or depression who were excluded from our analyses, resulting in a sample size of 23 students (11 females and 12 males) with a mean age of 23 years.
Perceived stress was self-reported by the participant at baseline (for both courses), during the exam day (for both courses), and during the instructional day (for the fall course only). This produced a total of 23 people who self-reported their perceived stress between one and three times, for a total of 51 observations.

Saliva Collection and Questionnaires
The passive drool method was used to collect the saliva samples (Salimetrics, 2016). This process involved having the participant hold a plastic tube up to their mouth, allowing spit to slide downward into the collection vial. After collection, samples were kept frozen until sample extraction and analysis began.
On the day the baseline sample was collected, participants completed an on-line questionnaire. The survey included questions about the students' basic demographic information. On subsequent saliva collection days, participants completed a short survey on perceived stress level, time of last meal, and wake-up time (this data was not collected on the instructional day in the spring course). Selfreported perceived stress was asked using the question "How stressed out do you feel today?" and the response was measured on an ordinal 7-point interval scale ranging from "completely relaxed" to "completely stressed". Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster,

Saliva Analyses
Salivary cortisol samples were analyzed according to the Salimetrics expanded range high sensitivity salivary cortisol enzyme immunoassay kit, while the salivary testosterone samples were analyzed according to the Salimetrics expanded range salivary testosterone enzyme immunoassay kit (Salimetrics LLC, State College, PA). These included: centrifuging samples at 3000 rpm, pipetting the samples and controls into wells and adding the enzyme conjugate. After mixing the samples on a rotator plate and incubating the samples for an hour, the plate was washed four times with the wash buffer. The substrate tetramethylbenzidine (TMB) solution was added to each well and incubated for 25 minutes. Finally, a stop solution was added and the plate was read at 450 nm. For each plate, the concentrations of the controls and saliva samples were calculated by interpolation using a 4-parameter non-linear regression curve fit (Salimetrics, 2016).

Data Analysis
Since we sampled the same individuals repeatedly, within subject comparisons of salivary cortisol and testosterone formed the basis of our statistical analyses. We conducted a one-way repeated measures ANOVA to compare the effects of instructional methods and exam type on salivary cortisol and testosterone (Field, 2013). Analyses were performed in SPSS (v. 22). For pair-wise comparisons, a Šidák correction was used to maintain the familywise error rate. For completeness, both the Šidák correction and Tukey LSD (which is equivalent to no adjustment) are reported. We also examined the relationship between self-reported perceived stress (as reported by the participant) and salivary cortisol. This was done by conducting a random effects regression model where self-reported perceived stress predicts salivary cortisol and a random effect is included to control for the repeated measures design. We also provide figures of the mean salivary cortisol and testosterone values for each condition, but these do not control for the many factors that influence salivary cortisol and testosterone. We do not use these figures as the basis of our statistical tests; they are simply depicting the descriptive statistics visually. Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster,   Boxplots represent the upper quartile, median, and lower quartile as the top, middle, and bottom of the box. The whiskers represent the top 25% (upper whisker) and the bottom 25% (lower whisker) of data, unless one of the scores is greater than the upper quartile plus 1.5 times the inter-quartile range, which are deemed outliers and represented by an open circle (o). Values that are greater than the upper quartile plus three times the inter-quartile range are deemed extreme cases and are represented by asterisk (*). The two outliers shown for the exam day represent the same individual. Figure 1 displays the boxplots of salivary cortisol levels after each of the five conditions. Salivary cortisol at baseline had the largest mean and standard deviation (M = 0.19 μg/dL; SD = 0.12), while salivary cortisol following a small group activity had the lowest mean value and the smallest standard deviation (M = 0.10 μg/dL; SD = 0.04). Mean salivary cortisol following traditional lecture (M = 0.14 μg/dL; SD = 0.08) was higher than mean salivary cortisol following a small group activity, but lower than mean salivary cortisol following both exam conditions: the essay portion (M = 0.15 μg/dL; SD = 0.09) and the multiple choice portion (M = 0.16 μg/dL; SD = 0.09) of an exam. This figure is suggestive of overall effects, but it is possible that they mask within-individual differences, as it is not possible to detect individual changes in summary plots.

Salivary Cortisol
To examine within-individual effects, we conducted one-way repeated measures ANOVA to compare the effects of small group activity, traditional lecture, multiple choice exam, and essay exam on student's cortisol levels. Mauchly's test indicated that the assumption of sphericity (that the variances of the differences between conditions are equal) was violated (p < 0.01); therefore the Greenhouse-Geisser corrected tests are reported (Field, 2013). These results showed that there are significant differences in cortisol levels across conditions (p < 0.05). Table 1 displays the pair-wise comparisons with and without Šidák correction of p-values. The Šidák correction was used to counteract the problem of multiple comparisons. Examining our pair-wise comparisons (to determine which groups were significantly different from each other), we found only one statistically significant pair-wise difference with a Šidák correction; salivary cortisol was significantly higher at baseline than after the small group activity (p < 0.05). Our evaluation of the difference between small group learning activities and traditional lecture showed that the effect was significant only when using Tukey LSD post-hoc test, which is equivalent to having no adjustment (see Table 1), where cortisol was lower after small group activities compared to traditional lecture (Tukey LSD post-hoc test, p < 0.05). The average salivary cortisol values under exam conditions (calculated as the average of multiple choice and essay cortisol values; M = 0.166 μg/dL, SD = 0.101) compared to instructional methods (average of traditional lecture and small group activity; M = 0.122 μg/dL, SD = 0.061) show that cortisol values are higher under exam conditions than after instructional methods; t(20) = -2.065, p = 0.052, which may be expected given the psychological stress that commonly accompanies testing.

Does salivary cortisol correlate with self-reported perceived stress?
Figure 2 displays a bar chart of mean salivary cortisol for each value of self-reported perceived stress. If self-reported perceived stress is associated with cortisol responses, we would expect a strong positive correlation within each day, but we find no significant relationship between self-reported perceived stress and salivary cortisol levels (at baseline, there is a positive association; on the instructional day, the effect is slightly positive; and on exam day, there is a negative association). Given the many confounds influencing salivary cortisol levels, a random effects regression model to control for repeated measures is a more appropriate analysis. This analysis reveals a non-significant association between salivary cortisol and self-reported perceived stress (β = -0.01; p > 0.1). This result indicates that even within individuals, there is no significant association between self-reported perceived stress and salivary cortisol. Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster,

Mostly relaxed
Completely stressed Snopkowski, Demps, Scaggs, Griffiths, Fulk, May, Neagle, Downs, Eugster,   The whiskers represent the top 25% (upper whisker) and the bottom 25% (lower whisker) of data, unless one of the scores is greater than the upper quartile plus 1.5 times the inter-quartile range (but less than three times the inter-quartile range), which are deemed outliers and represented by an open circle (o). One male subject had extreme outliers on exam day -after both multiple choice and essay (over 250 pg/mL) and is not displayed in this figure.
We conducted one-way repeated measures ANOVA to compare the effects of small group activity, traditional lecture, multiple choice exam, and essay exam on student's testosterone levels. Mauchly's test indicated that the assumption of sphericity was violated (p < 0.01); therefore the Greenhouse-Geisser corrected tests are reported (Field, 2013). These results showed that testosterone levels are not significantly different across the five conditions (p > 0.1). Table 2 displays the pair-wise comparisons with Šidák correction and Tukey LSD post-hoc tests. Examining our pair-wise comparisons (to determine if groups were significantly different from each other), we found only one significant difference; salivary testosterone was significantly higher following traditional lecture than after the small group activity (Šidák correction, p < 0.10; Tukey LSD, p < 0.01). There were no significant differences across any other groups. Note: * p < 0.05, ** p < 0.01

Discussion
Our results demonstrate that salivary cortisol and testosterone responses vary between students and across different learning and testing environments. Students participating in small group activities had salivary cortisol levels that were: a) significantly lower than at baseline and b) lower than after traditional lecture (but the effect was only significant when not adjusting for multiple comparisons). These results suggest that small group activities reduce students' physiological stress compared to baseline and lecture conditions and may be the mechanism by which small group learning activities improve student engagement and academic performance. It was unexpected that baseline salivary cortisol measures were, on average, as high or higher than other conditions (see Limitations section for possible explanations). Salivary testosterone, on the other hand, was lower after the small group learning activity than after traditional lecture. This may suggest that small group activities lead to cooperativeness between group members, as individuals tend to exhibit higher testosterone when having to compete with others, although other interpretations are possible (see Limitations section). Not only did the group-learning context have the lowest average cortisol levels, but it also showed the least amount of variation between individuals. Our interpretation of this result is that undergraduates are less physiologically stressed by group-learning contexts. In animal studies, predictability, social support, and control over one's environment all contribute to mitigate stress (Sapolsky, 2004). We hypothesize that group learning might be making use of these tactics based on our qualitative participation in and knowledge of the group-learning environment. Students may feel more in control of the pace of learning and the predictability of small group activities. An alternative interpretation may be that more anxious students are able to reduce participation, allowing other group members to take control over the direction of the learning process. Group learning draws on providing social support to reduce stress in the learning experience. Previous research has shown that nursing students who accessed social support perceived it as beneficial in coping with the stress of their academic program (Reeve, Shumaker, Yearwood, Crowell, & Riley, 2013). Future research should explore how small group activities affect students' feeling of control, predictability, and social support, and whether these influence stress and learning. josotl.indiana.edu In our study, participants remained anonymous to the instructor, therefore the test results of each participant cannot be determined and outcome-based measures were not explored, but evidence from prior research has shown that students with higher salivary cortisol before an exam tend to also have significantly lower examination scores (Ng, Koh, & Chia, 2003). Other research has shown that an individual's perceived ability in a subject area is negatively associated with cortisol response to an examination (Minkley, Westerholt, & Kirchner, 2014). While we were unable to explore the association with actual exam scores of our participants, we did collect information on student's anticipated exam score and level of exam preparation (both collected at the end of the exam), and general test-taking anxiety (collected at baseline). Results show that students who reported higher test taking anxiety believed they would earn fewer points on the exam. Students who reported they were more prepared for the exam reported a higher expected exam score. But, there was no significant correlation between these measures (perceived exam performance, exam preparation, or test-taking anxiety) and salivary cortisol after the multiple choice or essay portions of the exam (examined as either cortisol level or change from baseline). This links to our other results examining self-reported stress and cortisol levels, where we found no correlation across any of the learning and exam contexts for self-reported stress and salivary cortisol. This replicates some previous research that has found no correlation between salivary cortisol and self-reported stress (e.g. Murphy et al., 2010;Weekes et al., 2006), but other explanations are possible (see Limitations section). We encourage researchers to keep this in mind when using self-reported measures of stress as their only measure of this socio-biological phenomenon, as it may not be a good proxy of physiological stress.
Variation in hormonal responses across learning and testing contexts most likely responds, in part, to different preferences for learning and evaluation techniques. As always, professors may benefit from using a variety of learning activities in the classroom to reach multiple learning styles (Ambrose, Bridges, DiPietro, Lovett, & Norman, 2010). One of the mechanisms by which small group learning may increase student engagement and improve student outcomes is by reducing stress, increasing control over learning, and allowing classmates to provide each other with social support. Physiological measures of stress vary according to individual physiology and learning type and may play a role in educational outcomes in undergraduate classrooms.

Limitations
While this study is a promising first step to demonstrating the potential positive physiological benefits of small group activities, there are several limitations to our study. First, our subjects were undergraduate students at Boise State University, a largely homogenous group across age, ethnicity, and cultural background. Future research could benefit from investigating patterns of physiological stress in a broader range of learners and learning experiences.
Second, our baseline measures were higher than other conditions and exhibited high variation across subjects. In reflecting on why this may have occurred, it is possible that spitting into a tube in front of peers or an instructor for the first time might have caused participants to feel stressed (particularly among shy or socially anxious subjects (Hofmann, Moscovitch, & Kim, 2006)). Collecting baseline samples within 15 minutes of the beginning of class may also represent the physically and psychologically stressful experience of getting to class (e.g., commuting, pressure to arrive on time (Stutzer & Frey, 2008)) or students consuming food or beverage within an hour of saliva collection (as this information was not collected at baseline). We recommend multiple baseline collections so that students become comfortable with the collection procedure. Providing students with a private location to provide their saliva sample may reduce the stress of spitting in front of peers and faculty.
Third, we found no correlation between self-reported stress level and salivary cortisol. While this replicates some previous research (Hellhammer, Wüst, & Kudielka, 2009;Murphy et al., 2010;Weekes et al., 2006), it is also possible that our measure of salivary cortisol is measuring some other aspect of physiological stress (e.g., increased cortisol after exercising) or that our self-reported measure of perceived stress (using an ordinal scale) was an imperfect way to capture a person's feeling of stress.
While we did not hypothesize a particular relationship between salivary testosterone and learning or exam contexts, we found that salivary testosterone after small group learning was lower than following lecture. Our research design examined within individual differences in testosterone, but previous research has shown that many factors (some of which vary within individuals) are associated with testosterone levels, including stress, competition, relationship status, parenting status, time of day, exposure to attractive potential partners, gender composition of groups, among others (Booth et al., 1989;Gettler, McDade, Agustin, Feranil, & Kuzawa, 2015;Gray, Kahlenberg, Barrett, Lipson, & Ellison, 2002;Kivlighan et al., 2005;Ronay & Hippel, 2010). Similarly, salivary cortisol can be influenced by many factors, including season, age, exercise, alcohol consumption, education, academic performance measures, biological sex, hormonal birth control, prior trauma, smoking, reproductive state (pre or post-menopausal), medications, chronotype, among others (Badrick, Kirschbaum, & Kumari, 2007;Dowd et al., 2011;Follenius, Brandenberger, Hietter, Simeoni, & Reinhardt, 1982;Hansen et al., 2008;Hellhammer et al., 2009;Lighthall et al., 2013;Persson et al., 2008;Preuß et al., 2010;Stephens et al., 2016;Suzuki et al., 2014;Vgontzas et al., 2003). While our study design compared samples from the same individuals over time to control for within-subject variation, increasing sample size can provide more confidence in the results and may allow for added controls for those factors that influence an individual across time. Additionally, while we tried to control for some of these factors (by excluding participants taking anxiety medication and collecting information on timing of last meal and time since waking), it is possible that students did not accurately report this information or engaged in consumption of beverages/food that they did not consider a meal. Either scenario may have inadvertently influenced their salivary hormones levels, thereby confounding our results.

Conclusion
In conclusion, this study suggests that small group learning activities may reduce salivary cortisol levels, which are linked to a reduction in physiological stress. Group learning activities may lead to improved learning outcomes by mitigating students' physiological distress. Although we found no evidence connecting self-reported stress to our physiological measures, small groups may be effective at mitigating stress by increasing control and predictability of the learning environment while adding social support to the learning process, allowing small group activities to result in improved student engagement and academic performance. This research suggests that small group learning may not just improve academic performance; it may also contribute to reduced physiological stress and associated positive health benefits.