Exploring Organic Chemistry I Students’ Responses to an Exam Wrappers Intervention

Research has demonstrated that academically successful students are effective, self-regulated learners. Moreover, exam wrapper interventions have been shown to foster the development of selfregulated learning behaviors on the part of college students. In this naturalistic, qualitative, and exploratory study, an exam wrapper intervention was implemented in a key, gatekeeping STEM course at a diverse, public university. Student responses to a series of four exam wrappers were collected and analyzed. Results indicated that while many students were able to look critically at their study behaviors and course performance, these behaviors did not necessarily pay off, especially for weaker students. Notably, transfer and/or non-matriculated students were at greatest risk of withdrawal and failure. However, all students, both weak and strong, showed a lack of attention towards checking their answers and learning from their mistakes. Overall, the exam wrappers provided useful information regarding the self-regulated learning processes of these STEM students.


Introduction
The loss and attrition of qualified undergraduates from STEM majors is no longer an unfamiliar phenomenon. Unfortunately, it has become a well-known area of study, research, and exploration (Chang, Sharkness, Hurtado, & Newman, 2014;Hunter, 2016;Malcom & Feder, 2016;Seymour & Hewitt, 1997). Research indicates that among the most recent generation of college students, more than half of all students who enter college intending to major in STEM leave STEM or fail to graduate altogether (Eagan, Hurtado, Figueroa, & Hughes, 2014). For certain ethnic minority groups, attrition rates are even higher. Eagan and colleagues (2014) report that 76% of UREM (underrepresented minorities) who enter planning to major in STEM do not complete their degrees within six years.
Chemistry students at our institution, at which there are large numbers of first-generation college students (2014 UNIVERSITY NAME STUDENT EXPERIENCE SURVEY), could be encouraged to be more self-reflective about their study habits, they might improve in their selfregulated learning abilities, modify their study behaviors and improve in their overall course performance. Thus, we chose to introduce exam wrappers into the Organic Chemistry I course because of their potential to impact the reflection and self-regulated learning behaviors of our students. We also chose exam wrappers as our pedagogical tool because they are a reform method that is fairly conventional, non-controversial, and easy to implement.

Participants and Course Context
This IRB-approved study was conducted at a large, urban, public university located in the Northeastern United States. The undergraduate population at this institution is highly diverse. Approximately 40% of students are from ethnic groups underrepresented in STEM, about 50% are low income, about 30% are first-generation college students (neither parent has any college education) and approximately 40% speak English as a second or third language (2014 COLLEGE NAME STUDENT PROFILE; 2014 UNIVERSITY NAME STUDENT EXPERIENCE SURVEY).
This study was conducted in an Organic Chemistry I classroom in the spring of 2017. Including students who withdrew, the course enrolled a total of 176 students. Students met in a single classroom for lecture for a 75-minute class period twice a week (lectures were taught by AUTHOR). Additionally, students met in one of six smaller groups (of approximately 30 students each) for one 50-minute recitation period per week.
Student learning in the course was assessed via a combination of quizzes and exams. Five quizzes were administered over the course of the semester in recitation. Quizzes took approximately 20 minutes to complete and were standardized across all sections. Students also completed two 75minute midterm exams and one 120-minute final exam (exams were administered in two large classrooms). Five percent of students' course grades were allotted towards completion of four assignments that were termed self-assessments. These self-assessments were actually two pre-exam and two post-exam exam wrappers. Students' course grades were determined using a mastery-based scheme, rather than a norm-referenced scheme (Popham, 1971).

Exam Wrappers
Exam wrapper 1 (see Appendix 1) was completed in recitation immediately before students took their first quiz (at approximately week three of the semester). Exam wrapper 2 (identical in content to exam wrapper 1) was administered similarly in that it was handed out and completed in recitation immediately before students took their second quiz (at approximately week four of the semester). One hundred sixty-six students (94%) completed exam wrapper 1 and one hundred sixty-three students (93%) completed exam wrapper 2.
Exam wrapper 3 (see Appendix 2) was completed online using the course management system available through the university. Exam wrapper 3 was only made available to students during week five of the semester (before the first midterm). Exam wrapper 4 (identical to exam wrapper 3) was similarly completed online and only available during week seven of the semester (in between the first and second midterm examinations). One hundred eleven students (63%) completed exam Journal of the Scholarship of Teaching and Learning, Vol. 20, No. 1, April 2020. josotl.indiana.edu wrapper 3 and one hundred thirty-seven students (78%) completed exam wrapper 4. ( Figure  1 shows the timing and sequence of the exam wrappers).

Figure 1. Timing of quizzes, midterm exams and exam wrappers
Exam wrappers 1 & 2 asked students to indicate how they felt about their competence in two areas: conceptual understanding and problem-solving ability. These questions were intended to 1 trigger self-reflection on the part of students (Zimmerman, 2002) and make them consider their degree of preparedness (Yuen-Reed & Reed, 2015). Students selected from among three choices (strong, ok, and weak) in two Likert-style questions. Additionally, students were asked to choose the extent to which they felt they could easily access help with the course when they needed it. This question was intended to lead students to consider whether or not they were seeking out and utilizing the resources available to them. Students selected from among three choices (agree, unsure and disagree) in a Likert-style question. Exam wrappers 1 and 2 were administered to students immediately before they took their first two quizzes (respectively) so that students who were underprepared but unaware of it would hopefully be hit with some dissonance if they put their expectations down on paper and then either struggled during the quiz or found out they were not in good shape when they received their quiz score.
Exam wrappers 3 and 4 asked students to report the extent to which they felt satisfied with their performance on their most recent assessment (quiz or midterm) . Students selected from 2 among five choices (strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree) in a Likertstyle question. Students were also asked to report how many hours per week they devoted to In the course syllabus, students were assigned readings (focused on conceptual understanding) and homework assignments (focused on problem 1 solving) for each individual class period. The distinction between these two separate ideas or skills was also discussed in class. Thus, it was expected that students would understand and be able to differentiate between the terms conceptual understanding and problem solving ability.
Typically within 24 hours of taking an exam, students' exam scores, along with a copy of the answer key, were posted online. However, typically they 2 did not receive their graded exam papers back until their next recitation period. Quizzes took up to two weeks to be graded and were also returned to students in recitation. Quiz grades were also posted online. However, answer keys for quizzes were not provided.
Journal of the Scholarship of Teaching and Learning, Vol. 20, No. 1, April 2020. josotl.indiana.edu studying and how many hours they spent problem-solving (with boxes provided for numerical responses.) Lastly, students were asked what they would do differently in the future to improve their course performance. (For this question, students wrote narrative responses of any length they wished and were able include as many study strategies as they wished.) The first three questions asked in exam wrappers 3 and 4 were designed to encourage students to directly confront their feelings about how they were doing in the course, to determine how much time they were putting in overall, and to consider whether or not they were devoting sufficient time to problem-solving. The final question was intended to encourage students to selfreflect regarding their study behaviors, to consider what they ought to do differently, to set goals of changing their behaviors and to hopefully commit to those changes by asking them to put in writing what they intended to do differently from then on (Zimmerman, 2002).
In this study, the exam wrappers were utilized both as an intervention, intending to trigger students to reflect and possibly change their behavior, and also as a source of data, as a means for the researchers to examine and learn from the reflections and reported behaviors of students. Thus, with IRB permission, after the course was over the responses that students gave to the exam wrapper questions were collected, stripped of identifying information and utilized as data.

Course Syllabus with SRL Supplement
One significant modification was made to the course syllabus to reinforce the idea that students reflecting upon their own learning was an iterative process and that making decisions, being strategic, utilizing resources, and seeking help were all important aspects of being successful in the course. This modification took the form of an extra handout entitled: "How to Study for this Course". It contained a graphic display of a recommended iterative process for a student to follow and two subsections of guidance or advice entitled, "Ways to Assess Yourself " and "Help Seeking Guide". (See Appendix 3.) This handout was in addition to the usual guidance and information provided in the syllabus regarding how to be successful (e.g. spend 75% of your study time doing problem solving, come to class, get help right away if you get stuck, etc.) (See Appendix 4.) During the first day of lecture, AUTHOR reviewed the syllabus with students and explained this handout.

Website with SRL Emphasis
One factor relevant to this study is that AUTHOR maintains an extensive webpage (URL) devoted to providing students with additional resources for the Organic I course. The majority of these resources are supplementary problem sets. This is not unique as other Organic Chemistry instructors at her institution and elsewhere (Cortes, 2017;Reusch, 2017) also provide supplementary problem sets to students through webpages and/or course management software. However, the particular presentation of AUTHOR'S Organic I homepage gives a strong visual impression to students that problem solving is an extremely important aspect of the course. Furthermore, the homepage is organized such that students can quickly find either introductory or advanced practice problems (with answer keys) corresponding to each topic they learn about in lecture. Thus, students who utilize the website are encouraged to think about what topics they need to work on and are guided towards thinking about practicing in a strategic way where they ideally try to master simpler problems before they move to more challenging ones.
Another resource provided through the Organic I homepage is links to old exams (with answer keys) written by AUTHOR. The old exams are provided as additional practice problems for students. But they also illustrate what the format of exams will look like, what the level of difficulty Journal of the Scholarship of Teaching and Learning, Vol. 20, No. 1, April 2020. josotl.indiana.edu of exams will be, and in what ways they will be expected to answer questions that require them to think across multiple chapters from their textbook while taking the exam.
One aspect of the old exams found on the website is that next to each exam question is listed the number of minutes a student should allocate toward completing it. This is a feature of AUTHOR'S exams that she includes to help students learn how to effectively pace themselves during exams, a skill that she finds is particularly challenging for many students at her institution.
Overall, there are a number of features in AUTHOR'S website that attempt to encourage students to strengthen and improve their self-regulation. However, these features have been in place on her website in a consistent fashion for a number of years. Therefore, for the purposes of this study, no modifications were made to the website.

Course Performance
Student scores on all assessments (quizzes, midterm examinations, and the final) were collected, stripped of identifying information, and utilized as data.

Prior Performance in Chemistry
Previously, AUTHOR and her colleagues (2013) had established that for students at her institution, the letter grades that they received in General Chemistry II (the immediate prerequisite course for Organic Chemistry I) were a good predictor of performance in Organic Chemistry I. Specifically if General Chemistry II was taken at our institution, 49% of the variability in performance in Organic Chemistry I was explained by performance in General Chemistry II. Therefore, for the purposes of the current study, we obtained transcripts from all students and collected General Chemistry II letter grades for students who completed General Chemistry II at our institution. (Letter grades for students who had taken General Chemistry II outside of our institution were not collected.) These letter grades were combined with the exam wrapper and course performance data described above and then all identifying information was removed.

Data Analysis
Exam wrappers 1 & 2. On exam wrappers 1 and 2, students were prompted to report the degree to which they felt competent (strong, ok, weak) in two areas: conceptual understanding and problem-solving ability. Students' self-perceptions were then compared to their actual quiz scores and then classified as either accurate, overestimates, or underestimates (see table 1).
A self-rating of weak on exam wrapper 1 was determined to be an underestimation if the student's quiz grade was above 45% and accurate if the student's grade was less than or equal to 45%. (An exam average of 45 was the approximate cutoff for passing the course.) A self-rating of ok was judged to be an overestimation if the student's grade was less than 46% and an underestimation if their grade was 80% or higher (and accurate if the grade was 46-79%). A selfrating of strong was gauged as an overestimation if the student's grade was less than 80% and deemed accurate if their grade was equal to or greater than 80%.
Because the subject matter of quiz 2 was more difficult than that of quiz 1 (and the student average on quiz 2 was 10 points lower than on quiz 1), more lenient criteria were utilized to define what was considered an overestimate, an underestimate or an accurate self-assessment for quiz 2. A self-rating of weak on exam wrapper 2 was gauged as an underestimation if the student's grade was above 35% and accurate if the student's grade was less than or equal to 35%. A self-rating of ok was categorized as an overestimation if the student's grade was less than 36% and an underestimation if their grade was 70% or higher (and accurate if the grade was 36-69%).
A self-rating of strong was gauged as an overestimation if the student's grade was less than 70% and deemed accurate if their grade was equal to or greater than 70%.

Exam wrappers 3 & 4: Study time and problem-solving time.
Questions on exam wrappers 3 and 4 prompted students to reflect on the number of hours they spent studying as well as the amount of time they spent practicing problems. The sample size, mean, standard deviation, and range of these reported hours were calculated. Outliers, participants whose reported hours were more than one standard deviation from the mean, were highlighted for possible further analysis. Differences between students' reported hours at the time of exam wrapper 3 and exam wrapper 4 were calculated.
Exam wrappers 3 & 4: Future plans. The final question of exam wrappers 3 and 4 asked students to describe what they would change or do differently in their future studying. Student responses were text-based and averaged approximately 30 words in length. Outliers, participants with word counts more than one standard deviation from the mean, were highlighted for possible further analysis.
Student written responses (to the question regarding what they planned to change in their study habits) were coded according to the following procedure. Four coders independently reviewed four different subsets of the student responses. (In total, one third of the student entries were reviewed.) In keeping with the explanatory nature of the study, the coders did not approach coding with preconceived or a-priori ideas of what the codes should be. Rather, we allowed themes to emerge from the data. Preliminary lists of themes and categories observed in the data were generated independently by each of the four coders. The coders then met and compared their preliminary lists. The preliminary lists were organized and collapsed into four codes, each of which contained a number of sub-codes. (See table 14  The code Study Behaviors referred to the many types of behaviors that students stated they planned to engage in, for example reading the textbook, working on practice problems, or reviewing their lecture notes. Each specific Study Behavior described by a student was given a separate sub-code. For example, the behaviors just described were assigned the sub-codes Textbook, Problem Solve and The code Strategic Behaviors & Decisions was created to capture behaviors that students described which could best be characterized as being strategic about their studying. For example, choosing to do a little bit of problem solving every day, rather than saving it all for the weekend or choosing to read the textbook before lecture rather than after, were both coded under the code Strategic Behaviors and Decisions and under the sub-code Timing Specific, whereas choosing to work on advanced problems rather than simple problems was also coded under Strategic Behaviors and Decisions, but under the sub-code of Which Problems.
The code Help Seeking referred to the different ways in which students described how they would try to get help in the course. Sub-codes were created for the seeking of human help, HHelpas in help from an instructor, TA, or tutor, electronic help, EHelp -as in help from an online resource like a tutorial or video, or help from a physical resource, PHelp -like a review book or molecular model set.
A code or category was created called What's Going on With Me to capture descriptions that students gave that did not fit under categories of study strategies or behaviors, but rather described emotional or psychological states. For example, a few of the sub-codes in this category were Anxious, Unsure, Confident, Careless and Overwhelmed.
A code book listing all the codes, sub-codes and their definitions was created. After this, the data entries were divided up into three equal portions. Each third of the data was coded independently by two coders (Six coders in total were utilized.) Afterwards, all six coders met as a group and went through each data entry one by one comparing the two sets of codes from the pair of coders against one another and against each data entry. Together, the group of six coders came to agreement on what the most complete and accurate codes should be for each data entry. Often the consensus or agreed upon codes matched the original codes assigned by the two coders, but occasionally errors or oversights were caught through this process. Therefore, it was determined that this method of meeting as a group of six and going through each data entry one by one was useful as it allowed for the most thorough, complete, and detailed analysis, without resulting in the coders reaching consensus prematurely. Occasionally as a result of this process, a few new sub-codes arose and had to be defined and created, and a few clarifications or refinements of existing codes had to be made. The code book was updated accordingly and modifications to the already coded data were made. After a complete listing of all the sub codes for each data entry was compiled, tallies were taken to determine the number of times each sub code was cited.
Exam 2 -1. Student scores on exam 1 were subtracted from the scores on exam 2. Participants were then listed into four categories based on that difference: improved, worsened, no change, or NA (did not take exam 2).
Prior performance in chemistry. Students were coded as at risk for not succeeding in Organic Chemistry I if they had scored a grade of C plus or lower in General Chemistry II at our institution and as not at risk if they had scored a B minus or higher. Students who had not taken General Chemistry II at our institution (non-matriculated and transfer students) were coded as unknown risk.
Course performance. Students were grouped into categories based on their performance in the Organic course. For students entering the Organic course not at risk, satisfactory performance was Journal of the Scholarship of Teaching and Learning, Vol. 20, No. 1, April 2020. josotl.indiana.edu defined as completing Organic with a grade of B minus or above. 3 For students entering the course at risk or with unknown risk, satisfactory performance was defined as completing the course with a C minus or above. Students who did not meet these criteria were categorized having unsatisfactory performance. (Students who withdrew from the course were categorized separately.) Student categories. Nine categories of students were differentiated based on a) the level of risk of the students entering the course and b) their actual performance in the course (see table 12). These categories were analyzed and compared across areas of interest, such as confidence in conceptual understanding and problem-solving ability, accuracy of self-ratings compared to subsequent quiz performance, hours reported studying and practicing problems, and planned changes to study behaviors.

Exam Wrappers 1 & 2
Students were asked to describe their confidence in their conceptual understanding and problemsolving ability. In both exam wrappers 1 and 2 (see table 2), approximately 70% of students reported feeling ok about their understanding and ability. Additionally, the percentages of students who characterized their conceptual understandings as weak or strong changed only minimally from exam wrapper 1 to 2. However, with regards to confidence in problem solving ability, there was a notable increase from exam wrapper 1 to 2 in the percentage of students who felt they were weak, as well as a sizeable decrease in students who felt they were strong.

Table 2. Students' confidence in their conceptual understanding & problem-solving ability
When comparing students' self-assessments to their actual quiz performances (see table 3), student accuracy was low (not on target), ranging from approximately thirty to forty percent. Student accuracy also decreased somewhat from exam wrapper 1 to 2. Furthermore, weak students (who scored 45 or below on quiz 1, or 35 or below on quiz 2) were highly likely to overestimate their abilities, while strong students (who scored 80 or above on quiz 1 or 70 or above on quiz 2) were highly likely to underestimate their abilities. We defined satisfactory as B-or better for not at risk students because we make the assumption that a C+ or worse will hinder future progress for 3 these students (Hrabowski, 2016 The final question on exam wrappers 1 and 2 surveyed students' feelings about how easily they felt they could obtain help with the course material when needed (see table 4). Students were prompted to select a response of either agree, unsure, or disagree from a 3-point Likert-type scale. The most frequently selected response for both exam wrappers 3 (80%, n=166) and 4 (79%, n=163) was agree. Only 3% of participants chose disagree as their response to this question.

Exam Wrappers 3 & 4
In exam wrapper 3, student reports of satisfaction with their course performance (see table 5) spread in a bell-shaped distribution, with the majority of students reporting a neutral, mildly positive, or mildly negative attitude. (At this point in the semester, students had only received grades back on two quizzes, which had a combined average of 72 and counted only as 5-10% of their final course average.) However, by the time of exam wrapper 4, there was a large shift in student satisfaction with nearly 70% of students reporting dissatisfaction with their course performance. (Students filled out exam wrapper 4 shortly after receiving back their scores on exam 1 which had an average of 57% and counted as 20% of their final course average.)  Table 3. Students' accuracy re their conceptual understanding & problem-solving ability y Grandoit, Bergdoll, Rosales, Turbeville, Mayer, and Horowitz Table 5. Student agreement that they are satisfied with their course performance Table 6 shows student responses on exam wrappers 3 and 4, indicating number of hours spent studying and practicing problems. The changes in reported study and problem-solving time from wrapper 3 to 4 showed that on average, students only increased their study time by 0.2 hours and their problem-solving time by 0.5 hours (see table 7).

Table 7. Increase in reported study hours from exam wrapper 3 to 4?
The length of participant responses to the question "What are you going to do differently from now on?" are reported in table 8. The average response was about 30 words in length. However, 21% and 23% of respondents (in wrappers 3 and 4, respectively) had word counts of less than 10 words.  Because exam wrappers 3 and 4 asked students to report their overall study times and problem-solving times, students who reported an intention (on wrapper 3) to change their behavior by increasing their study time or problem-solving time were checked to see if they followed through on their intentions. Only 22% of students indicated that they would increase their overall study time and only about half of these students fulfilled their intention. Fifty percent of students indicated that they would increase their problem-solving time. Similarly, only about half of those students followed through on their intention (see table 9).

Exam 2-Exam 1
Student performance on exams 1 and 2 was compared and differences were calculated (see 4   table 10). Approximately half improved their scores and one-third worsened. An additional 15% did not take exam 2. (All students who did not take exam 2 also did not take the final exam.)

Prior Performance in Chemistry
Based on their prior performance in General Chemistry II, students were grouped into three categories. Categories indicated whether or not they were at risk of not succeeding in the Organic course. Each category contained approximately one-third of all students (see table 11).

Course Performance
Overall, 63% of students performed satisfactorily in the Organic course, 29% performed unsatisfactorily, and 8% withdrew. Students of unknown risk were least likely to perform satisfactorily with only 55% of them successful, while 74% of not at risk and 71% of at-risk students were successful. (See table 12.)

Student Categories
Students were grouped into nine categories based on their risk when entering the course and their satisfactory performance in the course or lack thereof (see table 12). Seven of these nine categories were subjected to further analysis. Two categories were excluded (at risk & withdrawal, not at risk & withdrawal) because they each comprised only 1% of the student population.

Exam Performance
Despite the fact that the second exam covered more challenging material than the first exam, the majority of satisfactorily performing students improved their performance from exam 1 to exam 2.

Estimation of Abilities
While results consistent with a Kruger-Dunning effect (1999) were observed for the overall population (see table 3), unsatisfactorily performing students, overall, did not tend to overestimate their abilities (see table 13). Satisfactorily performing students, however, were found to underestimate their abilities (c 2 (1) = 26.0867, p < 0.001), while unknown risk students who withdrew were somewhat likely to overestimate their abilities (c 2 (1) = 16.264, p < 0.000055, see table 13).

Reported Study Times
Satisfactorily performing students did not necessarily put in more study time or problem-solving time than unsatisfactorily performing ones. Furthermore, when unsatisfactorily performing students did increase their problem-solving time, this increase did not result in success. However, not at-risk students were somewhat more likely to indicate an intention (at wrapper 3) to increase the amount of time they were going to devote to problem solving. (See

Exam Wrappers 3 & 4 and Unknown Risk Students
Unknown risk students differed from the other categories of students in a number of ways. Unknown risk, unsatisfactorily performing students was the only category to decrease their rate of completion of the exam wrappers from wrapper 3 to wrapper 4. At the time of wrapper 3, they were the most dissatisfied with their course performance as compared to the other categories. They reported the lowest average hours spent studying and doing practice problems. They also had the highest percentage of students who did not take either exam 2 nor the final exam.
The unknown risk, withdrawal students had the lowest completion rate of exam wrappers 3 and 4. Only three students (27%) completed wrappers 3 and 4. These students reported the greatest decrease in time spent studying and doing practice problems from wrapper 3 to 4, yet the greatest number of hours spent studying at the time of wrapper 3.

Planned Study Behaviors
Problem solving plans. At the time of wrapper 3, large numbers of all students in all categories indicated that they intended to devote more time to problem solving. However, by wrapper 4, almost none of the not at risk, satisfactorily performing students indicated that they needed to devote additional time to problem solving. Yet, over 70% of the not at risk, unsatisfactorily performing respondents reported that they still intended (and needed) to devote additional time to problem solving (see table 14).
Behaviors not reported. Plans to adopt behaviors such as joining a study group, attending office hours, reviewing lecture notes, checking one's answers against a key, or learning from one's mistakes were rarely (or never) reported by students of any category (see Appendix 1).