Contrasting Traditional In-Class Exams with Frequent Online Testing

Although there are clear practical benefits to using online exams compared to in-class exams (e.g., reduced cost, increased scalability, flexible scheduling), the results of previous studies provide mixed evidence for the effectiveness of online testing. This uncertainty may discourage instructors from using online testing. To further investigate the effectiveness of online exams in a naturalistic situation, we compared student learning outcomes associated with traditional in-class exams compared to frequent online exams. Online exams were administered more frequently in an attempt to mitigate potential negative effects associated with open-book testing. All students completed in-class and online exams with order of testing condition counterbalanced (in-class first, or online first) between students. We found no difference in long-term retention for material that had originally been tested using frequent online or traditional in-class exams and no difference in self-reported study time. Overall, our results suggest that frequent online assessments do not harm student learning in comparison to traditional in-class exams and may impart positive subjective outcomes for students.

A challenging reality in higher education is that fewer funds are available to accomplish the same, or greater, educational goals.Student enrollments at public institutions have increased steadily over the last few years, while the average appropriation per student has decreased (State Higher Education Executive Officers, 2012).As a result, many institutions have increased course enrollments (often accompanied by fewer sections of each course) and added large online sections of courses in an attempt to offset education costs.Instructors are, thereby, tasked with providing educational opportunities for more students without sacrificing quality.Technology can be used to offload some of the cost associated with larger classes.For instance, a relatively simple course change, such as administering exams online instead of in-class, can free class time for additional instruction and can foster the use of educational testing techniques (e.g., repeated testing).Nevertheless, instructors might be hesitant to adopt online exams in place of in-class exams because they are uncertain about the relative effectiveness of online exams.We investigate how exam performance and content retention are affected by changes to the testing format from traditional in-class exams to online exams.Specifically, we examined both immediate (unit exams) and long-term outcomes (comprehensive exam) associated with in-class and online test delivery methods.Our results provide unique insight into the impact of online testing on student retention of course content.
Whenever a major pedagogical change is made, instructors ought to consider potential advantages and disadvantages associated with their design decision.Transitioning from in-class exams to online exams is no exception.There are several logistical benefits to online administration of exams.Web-based content delivery systems (e.g., Blackboard, Moodle, Canvas) can be used to administer and automatically grade student exams.This saves time and money as the exams do not have to be printed or manually graded.Online testing also saves class time.The time that would have been devoted to test taking can be used for other activities.Although setting up online exams can also be time consuming (Brewster, 1996), the time cost is recouped in large classes and in cases where exams or exam questions can be reused (e.g., for those teaching multiple sections or teaching the same course in the future).In addition to cost savings, online testing can provide students with more scheduling flexibility, allowing them to have more control over when and where they test.Therefore, online testing appears to provide economic benefits and convenience.Clearly, these practical advantages for faculty and administrators have facilitated the transition from paper to digital exams.
Another benefit of using online examinations is that they can promote the use of testing as a learning mechanism.Although traditionally used to assess student learning, exams also serve as learning opportunities.The testing effect is the finding that simply being tested over information can increase the likelihood of later recalling that information (McDaniel, Anderson, Derbish & Morrisette 2007).The benefits of retesting are greater than those obtained when individuals restudy information and can occur when no feedback is provided (e.g., Roediger & Karpicke, 2006).Similarly, testing can lead to increased retention of information that is not even directly retested (retrieval-induced facilitation; Chan, McDermott & Roediger, 2006).For example, Chan (2010) had participants read a passage then he immediately tested them over the information.After a delay, participants were given a final comprehension test in which the same questions were tested again (retest condition) and new questions on the same passage were tested (related condition).Although the largest memory benefits came from direct retesting, performance on the related questions was significantly higher than the control condition (passage with no previous testing or related testing).Even though the benefits are clear, repeated testing is costly.In fact, when instructors are polled on the topic, they report that the primary disadvantage to using frequent exams is the time required to administer the exams (Bacdayan, 2004).Online examinations provide a mechanism to increase exam frequency without losing instructional time.
Even with the logistical and educational advantages offered by the use of online examinations, it remains important to remember the student and, ultimately, his or her learning outcomes.Would students have different outcomes if they were tested online instead of in the classroom?Alexander, Bartlett, Truell, and Ouwenga (2001) compared the exam scores of students who were tested in a traditional classroom setting -paper and pencil -to students tested on computers in a proctored lab.No significant difference in test scores was found between the two groups.Results like this are promising, but they do not fully address our question as students are rarely proctored during online exams.
There are many uncontrolled factors when exam periods are unsupervised.One prominent issue is that students might use notes or textbooks during an online exam when they are not permitted to use those materials during an in-class exam.Apart from the issue of academic honesty, it is possible that these behaviors affect learning.For example, Brothen and Wambach (2001) Still and Still Journal of Teaching and Learning with Technology, Vol. 4, No. 2, December 2015. jotlt.indiana.edu 32 found that quizzing as a study technique is ineffective compared to traditional study methods if students look up the answers while taking the quiz.However, Agarwal and Roediger (2011) obtained contradictory results.They assigned participants to take open-or closed-book tests and then tested their comprehension again after two days.At initial test, participants in the open-book condition scored higher on the test.After two days, though, there was no significant difference between the groups in comprehension.In a follow up study (Experiment 2), participants were told to expect an open-or closed-book test in the future, but were given a surprise test in the interim to see how the groups' preparations for the test differed.They found that participants expecting an open-book test had studied less and they scored lower on the test than those who were expecting a closed-book test in the future.
When examining these findings, it is difficult to predict how student comprehension might be affected by the use of online testing in place of traditional in-class exams.Online and in-class exams can produce equivalent comprehension effects under proctored situations (Alexander et al., 2001), but our online exams would not be proctored.Because online exams would be unsupervised, we assumed that student behaviors would change; students might use their noteswhich may or may not have an effect (Agarwal & Roediger, 2011;Brothen & Wombach, 2001)and students might study less for the online test because they plan to use their notes (Agarwal & Roediger, 2011).Although we could not know how the unsupervised nature of the online test would affect student behavior and comprehension, we did want to reduce the likelihood of students adopting clearly maladaptive study habits.We were concerned that when students knew an upcoming exam was to be taken online (unsupervised, open-book), they would not study as much as when they had an upcoming exam to be taken in class (closed-book).In an attempt to counteract this behavior, we administered online exams twice as often as in-class exams (the online exams were half the length of in-class exams).Not only do students often prefer more frequent exams (Bangert-Drowns, Kulik, & Kulik, 1991), there is evidence that frequent testing can enhance learning (Landrum, 2007;McDaniel, Agarwal, Huelser, McDermott, & Roediger, 2011).Landrum (2007) found greater comprehension for students who took weekly in-class quizzes compared to those who took traditional unit exams; further, the benefits were greatest for the bottom third of students.It is possible that the comprehension benefits associated with frequent examination are related to changes in study habits.Although it is well known that students should space their study episodes over time (e.g., study a little every day) to maximize learning outcomes (e.g., Rohrer & Pashler, 2007), many students believe cramming is an effective means for achieving high exam scores (Taraban, Maki, & Rynearson, 1999).Therefore, we hoped that by having more frequent online examinations, students would feel compelled to adopt a study strategy that was more similar to one they would use if they were only taking traditional in-class examinations -that is, we hoped they would study more often.
In addition to taking online and in-class exams, students completed a comprehensive exam and two reflective surveys.From an educational standpoint, the inclusion of a comprehensive exam at the end of the semester was important.If one of the goals of instruction is to facilitate long-term retention, it seems comprehension should be tested after a substantial delay (e.g., more than a week).Using the data from the exams and surveys, we were able to 1) compare long-term retention of material tested online to material tested in class, 2) compare performance on in-class and online exams, 3) compare the amount of time students study in preparation for frequent online exams and for less frequent in-class exams and 4) consider additional benefits that may be associated with online exams (e.g., student subjective experience).

Participants
The university institutional review board approved all experimental procedures.Students (N = 139) from two sections of Introductory Psychology taught by the same instructor participated in the study.Participants received course credit in exchange for agreeing to participate.

Materials and Procedure
David Myers' (2009) introductory psychology textbook, Psychology in Everyday Life, was the content foundation for the course.We wrote the exam questions (multiple choice with four alternatives) to reflect content that appeared in the textbook and was addressed in class.We used the same exam questions and time limits (i.e., an average of one minute per question) for both online and in-class exams to better equate the testing conditions.
The course was originally designed with four multiple-choice exams ("unit exams" with approximately 50 questions each) being the primary method of assessment.In this traditional format, the exams would be administered in class every 3-4 weeks.For this study we modified the traditional format by replacing unit exams with shorter, and more frequent, online exams (approximately 25 questions each).Thus, in preparing the course for this study we constructed a total of eight online exams from the existing four traditional unit exams.Because each online exam assessed half of the content of a traditional unit exam, individual online exams were worth half the points (11% of final grade) of a traditional in-class unit exam (22% of final grade).
Testing condition was manipulated within subjects with each student taking two traditional in-class exams and four, more frequent, online exams.Students were assigned to one of two testing orders according to their class section: in one section students took two in-class exams during the first half of the semester and four online exams during the second half of the semester; in the other section the order was reversed.This counterbalancing was intended to decrease the influence of confounding factors like fatigue (e.g., lower motivation at the end of the semester) and practice effects (e.g., familiarity with the instructor's questioning style, more effective study or organizational strategies).
Online exams were administrated through Blackboard (http://www.blackboard.com/).Students were given a three-day window to take the exam.They were encouraged to take online exams in a campus computer lab to reduce the chances of technical problems and they were encouraged to organize their notes to facilitate quick searches, however neither suggestion was enforced.By contrast, notes and other materials were not allowed during in-class exams because we wanted to maintain a typical in-class testing environment.To minimize other potential confounds, we controlled several aspects of the testing condition: students could not retake exams, no immediate feedback was provided, and the same time limits were enforced in both testing conditions.
In addition to taking exams for normal course assessment, students took an in-class, 26question comprehensive final (questions came from each of the previous exams) at the end of the semester.When analyzing the results of the final, we only used questions that provided some discrimination between students (i.e., those with Point Biserial values above +0.3).The purpose of including a comprehensive final exam was to assess retention of information that had been tested via traditional in-class exams and information that had been tested via more frequent online exams.This assessment provides a way to see if student learning was adversely affected by the use of online exams.Because the comprehensive exam score was not included in students' final grades, they were given an incentive; they received extra credit if they scored 80% or higher.
Finally, students were asked to complete two short reflective surveys in class.At midsemester students were asked to estimate the number of minutes spent studying for each exam.At the end of the semester students were asked to indicate their testing preferences and practices (e.g., preferences for online or in-class testing, subjective test difficulty, self-reported test anxiety, study habits).

Results
All statistical tests used an alpha level of .05.Several dependent measures were used to determine the effectiveness of frequent online testing.First, performance on the cumulative final for questions initially tested online compared to questions initially tested in-class were used to gauge retention differences related to testing conditions.Those data were then categorized by the students' final course grade (only includes exam scores, no assignments or extra credit) to examine differential effects on subsets of students.Letter grades in the course were assigned such that students earning 90-100% of the points earned an A, 80-89% earned a B, 70-79% earned a C, 60-69% earned a D, and anything below 60% earned an F. We investigated this possibility because low-performing students have been shown to benefit more from frequent quizzing than other students (Landrum, 2007).Second, a comparison of in-class and online exam scores was used to determine whether or not subsets of students (groups based on final grade) were immediately impacted by the testing manipulation.Finally, student responses on the reflective survey were used to examine testing condition effects on study habits and testing preferences.

Impact of the Testing Manipulation on Comprehension
At the end of the course, students completed a comprehensive exam; half of the questions had appeared before on online exams, half on in-class exams.A paired samples t-test examining student performance on comprehensive exam questions failed to reveal a significant difference between performance on content assessed through online exams (M = .54,SD = .18)and in-class exams (M = .57,SD = .17),t(101) = -1.61,p = .11.
Despite failing to reject the null, it is possible that subgroups of students did benefit from frequent online testing.To investigate this possibility, a second set of analyses were conducted after dividing students into groups according to their final grade (see Figure 1).A repeated measures ANOVA examining online and in-class comprehension as a function of final course grade (between subjects factor) revealed no significant main effect of testing condition, F(1, 96) = 2.67, p = .11,ηp 2 = .03,final grade, F(4, 96) = .64,p = .64,ηp 2 = .03,and no interaction, F(4, 96) = 1.16, p = .34,ηp 2 = .05.The most noteworthy, although not statistically significant, result was performance for students earning an A for a final grade, t(13) = -1.68,p = .12.They scored numerically lower on comprehension questions over material that had been tested online (M = .51,SD = .20)than on material tested in class (M = .63,SD = .22).When these results are taken together, frequent online testing did not provide a significant benefit to long-term comprehension of course

Impact of the Testing Manipulation on Exam Performance
Overall students earned higher scores for online exams (M = .75,SD = .20)than for in-class exams (M = .70,SD = .14),t(124) = 3.35, p = .001.But, this general finding, does not provide an adequate description of the results.As before, we followed up with an ANOVA in order to examine possible differences between subgroups of students.There was a main effect of testing condition, F(1, 120) = 9.81, p = .002,ηp 2 = .08,a main effect of final grade, F(4, 120) = 338.38,p < .001,ηp 2 = .92, and both were qualified by an interaction, F(4, 120) = 9.99, p < .001,ηp 2 = .25.Paired samples t-tests were used to examine the interaction (see Table 1 for descriptive and inferential statistics).Students earning an A, B, or C scored significantly higher on the online exams than the in-class exams.No effect was found for students earning a D in the course.Students earning an F, by contrast, scored lower on online than in-class exams.We believe this occurred because some students simply forgot to take the online exams thereby earning zero points.Based on these data, it appears online testing may inflate the grades of some students.However, there are many potential reasons for this inflation, some of which could be controlled.For example, students who score higher on online exams might have better organizational strategies than other students -something we cannot control -or they may have collaborated with classmates -something that can be minimized through the use of randomly selected questions from a large database (Daniel & Broida, 2004).

Impact of the Testing Manipulation on Study Time
At mid semester, students were asked to estimate the amount of time they had spent studying for each exam.Half of the students had only taken online exams and half had only taken in-class exams.Because online exams assessed half as much content as in-class exams, we multiplied students' average reported study times by two in order to have a fair comparison with study times for in-class exams.An independent samples t-test failed to reveal a significant difference between reported study time for online exams (M = 129 minutes, SD = 98) and in-class exams (M = 108 minutes, SD = 93), t(91) = 1.078, p = .284.

Student Preferences
Descriptive data regarding students' subjective experience of online and in-class exams revealed that 74% of the students preferred online exams.In addition, 83% of students with self-reported test anxiety preferred online exams.Finally, even though the same questions and time constraints were used for online and in-class exams, 75% of students reported that in-class exams were more difficult.

Discussion
The purpose of this study was to examine the ramifications of implementing frequent online exams compared to traditional in-class unit exams.Based on previous research, it was not clear what those ramifications would be.While the actual format of the exam (online vs. paper and pencil) has little or no impact on performance (Alexander et al., 2001), student strategies may differ based on the format causing a difference in performance.For example, students may adopt different study strategies when they know they have an open-book exam compared to a closed-book exam (Agarwal & Roediger, 2011).Similarly, they may adopt different testing strategies for unsupervised online exams compared to traditional in-class exams; specifically, they might use their notes or textbooks for online exams.The evidence is mixed as to whether or not there is a detrimental effect on comprehension when students look up the answers to questions (c.f., Agarwal & Roediger, 2011;Brothen & Wambach, 2001).Our intent was to, at minimum, maintain the

Still and Still
Journal of Teaching and Learning with Technology, Vol. 4, No. 2, December 2015.jotlt.indiana.edu37 academic outcomes associated with in-class testing in our online testing condition.Therefore, we attempted to counteract the effects of suboptimal strategies by using frequent online exams; frequent testing has been shown to enhance learning (Landrum, 2007;McDaniel, et al., 2011).In addition to providing a practical comparison of frequent online testing to traditional testing, we used a within subjects design to increase internal validity and examined both short-term (individual exam performance) and long-term comprehension (comprehensive exam performance) effects.
Although one must exercise caution when interpreting null results, it appears possible to obtain similar long-term retention outcomes using frequent online exams compared to in-class exams.In addition, if we assume that students used their textbooks and notes for online exams, these results parallel Agarwal and Roediger's (2011) finding that open-book exams do not necessarily harm comprehension.In further interpreting these results, there are two issues to consider.First, although the predominant finding was no effect of testing condition on comprehension, students who earned an A in the course demonstrated a non-statistically significant trend toward lower comprehension in the online testing condition.This potential limitation to online testing merits further examination.Second, in our study the testing manipulation (online vs. in-class) was conflated with exam frequency.This design was deliberate, it reflects the applied nature of the study.We wanted to investigate the effect a practical, but informed, change in exam administration would have on student comprehension.While we could have simply contrasted online and in-class unit exams, we were concerned that the unconstrained nature of online exams would have a negative impact on student study habits.Students do not prepare as much for an unsupervised online exam as they would for an in-class exam (c.f., Agarwal & Roediger, Experiment 2); we hoped that more frequent, but otherwise equivalent, testing would counteract this tendency.Student self-reported study times indicate that we were successful in this regard as there was no significant difference in self-reported study times for online and in-class exams.
Another concern with online testing is that the assessment might be less valid than that provided by in-class testing; for instance, online exam scores might not be an accurate reflection of student knowledge.One result of this could be grade inflation.When we compare online to inclass exam scores, on average, "A students" scored 7% higher, "B students" scored 11% higher, and "C students" scored 6% higher on online exams.Although, these increases would not affect the letter grade for A students, they could impact the letter grades for B and C students.We interpret this as grade inflation because the higher online scores were not associated with any increase in long-term comprehension of the same material.This type of grade inflation could be controlled by reducing the point-values of online exams and including an in-class comprehensive final exam that counts toward the final course grade like a typical exam.
Although there are many reasons why online and in-class exam scores might differ, one of the most troubling explanations would be that these students were more likely to cheat (e.g., collaborate with other students to gain an unfair advantage).According to Daniel and Broida (2004), typical cheating practices reported by students include sharing quizzes and looking up answers in the textbook.Fortunately, there are ways to minimize cheating beyond the method we employed in this experiment (i.e., time limitations).Cheating behaviors can be reduced by drawing questions from large test banks, limiting the time students could spend on each question (Daniel & Broida, 2004), and by blocking access to other internet resources during the exam.These practices are easily employed within most web-based content management systems.Upon implementation of these measures, Daniel and Broida found no difference between online and inclass quiz performance.In addition, it has been found that student online scores on mastery quizzes correlate with in-class exam scores (Maki & Maki, 2001); this provides additional evidence that online quizzes and exams can provide valid assessments of student learning.
There are several potential benefits to online administration of exams.From a financial standpoint, they reduce costs associated with printing and administration of exams, a savings that only increases with the size of the course.In addition, online testing changes the normal time constraints associated with the classroom, providing the opportunity for repeated testing without sacrificing other instructional activities.From a convenience standpoint, they can allow students more flexibility in scheduling their own exam times; they also allow instructors the ability to conveniently administer make-up exams.Finally, from a subjective standpoint, students simply prefer frequent online exams.Some students in our study even claimed that taking the exam online reduced their test anxiety (c.f., Stowell & Bennett, 2010).Although we do not know why students had this preference, it may come from the increased sense of control, or agency, they have over the testing conditions (as hypothesized by Stowell & Bennett, 2010), or from a less stressful testing situation, or from their perception of the in-class exams being more difficult.Of course, it is always possible that they simply prefer online exams because many of them performed better on those exams, perhaps with the aid of notes or textbooks.
Our goal was to provide a practical means for achieving equivalent, or better, educational outcomes under the pressures of increasing course enrollments.While we did not see better educational outcomes with frequent online testing, we did not see a detriment to educational outcomes.Even so, we acknowledge that technological aids are not without cost.Online course management systems can be time consuming to use (Brewster, 1996).Not only do instructors have to configure the system, they often have to manage technological issues encountered by students.For example, in running this study, students would occasionally ask to schedule a make-up exam because of computer-related failures.These issues are compounded if there is a system-wide failure (e.g., downed server) during an examination period.In our study the number of students requesting an online testing accommodation was minimal (approximately four requests were received for each exam, a low number considering the 139 student-enrollment).Thus in this situation, the time saved via the content management system overcame the time cost associated with responding to students' technical issues.
In conclusion, frequent online exams can serve as a viable alternative to traditional in-class exams.Not only is this testing technique practical, frequent online testing in this study was shown to impose little, if any, cost to long-term comprehension.

Figure 1 .
Figure 1.Comprehension of material originally tested online or in-class.Gray bars represent average student performance on the comprehensive final for questions that had originally been tested online while the black bars represent performance on questions that had originally been tested in class.Errors bars represent the standard error of the mean.