Getting an Honest Answer: Clickers in the Classroom

Asking students to raise their hands is a time-honored feedback mechanism in education. Hand raising allows the teacher to assess to what extent a concept has been understood, or to see where the class stands on a particular issue, and then to proceed with the lesson accordingly. For many types of questions, as the evidence here demonstrates, the tally from a public show of hands misrepresents the true knowledge or preferences of the class. The biases are predictable and systematic. Specifically, students raising their hands tend to herd and vote with the majority answer. Beyond impeding the teacher’s ability to assess her class, such herding threatens to diminish learning by limiting the level to which a student engages with the questions posed by the teacher.


Introduction
A teacher asks her class to work out a math problem. She then asks the students to raise their hands if they think the answer is greater than zero. A few confident hands shoot up while a greater number of less confident eyes dart around the room, collecting the relevant data necessary to "solve" the math problem. Slowly the rest of the hands go up. The teacher, satisfied that her class clearly understands the concept, moves on.
This story has played out generation after generation, at every level of learning. It played out much the same way in a one-room schoolhouse in rural New Hampshire in 1750 as it did in the classrooms of hundreds of colleges and universities, and thousands of primary and secondary schools last year. The raising of hands as a means of responding to a teacher's questions is lowtech and low-cost; its simplicity and minimal resource requirements have made it the preferred approach in many settings, including the classroom. But is it reliable? And if not, is there a better way?
For students' responses to be useful in informing an instructor's teaching, the instructor must be confident that the responses she sees accurately reflect the students' knowledge, skills, beliefs, or preferences. This study shows that such confidence is misplaced when students respond with a show of hands. Relying on that technique, instructors are likely to overestimate students' grasp of a concept, making it difficult to target instruction to concepts or ideas that need additional elaboration. To make matters worse, herding behavior by students raising their hands may reflect not merely a decision to vote with the majority contrary to the students' own thinking, but also decisions to vote with the majority instead of thinking about the question on their own. If so, herding may reduce students' cognitive activity in response to teachers' questions. This keeps questions posed in the classroom, however well developed, from promoting meaningful student engagement and learning.
This paper examines the reliability of hand raising as a useful pedagogic tool by comparing it with another technique that aggregates student answers, a technique that does not allow students to see their classmates' responses. If hand-raising students do not employ other students' answers when selecting their own, these two aggregation techniques should yield similar results (up to the degree of sampling error). Yet, the study we conducted suggest that these two techniques produce considerably different results. Moreover, the pattern of the differences in results suggests that students herd towards the most popular responses, an easy and comfortable approach when there is a show of hands.

Background
One relatively new technology that may be able to improve upon the ancient technique of hand raising is the audience-response system. Audience-response systems (ARSs) employ electronic handheld response keypads ("clickers") that allow students in a classroom to respond individually and anonymously to multiple-choice questions posed by the instructor. The responses are collected and aggregated, and the distribution of responses can be viewed by the instructor, and presented to the class in real time if she wishes. In addition to clickers, a number of other products and technologies can be used to solicit feedback from a classroom, including products designed for students' smartphones. These products serve many of the same purposes as our clickers; they allow students to vote without seeing others' responses and allow teachers to tally the responses quickly and accurately. We expect many of the results discussed in this paper will be applicable to these other technologies.
ARSs have been around since 1966, when Stanford University introduced an expensive, difficult-to-use version. ARSs did not become commercially available until the 1990s, but even then, the cost was prohibitive for most schools. Prices fell, and by the 2000s, ARSs began to be commonly used in secondary schools, colleges, and universities (Abrahamson, 2006). In the relatively brief period --certainly relative to hand raising's existence --that ARSs have been used in classrooms, a number of studies began to establish the promise and the challenges of this technology for improving classroom outcomes (Bruff, 2009). We will briefly review the key findings of these studies before describing how our present study contributes to this literature.

Benefits of Clickers
In their 2009 review of the literature on ARSs, Kay and LeSage examined 67 papers and chapters on the use of this technology in classrooms, primarily in undergraduate classrooms. These studies found overwhelmingly that clickers in the classroom were popular. Of the 38 studies that measured student and/or teacher attitudes towards clickers, 36 reported respondents had positive views of the technology on average (Kay and LeSage, 2009). Since student satisfaction often correlates imperfectly with student learning, it is important to explore the ways in which clickers may impact learning specifically, in addition to the positive reviews they are getting from users.
A number of studies show results indicating that student attendance (Shapiro, 2009), attention (Siau et al., 2006;Latessa and Mouw, 2005), and engagement (Draper and Brown, 2004;Simpson and Oliver, 2007) are greater when clickers are used in the classroom. (Kay and LeSage provide a complete list of relevant studies.) Unfortunately, most of these studies fail to distinguish between two possible explanations for the source of this finding. The first is that some aspect of responding by using clickers, rather than by raising hands, promotes attendance, attention, and engagement, for example forcing students to think rather than merely mimicking the choices of others. The second is that encouraging students to respond to more questions during a class, regardless of how they respond, may be the primary driver of these effects. Since, in these studies, the introduction of clickers in classrooms was often accompanied by an increase in the questions that were asked during lectures, it is difficult to parse the explanations for these improvements in attention and engagement. The many studies that find associations between clicker use and learning outcomes, such as test scores or course grades (Kennedy and Cutts, 2005;Preszler et al., 2007;Powell et al. 2011), suffer from the same difficulties. A notable exception is Mayer et al. (2008), whose research design included a control group with no questions and no clickers as well as a control group with questions but no clickers; Mayer et al. found that clickers use produced significant gains in test scores over both control groups.
Two benefits of clickers are of particular interest to us here. First, unlike hand raising, clicker use can be anonymous. Student surveys have shown this anonymity to be important to some groups of students (Draper and Brown, 2004;Crouch and Mazur, 2001), and there is experimental evidence to suggest anonymity can positively affect classroom discussion and debate (Ainsworth et al., 2011). Second, assuming that students participate and give honest answers, clicked responses can be an important source of feedback for teachers. Thus, it allows for contingent teaching, whereby instructors adjust their teaching plans in real time in response to the feedback they receive from the clickers (Draper and Brown, 2004). Students can also get beneficial feedback, learning where their responses fall in the distribution of the responses by the class (Abrahamson, 2006).

Clickers and Response Reliability
Our study provides evidence of the effect of the choice of response technique --hand raising or clicking --on student responses. In evaluating the benefits clickers bring to a classroom in terms of providing real-time feedback and allowing for contingent teaching, previous research has mostly focused on their ability to quickly collect, aggregate, and display student responses (Kay and LeSage, 2009). Little work has been done to determine whether clickers may in fact change these responses, and if so how. We show evidence that clicked responses differ from raised-hand responses in predictable ways. Specifically, vote shares for responses given by hand raising are likely to be more extreme (closer to 100 or 0 percent, a result consistent with herding behavior) than vote shares for responses given by clicking.
The process of responding by clicking differs from that of responding by hand raising in two important ways. First, raised-hand responses are immediately observable to the rest of the group, potentially influencing individuals who respond more slowly. Clicked responses are not observable until after the entire group has responded. Second, an individual can respond by clicking, knowing that his response is anonymous and cannot be revealed publicly. (For expository ease in this essay, respondents are male and questioners female.) In theory, anonymity is achievable with hand raising. It requires that all the students close their eyes, and that all the students trust that all other students close their eyes. In practice, such an outcome is difficult to achieve in many settings.
We suggest that there are two primary reasons why respondents might be influenced by the answers of others when choosing their own responses. Both could explain why herding behavior might emerge in situations in which individual responses are elicited in a group setting.
First, respondents can learn from other students' answers. Such learning sometimes occurs in a flash, though anecdotal observation indicates that there is often a second or two delay as individuals glance around the room to see the prior indications of others. When there is a right or a wrong answer to a question, observing before responding can clearly be a sensible strategy. A respondent who is poorly informed on a topic, wisely defers to the "wisdom of the crowd" and sides with the majority. Or if he can identify an expert (or relative expert) in the group, he may side with that person. If many respondents identify the same expert, this follow-the-expert strategy will produce aggregate response results that look very similar to a strategy of following the majority, even if only one person is being followed.
Second, respondents are concerned about preserving their reputations and avoiding embarrassment (Stowell et al., 2010). Such concerns could apply to questions for which there is a single definite right answer. For instance, a respondent may think the majority is probably wrong on a given question, but voting with it prevents him from any worries about being among a small dissenting minority that gets the answer wrong. Those providing erroneous answers love company. Clustering with others may also be helpful when there is no single right answer. Many people do not wish to be out of step with others in announced beliefs about political situations or ethical dilemmas, or even about something as simple as whether they liked a particular movie. Witness how groups coming out of a movie tend to concur in their ratings, though other similar groups produce quite different ratings. Observing others' responses allows individuals a chance to adjust their responses accordingly, based on the stated views of those who answer early.

Classroom Responses and Learning
In considering the relevance of this potential herding behavior to learning, it is important to consider why teachers do or should ask questions during class in the first place. Wittrock's generative theory of learning emphasizes that it is not a learner's behavioral activity (raising hands, clicking clickers) that causes learning, but rather the learner's cognitive activity that leads to their response. Students learn better when they are engaged in appropriate cognitive activity (Mayer and Wittrock, 2006;Wittrock, 1990). Specifically, Mayer outlines three cognitive processes that aid learning: selecting the relevant material, organizing that material into a coherent representation in working memory, and integrating that representation into long-term memory (Mayer 2001 and2008). Well-designed questions in the classroom can spur students to engage in these cognitive processes as they identify the material being asked about, organize this material as they work towards an answer, and then incorporate this experience into their prior knowledge. Herding behavior, perhaps enabled by hand raising, may short-circuit this process and impede learning. The following section describes how our study was designed to measure the degree to which hand raising may facilitate this herding.

Research Design
The two key research questions we examine in this paper are the following: (1) Do students give the same responses when using raised hands as when using clickers?
(2) If the answer to (1) is no, are the differences consistent with a herding hypothesis?
This section describes the research design we used to answer these two questions. It first describes the study protocol and indicates the types of questions we asked individuals in our sample. We then specify the key outcome variables of interest.

Study Protocol
To compare response outcomes using these two different techniques, hand raising and clicking, we conducted trials with 22 different groups consisting of over 1100 participants in total. 4 Each group was divided into two, usually according to where the participants were seated. When logistically feasible, the facilitators randomized assignment to the clicking and handraising groups, rather than simply splitting the groups based on the seating arrangement at the time. Random assignment was implemented in nine of the 22 trials. Quasi-random assignment (seating was assigned alphabetically by name) was implemented in an additional six trials. The remaining seven groups were formed according to the subjects' chosen seating arrangements. For these groups, random assignment should not be assumed. The tables reporting key results of these trials are replicated in Appendices B, C, and D, restricting the sample to trials implemented with random and quasi-random assignment.
Once assigned to groups, participants were asked one to four questions that each had only two possible responses. After the first question had been posed to the whole audience, the first half of the group was asked to respond anonymously using clickers. When the clicker half of the group had finished responding, the other hand-raising half was asked to respond with hand raising. Note that the two groups never knew the answers the other group had given prior to submitting their own responses. If the group was asked a second question, the two halves of the room switched roles. The original hand raisers responded first with clickers; then the original clickers responded with hand raising. Given that the two groups were likely to be similar to each other given the random or quasi-random assignment, the aggregate response results should not have been very different, or at least not consistently so. Systematic differences would suggest that the response technique influences the responses themselves.

Question Types
The choice of questions used in this study is likely to be an important determinant both of whether any differences are observed in the aggregate responses from hand raising and clicking, and if so of what magnitude. We might expect greater differences for questions that are especially sensitive or for those requiring specific knowledge that might be gained from the answers by others. Thus, we might expect some individuals asked, "Would you support legislation to require the deportation of immigrants convicted of more than one misdemeanor?" would respond differently if their answer were public or anonymous. A student's response to a question requiring the solution of a probability problem might take guidance from the answer of a classmate known for his strong mathematical abilities. By contrast, we might expect little difference between clicked and hand-raised responses on the question, "Do you prefer vanilla or chocolate ice cream?" Identifying and understanding the variation in the questions used in the study will be important for interpreting the results presented in the next section.
A total of 61 questions were asked in the 22 trials. Each question fell into one of four italicized categories, which reflect the different incentives one might have for considering the responses of others before submitting a response.
No Single Right Answer, Sensitive Topic (27 of the 61 questions). These questions explore a respondent's preference or belief about a potentially sensitive topic. Some of these questions are political in nature ("Would you support legislation to allow gay marriage in the state of New York?"). Others are ethical ("Would you reveal information about a mechanical problem of a car you are trying to sell?"). If made public, the responses to many of these questions might have reputational consequences for individuals, depending on the composition of the group. For instance, one question asks whether the individual plans to use the psychological principles and tools taught during the experiment to get her colleagues to agree to policies that differed significantly from what they would otherwise support. It is not clear that there is an ethically right or wrong answer to this, but one can imagine a roomful of colleagues scanning the crowd to note whose hands go up for the 'yes' vote.
No Clear Right Answer, Not a Sensitive Topic (14 of the 61 questions). These questions also may pertain to respondent preferences, but about topics that are much less sensitive than those in the previous category. Some are less sensitive because the topics they cover are trivial. An example would be "Do you enjoy watching any reality TV shows?" In some instances, even apparently trivial preference questions might prove embarrassing. Indeed, some participants might be hesitant to admit that they like reality TV. Some questions that are not sensitive ask for answers that will eventually prove to be either right or wrong, outcomes that will not be known immediately ("Who do you think will win the Monday Night Football game?" and "Will the Dow Jones Average be up more than 6% per year over the next 3 years?"). We expect to observe fewer differences in the hand raising and clicking responses for such "innocuous" questions than for the other three categories. Unlike questions in the other three categories, these questions have no single right (or socially acceptable) answer that respondents may be tempted to glean from observing others.
Single Right Answer, Factual (10 of the 61 questions). This category consists of questions about specific facts. Responding to these questions relies on factual knowledge and perhaps on some degree of general experience rather than on critical thinking. Examples include "Is the population of Turkey greater than 100 million?" and "Are there more than 45 countries in Africa?" Hand-raising respondents unsure of the correct answer may be tempted to follow the "wisdom of the crowd" or to identify one or more knowledgeable individuals to help inform their response. For simplicity, our factual questions were yes/no questions. However, the same forces would be at play if we asked factual questions and offered multiple possible answers.
Single Right Answer, Conceptual (10 of the 61 questions). The questions in this category were used mostly in graduate-level statistics classes and concerned conceptual material related to the courses. For most students, these questions could not be answered based simply by recalling memorized facts. Instead, working out the answers to these questions required drawing on knowledge as well as thinking critically about the problems. Some resembled trick questions. For instance, "You ask a woman you just met if she has any children. She says she has two. You ask if she has at least one girl. She says yes. Given this information, is the probability that both are girls equal to 50%?" 5 Other questions are less like riddles, but still require some degree of critical thinking ("Do all bivariate regressions suffer from omitted variable bias?" 6 ). Some respondents will be tempted to identify experts in the class whose responses they might mimic, or to go with the crowd if they do not know who is expert. Conceptual or factual questions will tempt subjects to seek less guidance from others in areas where they are knowledgeable, and more guidance from others where embarrassment might attend error.

Outcome Measures
Two key outcome measures provide answers to the two research questions of interest. First, we computed the absolute differences in vote share to assess whether hand raising and clicking lead to different aggregate response outcomes for various types of questions.
Next, we constructed a variable called "distance from extreme" (DFE) to determine whether any of these differences provide evidence of herding behavior among the hand-raising group. Let vote_share be the proportion of participants responding with the first option. Then DFE will be: = min ( _ ℎ , 1 − _ ℎ ).
Since there are only two options for a given question, the DFE is the same no matter which response is arbitrarily chosen as the "first option." On questions for which the majority response for both hand raisers and clickers is the same, an outcome is consistent with herding to the majority when the DFE for hand raising is less than the DFE for clickers. In other words, when hand-raising vote shares are closer to the extremes (0 or 1) than the clicker vote shares. If 90% of respondents using raised hands (DFE = 0.10) answered that they believed Africa has more than 45 countries, as opposed to 70% of respondents with clickers (DFE = 0.30), this would be consistent with an explanation that some hand raisers who would have otherwise guessed fewer than 45 countries chose instead to herd and vote with the majority.
On questions for which the majority response of hand raisers differed from that of clickers, defining what it means for results to be consistent with herd behavior is less obvious. We do note that if we take the clicker response to be an unbiased measure of herd-free behavior, then getting a result on the opposite side of 50% could reflect herding. In one trial, for example, a majority of clicker responders (56%) indicated support for the US intervention in Libya taking place at the time, compared to only 25% of hand raisers, a difference significant at the 5% level. Hand raisers asked if they support this controversial intervention might be hesitant to admit that they do if they only see 25% of their classmates' hands go up in support.
An individual learning from others will have a personal tipping point that depends on his information. Herding that is triggered by a tipping point that differs from 50% is certainly a realistic possibility, indeed one to be expected in many situations, perhaps particularly on sensitive questions. Some students may only need to know that at least 40%, rather than a majority, of their classmates agree with their support of the intervention in Libya before they reveal their own support. We plan to explore more complex tipping behavior in future work. Since this paper is concerned primarily with herding to the majority, our herding analysis in the next section drops the few questions (7 out of the 61) where the majority responses of hand raisers and of clickers differed. The main results of the study are robust to including these seven questions.

Hands and Clickers Lead to Different Responses
While we would expect hand-raising and clicking results to differ simply due to sampling fluctuations, the differences in these trials prove to be far stronger than chance would produce. Thus, they provide strong evidence that the technique for eliciting choices (hand raising vs. clicking) will affect individual responses in group settings. On average, the absolute difference between the vote shares for a given response with hand raising and with clicking was 14.37 percentage points. 7 Fifty-seven percent (35 of 61) of the questions resulted in vote shares that differed by more than 10 percentage points between the two response techniques; over 40% of those (15 of 35) differed by more than 20 percentage points. Figure 1 plots the vote shares of the first responses for hand raising and for clicking. If vote shares were consistently the same across techniques, the points would fall along the solid 45-degree line in Figure 1. Points outside the region bounded by the dotted diagonal lines in Figure 1 are questions for which the difference in vote shares was greater than 10 percentage points. Taken individually, 14 of the 61 differences (23%) in vote share were significant at the 5% level, assuming a normal sampling distribution for the differences in these proportions, as would be expected to emerge in a large sample. Using Fisher's method 8 to aggregate the results and test the null hypothesis that there are no differences in vote share across voting techniques yields a p-value virtually indistinguishable from zero (1.6 × 10 −13 ). However, given the small sample sizes of many of the individual trials, 9 we chose not to assume a normal sampling distribution and instead used the more conservative Fisher's Exact Test. 10 Using this method, only 8 of the 61 differences (13%) proved significant at the 5% level. However, taking the trials as a whole, the p-value associated with the null hypothesis that the two techniques produce the same vote shares is 0.0012. In short, it would be exceedingly unlikely for the trials to yield results this extreme if raised hands and clickers did not produce different results.
These differences are driven mainly by sensitive questions that had no single right answers, and by factual and conceptual questions with single right answers, as Table 1 shows. Disaggregating by question type reduces statistical power, but even with the reduced power (and using the conservative Fisher's Exact Test to compute p-values for individual trials) we see evidence of significant differences in aggregate responses between techniques for questions in these three categories. Consistent with our priors outlined earlier, the evidence of differences between hand and clicker votes is much weaker for questions that have no single right answer and are not sensitive, the only category of question for which this difference is not statistically significant at any conventional level. 17.18% 6 out of 10 (60.0%) 0.011** * significant at 10% level, **5%, ***1% 8 Fisher's method aggregates p-values from k independent tests using the formula 2 2 = −2 ∑ ln( ) =1 , which has a chi-squared distribution with 2k degrees of freedom. 9 Note that the average number of participants across all 22 trials is about 50. 10 Same Fisher as before, different method.

Differences in Hand and Clicker Responses are Consistent with Herding
36 of the 54 questions (or 67%) for which the majority responses of hands and of clickers were the same had more hand-raising respondents choose the more common response than did the clicking respondents (see Figure 2), indicating the presence of majority herding. On average, the majority vote share of hand raisers was 6.46 percentage points greater than that of clickers. Since each question had only two responses, a 6.46 percentage point increase in the majority response also meant a 6.46 percentage point decrease in the minority response, implying a spread of nearly 13 percentage points between responses to a given question. These figures are robust to the inclusion of the seven questions for which the majority responses of hands and clickers were not the same. Including these seven questions, 40 of the 61 questions (66%) showed signs of herding among hand raisers, with an average increase in the majority response vote share of 6.46 percentage points, the same figure as without the seven questions.  Figure 1 by shading the areas that represent data points consistent with majority herding behavior. Figure 3b plots the DFEs for these clicker and hand response data. Together, Figures 2, 3a, and 3b reveal the extent to which the questions that have the largest differences between hand-raising and clicking vote shares also tend to be those in which the signs of those differences tend to be consistent with herding.
Note from Figure 3b that in six of the 54 questions (11%), hand raising led to a unanimous response, indicated by a DFE of 0. No clicker vote was ever unanimous. Hand raising had 18 votes at 90% or above or at 10% or below. Clickers only had three such votes. Of the 36 questions that showed signs of herding, in only six (or 17%) of those cases was this herding statistically significant at the 5% level; another three were significant at the 10% level. This is not surprising, given the small sample sizes for each question. However, aggregating the test statistics for the herding tests of each of the 54 questions (and allowing those questions in which herding did not take place to count against a conclusion of herding) yields an aggregate Z-score of 4.90, which is statistically significant well beyond any conventional level. 11 Inclusion of the seven dropped questions yields an aggregate Z-score of 4.95, also significant at any conventional level.
This evidence for herding hand raisers on average is strong and robust, persisting across a number of test specifications. Table 2 reports the results of eight models regressing the distance from extreme variable, or DFE, on a dummy variable for responses given by clicker. The coefficients estimated for this dummy, Clicker, can be thought of as estimates of the degree to which herding is taking place among hand raisers. Specifically, it is the average difference between the vote share received by the majority response of the hand raisers and that of the clickers. According to the results in the first specification, responding with clickers is associated with a 6.2 percentage point increase in DFE. Analogously, hand raising is associated with majority response vote shares that are 6.2 percentage points greater than those associated with clicking. Note that this coefficient is significant at the 1% level for every specification. The significance of these coefficients is robust to the inclusion of the seven dropped questions, though including these seven questions results in slightly smaller point estimates (ranging from 0.055 to 0.060). After the data are disaggregated by question type, the evidence remains strongly consistent with herding, especially in both the sensitive and factual questions that we saw were driving the significant differences between hand-raising and clicker results shown earlier. Table  3 summarizes the herding evidence by question type. Running this analysis while including the seven dropped questions yields similar results, with the exception of the p-values for factual questions with single right answers. For these questions, with no clustering of the standard error, the p-value is 0.066; with clustering of the standard error at the question level, it is 0.203. In the 17 questions for which there was a single correct answer (the last two rows of Table 3), evidence of herding emerged in 12 of the questions. In a nod towards the "wisdom of crowds," in all but one of these 12 cases, the herding was towards the correct answer (hand raisers did better than clicker responders). In each of the five cases where there was no evidence of herding, the clickers did better than the hand raisers. But while hand raising may increase the number of students responding with the correct answer, that is hardly the goal if many students are merely following others. It seems likely that clicker responses give the teacher a better picture of where the students actually are in understanding a given concept or question.

Conclusion
In response to the first key research question examined in this study (Do students give the same responses when using raised hands as when using clickers?), the evidence presented here provides a definitive no. Over half (57.3%) of questions show a difference in hand and clicker responses of over 10 percentage points, and nearly a quarter (24.6%) show a difference of over 20 percentage points. Taking all 61 questions together, we can reject at the 1% significance level the null hypothesis that the two techniques do not generate differences in vote shares. Statistically significant differences in vote shares mostly persist even when the data are split out by question type. The one exception is questions that have no single correct answer and that do not involve sensitive issues. We hypothesize that these low-stakes questions may not be as likely to lead to students' reliance on the responses of other students.
The answer to the second research question (Are the differences consistent with a herding hypothesis?) is less definitive. One explanation that is consistent with our data is that when answers "matter" (that is, when the question has only one correct answer or is uncomfortably sensitive), students raising their hands have a greater tendency to herd and vote with the majority. Two-thirds of questions in our trials show evidence of this type of herding, including the 11% of questions that elicited unanimous responses given hand raising. (No clicker response led to a unanimous outcome.) Majority-vote shares for hand raisers average 6.46 percentage points higher than for clickers. These results are robust to a number of alternative regression specifications and weighting schemes. While majority herding may explain only part of the differences between hand-raising and clicker responses, the results of this study suggest it is an important one.
Herding in response to questions in the classroom can short-circuit Wittrock's generative learning process and thereby rob questions of their ability to spur the type of thinking important for learning. In fact, though our study was not designed to test this, it seems reasonable to believe that academically weaker students are more likely to herd. Students who probably have the greatest need for in-class generative learning processes are those most likely to simply vote with the majority. Since such a voting strategy requires only behavioral activity and little to no cognitive activity, the learning process is stunted.
Herding presents different additional problems for the teacher, especially since our results suggest that herding pushes more of the class towards the correct answer, on average. Teachers engaged in contingent teaching need a reliable means to ascertain what fraction of a class has understood a given topic or question. The hand-raising technique, given its herding correlate, is likely to exaggerate a class's grasp of the material to some unknown extent. This undermines the teacher's ability to engage in effective contingent teaching.
The implications for instructors are clear. The large and persistent differences found in this study between the responses communicated by raised hands and those communicated by clickers should be considered by teachers who persist with hand raising when they interpret students' shows of hands. If a high proportion of hand raisers respond correctly to a question, it may mean that the class has understood the topic and the lesson can move forward. Unfortunately, it may also mean that although some students did indeed understand, others simply voted with the herd.
Questions in the classroom have two major purposes: First, to engage students in meaningful cognitive activity; and second, to enable the teacher to assess student capabilities, knowledge or preferences. Either purpose is defeated if students simply vote with the herd, as they frequently do with hand raising. By contrast, clickers, and other audience response systems, only allow students to respond on an individual basis. The use of clickers in the classroom enables questions to better fulfill their intended pedagogic roles.