Intending to Teach Critical Thinking: A Study of the Learning Impacts over One Semester of Embedded Critical Thinking Learning Objects

: Numerous studies from recent years and going back decades suggest that post-secondary students are failing to sufficiently improve their critical thinking (CT) skills during their undergraduate years (Abrami et al., 2015; Arum & Roksa, 2011; Huber & Kuncel, 2016). Meanwhile, institutions have increasingly embraced CT as a core competency and educational outcome. Several studies have demonstrated measurable within-semester increases in CT, but most often without a meaningful control group for comparison (Cargas et al. 2017; Grant & Smith 2018; Styers et al. 2018). This study asks if an intervention of embedding content-driven critical thinking exercises within courses would cause a measurable impact on critical thinking outcomes within one semester. All participating courses were paired with an instructor teaching a control section alongside an experimental section. All sections were exposed to pre-and post-assessments, using the Critical Thinking Assessment test. Pre-post results indicated statistically significant gains for experimental groups compared with control groups.


Study Questions and Study Design
The present study pursued the following research question: Would the classroom deployment of intentionally designed critical thinking learning activities result in observable gains in the quality of students' CT skills over the course of one semester?
For its CT intervention, the study team developed a set of learning experiences that the team calls Critical Thinking Learning Objects (CTLOs), classroom activities intentionally designed to synthesize critical thinking concepts and course-specific content and then integrated into the normal semesterly learning plan of a course.The study model compared different sections of the same course taught by the same instructor in the same semester: one section integrated CTLOs into the routine learning of the course, the other did not.The study team hypothesized that students experiencing these activities in one semester would show significant improvement in their CT competence, when compared with students in the other section of the same course who did not experience these activities.
Four foundational cross-disciplinary considerations informed the study design: adherence to a standard definition of CT; administration of a uniform, reliable CT instrument; consistent implementation of its comparative study model to reduce variability between non-CTLO (control) and CTLO (experimental) groups; and use of a uniform template for development of course-specific CTLOs.Regarding the first consideration of a standard operational definition of CT, the present study defined CT consistent with that of the Center for Assessment and Improvement of Learning (CAIL) at Tennessee Technological University (TTU), the designers of the CAT: CT is the analytical, interpretive, and creative thinking competencies identified across academic disciplines as crucial for skills development in evaluating information, problem solving, and communication (Al-Mazroa, 2017; Center for Assessment and Improvement of Learning, 2017).
To address the second consideration, reliable CT assessment, the study used the CAT.The CAT is a validated instrument for assessing critical thinking across disciplines (Center for Assessment and Improvement of Learning, 2016).Unlike most other CT instruments, the CAT relies entirely upon a set of free response questions rather than Likert scale surveys, an improvement over other tests (Ku, 2009;Haynes et al., 2015).Scores on the 15-question CAT instrument range from 0 to 38; in one multi-institution study the actual student score average was 21.02 (SD = 6.19) with a range of 6.0 to 36.3 (Stein et al., 2007).Its scoring methodology involves extensive college faculty training and participation, minimizing faculty scorer recognition of individual students, randomizing faculty scorer encounters with each test question, and norming faculty scores question-by-question.Each individual answer is independently scored by two separate faculty members and, if those scores match, they are recorded; if they do not match, a third faculty member independently scores the item, and the average of the three scores is recorded.Furthermore, all scored test sheets from institutions are authenticated by CAIL by audit, where the CAIL test designers score a sample of CAT tests already administered and scored at an institution and compare auditors' scoring accuracy with that of the institution.Alongside the reliability and accuracy of CAT results, the faculty-centered approach to administering and scoring the CAT is associated with better faculty engagement with the process and with promoting faculty-led CT interventions (Center for Assessment & Improvement of Learning, 2017;Haynes et al., 2016;Lisic, 2015).
Another reason the study team used the CAT is GGC's longstanding assessment relationship with CAIL from TTU. GGC has administered the CAT for collegewide core curriculum assessment since August 2015.Collegewide assessment results for first-time freshman (FTF), sophomores and juniors completing their General Education curriculum (GEN ED) and seniors about to graduate The third consideration, study design, turned upon the question of minimizing variability of students' classroom experiences among students in control and experimental groups.To do so, participating faculty taught two sections of the same course in the same semester, designating one as the experimental section, the other as the control section.CAT pretests and posttests were scheduled in both experimental and control sections at the beginning and end of the semester.The experimental section differed in that it used CTLOs, the first following the CAT pretest by 1-2 weeks, the second preceding the CAT posttest by 2-4 weeks.The team then aggregated data and compared pre-post performance of the experimental and control groups to ascertain any significant differences.The disciplinary range of the current study is represented in Table 2, which also indicates the number of paired sections that participated by semester.Faculty participants for this study were selected either because of their previous experience in assessing critical thinking at GGC or by recommendation of other members of the research team.
Determining the baseline critical thinking potential of the courses of this study was difficult to ascertain.To clarify at least faculty participants' perceptions of their own courses' intentional learning relationship to critical thinking outcomes, faculty participants were asked to rate their courses' relationship to the above three learning outcomes and to list regular classroom activities aligned with

None
The final consideration was to embed some uniformity in the design and development of CTLOs, which would necessarily differ, as elements of any CTLOs are course-specific.One step the study team took was to apply a uniform intention to the design and development of a CTLO.CTLOs were required to align with course content and outcomes while simultaneously designed to intentionally challenge students to develop three critical thinking competencies: • evaluate how strongly information supports an idea or interpretation, • provide alternative interpretations for information or observations that have several interpretations, and • identify additional information to evaluate alternative interpretations.
These outcomes correspond to the design of the CAT and align with the first of two CT "Skillsets" mapped to CAT questions by CAIL (CAIL, "About the CAT").A second step was to use a uniform set of design tools.For that, the team relied on training and pedagogical templates from CAIL for embedding specific course content and knowledge within an activity aligned with those three learning outcomes.CAIL's training materials guided the team on creating content-specific CT activities and activity-specific CT evaluation rubrics.With further guidance from CAIL trainers and other members of the research team, participating faculty developed the scope, focus, and content of the CTLOs, as well as targeted evaluation rubrics for each CT outcome, by crosswalking their course outcomes, content areas, and semesterly schedule.The study team implemented an additional design standard outside of the CAIL pedagogical toolkit, a reflective review of the activity involving student blind scoring of another student's CTLO, using the rubric the instructor developed to evaluate student CTLO performance.For quality assurance, prior to classroom use of any CTLO, other members of the research team reviewed CTLOs for clarity, adherence to CAIL's design templates, and alignment with the three CT outcomes listed above.Complete, reviewed CTLOs were then permitted for classroom use and stored within a library for future reference and use.See Appendix 2 for a sample CTLO; the rubric has been omitted because it adheres to CAIL's proprietary formatting and content.

Methodology
Once CTLOs were approved for classroom use by the study team, participating faculty deployed them in one of two sections of the same course they were teaching during the same semester.Two CTLOs were used in each experimental section, based on a predicted impact on critical thinking ability and likely variability in the frequency of students' experiencing them due to absence.The study team also recognized that the fact that students were likely to experience different numbers of CTLOs would allow for an additional comparison of the impact of one CTLO versus two CTLOs on end-ofsemester CAT performance.Faculty participants determined the order of CTLO deployment, which depended on when in the semesterly schedule the course content of the CTLO was being taught.
The classroom administration of each CTLO in its entirety took place over two consecutive class periods.The first involved the completion of the activity, the second a review of student responses and reflection on the activity, which included student peer evaluation of the CTLO using the rubric developed for that CTLO.On the first day, the CTLO was completed over approximately 30 minutes of the class period.The exercise was introduced briefly over the first five minutes of the class.Students did not receive intentional coaching on what CTLOs are or what constitutes critical thinking so that the CTLO experience alone could be the focus.The response sheets with the students' college ID number as an identifier were collected by the instructor and graded using the preestablished rubrics.In the subsequent class period, students received the response sheet of an anonymous classmate and were then guided through use of the scoring rubrics so they could score a classmate's work, by both assigning a numerical score in pencil and writing in the margins any specific comments.Students were instructed to be as precise as possible and to avoid personal comments.This peer grading process was applied to all questions featured in the CTLO, with the instructor first introducing and explaining the relevant rubric before asking students to grade; the instructor answered any questions students had while the grading took place.This peer grading process as applied spanned the entire length of the 50-75 minutes class period, to enable students to perform this task without being unduly rushed.Scores from peer graded response sheets were recorded in the same spreadsheet used to record instructor scoring.
In the parallel control section, students performed independent coursework so that they did not receive any additional lecture or any specific instruction that the students in the experimental section would not receive.
Consent was solicited from all students in every section.In all, 345 students consented to participate, 163 in the control sections and 182 in the experimental sections.An incentive of one extra credit point was offered for each class period in which an activity tied to the study took place.The study was approved by the GGC's Institutional Review Board and funded for two years by internal seed grants.As indicated in Table 4, implementation in each experimental section required six class periods throughout the semester (about 7.5 hours in total).Faculty score CAT tests and analyze data from peer grading of CTLO experiences in all courses.
For pre-post testing, the paper version of the CAT test was administered over the duration of one entire class period during the intervals noted in Table 4. Students were introduced to the test using the same script, which instructors read aloud in class before administering pretests.All students began the test at the same time.Instructors recorded the test completion time for each student so that any relationship between test completion time and quality of performance could be explored.Pretests were graded within 3-4 weeks of completion and posttests were graded immediately after finals week each semester.Consistent with CAIL's CAT scoring protocols, CAT scoring was blind and involved faculty from across the college not affiliated with the study.Scorers did not know which tests were from control or experimental groups and could not identify any student whose test they scored.Per test scoring protocols designed and required by CAIL, any tests that had more than two unanswered items were removed from consideration before grading commenced.

Results
Results are presented in Table 5.Overall, 345 students participated in the study, 182 in the experimental sections, 163 in the control sections.For analysis, the study considered only students who took both pre-and posttests because completing both tests provided the most complete impression of critical thinking development through the semester.The study team was aware that the study design could have permitted further comparisons for experimental subgroups experiencing zero or one CTLO.However, the small number of study participants in the experimental group who either experienced no CTLOs (n = 2) or one CTLO (n = 15), made analysis based on the number of CTLOs moot.Subsequent analysis was conducted only on the students in the control group who completed both pre-and posttests (n = 98) and the number of students in the experimental group who completed both pre-and posttests and who experienced two CTLOs (n = 110).6 and 7, respectively.Of the overall study population, 21 students did not self-report class rank.Of students who completed both pre-and posttests, two students in the experimental group and two students in the control group did not self-report class rank.The proportion and scope of gains pre-to-post are illustrated in Figures 1 and 2. Of the 110 students in the experimental condition, 33 showed a decrease in CAT score from pre to post (M = -3.10,SD = 2.03, min = -8.0,max = -1.0),8 showed no change, and 69 showed an improvement (M = 5.28, SD = 2.95, min = 1.0, max = 13.0),demonstrating that ~63% of students experienced a gain.In contrast, of the 98 students in the control condition, 40 showed a decrease in CAT score from pre to post (M = -3.56,SD = 2.59, min = -12.0,max = -1.0), 3 showed no change, and 55 showed an improvement (M = 4.33, SD = 2.76, min = 0.33, max = 12.0), demonstrating that approximately 56% of students experienced a gain.

Do CTLOs significantly impact CAT scores?
A two-way mixed model ANOVA revealed a main effect for Time (pre vs. post) (F (1, 206) = 26.99,p < 0.05, η 2 = 0.12), no main effect for Condition (p > 0.05, η 2 = 0.004), and a significant Time by Condition interaction (F(1, 206) = 4.72, p < 0.05, η 2 = 0.012).Thus, there was a statistically significant difference comparing posttest results (M = 14.88,SD = 5.56) to pretest results (M = 13.17,SD = 5.46) when collapsing across condition.There was no overall difference comparing the control (M = 13.70,SD = 5.35) and experimental (M = 14.32,SD = 5.73) groups when collapsing across time.While the Time by Condition interaction was a weak effect, on average the exposure to the critical thinking exercises in the experimental group provided more than double the gain in performance on the CAT test (M = 2.38, SD = 4.63) compared to that observed in the control condition (M = 0.98, SD = 4.67) (see Figure 3; error bars represent 95% confidence intervals).Post-hoc independent t-tests demonstrated that difference between the control and experimental conditions for the pre-test was not significant (p > 0.05), but was for the post-test (t (206) = 1.73, p = 0.04, d = 0.24).Paired sample tests revealed that the pre vs. post CAT score gain was statistically significant for both the control condition (t (97) = 2.07, p = 0.04, d = 0.21), and the experimental condition (t(109) = 5.39, p<0.01, d = 0.51).The 2.38 point average gain in the experimental condition represents an 18.1% improvement over the baseline pre-test score in that condition while the 0.98 point average gain in the control condition represents a 7.3% improvement over baseline in that condition.We next examined the CTLO scores themselves and their relationship to CAT scores.The maximum score possible for different CTLOs varied from 5 to 10.All CTLO scores were converted to percentages before further analysis was conducted.There was no statistically significant difference between CTLO 1 scores (M = 34.90,SD = 2.42, min = 0.0, max = 90.0)and CTLO 2 scores (M = 32.03,SD = 2.21, min = 0.0, max = 100.0),p > 0.05.CTLO 1 was significantly correlated to the pre CAT (r = +.33) and post CAT (r = +.24) but not to CTLO 2 (r = +.12) or to the pre-post CAT difference score (r = -.11).CTLO 2 was not significantly correlated to the pre CAT (r = +.15),post CAT (r = +.08), or pre-post CAT difference score (r = -.09).Thus, students who engaged in both CTLOs showed a significant improvement in CAT scores relative to control (see Figure 3), but actual scores on the CTLOs did not predict improvement in the CAT.
There was no significant difference in the duration of time (in minutes) to complete the pre CAT (M = 43.66,SD = 9.99, min = 23, max = 83) compared to the post CAT (M = 39.34,SD = 10.04,min = 18, max = 64) for the experimental group, p > 0.05.The pre CAT time was significantly correlated to the pre CAT score (r = +.27) but not the pre-post CAT difference score (r = -.08).The post CAT time was significantly correlated to the post CAT score (r = +.23) but was also not significantly correlated to the pre-post CAT difference score (r = +.06).It is also worth highlighting that that the control condition pre CAT time (M = 43.93,SD = 9.12, min = 23, max = 66) and post CAT time (M = 38.55,SD = 11.00,min = 20, max = 59) were not statistically significantly different from one another, or from the pre CAT and post CAT times for the experimental condition (p > 0.05).
We next examined the impact of student academic rank on CAT scores.As would be expected from the collegewide assessment data, a one-way ANOVA comparing pretest CAT scores by class rank revealed a significant difference (F (3,200) = 2.90,SD = 4.91,n = 24) or sophomores (M = 1.44,SD = 3.96, n = 42), these differences were not statistically significant (p > 0.05).A one-way ANOVA on pre-post changes scores by class rank for the control condition was not statistically significant (F (3, 88) = 1.93, p > 0.05).
Because seniors demonstrated higher baseline CAT scores compared to freshmen in both our collegewide assessment data (see Table 1) and in the current study, it was important to evaluate the proportion of academic ranks in the control and experimental conditions to ensure equivalency.The percentage of students for each academic rank for each condition were: freshmen (18.4% control vs. 20.6%experimental), sophomore (41.9% vs. 47.1%),junior (23.1% vs. 25.9%), and senior (12.7% vs. 14.3%).A chi-square analysis demonstrated that there was no significant relationship between the study condition (experimental vs. control) and academic rank (X 2 (3, N = 204) = 2.77, p = .43).
Taking a different approach to examine if some individuals benefitted more from the CTLOs than others, a correlational analysis was conducted on the pre-test CAT scores versus the pre-post CAT change scores for participants in the experimental group.While the result (r (108) = -.42,p < 0.05) suggests that those who do worse in the pre-test improve the most, an essentially identical result was obtained in the control condition (r (96) = -0.40,p < 0.05) suggesting this effect is due to regression to the mean.

Discussion
The intent of this study aligns with the growing body of research (Evens et al., 2013;Cargas et al. 2017;Grant & Smith, 2018;Grussendorf & Rogol, 2018;Styers et al. 2018), focused on the possibility of observing CT gains over one year.Our results demonstrated that, on average, both control and experimental groups showed statistically significant improvements in CT scores, but that the single semester CT change for those who experienced the CTLOs and the peer-review processes (~18% gain) was essentially double that observed in students who did not (~7% gain).The CT improvement in the control condition could be due in part to organic improvement in critical thinking over a semester of college coursework and here it is worth noting that while no intentional CT training was offered in the control condition course sections, the students in those sections were also enrolled in courses unrelated to this study.Practice effects may have also played a role since the CAT test given at the end of the semester was identical to that students took at the semester start.It is logical to consider that the actual CT gain seen in the experimental condition is superimposed upon the smaller gains seen in the control condition, leaving a true gain of approximately 11%.While the skills students in the experimental group practiced in the CTLOs and peer-review processes overlap with some of those required in the CAT itself, we argue that this is not a limitation but a feature of intentional CT training, and the results indicate that even the limited training provided here at two time-points in the semester, provides statistically significant benefits.
The statistically significant results that answer our central research question, would the classroom deployment of CTLOs result in observable gains in the quality of students' critical thinking skills, likewise align with results of Cargas et al. (2017) and Grant and Smith (2018) and as such contribute to the growing optimism that the teaching of critical thinking may not be as elusive as Willingham (2008) and Roksa and Arum (2011) claim.Another similarity between this study and the work of Cargas et al., Grant and Smith, and Styers et al. (2018) is the use of the CAT to measure overall CT.The validity and interdisciplinary/universal design of the CAT recommend it for future studies.The design of this study notably departs from models of previous studies in two ways.As Styers et al. (2018) commented, "future studies comparing sections of the same course with comparable student demographics, taught by the same faculty using more traditional teaching pedagogies would allow for a more objective analysis" (p.10).This study is novel in that it establishes the baseline conditions for such a comparison in having the same instructor teach two identical classes at the same time, differing only in the integration of CTLOs in one of the classrooms.This allowed the study to draw objective, comparative observations of the impact of a single type of pedagogical intervention, the CTLO and peer-review learning process, given at two time points.This is the first study the authors are aware of that observes a significant impact on CT learning due to a specific learning activity in comparison with matched control sections across diverse disciplines.
Before considering implications, it is worth remarking on a limitation of this study, sample size.The size of the study sample permitted analysis of aggregate results but not of disaggregated results.The study therefore was prevented from considering demographic or other notable variables in student participants' experience that may have impacted learning of CT competencies.For example, results rule out the possibility that disproportionate class rank numbers influenced comparisons of experimental and control sections, but the results also show no more than marginally significant differences between students of different class ranks.More participants are needed to draw those comparisons.Similarly, the nature and content of the knowledge of the classes could correlate with differential impacts, but grouping by broader knowledge domain could minimize the significance of the course-specific design and integration of a CTLO.Furthermore, any broader claims derived from the current data set regarding the CT interdisciplinarity of the results would need to note that no Humanities courses provided data.Course-level analysis is similarly affected.Ns for any one course were too small or varied to permit generalizable conclusions regarding course-specific performance.To consider those questions would involve greater costs, and more faculty and student participants, than the current study could manage.The execution of this study was time and resource intensive.The research team devoted extensive hours to receiving training on the CAT, developing CTLOs, planning on integrating those activities into course content over multiple classroom days in a semester, reviewing materials from other members of the team, evaluating student results on CTLOs, and scoring CAT tests.In particular, the scoring of CAT tests is extremely time intensive because of the CAT's written-answer design and scoring procedure.Multiple-choice CT instruments would have taken less time to score, but the CAT is uniquely reliable and accurate, as well as applicable to diverse learning experiences and learner populations (CAIL, "About the CAT").Scaling up this study to explore more fully the demographics of CT learning would require an even more upscaled investment of time, resources, and people, as well as a restructuring of the faculty training model to accommodate a greater diversity of content knowledge expertise and CT awareness.
A second limitation of this study is its understanding of the experience of the CTLO itself.The study was designed to gauge students' CT development primarily in one way, through instructor scoring of each CTLO.Pretest and posttest CAT scores provided reliable information on the CT learning after both CTLOs.It was assumed that scheduling CTLOs in sequence was sufficient to improve critical thinking.However, we observed that individual CTLO performance did not necessarily improve sequentially.Predictably, high instructor-derived scores on CTLOs 1 and 2 were significantly positively correlated with CAT posttest scores.However, students did no better on CTLO 2 than on CTLO 1.There was no significant difference between CTLO 1 scores (M = 34.90,SD = 2.42, min = 0.0, max = 90%) and CTLO 2 scores (M = 32.03,SD 2.21, min = 0.0, max = 100%, p > .05).There was no significant correlation between differences from CTLO 1 to CTLO 2 and pre/post CAT differences.Several attitudinal or experiential reasons that affect student persistence could have also affected students' performance on the CTLOs and CAT.The study did not seek student attitudes about the experience.Knowing more about how students perceived these experiences could shed light on why mean CTLO student performance declined as the semester progressed and suggest ways of making CTLO experiences more engaging or effective.Also, worth asking is whether faculty CTLO scoring was, itself, consistent.Faculty scored their own students' CTLOs without participation from other members of the study team.A multiple instructor scoring Questions about CTLO scoring consistency treated an individual CTLO as a standalone problem set.These activities are more than the problems themselves: they are constructed as learning events that take place over multiple class periods, involving case studies analysis, data interpretation, situational role-playing and decision-making, and student peer review, and reflection on the experience.
Despite these limitations, the study team is confident that refining its processes and understanding the nature of these activities will only further benefit students.The prospect of overcoming those limitations in fact gestures towards interesting opportunities to expand the directions and implications of this study model.One opportunity pertains to conducting comparative analysis by demography, and one important demographic variable is learning by discipline area or knowledge domain.Broadly considered, the study was conducted in courses in the Social Sciences (Psychology, Management, Geography) and Natural Sciences (Physical Science, Biology).A comparison of impact by knowledge domain could shed light on whether content learning in some areas affects critical thinking development comparatively more than content learning in other areas does.That comparison would also benefit from development and deployment in Humanities courses; the current study was prevented from gathering information about Humanities courses because coronavirus closed GGC the semester that the study was testing CTLO interventions in its first Humanities course, History 1122, Survey of Western Civilization 2. Such analysis could also qualify the understanding of the CAT as an instrument designed to observe interdisciplinary or universal CT.The results presented in Table 7 are suggestive of this.A broader study with greater student participation could clarify questions in that direction as well as pose new questions about CT learning specific to disciplines and knowledge domains.
Given that comparative analyses would hinge upon scaling up the entire study, it would offer further lines of inquiry besides comparing across populations.Because critical thinking grew not necessarily in sequence but certainly by regular exposure to intentionally wrought CT learning activities integrated into routinized learning, providing more students with more frequent CTLO-based experiences is likely to yield greater effects.The current study found that two CTLOs correlated to CT growth over one semester, even though CTLO performance did not incrementally improve over that semester.This suggests multiple exposure to the experience could have greater impacts, perhaps even retained over a greater length of time.An expanded study could involve incorporating this model into learning communities or pathways of courses where multiple CTLOs could grow and reinforce critical thinking for more students over longer spans of their overall college experience.
A third opportunity for further study pertains to a factor intrinsic, indeed central to the design of the entire study -namely, a close examination of whether components of the CTLO impact critical thinking differently, in what ways, and to what effect.The totality of the CTLO experience was measured but not the various steps or phases of each experience.This warrants consideration, as it could be argued that expressly because students in the experimental group were provided both with exercises modeled on CAT skills and a reflection period including blind scoring of those exercises, they were expressly prepared to succeed on the CAT.A closer examination of the design of the CTLOs in relation to CAT skills and questions suggests that students in the experimental group were not unduly prepared to excel at specific CAT questions just because they practiced and reviewed experiences resembling them.CTLOs were based on CAIL templates aligned to Skillset 1 of CATaligned CT competencies.CAT questions aligned with Skillset 1 account for 20 points of the overall CAT score of 38.Results on Skillset 1 questions are provided in Table 8.We observed no such targeted improvement, suggesting that students retained critical thinking abilities not only to answer specific questions built around specific CT skills but also to consider how to answer questions involving different CT skills than what they practiced.
Understanding better the relationship of the features of these learning activities could illuminate further innovative practices with more targeted and impactful CT teaching, integratable in diverse ways within a course or among different courses.The opportunities to expand and innovate are, ultimately, what inspire our continued commitment and optimism.Now that we can observe an impact of our intention to teach critical thinking, we can understand those intentions more clearly to refine our practice for the benefit of our students.We, as teachers, can also learn to intentionally diversify our practices to help more students learn critical thinking in different ways towards, perhaps, even more long-lasting and impactful outcomes.
• The late Dr. Tom Gluick, Assistant Professor of Chemistry, Georgia Gwinnett College.A member of the research team whose insights gave shape to the scope of the study's design, Tom passed away before the drafting of this manuscript.We miss you, Tom.

Figure 3 .
Figure 3. Pre-post CAT gains, control and experimental groups.

Table 1 . College-wide CAT assessment results, Fall 2015 -Fall 2020.
SENIOR), are represented in Table1.See Appendix 1 for a description of the home institution's methodology for administering the CAT for college-wide assessment.Assessment results for GGC's core curriculum are analyzed by interdisciplinary faculty committees at GGC.Because critical thinking is one of the college's core curriculum outcomes, numerous GGC faculty have received training from CAIL on using the CAT for assessment, scoring the CAT, and enhancing CT learning via CT activities in their classes.The work of the faculty committee responsible for oversight of collegewide assessment of critical thinking led to this study.

Table 3 . Faculty Participant Impressions of Critical Thinking Competency Conditions in Control Sections. Evaluate how strongly information supports an idea or interpretation Provide alternative interpretations for information or observations that have several interpretations Identify additional information to evaluate alternative interpretations Degree Outcome 1 taught in control Classroom activities aligned with Outcome 1 in control Degree Outcome 2 taught in control Classroom activities aligned with Outcome 2 in control Degree Outcome 3 taught in control Classroom activities aligned with Outcome 3 in control BIOL 1107K Moderate
Journal of the Scholarship of Teaching and Learning, Vol.22,No.3, September 2022.josotl.indiana.eduthem.The result provided a coherent, qualitative impression of the critical thinking conditions and support in the course without CTLO intervention, in other words, the control condition of the course.Table3conveys faculty participants' impression of critical thinking intentionality in the control sections of the courses of this study.

Table 5 . Overall Study Results.
Results by course knowledge domain (Natural Sciences, Social Sciences) and by self-reported class rank (freshman, sophomore, junior, and senior), are presented in Tables

Table 8 . Study Results, Skillset 1.
Gains of the experimental cohort on questions aligned with Skillset 1 negligibly differed from gains of the experimental cohort on the entire CAT.As indicated in Table5, proportional improvement of the experimental group on the entire CAT was 6.26%(2.38/38).The proportional improvement of the experimental group on questions aligned with Skillset 1 and with CAIL's pedagogical templates was slightly less, 6.25%(1.25/20).The proportional difference of the control cohort was slightly larger, 2.58% (.98/38) on the entire CAT compared with 2.05% (.41/20) on questions aligned with Skillset 1.If the classroom reflection/scoring sessions after the CTLOs were merely preparing students to excel at Skillset 1 questions, results would indicate disproportionate improvement on those questions.
• The Office of the Senior Vice President of Academic and Student Affairs and Provost, Georgia Gwinnett College.Seed grants from the Office of the Provost funded CAT testing for this study for two years.• The Office of Academic Assessment, Georgia Gwinnett College.The Office of Academic Assessment subsidized some CAT testing costs, led CAT scorer training and college faculty scoring sessions, and organized collegewide CAT/CT trainings offered by CAIL.• The Center for Assessment and Improvement of Learning, Tennessee Technological University.The developers of the CAT, CAIL provided valuable advice and guidance throughout this study