Institutional Selectivity and Good Practices in Undergraduate Education: How Strong is the Link?

Academic selectivity plays a dominant role in the public's understanding of what constitutes institutional excellence or quality in undergraduate education. In this study, we analyzed two independent data sets to estimate the net effect of three measures of college selectivity on dimensions of documented good practices in undergraduate education. With statistical controls in place for important confounding influences, an institution's median student SAT/ACT score, a nearly identical proxy for that score, and the Barron's Selectivity Score explained from less than 0.1% to 20% of the between-institution variance and from less than 0.1% to 2.7% of the total variance in good practices. The implications of these findings for what constitutes quality in undergraduate education, college choice decisions, and the validity of national college rankings are discussed.

The logic underlying the use of average or median student test scores as a proxy for undergraduate educational quality is not unreasonable. Students are not simply the passive recipients of undergraduate education delivered by a college or university's faculty. Rather, interactions with other students constitute a major dimension of the educational impact of an institution on anyone student (e.g. Astin, 1993;Kuh, Schuh, Whitt, & Associates, 1991;Pascarella & Terenzini, 1991Whitt, Edison, Pascarella, Nora, & Terenzini, 1999). Consequently, the more academically adroit one's peers are, the greater the likelihood of one's being intellectually stimulated and challenged in his or her classroom and nonclassroom interactions with them-or so the argument goes. Similarly, a well-prepared student body may provide faculty with greater latitude to increase academic expectations and demands of students in the classroom and, thereby, even further enhance the impact of an institution's academic program.
Measures of an institution's academic selectivity, such as average student SAT/ACT scores, are not just a convenient and easily obtainable proxy for college quality. They also playa dominant, if perhaps unintended, role in more elaborate and public attempts to identify the nation's "best" colleges and universities. Probably the most nationally visible and credible of these attempts to identify and rank postsecondary institutions based on the quality of their undergraduate education is the annual report by U.S. News & World Report (USNWR) (Ehrenberg, 2003). USNWR "bases its college and university rankings on a set of up to 16 measures of academic quality that fall into seven broad categories: academic reputation, student selectivity, faculty resources, student retention, financial resources, alumni giving, and ... graduation rate performance" (Webster,200 I,p. 236). In a creative approach employing principal component analysis, Webster (2001) found that, although the average SAT (or ACT equivalent) score of enrolled students constituted only 6% of an institution's overall USNWR quality score, it was by far the criterion that most clearly determined an institution's rank. Consistent with Webster's more extensive analyses, we conducted a preliminary analysis for this study that estimated the simple correlation between average SAT/ACT score and the 2002 USNWR ranking for the "top 50" national universities. Even in this substantially attenuated distribution of schools, the correlation between the USNWR ranking (I = highest, 50 = lowest) and the average SAT/ACT scores of enrolled students was -.89. (Average SAT/ACT equivalent score was obtained from the 2002 report America's Top Research Universities, Lombardi, Craig, Capaldi, & Carter, 2002.) For all practical purposes, then, the USNWR ranking of "best" colleges can be largely reproduced simply by knowing Institutional Selectivity and Good Practices 253 the average SAT/ACT scores of the enrolled students. Beyond this index, the other so-called "quality" indices make little incremental contribution to the USNWR rankings.
While it is clear that an aggregate institution-level index of student body selectivity is a widely used and frequently accepted proxy for college "quality," the net impact of such an index on the outcomes of college is less certain. Over the past 30 years, three large-scale syntheses of the college impact literature have estimated the influence of institutional selectivity on a wide range of student and alumni outcomes (Bowen, 1977;Pascarella & Terenzini, 1991, in press). These three syntheses are consistent in arriving at two general conclusions about the role of college selectivity. First, although there is considerable uncertainty over causality (e.g., Arcidiacono, 1998;Dale & Krueger, 1999;Kane, 1998;Knox, Lindsay, & Kolb, 1993), the academic selectivity of one's undergraduate institution is, nevertheless, positively linked to career success and, in particular, to earnings. The second conclusion, however, is that when student cognitive, developmental, and psychosocial outcomes are considered, the net impact of college selectivity tends to be small and inconsistent.
It is not entirely clear why institutional selectivity demonstrates such equivocal impacts on student cognitive, developmental, and psychosocial growth during college. However, a number of scholars and social scientists who study the impact of college on students have suggested a reasonable explanation. They argue that there is substantially more variation within than between institutions. Put another way, the vast majority of colleges and universities in the American postsecondary system have multiple subenvironments with more immediate and powerful influences on individual students than any aggregate institutional characteristic (Baird, 1988(Baird, , 1991Hartnett & Centra, 1977;Smart & Feldman, 1998). Consequently, institutional selectivity (as measured by average student SAT/ACT score or a similar aggregate index) may simply be too global and remote an index to tell us much about the quality and impact of a student's classroom and nonclassroom experiences (Chickering & Gamson, 1987Kuh, 2001Kuh, , 2003Kuh, , 2004Pascarella, 2001 a;Pike, 2003). In an attempt to shed further light on this issue, the present study estimated the relationships between college selectivity and students' experiences in what existing evidence suggests are dimensions of good practice in undergraduate education. To determine the robustness of our findings, parallel analyses were conducted on two independent data sets. The first was the National Study of Student Learning (NSSL), a federally funded longitudinal investigation conducted in the mid-1990s, and the second was the National Survey of Student Engagement (NSSE), a cross-sectional investigation carried out in 2002. In the NSSL data, a proxy selectivity measure consisting of incoming student test scores on the Collegiate Assessment of Academic Proficiency (ACT, 1990) was used. In the aggregate, this measure correlated .95 with average institutional SAT/ACT score. In analyses of the NSSE data, two measures of selectivity were employed: median student SAT/ACT equivalent score and, for comparative purposes, the Barron's Selectivity Score, which combines median SAT/ACT score with other measures of the stringency of an institution's admissions requirements.

Good Practices in Undergraduate Education
In a project sponsored by the American Association for Higher Education, the Education Commission of the States, and The Johnson Foundation, Gamson (1987, 1991) synthesized the existing evidence on the impact of college on students and distilled it into seven broad categories or principles for good practice in undergraduate education. These seven principles or categories are: (1) student-faculty contact; (2) cooperation among students; (3) active learning; (4) prompt feedback to students; (5) time on task; (6) high expectations, and (7) respect for diverse students and diverse ways of knowing (Chickering & Gamson, 1991). The influence of Chickering and Gamson's seven principles has been extensive. For example, the NSSE, one of the most broad-based annual surveys of undergraduates in the country, is based on questionnaire items that attempt to operationalize the seven good practices (Kuh, 2001).
Although a detailed review of the extensive research on good practices is far beyond the scope of this paper, several examples are nevertheless useful for illustrative purposes. Analyzing longitudinal data from the Cooperative Institutional Research Program, Avalos (1996) found that a scale measuring the quantity and frequency of student-faculty interaction (e.g., informal conversations outside of class, working on a research project, being a guest in a professor's house) had a significant, positive link with postcollege occupational status. This association persisted even when controls were made for such factors as precollege occupational status, family background, and grades.
Meta-analyses of experimental and quasi-experimental research have clearly indicated that cooperative learning experiences provide a distinct advantage over individual learning experiences in fostering growth in both knowledge acquisition and problem-solving skills (Johnson, Johnson, & Smith, 1998a, 1998bQin, Johnson, & Johnson, 1995). There is also evidence to suggest that involvement in cooperative group class projects has a positive net effect on growth in leadership abilities and job-related skills (Astin, 1993).
A growing body of evidence has suggested that involvement in diversity activities (i.e., racial, cultural, intellectual, and political) has a positive net influence on critical thinking skills and other measures of cognitive growth during college (e.g., Gurin, 1999;Pascarella, Palmer, Moye, & Pierson, 200 I). However, the benefits of involvement in diversity experiences during college appear to extend to one's postcollegiate life. Gurin (1999) found that involvement in diversity experiences during college (e.g., discussing racial/ethnic issues, socializing with students from a different ethnic group) had a positive net influence not only on alumni reports of community involvement but also on alumni reports of the extent to which their undergraduate experience prepared them for their current jobs.
There is ample experimental and correlational evidence that effective teaching (e.g., teacher clarity and teacher organization) has positive effects on both knowledge acquisition and more general cognitive competencies such as critical thinking (Hines, Cruickshank, & Kennedy, 1985;Pascarella et aI., 1996;Wood & Murray, 1999). However, it also appears that receipt of effective teaching at the undergraduate level may positively influence one's plans to obtain a graduate degree; this influence is independent of background characteristics, tested ability, precollege plans for a graduate degree, college grades and social involvement, and the average graduate degree plans of students at the college attended.
It is clear from the existing evidence that we can identify empiricallyvalidated dimensions of good practice in undergraduate education that not only enhance cognitive and personal development during college, but also are linked to a range of postcollege benefits. The present investigation sought to estimate the extent to which these good practices are influenced by the academic selectivity of the institution one attends. In operationalizing good practices, we were guided by the research on the predictive validity of different dimensions of good practice reviewed above. Indeed, many of the operational definitions of good practices employed in this investigation were either adapted or taken directly from the studies on predictive validity previously cited (e.g., Cabrera et aI., 2002;Feldman, 1997;Hagedorn et aI., 1997;Pascarella et aI., 1996;Terenzini et aI., 1994;Whitt et aI., 1999).

Method
The results of this investigation are based on analyses of data from two multi-institutional samples: the longitudinal National Study of Student Learning (NSSL) and the cross-sectional National Survey of Student Engagement (NSSE). The sample descriptions, data collection procedures, and variables for each database are described below.

NSSL Sample
The NSSL institutional sample consisted of 18 four-year colleges and universities located in 15 states throughout the country. Institutions were chosen from the National Center on Education Statistics IPEDS data to represent differences in colleges and universities nationwide through a variety of characteristics, including institutional type and control (e.g.,

Institutional Selectivity and Good Practices 257
private and public research universities, private liberal arts colleges, public and private comprehensive universities, historically Black colleges), size, location, commuter versus residential character, and ethnic distribution of the undergraduate study body. Our sampling technique produced a sample of institutions with a wide range of selectivity. For example, we included some of the most selective institutions in the country (average SAT = 1400) as well as some that were essentially open-admission. The result of our sampling technique was a student population from 18 schools that approximated the national population of undergraduates in four-year institutions by ethnicity, gender, and selectivity level. However, the small number of institutions makes statistical generalizations to the population of American four-year colleges and universities problematic.
The individuals in the sample were students who had participated in the first and second follow-ups of the NSSL. We selected the initial sample randomly from the incoming first-year class at each participating institution, informed them that they would be participating in a national longitudinal study of student learning, and assured them of a cash stipend for their participation in each data collection. We also gave them assurances that the information they provided would be kept confidential and never become part of their institutional records.

NSSL Data Collection
The initial data collection for NSSL was conducted in the fall of 1992 with 3,331 students from the 18 participating institutions. We asked the participants to fill out an NSSL precollege survey that sought information on student background (e.g., sex, ethnicity, age, family socioeconomic status, secondary school achievement) as well as on aspirations, expectations of college, and orientations toward learning (e.g., educational degree plans, intended major, academic motivation). Participants also completed Form 88A of the Collegiate Assessment of Academic Proficiency (CAAP), developed by the American College Testing Program (ACT) to assess general skills typically acquired by students during college (ACT, 1990). The total CAAP consists of five 40-minute, multiple-choice test modules: reading comprehension, mathematics, critical thinking, writing skills, and science reasoning. We administered only the reading comprehension, mathematics, and critical thinking modules with this stage of data collection.
The first and second NSSL follow-up data collections were conducted at the end of the first year of college (spring 1993) and the end of the second year of college (spring 1994), respectively. In both data collections, each participant completed different CAAP tests as well as the College Student Experiences Questionnaire (CSEQ) (Pace, 1990) and a detailed NSSL follow-up questionnaire. The CSEQ and the NSSL questionnaires gathered extensive information about each student's classroom and nonclassroom experiences during the preceding school year. At the end of the second follow-up, complete data was available for 1,485 students, or 44.6% of the original sample tested 3 years previously at the 18 participating institutions. Because of attrition from the sample, a weighting algorithm was developed to adjust for potential response bias by gender, ethnicity, and institution. Within each of the 18 institutions, participants in the second follow-up were weighted up to that institution's end-of-second year population by sex (male or female) and race/ethnicity (White, Black, Hispanic, Other). For example, if an institution had 100 Black men in its second-year class and 25 Black men in the sample, each Black man in the sample was assigned a weight of 4.00.
While applying sample weights in this way corrects for bias in the sample we analyzed by sex, ethnicity, and institution, it cannot adjust for nonresponse bias. However, we conducted several additional analyses to examine differences in the characteristics of students who participated in all years of the NSSL and those who dropped out of the study. The dropouts consisted of two groups: (a) those who dropped out of the institution during the study, and (b) those who persisted at the institution but dropped out of the study. Initial participants who left their respective institutions had somewhat lower levels of precollege cognitive test scores (as measured by fall 1992 scores on the CAAP reading comprehension, mathematics, and critical thinking modules), socioeconomic background, and academic motivation than their counterparts who persisted in the study. Yet students who remained in the study and those who dropped out of the study but persisted at the institution differed in only small, chance ways with respect to precollege cognitive test scores, age, race, and socioeconomic background (Pascarella, Edison, Nora, Hagedorn, & Terenzini, 1998).

NSSE Sample
The NSSE is an annual survey of first-year and senior students designed to assess the extent to which students engage in empirically-derived good practices in undergraduate education and what they gain from their experience (Kuh, 2001). The NSSE sample for this study consisted of 76,123 undergraduates (38,458 first-year students and 37,665 seniors) from 271 different four-year colleges and universities who completed the NSSE in the spring of 2002. The sample of institutions is described in Table 1 and closely resembled the national profile of four-year Institutional Selectivity and Good Practices 259 colleges and universities. More than 40% were Master's institutions. Approximately one fourth of the institutions were Doctoral (Extensive or Intensive Universities) and approximately one fifth (19%) were Liberal Arts Colleges. Less than 10% of the institutions were Baccalaureate General Colleges.
The largest representation of students came from Master's Universities, with more than one third of all first-year students and seniors in the sample. Approximately one fourth of all students were from Doctoral Extensive Universities, and just 10% were from Doctoral Intensive Universities. Slightly more than 18% of the sample were enrolled in Baccalaureate Liberal Arts Colleges, and only 7% were students at Baccalaureate General Colleges.

NSSL Variables
We initially attempted to use the average SAT/ACT equivalent score for entering students at each of the 18 participating institutions as the independent variable in our analyses of the NSSL data. However, two of the institutions in the NSSL sample were essentially open-admissions schools, and it was not possible to obtain reliable average SAT/ACT data for them. Consequently, we used a composite of the average precollege CAAP reading comprehension, mathematics, and critical thinking scores as a proxy for the average SAT/ACT score of incoming students at each institution in the study. For the 16 participating schools that had reliable average SAT/ACT scores, the correlation between this index and the average precollege CAAP composite score was .95. Thus, our measure of institutional selectivity based on the precollege CAAP appeared to be a very strong proxy for average SAT/ACT score, and was the independent variable in the NSSL analyses. In selecting and creating dependent measures in the NSSL analyses, we were guided by the principles of good practice in undergraduate education outlined by Gamson (1987, 1991), and by additional research on effective teaching and influential peer interactions in college (e.g., Astin, 1993;Feldman, 1997;Pascarella & Terenzini, 1991;Whitt et aI., 1999). In the NSSL analyses, there were 20 measures or scales of "good practices" grouped in the following eight general categories: 1. Student-Faculty Contact: quality of nonclassroom interactions with faculty, faculty interest in teaching and student development; 2. Cooperation Among Students: instructional emphasis on cooperative learning, course-related interaction with peers; 3. Active Learning/Time on Task: academic effort/involvement, essay exams in courses, instructor use of high-order questioning techniques, emphasis on high-order examination questions, computer use; 4. Prompt Feedback: instructor feedback to students; 5. High Expectations: course challenge/effort, scholarly/intellectual emphasis, number of textbooks or assigned readings, number of term papers or other written reports; 6. Quality of Teaching: instructional clarity, instructional organization/preparation; 7. Influential Interactions with Other Students: quality of interactions with students, non-course-related interactions with peers, cultural, and interpersonal involvement; 8. Supportive Campus Environment: emphasis on supportive interactions with others.
All dependent measures were formed by summing student responses on the CSEQ and NSSL follow-up questionnaire during the first and second follow-up data collections. Table 2 provides detailed operational definitions and, where appropriate, psychometric properties of all independent and dependent variables employed in the NSSL analyses.

NSSE Variables
The NSSE data permitted us to employ two different measures of college selectivity. The first was the median composite verbal and mathematics SAT (or ACT equivalent) score of first-year undergraduate students at each institution in the sample. The second was the Barron's Selectivity Score. This index has nine categories ranging from "noncompetitive" to "most competitive" and combines the median composite College selectivity: Operationally defined as the average tested precollege academic preparation (composite of Collegiate Assessment of Academic Proficiency reading comprehension, mathematics, and critical thinking tests) of students at the institution attended where institutional data were available, this score correlated .95 with the average ACT score (or SAT score converted to the ACT).

Student-Faculty Contact
Quality of nonclassroom interactions with faculty: An individual's responses on a five-item scale that assessed the quality and impact of one's nonclassroom interactions with faculty. Examples of constituent items were "Since coming to this institution I have developed a close personal relationship with at least one faculty member," "My nonclassroom interactions with faculty have had a positive influence on my personal growth, values and attitudes," and "My nonclassroom interactions with faculty have had a positive influence on my intellectual growth and interest in ideas." Response options were 5 =strongly agree, 4 = agree, 3 = not sure, 2 =disagree, and I =strongly disagree. Alpha reliability = .83. The scale was summed through the first and second year of college.
Faculty interest in teaching and student development: An individual's responses on a five-item scale assessing students' perceptions of faculty interest in teaching and students. Examples of constituent items were "Few of the faculty members I have had contact with are genuinely interested in students" (coded in reverse), "Most of the faculty members I have had contact with are genuinely interested in teaching," and "Most of the faculty members I have had contact with are interested in helping students grow in more than just academic areas." Response options were 5 = strongly agree, 4 = agree, 3 = not sure, 2 = disagree, I = strongly disagree. Alpha reliability = .71. The scale was summed through the first and second year.

Cooperation Among Students
Instructional emphasis on cooperative learning: An individual's responses on a four-item scale that assessed the extent to which the overall instruction received emphasized cooperative learning. Examples of constituent items were "I am required to work cooperatively with other students on course assignments," "In my classes, students teach each other in groups instead of only having instructors teach," and "Instructors encourage learning in student groups." Response options were: 4 = very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability = .81. The scale was summed through the first and second year.
Course-related interaction with peers: An individual's responses on a lO-item scale that assessed the nature of one's interactions with peers focusing on academic coursework. Examples of constituent items were "Studying with students from my classes," "Tried to explain the material to another student or friend," and "Attempted to explain an experimental procedure to a classmate." Response options were 4 = very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability = .79. The scale was summed through the first and second year.

Active Learning/Time on Task
Academic effort/involvement: An individual's response on a 37-item, factorially derived, but modified scale that assessed one's academic effort or involvement in library experiences, experiences with faculty, course learning, and experiences in writing. The scale combined four lO-item involvement dimensions from the CSEQ, minus three items that were incorporated into the Course-Related Interaction with Peers Scale described above. Examples of constituent items were "Ran down leads, looked for further references that were cited in things you read," "Did additional readings on topics that were discussed in class," and "Revised a paper or composition two or more times before you were satisfied with it." Response options were 4 = very often, 3 = often, 2 = occasionally, and I never. Alpha reliability = .92. The scale was summed through the first and second year.
Number of essay exams in courses: An individual's response to a single item from the CSEQ. Response options were I =none, to 5 =more than 20. The item was summed through the first and second year.
Instructor use of high-order questioning techniques: An individual's responses on a four-item scale that assessed the extent to which instructors asked questions in class that required high-order cognitive processing. Examples of constituent items were "Instructors' questions in class ask me to show how a particular course concept could be applied to an actual problem or situation," "Instructors' questions in class ask me to point out any fallacies in basic ideas, principles or points of view presented in the course," and "Instructors' questions in class ask me to argue for or against a particular point of view." Response options were 4 = very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability =.80. The scale was summed through the first and second year.
Emphasis on high-order examination questions: An individual's responses on a five-item scale that assessed the extent to which examination questions required high-order cognitive processing. Examples of constituent items were "Exams require me to point out the strengths and weaknesses of a particular argument or point of view," "Exams require me to use course content to address a problem not presented in the course," and "Exams require me to compare or contrast dimensions of course content." Response options were 4 = very often, 3 =often, 2 = occasionally, and I = never.
Alpha reliability = .77. The scale was summed through the first and second year.
Using computers: An individual's response on a three-item scale indicating extent of computer use: "Using computers for class assignments," "Using computers for library searches," and "Using computers for word processing." Response options were 4 =very often, 3 =often, 2 =occasionally, and I =never. Alpha reliability =.65. The scale was summed through the first and second year.

Prompt Feedback
Instructor feedback to students: An individual's response on a two-item scale that assessed the extent to which the overall instruction received provided feedback on student progress. The items were "Instructors keep me informed of my level of performance" and "Instructors check to see if I have learned well before going on to new material," Response options were 4 = very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability = .70. The scale was summed through the first and second year.

High Expectations
Course challenge/effort: An individual's responses on a six-item scale that assessed the extent to which courses and instruction received were characterized as challenging and requiring high level of effort. Examples of constituent items were "Courses are challenging and require my best intellectual effort," "Courses require more than I can get done," and "Courses require a lot of papers or laboratory reports." Response options were 4 = very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability =.64. The scale was summed through the first and second year.
Number of textbooks or assigned readings: An individual's response on a single item from the CSEQ. Response options were I =none, to 5 =more than 20. The item was summed through the first and second year. Number of term papers or other written reports: An individual's response on a single item from the CSEQ. Response options were I =none, to 5 =more than 20. The item was summed across the first and second year.
Scholarlylintellectual emphasis: An individual's responses on a three-item scale that assessed perceptions of the extent to which the climate of one's college emphasized: I) the development of academic, scholarly, and intellectual qualities; 2) the development of esthetic, expressive, and creative qualities; or 3) being critical, evaluative, and analytical. Response options were on a semantic differential-type scale where 7 =strong emphasis and I =weak emphasis. Alpha reliability =.79. The scale was summed through the first and second year.

Quality of Teaching
Instructional skill/clarity: An individual's responses on a five-item scale that assessed the extent to which the overall instruction received was characterized by pedagogical skill and clarity. Examples of constituent items were "Instructors give clear explanations," "Instructors make good use of examples to get across difficult points," and "Instructors interpret abstract ideas and theories clearly." Response options were 4 =very often, 3 =often, 2 =occasionally, and I = never. Alpha reliability =.86. The scale was summed through the first and second year.
Instructional organization and preparation: An individual's responses on a five-item scale that assessed the extent to which the overall instruction received was characterized by good organization and preparation. Examples of constituent items were "Presentation of material is well organized," "Instructors are well prepared for class," and "Class time is used effectively." Response options were 4 =very often, 3 =often, 2 =occasionally, and I =never. Alpha reliability =.87. The scale was summed through the first and second year.

lnfluential lnteractions With Other Students
Quality of interactions with students: An individual's responses on a seven-item scale that assessed the quality and impact of one's interactions with other students. Examples of constituent items were "Since coming to this institution I have developed close personal relationships with other students," "My interpersonal relationships with other students have had positive influence on my personal growth, attitudes and values," and "My interpersonal relationships with other students have had a positive influence on my intellectual growth and interest in ideas." Response options were 5 = strongly agree, 4 =agree, 3 =not sure, 2 =disagree, and I =strongly disagree. Alpha reliability = .82. The scale was summed through the first and second year.
Non-course-related interactions with peers: An individual's response on a ten-item scale that assessed the nature of one's interactions with peers focusing on non-class, or non-academic issues. Examples of constituent items were "Talked about art (painting, sculpture, architecture, artists, etc.) with other students at the college," "Had serious discussions with students whose philosophy of life or personal values were very different from your own," and "Had serious discussions with students whose political opinions were very different from your own." Response items were 4 = very often, 3 =often, 2 =occasionally, and I =never. Alpha reliability =.84. The scale was summed through the first and second year.
Cultural and interpersonal involvement: An individual's response on a 38-item, factorially-derived, but modified scale that assessed one's effort or involvement in art, music, and theater, personal experiences, student acquaintances and conversations with other students. The scale combined items from five involvement dimensions of the CSEQ, minus eight items that were incorporated into the Non-Course-Related Interactions With Peers Scale described above. Examples of constituent items were "Seen a play, ballet, or other theater performance at the college," "Been in a group where each person, including yourself, talked about his/her personal problems," "Made friends with students whose interests were different from yours," "Had conversations with other students about major social problems such as peace, human rights, equality, and justice," and "In conversations with other students explored different ways of thinking about the topic." Response options were 4 =very often, 3 = often, 2 = occasionally, and I = never. Alpha reliability = .92. The scale was summed through the first and second year.

Supportive Campus Environment
Emphasis on supportive interactions with others: An individual's responses on a three-item scale that assessed the extent to which one's relationships with faculty, administrators/staff, and other students could be described as friendly, supportive, helpful, or flexible (coded 7) to competitive, remote, impersonal, or rigid (coded I). Alpha reliability = .70. The scale was summed through the first and second year.
verbal and mathematics SAT/ACT equivalent score with four other criteria: percentage of first-year students above certain SAT/ACT scores; percentage of first-year students within specific quintiles of their high school graduating class; minimum class rank and grades needed for admission; and percent of applicants admitted.
The NSSE measures several dimensions of good practice. They include the following: student-faculty interaction, active and collaborative learning, academic challenge, diversity-related experiences, and supportive campus environment. Incorporated within several of these general scales are a number of subscales. The student-faculty interaction scale included two subscales, course-related interactions and out-ofclass interactions; the academic challenge scale included a subscale tapping high-order thinking activities; and the supportive campus environment scale contained subscales focusing on interpersonal support and support for learning. Table 3 provides detailed operational definitions of the independent and dependent variables employed in our analysis of the NSSE data.

NSSL Analyses
Analyses of the NSSL data proceeded in a series of steps that used both institutions and individuals as the units of analysis. In all analyses, we used the percent of variance in good practice dimensions associated with or explained by selectivity as an estimate of effect size (Hays,  College selectivity was operationally defined as two variables. The first was the median composite verbal and mathematics SAT/ACT equivalent score of first-year students at each institution in the sample. The second was the Barron's Selectivity Score. This index has nine categories ranging from "Noncompetitive" to "Most competitive" and uses five criteria to determine an institution's selectivity index or score: I) Median composite verbal and mathematics SAT/ACT equivalent score; 2) Percentage of first-year students scoring 500 and above and 600 and above on the SAT; and the percentage of first-year students scoring 21 and above and 27 and above on the ACT. 3) Percentage of first-year students who ranked in the upper fifth and upper two fifths of their secondary school graduating class; 4) Minimum class rank and grade point average required for admission; and 5) Percentage of applicants who were admitted. Student-faculty interaction was a six-item scale with alpha reliabilities of .70 for first-year students and .71 for seniors. Constituent items were: • Discussed grades or assignments with an instructor • Received prompt feedback from faculty on your academic performance (written or oral) • Discussed ideas from your readings or classes with faculty members outside of class • Talked about career plans with a faculty member or advisor • Worked with faculty members on activities other than coursework (committees, orientation, student life activities, etc.) • Worked on a research project with a faculty member outside of course or program requirements A three-item subscale with alpha reliabilities of .62 for first-year students and .61 for seniors. Constituent items were: • Discussed grades or assignments with an instructor • Received prompt feedback from faculty on your academic performance (written or oral) • Discussed ideas from your readings or classes with faculty members outside of class • Talked about career plans with a faculty member • Worked with a faculty member on activities other than coursework (committees, orientation, student life activities, etc.) • Worked on a research project with a faculty member outside of course or program requirements Active and collaborative learning was a seven-item scale with alpha reliabilities of .61 for first-year students and .63 for seniors. Constituent items were: • Asked questions in class or contributed to class discussions • Made a class presentation • Worked with other students on projects during class • Worked with classmates outside of class to prepare class assignments • Tutored or taught other students (paid or voluntary) • Participated in a community-based project as part of a regular course • Discussed ideas from your readings or classes with others outside of class (students, family members, coworkers, etc.) Academic challenge was an eleven-item scale with alpha reliabilities of .73 for first-year students and .76 for seniors. Constituent items were: • Preparing for class (studying, reading, writing, rehearsing, and other activities related to your academic program) • Worked harder than you thought you could to meet an instructor's standards or expectations • Number of assigned textbooks, books, or book-length packs of course readings • Number of written papers or reports of 20 pages or more • Number of written papers or reports between 5 and 19 pages • Number of written papers or reports of fewer than 5 pages • Analyzing the basic elements of an idea, experience, or theory • Synthesizing and organizing ideas, information, or experiences into new, more complex interpretations and relationships • Making judgments about the value of information, arguments, or methods  A three-item subscale with an alpha reliability of .77 for both first-year students and seniors. Constituent items were: • Campus Environments Emphasize: Providing the support you need to help you succeed academically • Campus Environments Emphasize: Helping you cope with your non-academic responsibilities (work, family, etc.) • Campus Environments Emphasize: Providing the support you need to thrive socially 1994). In the first step of the analyses, individual student responses on each of the 20 good practice variables were regressed on a series of dummy variables (i.e., coded 1 and 0) representing the 18 four-year institutions in the sample. This estimate yielded the percent of total variance (or differences) in each good practice variable between institutions.
In the next step in the analyses, we sought to determine the percentage of between-institution variance in good practices that was uniquely explained by college selectivity. Because our measures of good practices were based on student reports, we could not simply compute correlations between institutional selectivity and each good practice dimension. The reason for this is that such estimates do not account for the potential confounding influence of differences between institutions in the characteristics of the students who are reporting on good practices (Astin, 2003;Pascarella, 2001b). As Astin and Lee (2003) have demonstrated, a substantial portion of the differences in student reports of academic and nonacademic experiences during college are explained by differences in the background characteristics of the students themselves. Thus, failure to control for such student precollege characteristics could lead one to conclude that differences in reported good practices are institutional effects when, in fact, they may be simply the result of differences between institutions in the characteristics of the students enrolled. To address this methodological issue, we estimated a model that regressed each average good practice variable on institutional selectivity and three composite measures of student precollege characteristics: a student background composite (age, sex, race, parents' education, and parents' income; a precollege academic composite (secondary school grades, precollege plans to obtain a graduate degree, a measure of precollege academic motivation, and if the college attended was one's first choice); and a high school involvement composite consisting of time spent in high school in eight separate activities (studying, socializing with friends, talking with teachers outside of class, working, exercising or sports, studying with friends, volunteer work, and extracurricular activities).
Our final step in the analyses was to estimate the impact of college selectivity on the total variance in good practices. In these analyses, we estimated a model with individuals as the unit of analysis. This model closely paralleled the model employed with institutions as the unit of analysis. Each individual-level measure of good practices was regressed on college selectivity and the following individual-level student precollege characteristics: tested precollege academic preparation (composite of CAAP reading comprehension, mathematics, and critical thinking, alpha reliability = .83); precollege plans to obtain a graduate degree; a measure of precollege academic motivation (alpha reliability = .65); whether or not the college attended was one's first choice; age; sex; race; parents' education; parents' income; secondary school grades; and time spent in high school in eight separate activities (studying, socializing with friends, talking with teachers outside of class, working for pay, exercising or sports, studying with friends, volunteer work, and extracurricular activities).
All NSSL institution-level and individual-level estimates were based on the weighted sample. Because of the very small sample size (n =18) in the analyses of institution-level data, the critical alpha was set at .10. An alpha of .05 was used for all individual-level analyses. However, because we were conducting multiple analyses on 20 separate good practice dimensions, we applied the Bonferroni correction to all tests of significance.
It is important to point out that even though they differed widely in selectivity, the small number of institutions in the NSSL sample is a clear limitation of this part of the study. We report the results for purposes of consistency in our analyses, but caution against overgeneralization of results based on institutions as the unit of analysis.

NSSE Analyses
For the reasons specified above, the analyses of the NSSE data paralleled those of the NSSL analyses. Thus, two models were estimated. The first employed institutions as the unit of analysis in an attempt to explain the percentage of between-institution variance in good practice dimensions uniquely associated with institutional selectivity. The second model used individuals as the unit of analysis and estimated the percentage of total variance in good practices uniquely associated with selectivity. Both models introduced controls for student precollege characteristics, either at the institutional aggregate or individual level. These characteristics included age, race, sex, whether or not one was a firstgeneration college student, and whether or not one was a transfer student. Separate analyses were conducted for first-year students and for seniors. Because of the somewhat more restricted range of institutional selectivity and scale reliabilities in the NSSE sample, the estimates of total variance explained by selectivity were based on a correction for attenuation (Pedhazur & Schmelkin, 1991). Because of multiple dependent variables, the Bonferroni correction was applied to all tests of significance.

Results
The estimated associations between college selectivity and dimensions of good practice in undergraduate education based on the longitudinal NSSL data are summarized in Table 4. Column 1 in Table 4 indicates the percent of total variance between institutions in each good practice variable. These percentages ranged from 5.6% to 19.7%, with a median of 9.7%. Thus, on average, institutional differences accounted for about 10% of the total variance in good practices. Column 2 shows the percentages of between-institution variance in good practices explained by college selectivity when differences in average student precollege characteristics among institutions were taken into account. As Column 2 indicates, these estimates of college selectivity's unique or net influence on between-institution variance in good practices tend to be modest. They range from less than 0.1 % to 20.4%, with a median across the 20 good practice variables of 3.5%. Thus, on average, something less than 5% of the between-institution differences in good practices was uniquely associated with institutional selectivity. When the Bonferroni correction was applied, none of the unique variance estimates associated with selectivity in Column 2 were significant at even the .10 level.
Column 3 in Table 4 shows the percentages of total variance in good practices accounted for by college selectivity after differences in student precollege characteristics were taken into account. The unique variance percentages associated with selectivity ranged in magnitude from less than 0.1% to 2.7%, with a median across all 20 good practice variables of 0.5%. After applying the Bonferroni correction, college selectivity explained a statistically significant percentage of the variance in 10 of the 20 good practice dimensions. Net of student precollege characteristics, college selectivity explained small but statistically significant percentages of total variance in all four measures of high expectations: ematics, and critical thinking tests) of students at the institution attended. Where institutional data were available, this score correlated .95 with the average ACT score (or SAT score converted to the ACT). "Calculated by random effects ANOVA using student-level data, unweighted N = 1,485.
'Calculated by regression analysis using aggregate-level data, N =

18.
dStatistical adjustments made for the following influences: (a) student background composite, comprising age, sex, race, parents' education, and parents' income; (b) precollege academic composite, comprising secondary school grades, precollege plans to obtain a graduate degree, precollege academic motivation, and if college attended was first choice; and (c) a high school involvement composite, comprising items that measured time spent during high school in eight separate activities (studying, socializing with friends, talking with teachers outside of class, working for pay, exercising or sports, studying with friends, volunteer work, and extracurricular activities). 'Calculated by regression analysis using student-level data, unweighted N = 1,485.
'Statistical adjustments made for the following influences: (a) student background variables, including age, sex, race, parents' education, and parents' income; (b) precollege academic variables, including tested precollege academic preparation (composite of CAAP reading comprehension, mathematics, and critical thinking), secondary school grades, precollege plans to obtain a graduate degree, precollege academic motivation, and if college attended was first choice; and (c) high school involvement variables, including measures of time spent during high school in eight separate activities (studying, socializing with friends, talking with teachers outside of class, working for pay, exercising or sports, studying with friends, volunteer work, and extracurricular activities), course challenge/effort (2.1 %); number of textbooks/assigned readings (2.7%); number of term papers/written reports (1.2%); and scholarly/intellectual emphasis (1.7%). Selectivity also accounted for small but statistically significant variance percentages in all three measures of influential interaction with other students: quality of interactions with students (0.9%); non-course-related interactions with peers (2.6%); and cultural and interpersonal involvement (1.5%). The significant net relationships between college selectivity and two dimensions of active learning/time on task were somewhat contradictory; emphasis on highorder examination questions (0.7%) was positive, but number of essay exams in courses (1.1 %) was negative. Finally, college selectivity accounted for a statistically significant part of the variance in instructor feedback to students (0.7%), but the relationship was negative. The estimated associations between college selectivity and dimensions of good practice in undergraduate education based on the crosssectional NSSE data are summarized in Table 5. Part A of Table 5 summarizes the results for first-year students while Part B summarizes the results for seniors. Columns 2 and 3 in Table 5 summarize the results when median SAT/ACT score was the measure of institutional selectivity. For both first-year students and seniors, college selectivity (as estimated by median institutional SAT/ACT score) had very small, and perhaps trivial, relationships with the various measures of good practices operationalized by the NSSE. Net of student precollege characteristics, median SAT/ACT score explained from less than 0.1 % to 0.3% of the between-institution variance in good practices for first-year students, and from less than 0.1% to 0.6% of the between-institution variance in good practices for seniors. Even after a correction for attenuated range, the corresponding percentages of total variance in good practices explained by median SAT/ACT score ranged from 0.0% to 0.1% for firstyear students and from 0.0% to less than 0.1 % for seniors.
Columns 4 and 5 in Table 5 summarize the unique relationships between selectivity and good practices when college selectivity was operationally defined as the Barron's Selectivity Score. Consistent with our other findings, the associations were quite small, although larger than those found when selectivity was defined as median institutional SAT/ACT equivalent score. Net of student precollege characteristics, the Barron's Score explained from less than 0.1% to 3.8% of the betweeninstitution variance in good practices for first-year students, and from less than 0.1% to 1.2% of the between-institution variance in good practice for seniors. After a correction for attenuation, the corresponding percentages of total variance in good practices explained by the Barron's Score ranged from 0.1% to 2.3% for first-year students and from less     'Median SAT/ACT score was operationally defined as the median composite verbal and mathematics SAT/ACT equivalent score of first-year students at each institution in the sample. The Barron's Selectivity Score has nine categories ranging from "Noncompetitive" to "Most competitive" and uses five criteria to determine an institution's selectivity index or score: 1) Median composite verbal and mathematics SAT/ACT equivalent score; 2) Percentage of first-year students scoring 500 and above and 600 and above on the SAT; and the percentage of first-year students scoring 21 and above and 27 and above on the ACT; 3) Percentage of first-year students who ranked in the upper fifth and upper two-fifths of their secondary school graduating class; 4) Minimum class rank and grade point average required for admission; and 5) Percentage of applicants who were admitted.. "Calculated by random effects ANOVA using student level data, unweighted N = 38,456 first-year students and unweighted N = 37,665 senior students.
'Calculated by regression analysis using aggregate-level data, N =271.
dStatistical adjustments made for the following institution-level aggregates: age, race, sex, transfer status, first-generation student status. 'Calculated by regression analysis using student-level data, unweighted N = 38,456 first-year students and N = 37, 665 senior students.
'Statistical adjustments made for the following student characteristics: age, race, sex, transfer status, first-generation student status.
(-) Indicates a negative relationship between the dependent variable and selectivity. "p < .05, *p < .0 I-Significant at the respective level after a Bonferroni correction.
278 The Journal of Higher Education than 0.1 % to 1.0% for seniors. As with other measures of selectivity (average CAAP test scores and median SAT/ACT scores), the Barron's Selectivity Score had its strongest positive associations with measures of high expectations. However, even on this good practice dimension, the Barron's Score explained only 3.8% of the between-institution variance and 2.3% of the total variance for first-year students and 1.2% of the between-institution variance and 1.0% of the total variance for seniors. I

Conclusions and Implications
College academic selectivity plays a dominant role in the public's understanding of what constitutes "institutional excellence" in undergraduate education. In this study, we conducted analysis of two independent data sets to estimate the net effect of three measures of college selectivity on dimensions of documented good practices in undergraduate education. Consistent with the methodological arguments and evidence of Astin (2003), Astin and Lee (2003), and Pascarella (200Ib), we estimated the effects of institutional selectivity while statistically controlling for the characteristics of the students who reported on good practices. The results were consistent across both samples, suggesting two conclusions.

Conclusions
First, with statistical controls in place for important confounding influences, three measures of institutional selectivity accounted for significant percentages of the variance in established good practices in undergraduate education. Some of these significant, net relationships between selectivity and good practices were negative (e.g., selectivity and number of essay exams in courses; selectivity and instructor feedback to students; selectivity and supportive campus environment). However, the clear majority of them were positive. Thus, from the standpoint of statistically reliable associations, one could conclude from our findings that institutional selectivity does indeed count in terms of fostering good practices in undergraduate education.
A second conclusion, however, is that while institutional selectivity may count in terms of fostering good practices, the magnitude of the net relationships we uncovered suggests that it may not count very much. Net of student precollege characteristics, the between-institution variance in good practices linked to an institution's median student SAT/ACT score, a nearly identical proxy for that score, and the Barron's Selectivity Score ranged from less than 0.1 % to approximately 20%.

Institutional Selectivity and Good Practices 279
Similarly, the net total variance in good practices explained by college selectivity ranged from less than 0.1 % to 2.7%. Not surprisingly, perhaps, college selectivity had its most consistent and strongest net impact on high academic expectations. Yet, even here the major proportion of differences in measures of high academic expectations reported by students (about 80-100% of the between-institution variance, and about 97-99% of the total variance) was unexplained by the selectivity of the college one attended. On average, across all good practice variables we considered, more than 95% of the between-institution differences and almost 99% of the total differences were unexplained by the academic selectivity of a college or university. Put another way, attending a selective institution in no way guarantees that one will encounter educationally purposeful academic and out-of-class experiences that are linked to a developmentally influential undergraduate experience. If a selective institution makes some of those experiences more likely, it does so in rather minimal ways. The student body selectivity of an institution may bear too great a burden as a signal for the impact or quality of the undergraduate education actually received.

Implications
The absence of a substantial relationship between selectivity and good practices in undergraduate education may explain in large part why the weight of evidence accumulated over time indicates a similar inconsistent and trivial relationship between institutional selectivity and measures of learning and cognitive development during college (e.g., Anaya, 1996Anaya, , 1999Astin, 1968;Flowers, Osterlind, Pascarella, & Pierson, 2001;Hagedorn, Pascarella, Edison, Braxton, Nora, & Terenzini, 1999;Knox, Lindsay, & Kolb, 1992;Opp, 1991;Toutkoushian & Smart, 2001). If more selective colleges and universities are not particularly effective in fostering good practices in undergraduate education, it should not be particularly surprising that institutional selectivity makes only an inconsistent and typically small or trivial value-added contribution to student learning and cognitive growth during college.
Because our findings are robust across two distinct samples of students and come from two different decades, they raise questions about the validity of national rankings of college "quality" that are essentially de facto rankings of institutional selectivity. Indeed, two analyses of cross-sectional NSSE data (National Survey of Student Engagement, 2001;Pike, 2003) found little relationship between USNWR rankings of colleges and good practices in undergraduate education. If these rankings are poor indicators of documented good practices in undergraduate education, do they merely reflect rigorous entrance requirements and other so called "quality" dimensions that tend to be highly collinear with institutional selectivity (e.g., reputation, wealth, high graduation rates, and the like)? Such "quality" dimensions are certainly of some consequence in terms of an institution's capacity to allocate status or advantage in one's postcollege career. For some, this in itself may be sufficient justification to take the annual college rankings tournament seriouslyeven if institutional "quality" serves only as a signal for graduates' intelligence and ambition and not the actual impact or quality of the education one receives. If, however, one is concerned about exposure to undergraduate academic and nonacademic experiences that tend to foster personal and intellectual growth, national magazine rankings based essentially on selectivity may offer little guidance for selecting a college.
The academic selectivity of an undergraduate student body is an institutional characteristic that is quite difficult, if not impossible, to change in any meaningful way. For some public institutions, admissions requirements are mandated by state legislatures, while many private colleges and universities may be largely limited to a specific demographic segment of prospective students. Thus, if institutional selectivity was in fact a major determinant of influential good practices, purposeful attempts to enhance or reshape the impact of the undergraduate experience might prove largely futile. The evidence from this study, however, suggests that good practices in undergraduate education are essentially independent of the academic preparation of the students enrolled. At the same time, such practices are amenable to the influence of institutional policies and practices (Kuh, 2001(Kuh, , 2003. To illustrate, consider two measures of effective teaching used in this study: instructional clarity and instructional organization/preparation. Not only have they been validated by experimental research (Hines, Cruickshank, & Kennedy, 1985;Wood & Murray, 1999), their constituent skills (e.g., use of examples, identifying key points, providing class outlines, and using course objectives) may themselves be learnable by faculty (Weimer & Lenze, 1997). Similarly, recent evidence has suggested the potential benefits of learning communities and living-learning centers that attempt to create subenvironments that are more effective within large universities (Inkelas & Wiseman, 2003;Zhao & Kuh, 2004).
Finally, it is important to be clear about what our findings do and do not indicate. Essentially, we found replicated evidence to strongly suggest that institutional selectivity (as estimated by average test scores of incoming or enrolled students) has little, if any, net impact on established good practices in undergraduate education. What our findings do not indicate is the absence of between-college effects on good practices.

Institutional Selectivity and Good Practices 281
Some institutions may be particularly effective in fostering those academic and non-academic experiences that lead to an influential undergraduate education. For example, recent analyses of a subsample of the NSSL data suggest that some small liberal arts colleges may maximize good practices in undergraduate education irrespective of their academic selectivity, residential character, or full-time nature of their student bodies (Pascarella, Cruce, Wolniak, & Blaich, 2003). Similarly, the NSSE Institute for Effective Educational Practices is studying 20 colleges and universities that have higher-than-predicted graduation rates and higherthan-predicted engagement scores (Kuh, Kinzie, & Umbach, 2003). Within this set of institutions are some that are highly selective. Thus, selectivity and effective educational practice are not mutually exclusive. Rather, the findings presented in this paper suggest that one cannot readily identify those institutions providing a developmentally powerful undergraduate experience simply by considering the academic selectivity of their student bodies. Because national magazine rankings of the nation's "best" colleges essentially reflect institutional selectivity, the information they provide about the quality of undergraduate education is likely limited in the same way.
Endnote I Because our interest in this study was primarily magnitude of effect, and because our intra-class correlations (between-institution variances) for the dependent variables were small, we expected our results based on ordinary least squares to be quite close to those of more complex mixedlevel approaches such as hierarchical linear modeling (HLM) (Ethington, 1997). To be safe, we conducted parallel analyses using HLM and found essentially the same results as in our multi-level ordinary least squares estimates.