Would Smaller Classes Increase Academic Achievement?

Maybe it is cooking (see Oliver post below)—but maybe it is smaller classes that make a difference in academic achievement in our schools. Opinions abound, of course, so what can we learn from the research?

I came across a study by Krueger and Whitmore (2001) that looked at the results of the Tennessee’s Project STAR.[1] This was, at least at that time, the only large-scale randomized experiment ever to measure the effects of class size. In this experiment, 11,600 elementary students and their teachers were randomly assigned to small classes (13-17 students), regular-size classes (22-25 students), or regular-size classes with a teachers-aide. The experiment began in 1985 when students entered kindergarten and lasted through the 3^rd grade. After 3^rd grade, they returned to regular-size classes. The researchers looked to see if there are discernable impacts in the 8^th grade testing as well as scores on ACT or SATs taken in high school.

Their bottom-line conclusion was that small classes did increase test scores, that it increased test scores more for black children than for white children, and those effects were still evident in later grades, although not nearly as great as they were in the earlier grades. Given their results, the writers conclude: “If all students were assigned to a small class, the black-white gap in taking a college entrance exam would fall by an estimated 60 percent.” (see http://www.irs.princeton.edu/pubs/pdfs/451.pdf)

Experimental designs are the preferred research strategy when trying to answer cause-and-effect questions. The random assignment is its hallmark feature. By randomly assigning students into one of three types of classrooms, it is assumed that any particularly quality (intelligence, socio-economic status, educational levels of parents, etc) would be randomly distributed to each of the groups. Random assignment controls for these other possible explanations for academic performance. Similarly, randomly assigning teachers would also mean that differences in teaching skills and styles would be controlled for and leave only class size as the explaining variable.

However, it is difficult to have all control necessary once the researchers are out of the laboratory. And that was the case here. Small challenges arose. Not all those who attended small classes did so for all 4 years, although they do not explain why. It is also possible that the students and teachers in the small classes knew this was something special and it may be been this “specialness” that provided increased motivation to perform well. Researchers sometimes call this the Halo-effect.

A closer look at the description of the experiment revealed that there was some amount of shifting between the smaller and regular-sized classes. While it is unlikely to have changed the overall pattern of the findings, it is possible that the exact estimates would be slightly different.

Lastly, the data analysis looked at scores on ACT scores in later years and did not find statistically significant differences based on class size. However, the researchers suggest that there might be some threats to validity with this analysis—what they call selection and treatment effects: “since a larger percentage of students assigned to small classes took the exam, a larger share of weaker students in small classes likely took the test. As a result, it is difficult to interpret the score results because scores are only reported conditional on taking the exam, and the treatment appears to have affected the likelihood of taking the exam” (p.22).

This is a high quality research study, in my opinion. All research is flawed and measuring complex phenomenon is very difficult. Even in the best designs, there are things that the researchers cannot control.

Perfection is not the standard to use in assessing the validity and quality of a research study (although imperfections will become the target for those who disagree with the results). The real challenge is to work in that gray area of imperfection: how much imperfection and what kind of imperfection are needed to toss research results out?

In this case, the researchers used a very strong design, did extensive analysis, reported the results, and were honest about the warts. So yes, this is a high quality design in my assessment and I would use these results as part of making decisions about strategies to improve academic achievement. But I still might invite Jamie Oliver to pitch in.:)

[1] Adler, Moser (2010). Economics for the Rest of Us: Debunking the Science that Makes Life Dismal, www.thenewpress.com

Gail Johnson's Research Demystified

Handy Dandy Guide to Research Basics

Leave a Reply Cancel reply