The Most Dangerous Equation, or Why Small is Not Beautiful!

Over one billion dollars have been spent in the US to split up big schools into smaller ones because small schools regularly show up in rankings as top performers.

In this post, I will show you why that money was wasted because of a widespread (but not so well known) statistical artifact, so read on!

Why do small schools perform better? Many are quick to point out that an intimate, family-like atmosphere helps facilitate learning and smaller units can cater more to the special needs of their pupils. But is this really the case?

Let us conduct a little experiment where we distribute randomly performing pupils on one thousand schools, where half of them are big and the other half small. To be clear: we distribute those pupils by chance alone so that every school should have the same shot at getting top, average-, and low performers!

As a proxy for performance let us just take the intelligence quotient (IQ) which is defined to be normally distributed with a mean of 100 and a standard deviation of 15.

Let us now have a look at the fifty best performing schools:

n <- 1000 # no of schools
sm <- 50 # no of pupils small school
bg <- 1000 # no of pupils big school

set.seed(12345)
small <- matrix(rnorm(n/2 * sm, mean = 100, sd = 15), nrow = n/2, ncol = sm)
big <- matrix(rnorm(n/2 * bg, mean = 100, sd = 15), nrow = n/2, ncol = bg)

df_small <- data.frame(size = "small", performance = rowMeans(small))
df_big <- data.frame(size = "big", performance = rowMeans(big))

df <- rbind(df_small, df_big)

df_order <- df[order(df$performance, decreasing = TRUE), ]
df_order |> head(50)
##      size performance
## 296 small    106.3838
## 464 small    105.9089
## 128 small    105.5734
## 146 small    105.2287
## 36  small    104.9394
## 479 small    104.7963
## 406 small    104.7407
## 126 small    104.7316
## 386 small    104.6500
## 15  small    104.6092
## 183 small    104.5761
## 492 small    104.5029
## 106 small    104.5003
## 330 small    104.3905
## 456 small    104.2457
## 84  small    104.2373
## 474 small    104.1203
## 89  small    103.9073
## 315 small    103.7264
## 108 small    103.5928
## 19  small    103.5878
## 268 small    103.5433
## 435 small    103.4099
## 481 small    103.2921
## 70  small    103.2447
## 110 small    103.2089
## 96  small    103.1939
## 497 small    103.1700
## 103 small    103.1609
## 262 small    103.1533
## 33  small    103.1376
## 293 small    103.1008
## 252 small    103.0873
## 240 small    103.0657
## 170 small    103.0553
## 220 small    103.0485
## 185 small    103.0196
## 195 small    103.0042
## 98  small    102.9625
## 294 small    102.9349
## 51  small    102.9339
## 317 small    102.9308
## 403 small    102.9258
## 202 small    102.9255
## 463 small    102.9072
## 321 small    102.8631
## 124 small    102.8468
## 380 small    102.8341
## 273 small    102.8147
## 217 small    102.8005

In fact, all of them are small! How is that possible?

You are in for another surprise when you also look at the fifty worst performers:

df_order |> tail(50)
##      size performance
## 221 small    97.43354
## 271 small    97.43122
## 420 small    97.42636
## 77  small    97.40313
## 141 small    97.38554
## 192 small    97.38214
## 45  small    97.36123
## 331 small    97.35636
## 133 small    97.15977
## 400 small    97.12790
## 350 small    97.09748
## 161 small    96.97677
## 395 small    96.97156
## 17  small    96.95732
## 50  small    96.94013
## 353 small    96.77378
## 376 small    96.63066
## 140 small    96.60068
## 80  small    96.59475
## 115 small    96.56996
## 362 small    96.52821
## 16  small    96.35869
## 344 small    96.29919
## 290 small    96.23309
## 41  small    96.21083
## 246 small    96.13112
## 345 small    96.06559
## 104 small    96.03273
## 425 small    96.02648
## 32  small    96.01366
## 7   small    95.99434
## 364 small    95.99399
## 397 small    95.95282
## 374 small    95.92229
## 18  small    95.80126
## 184 small    95.73809
## 127 small    95.65754
## 270 small    95.60595
## 356 small    95.39492
## 433 small    95.33532
## 475 small    95.30683
## 66  small    95.30076
## 445 small    94.94566
## 61  small    94.88395
## 500 small    94.85970
## 442 small    94.85128
## 6   small    94.56465
## 347 small    94.43273
## 261 small    94.23249
## 171 small    93.82268

Again, all of them are small! The following plot reveals the overall pattern:

(df_order$performance - 100) |> barplot(main = "Sorted relative performance of schools", col = ifelse(df_order$size == "small", "blue", "red"), border = NA)
abline(h = 0)
legend("bottomleft", legend = c("Small schools", "Big schools"), fill = c("blue", "red"), bty = "n")

The statistical effect at work here is what my colleague Professor Howard Wainer from the University of Pennsylvania coined “The Most Dangerous Equation” in an article in the renowned American Scientist.

To understand this let us first look at some statistical measures:

summary(df_small$performance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   93.82   98.59  100.06  100.01  101.42  106.38

summary(df_big$performance)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   98.58   99.71  100.02   99.99  100.32  101.19

We can see that, while both have the same mean performance, the performance variation is bigger for small schools than for big ones where the extreme cases are averaged out. De Moivre’s equation gives the exact relation in connection to the sample size, i.e. the number of pupils in this case:

$\sigma_{\bar x}=\sigma/\sqrt{n}$

where $\sigma_{\bar x}$ is the standard error of the mean, $\sigma$ is the standard deviation of the sample and $n$ is the size of the sample.

Let us calculate those numbers for our little example and compare them to the actual standard deviations of the school performance:

sd(df_small$performance)
## [1] 2.144287

15 / sqrt(sm)
## [1] 2.12132

sd(df_big$performance)        
## [1] 0.4677121

15 / sqrt(bg)
## [1] 0.4743416

We can see that those numbers fit very well.

Basically, de Moivre’s equation tells us that the smaller the sample size (= the number of pupils per school) the bigger the variation (= standard error/standard deviation) which results in the effect that small schools inhabit both extremes of the performance rankings. When you only look at the top performers you will falsely conclude that you have to split up big schools!

In fact, later studies (which took this effect into account) even found that larger schools are better performing after all because they are able to offer a wider range of classes with teachers who can focus on fewer subjects. The “de Moivre”-effect was hiding this and turned it on its head!

Other examples of this widespread effect include:

Disease rates in small and big cities, e.g. for cancer or COVID-19 infections. Small cities simultaneously lead and lag statistics compared to large cities. Sometimes even one single case can make all the difference.
Traffic safety rates: same here, small cities are at each end of the spectrum.
Quality of hospitals: regularly small houses end up being at the top and at the bottom.
Mutual fund performance: there are often large swings in the performance of small funds which lets them show up in rankings on both extremes.
…and the list goes on and on.

Of course, commentators are always fast to find reasons for each extreme, e.g. in the case of cancer rates: either “clean living of the rural lifestyle — no air pollution, no water pollution, access to fresh food without additives”, and so on vs. “poverty of the rural lifestyle — no access to good medical care, a high-fat diet, and too much alcohol and tobacco”, where in fact both results are mainly due to the statistical artifact explained by de Moivre’s equation!

This equation is so dangerous exactly because people don’t know it yet its effects are so far-reaching.

Another example of what can be called “Only looking at the winners-fallacy” can be found here: How to be Successful! The Role of Risk-taking: A Simulation Study. For another interesting fallacy due to sampling see: Collider Bias, or Are Hot Babes Dim and Eggheads Ugly?.

Do you have other examples where the “de Moivre”-effect creeps up? Please share your thoughts in the comments.

UPDATE January 5, 2023
I created a video for this post (in German):

6 thoughts on “The Most Dangerous Equation, or Why Small is Not Beautiful!”

Håvard R. Karlsen says:

September 22, 2021 at 10:28 am

Thanks for illustrating this effect so nicely and effectively!

Your simulation might be a good exercise for students of statistics. I recently came across a paper by Yanai and Lercher (2020) where they hid a gorilla in a data set that they asked students to analyse. If they made a simple scatter plot they would have found it, but many didn’t. The researchers found that students who were not given a specific hypothesis to test would explore the data more, and thus find the gorilla. Similarly, I imagine that people might stop exploring the IQ x school size data when they found that the best performing schools were small schools, and not notice what you show here, that they are also the worst schools.

Yanai, I., & Lercher, M. (2020). A hypothesis is a liability. Genome Biology, 21(1), Article 1. https://doi.org/10.1186/s13059-020-02133-w

1. Learning Machines says:
  
  September 22, 2021 at 10:49 am
  
  Very interesting, thank you Håvard!
  
  I the paper on my reading list. You might be interested in the following post: Causation doesn’t imply Correlation either (with the infamous datasaurus). It is quite an ingenious idea to let students find it, I think I will try this 😉
  
Rich says:

September 23, 2021 at 4:01 am

Wait — is the fallacy here the statistics or that the reviewing agencies are just looking at the TOP performing schools and comparing their size (I don’t know the answer). Because if the reviewing agencies simply looked at averages across small/large school groups, they would seem to balance out, because small schools are at both ends of the spectrum, right? If so, this would not really be a statistical issue but one of interpretation.

CRM says:

September 23, 2021 at 1:51 pm

School classes typically consist of 25 pupils, so in a random class you have an even bigger spread than in the small school example, but if you’d compare classes in big and small schools they should be identical except for comfort factors such as the ones mentioned (“atmosphere helps facilitate learning and smaller units can cater more to the special needs of their pupils”)–I guess what I try to say is that focusing on means over larger and smaller samples is simply comparing apples and oranges.

Pingback: De Moivre’s Equation and Sample Size-Based Variance – Curated SQL
Sebastian says:

September 25, 2021 at 11:49 am

Once again a very interesting analysis through simulation. Thanks for highlighting that. Having this in mind what do or other readers think to do to analyze the effect of class size on school performance?
Perfect would be something like an experiment, putting randomly pupils into classes of different size. Thereby checking for balanced school performance in classes and controlling for “all” other things that should have an effect on school performance (through a suitable test) (how and what to teach, teacher, school, IQ endowments of pupils, effort of parents, etc.) and then checking after a year how school performance evolved for different size classes.