the filing drawer
Posts
Are Refs Biased? (Or, The Conscious Decision to Not Keep Searching)

Are Refs Biased? (Or, The Conscious Decision to Not Keep Searching)

A Data-Driven Analysis. Mostly.

Zev Burton
May 22, 2024

Howdy y’all —

Watching the second game of the Pacers-Knicks series felt painful if you were a Pacers fan -- no matter what we did, there was always -- always -- a foul. During the third game, where it seemed like the calls were more equal in their terribleness, someone in my vicinity remarked, "the refs have to be biased for the home team every time -- unconsciously -- right?"

Just a passing comment, one that rarely sparked any discussion, but it stuck with me through Game Three (great game)...

Through game four (another great game)...

(The author has chosen to forget Game Five existed.)

Through game six (another incredible game)...

And through game seven (the greatest game of all time)

Alas, the data nerd in me won out, thanks to the wonderful R package nbastatR. In just a few lines of code, I had the box scores of every game from the 2023-2024 NBA season. With a little bit more wrangling, I had the aggregate number of fouls for every game, for every team.

Immediately, an issue arose: overtime. With a healthy dash of Occam's Razor, a quick adjustment was made to change the metric from 'Fouls per Game' to 'Fouls per 48 Minutes.' With that in mind, I present my world changing, earth-shattering, sports-revolutionizing--

Oh.

There isn't that big of a difference. Even accounting for the lack of normality in the data, it's still... nothing.

But of course, I'm missing some confounding variables, right? After all, the reason for a 'home court advantage' is because of the crowd! The crowd cheering on the home team, yelling at the away team (especially during free throws!), screaming at the refs -- surely there must be something there!

After scraping Basketball Reference, I had the recorded attendance for every game of the season, and as such could begin to do some analysis. My expectation/hypothesis is that as attendance increases, then the number of fouls against the home team will decrease and the number of fouls against the away team will increase. After all, more people = louder = more unconscious bias?

That's, uh, not much. There's a minimal relationship there, and really nowhere near enough to make a conclusion about the impact (especially given the low R² value).

Maybe I should be focused on the away team. After all, its the crowd usually yelling at the ref to call a foul, so we would expect more fouls the larger the audience! This is clearly, truly, absolutely going to--

Oh. Well, no worries at all! There are plenty of other ways to look at this — what if we look at the difference between home and away fouls? Surely this will have to be some sort of—

Okay there's something! As attendance increases, we can confidently say that we have definitive proof that there is a relationship! I mean, as long as we avoid those pesky numbers in the top left that tell us there's nothing more than a extremely weak positive correlation (if we can even say that).

At this point, I have at least a dozen more ideas that we could go down:

Breaking it down by team
What about the arena they play in
The individual refs
The whole officiating crew
The game situation (e.g. end-of-game intentional fouls)
Technical Fouls
Rule changes
Pace of play
Individual players
Player reputation
Types of fouls (shooting, traveling, etc.)
Difference between the NBA and international leagues (or the Olympics!)

It's at this point when I needed to take a step back and ask myself a key question: am I searching for a positive answer? At what point am I searching for an answer that simply does not exist, relying upon subtle sways in ρ just to end up trying to make something out of nothing.

Turns out there's a term for this: publication bias. In medical literature, statistically significant results are three times more likely to be published than null results. That is shocking -- but keep in mind that that statistic only looks at published results; it doesn't include the results that ended the moment researchers realized they didn't support their initial hypothesis.

Robert Rosenthal coined this phenomenon the "file drawer problem," because these results don't make it out of the back of the file drawer they've been stuffed into (or I guess today, just the file hidden away on the computer, waiting to be overwritten by positive results). Rosenthal writes:

For any given research area, one cannot tell how many studies have been conducted but never reported. The extreme view of the "file drawer problem" is that journals are filled with the 5% of the studies that show Type I errors, while the file drawers are filled with the 95% of the studies that show nonsignificant results.

Robert Rosenthal, The "File Drawer Problem" and Tolerance for Null Results, 1979

I'm noticing that trend as well in data science portfolios and publications -- everyone wants to have a positive, earth-shattering result. I sure wanted to -- hey, if this analysis got me a job working for the Pacers, I'd be thrilled! But I'll sadly just have to deal with the fact that my Pacers just foul...

...a lot. Yikes.

You may have noticed that the name of this publication has changed to "The File Drawer." I chose this name because I love the metaphor it represents: exploring the topics and results that often get pushed to the back of a file cabinet. These aren't just null results, but I'll also share null results when I find them. Of course, I hope to uncover some significant findings along the way.

This may not become a full-fledged data science portfolio, but I have many questions about the world, particularly in the realm of technology. I plan to use data to explore these questions. I hope you enjoy the journey.

Cheers,

Zev

Note: Here is the code for this article.