NCAA Tournament ‘Cinderella’ Model: The Formula for an Upset and 2019’s Matches
- Looking for NCAA Tournament Cinderellas for your bracket or betting card?
- Ryan Collinsworth has built a statistical model based on almost 20 years of data to determine which teams have the best chance to pull first-round upsets.
- There are seven key metrics that all past Cinderellas have in common, and others that mean nothing but are often talked up.
In the past month, I have been meticulously pouring through historical data to try to build a mid-major Cinderella statistical model. I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available.
Through statistical treatment, I determined which metrics matter and which ones don’t. And, you might be surprised to learn that one important factor (cough team experience cough) doesn’t matter at all.
I then built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.
For you TL;DR folks out there who want to get to my Cinderella Rankings as fast as possible, feel free to skim to the bottom of this article. Or use the highlighted text throughout as a summary.
Defining a Cinderella Team
Is being a Cinderella about the colossal upset, or the deep tournament run?
Maybe it’s both. But those deep Final Four runs aren’t exactly predictable — and I want to provide you with something that has meaningful predictive value as you fill out your brackets this season. So, when I say “Cinderella Teams,” I’m focusing on squads that can pull a first-round upset this year.
By focusing on obscure mid- and low-major schools with a chance to pull a big upset, I’m also thereby highlighting high-profile, higher seeds with a real chance of losing on Day 1. These are the kinds of teams you want to avoid taking deep into the tournament, lest your bracket be busted in the first weekend of play.
I am not trying to find every single possible upset in the first round. I am not trying to identify every team that could make a Sweet 16 run. Instead, I’m trying to identify the teams that no one is thinking about that have a strong chance of being upset in the first round — thereby busting everyone else’s brackets.
Rules & Requirements for Cinderella Status
Let’s define what constitutes a Cinderella team as specifically and operationally as possible:
- 10-seed or higher.
- 16-seeds are excluded (sorry UMBC, but that’s not happening again). Since 2001-02, 16-seeds are 1-68 in the NCAA tournament. If I included them in my statistical analysis, their poor metrics would throw off our sample.
- Cannot come from a Power-6 conference (ACC, Big 12, Big East, Big Ten, Pac-12 or SEC).
- The team cannot be ranked entering the NCAA Tournament. This stipulation gets rid of past Gonzaga and Wichita State teams that were criminally under-seeded despite their season-long excellence.
- The team cannot be ranked in the AP top-15 in January, February or March of the given season. This stipulation ensures that the team is largely unknown to the public.
After filtering all tournament teams since 2001-02, there are 314 schools that fit the parameters listed above. Of those 314 teams, 67 of them won their first-round game. That equates to a win percentage of 21.3%. Let’s break that down by seed:
A Short Lesson: Don’t Be Like Me
After filtering all these teams until I was satisfied that I was capturing the right kind of team, I then recorded every team’s pre-tournament KenPom metrics and ranks. I manually recorded all 43 of KenPom’s metrics — and team rankings for each of those metrics — for all 314 teams in our sample. That’s 27,004 data points. By hand. And yes, Excel crashed multiple times on me.
But I did it. I pressed on for you folks, because I care. Maybe I’ve watched too many Jon Bois videos on YouTube. Maybe I really need to get a dog. Either way, you’re welcome.
Why Did I Do This to Myself?
So, why did I individually log 27,004 data points? What was the purpose of that suffering?
By way of answering, let’s return to base here for a second and remember our goal. We’re trying to identify the statistical profile of low- and mid-major teams that win their first-round games. So, we need to weed out all the noise that doesn’t differentiate these schools and instead focus on only the core metrics that matter the most in discriminating winners from losers.
I’ll spare you the details on how I did that — it involves something called an ANOVA test and other complicated methods that I won’t bore you with. Let’s speed along to the results.
Metrics That Matter
After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):