NCAA Tournament ‘Cinderella’ Model: The Formula for an Upset and 2019’s Matches
- Looking for NCAA Tournament Cinderellas for your bracket or betting card?
- Ryan Collinsworth has built a statistical model based on almost 20 years of data to determine which teams have the best chance to pull first-round upsets.
- There are seven key metrics that all past Cinderellas have in common, and others that mean nothing but are often talked up.
In the past month, I have been meticulously pouring through historical data to try to build a mid-major Cinderella statistical model. I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available.
Through statistical treatment, I determined which metrics matter and which ones don’t. And, you might be surprised to learn that one important factor (cough team experience cough) doesn’t matter at all.
I then built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.
For you TL;DR folks out there who want to get to my Cinderella Rankings as fast as possible, feel free to skim to the bottom of this article. Or use the highlighted text throughout as a summary.
Defining a Cinderella Team
Is being a Cinderella about the colossal upset, or the deep tournament run?
Maybe it’s both. But those deep Final Four runs aren’t exactly predictable — and I want to provide you with something that has meaningful predictive value as you fill out your brackets this season. So, when I say “Cinderella Teams,” I’m focusing on squads that can pull a first-round upset this year.
By focusing on obscure mid- and low-major schools with a chance to pull a big upset, I’m also thereby highlighting high-profile, higher seeds with a real chance of losing on Day 1. These are the kinds of teams you want to avoid taking deep into the tournament, lest your bracket be busted in the first weekend of play.
I am not trying to find every single possible upset in the first round. I am not trying to identify every team that could make a Sweet 16 run. Instead, I’m trying to identify the teams that no one is thinking about that have a strong chance of being upset in the first round — thereby busting everyone else’s brackets.
Rules & Requirements for Cinderella Status
Let’s define what constitutes a Cinderella team as specifically and operationally as possible:
- 10-seed or higher.
- 16-seeds are excluded (sorry UMBC, but that’s not happening again). Since 2001-02, 16-seeds are 1-68 in the NCAA tournament. If I included them in my statistical analysis, their poor metrics would throw off our sample.
- Cannot come from a Power-6 conference (ACC, Big 12, Big East, Big Ten, Pac-12 or SEC).
- The team cannot be ranked entering the NCAA Tournament. This stipulation gets rid of past Gonzaga and Wichita State teams that were criminally under-seeded despite their season-long excellence.
- The team cannot be ranked in the AP top-15 in January, February or March of the given season. This stipulation ensures that the team is largely unknown to the public.
After filtering all tournament teams since 2001-02, there are 314 schools that fit the parameters listed above. Of those 314 teams, 67 of them won their first-round game. That equates to a win percentage of 21.3%. Let’s break that down by seed:
A Short Lesson: Don’t Be Like Me
After filtering all these teams until I was satisfied that I was capturing the right kind of team, I then recorded every team’s pre-tournament KenPom metrics and ranks. I manually recorded all 43 of KenPom’s metrics — and team rankings for each of those metrics — for all 314 teams in our sample. That’s 27,004 data points. By hand. And yes, Excel crashed multiple times on me.
But I did it. I pressed on for you folks, because I care. Maybe I’ve watched too many Jon Bois videos on YouTube. Maybe I really need to get a dog. Either way, you’re welcome.
Why Did I Do This to Myself?
So, why did I individually log 27,004 data points? What was the purpose of that suffering?
By way of answering, let’s return to base here for a second and remember our goal. We’re trying to identify the statistical profile of low- and mid-major teams that win their first-round games. So, we need to weed out all the noise that doesn’t differentiate these schools and instead focus on only the core metrics that matter the most in discriminating winners from losers.
I’ll spare you the details on how I did that — it involves something called an ANOVA test and other complicated methods that I won’t bore you with. Let’s speed along to the results.
Metrics That Matter
After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):
AdjO: Adjusted offensive efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-1 defense.
AdjD: Adjusted defensive efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-1 offense.
AdjEM: The difference between a team’s offensive and defensive efficiency.
Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.
Offensive Turnover %: Offensive turnovers per possession.
Defensive Turnover %: Opponent turnovers forced per possession.
3P% Defense: Three-point percentage allowed to opposing teams.
These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team. Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense, and defend well on the perimeter.
One Metric That Doesn’t Matter
There are literally 79 statistics that don’t matter (based on our results), but I’m not going to report all 79 of them here.
I will, however, highlight one particular metric in order to dispel a false narrative about Cinderella teams: Team Experience does not matter.
Sports media loves to talk up experienced, senior-laden teams that pull first-round upsets, but the data suggests that team experience has zero statistical effect on a team’s chance to pull off an improbable win. The experience narrative is just that. A narrative. It pulls at the heartstrings, but it isn’t grounded in historical fact. It should be dismissed as a predictive tool.
This Year’s Potential Cinderella Teams
To find 2019’s Cinderellas, I built a model using scary math things like multivariate logistic regressions and coefficients … I’ve whittled that down to 18 teams that fit our requirements for a mid-major Cinderella below.
Here’s what you need to know about the results in the table below:
- The higher the probability coefficient, the better chance the model gives a team of pulling an upset.
- That data point informs the historical W-L column, which shows the tourney results for past teams with equal or better probability coefficients.
- The win % column is simply the percentage attached to the historical W-L.
- The implied odds are a direct reflection of the win %. This hypothetical moneyline is not an attempt to handicap each team. Instead, use it as a tool to identify value in the betting market.
Below is a ranking of each Cinderella team’s chances to win from best to worst, their statistical profile, and the higher seeds most at risk of getting upset: