2022 NCAA Tournament Cinderella Model: The Formula for a First-Round Upset & This Year’s Matches
Photo by Michael Wade/Icon Sportswire via Getty Images. Pictured: Tony Toney (UAB)
During the 2018-19 NCAA basketball season, I spent the month of February developing a mid-major Cinderella Model from scratch.
Utilizing an Exploratory Factor-Analysis (EFA) approach, I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available. Through statistical treatment, I determined which metrics matter and which ones don’t.
Then, I built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.
Defining a Cinderella Team
Is being a Cinderella about the colossal first-round upset … or the improbably deep tournament run to the Final Four?
Maybe it’s both. But those deep Final Four runs aren’t exactly predictable — and I want to provide you with something that has meaningful predictive value as you fill out your brackets this season. So, when I say “Cinderella Teams,” I’m focusing on squads that can pull a first-round upset this year.
By focusing on obscure mid- and low-major schools with a chance to pull a big upset, I’m also implicitly highlighting high-profile, lower seeds with a real chance of losing on Day 1.
These are the kinds of teams you want to avoid taking deep into the tournament, lest your bracket be busted in the first weekend of play.
I am not trying to find every single possible upset in the first round. I am not trying to identify every team that could make a Sweet 16 run.
Instead, I’m trying to identify the teams that no one is thinking about that have a strong chance of being upset in the first round — thereby busting everyone else’s brackets … except yours (if you take my advice).
Rules & Requirements for “Cinderella” Status
Let’s define what constitutes a Cinderella team as specifically and operationally as possible:
- 10-seed or higher.
- 16-seeds are excluded (sorry UMBC, but that’s not happening again). Since 2001-02, 16-seeds are 1-76 in the NCAA Tournament. If I included them in my statistical analysis, their poor metrics would throw off our sample.
- The team cannot come from a Power-6 conference (ACC, Big 12, Big East, Big Ten, Pac-12 or SEC).
- The team cannot be ranked entering the NCAA Tournament. This stipulation gets rid of past Gonzaga and Wichita State teams that were criminally underseeded despite their season-long excellence.
- The team cannot be ranked in the AP top-15 in January, February or March of the given season. This stipulation ensures that the team is largely unknown to the public.
Metrics That Matter
After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):
AdjO: Adjusted offensive efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-I defense.
AdjD: Adjusted defensive efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-I offense.
AdjEM: The difference between a team’s offensive and defensive efficiency.
Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.
Offensive Turnover %: Offensive turnovers per possession.
Defensive Turnover %: Opponent turnovers forced per possession.
3P% Defense: Three-point percentage allowed to opposing teams.
These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team.
Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense and defend well on the perimeter.
This Year’s Potential Cinderella Teams
To find 2022’s Cinderellas, I used my EFA research to built a multivariate model that specializes in forecasting binary (win-loss) outcomes — aka teams that tend to win their first-round games versus those that tend to lose.
Here’s what you need to know about the results in the table below:
- The higher the probability coefficient (labeled in the chart below as “p-Coefficient”), the better chance the model gives a team of pulling an upset.
- That data point informs the historical W-L column, which shows the tourney results for past teams with equal or better probability coefficients.
- The win % column is simply the percentage attached to the historical W-L.
Below is a ranking of each Cinderella team’s chances to win from best to worst — and the higher seeds most at risk of getting upset.
2022 NCAA Tournament Cinderella Rankings
My model includes a p-Coefficient “cutoff value” of 0.3. There have been plenty of teams with lower p-Coefficients that have still won their first-round matchups — Oral Roberts (p-Coefficient = 0.024) is an obvious example from last season.
However, the threshold of 0.3 distinguishes the majority of first-round upsets while minimizing the number of false-positives. During the last two seasons, teams with p-Coefficients of 0.3 or above have significantly outperformed others in the first round:
|0.3 or higher||4-5||7-2|
|Lower than 0.3||3-21||11-13|