NCAA Tournament Cinderella Model: The Formula for a First-Round Upset & This Year’s Matches
Photo by John Byrum/Icon Sportswire via Getty Images. Pictured: Malachi Smith (Chattanooga)
During the 2018-19 NCAA basketball season, I spent the month of February developing a mid-major Cinderella Model from scratch.
Utilizing an Exploratory Factor-Analysis (EFA) approach, I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available. Through statistical treatment, I determined which metrics matter and which ones don’t.
Then, I built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.
Metrics That Matter
After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):
- AdjO: Adjusted offensive efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-I defense.
- AdjD: Adjusted defensive efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-I offense.
- AdjEM: The difference between a team’s offensive and defensive efficiency.
- Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.
- Offensive Turnover %: Offensive turnovers per possession.
- Defensive Turnover %: Opponent turnovers forced per possession.
- 3P% Defense: Three-point percentage allowed to opposing teams.
These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team.
Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense and defend well on the perimeter.
This Year’s Potential Cinderella Teams
To find 2022’s Cinderellas, I used my EFA research to built a multivariate model that specializes in forecasting binary (win-loss) outcomes — aka teams that tend to win their first-round games versus those that tend to lose.
Here’s what you need to know about the results in the table below:
- The higher the probability coefficient (labeled in the chart below as “p-Coefficient”), the better chance the model gives a team of pulling an upset.
- That data point informs the historical W-L column, which shows the tourney results for past teams with equal or better probability coefficients.
- The win % column is simply the percentage attached to the historical W-L.
Below is a ranking of each Cinderella team’s chances to win from best to worst — and the higher seeds most at risk of getting upset.