# NCAA Tournament Cinderella Model: The Formula for a First-Round Upset & This Year’s Matches

Credit:

Photo by John Byrum/Icon Sportswire via Getty Images. Pictured: Malachi Smith (Chattanooga)

During the 2018-19 NCAA basketball season, I spent the month of February developing a mid-major Cinderella Model from scratch.

Utilizing an Exploratory Factor-Analysis (EFA) approach, I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available. Through statistical treatment, I determined which metrics matter and which ones don’t.

Then, I built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.

## Metrics That Matter

After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):

• AdjO: Adjusted offensive efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-I defense.
• AdjD: Adjusted defensive efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-I offense.
• AdjEM: The difference between a team’s offensive and defensive efficiency.
• Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.
• Offensive Turnover %: Offensive turnovers per possession.
• Defensive Turnover %: Opponent turnovers forced per possession.
• 3P% Defense: Three-point percentage allowed to opposing teams.

These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team.

Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense and defend well on the perimeter.

## This Year’s Potential Cinderella Teams

To find 2022’s Cinderellas, I used my EFA research to built a multivariate model that specializes in forecasting binary (win-loss) outcomes — aka teams that tend to win their first-round games versus those that tend to lose.

Here’s what you need to know about the results in the table below:

• The higher the probability coefficient (labeled in the chart below as “p-Coefficient”), the better chance the model gives a team of pulling an upset.
• That data point informs the historical W-L column, which shows the tourney results for past teams with equal or better probability coefficients.
• The win % column is simply the percentage attached to the historical W-L.

Below is a ranking of each Cinderella team’s chances to win from best to worst — and the higher seeds most at risk of getting upset.