2023 NCAA Tournament Cinderella Model | The Formula for a First-Round Upset

Credit:

Photo by Dylan Buell/Getty Images

During the 2018-19 NCAA basketball season, I spent the month of February developing a mid-major Cinderella Model from scratch.

Utilizing an Exploratory Factor-Analysis (EFA) approach, I analyzed every NCAA tournament team since the 2001-02 season based on every single KenPom metric available. Through statistical treatment, I determined which metrics matter and which ones don't.

Then, I built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season's mid-major squads based on each team's probability of scoring a first-round upset.

Defining a Cinderella Team

Is being a Cinderella about the colossal first-round upset … or the improbably deep tournament run to the Final Four?

Maybe it's both. But those deep Final Four runs aren't exactly predictable — and I want to provide you with something that has meaningful predictive value as you fill out your brackets this season.

So, when I say "Cinderella Teams," I'm focusing on squads that can pull off a first-round upset.

By focusing on obscure mid- and low-major schools with a chance to pull a big upset, I'm also implicitly highlighting high-profile, lower seeds with a real chance of losing on Day 1.

I am not trying to find every single possible upset in the first round. I am not trying to identify every team that could make a Sweet 16 run.

Instead, I'm trying to identify the teams that no one is thinking about that have a strong chance of being upset in the first round, thereby busting everyone else's brackets — except yours (if you take my advice).

Rules & Requirements for "Cinderella" Status

Let's define what constitutes a Cinderella team as specifically and operationally as possible:

1. 10-seed or higher.
2. 16-seeds are excluded (sorry UMBC, but that's not happening again). Since 2001-02, 16-seeds are 1-80 in the NCAA tournament. If I included them in my statistical analysis, their poor metrics would severely skew our statistical sample.
3. The team cannot come from a Power-6 conference (ACC, Big 12, Big East, Big Ten, Pac-12 or SEC).
4. The team cannot be ranked entering the NCAA tournament. This stipulation gets rid of past Gonzaga and Wichita State teams that were criminally underseeded despite their season-long excellence.
5. The team cannot be ranked in the AP top-15 in January, February or March of the given season. This stipulation ensures that the team is largely unknown to the public.

Metrics That Matter

After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):

AdjO: Adjusted Offensive Efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-I defense.

AdjD: Adjusted Defensive Efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-I offense.

AdjEM: The difference between a team's offensive and defensive efficiency.

Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.

Offensive Turnover %: Offensive turnovers per possession.

Defensive Turnover %: Opponent turnovers forced per possession.

3P% Defense: Three-point percentage allowed to opposing teams.

These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team.

Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense and defend well on the perimeter.

2023 NCAA Tournament Cinderella Rankings

To find 2023's Cinderellas, I used my EFA research to built a multivariate model that specializes in forecasting binary (win-loss) outcomes — aka teams that tend to win their first-round games versus those that tend to lose.

Here's what you need to know about the forthcoming results:

The higher the probability coefficient (p), the better chance the model gives a team of pulling an upset.

My model includes a probability coefficient "cutoff value" of 0.25. There have been plenty of teams with lower coefficients that have still won their first-round matchups. For instance, Oral Roberts (p = 0.02) is a glaring example from the 2021-22 season, and Saint Peter's (p = 0.19) was last year's darling.

Nevertheless, the threshold of p = 0.25 distinguishes the majority of first-round upsets while minimizing the number of false-positives. During the last three seasons, "Cinderella" teams with coefficients of 0.25 or higher have significantly outperformed others in the first round:

 "p" value SU ATS 0.25 or higher 6-10 11-5 Lower than 0.25 4-31 15-20

This Year's Potential Cinderella Teams

Below is a ranking of each Cinderella team's chances to win its Round of 64 game from best to worst:

TeampRegion

0.57West

0.50Midwest

0.45South

0.44West

0.37West

0.32Midwest

0.28East

0.28West

0.21South

0.17South

0.12South

0.11East

0.10East

0.07Midwest

0.07South

0.07West

0.06East

0.05Midwest

15 UNC Asheville

0.05West

** 11-seed Nevada must first defeat 11-seed Arizona State in the First Four round in order to advance to the Round of 64.