How We Built Our Win Probability Model for WNBA Totals in The Action Network App

Credit:

Stephen Gosling/NBAE via Getty Images. Pictured: Kia Vaughn #1 and Alanna Smith #11 of the Phoenix Mercury

Aug 25, 2020, 11:56 AM EDT

By William Doyle and Carlon Brown

The last two decades have seen a major shift in how people approach data. This began to seriously pervade the sports world as early as 2002, with the success of that year’s Oakland Athletics.

The next data invasion has already begun in our sports betting world, with live win probability in The Action Network app, PRO Systems, and more. It will continue to grow as sports betting becomes legal in more states.

The demand for sports betting related content, products and analysis continues to grow. This includes offerings for new leagues, such as the WNBA. The Action Network has seen an almost three-fold increase in the number of users who track WNBA picks in the app year over year.

The app already has win probability visualizations for point spread, moneyline and total bets for most other sports.

And due to the increased demand for WNBA betting content and analysis over the past year, we decided to make win probability models for it. It’s not in the app yet, but we’re hoping to get it live soon.

The current model is a Random Forest regressor which predicts the win total of a given WNBA game. This model when evaluated on the test set of data achieved an R^2 = 0.94.

Here’s how we got there.

Methods

The Action Network has historical play-by-play data for multiple professional and amateur sports leagues, including the WNBA. In total, 694 complete WNBA games were gathered with an average of 387 plays per game.

Statistics not already included in play-by-play data were created, for example, a running points per minute (all statistics, including calculated statistics, are located in Table I). All play-by-play data and associated statistics were preprocessed to properly handle missing/null values for model training and testing.

In addition, each game’s closing total line from a single sportsbook was gathered. Each statistic except for the end of game total in Table I was a feature input in the model. The target value was the end of game total.

The model’s input was a 1-dimensional array of cardinality 14, and its output was a single value.

Figure I. Schematic Representation

Multiple models were selected to be trained by the training set of data. The top three models with no hyperparameter tuning were chosen and a random search was performed on each.

The Random Forest Regressor performed best. An exhaustive grid search was performed with the Random Forest model, and the best hyperparameters were selected.

The model performed well with an R^2 = 0.94, however, the model predicted the total of the game and did not yield a probability of a bet winning. A probability of 0.5 was assumed for the first few plays of the game (i.e. opentip, jumpball) before adjusting to the model prediction.

To convert from a predicted win total to a probability, we needed a line from a sportsbook. All closing lines were gathered from one book for this analysis, however this conversion can be achieved with any book line for any game.

The difference of the predicted win total l_t and the book line b divided by the clock remaining in the game r_t provided a simple and intuitive likelihood of a win total bet cashing. Along with some scaling of the denominator, the resulting value was then forced to be between 0 and 1 (sigmoid function) in order to create an estimated probability interpretation.

Results and Discussion

As stated above, the final random forest model achieved an R^2 = 0.94 Feature importances were investigated (Table I below) to identify which parameters were most useful in predicting the win total. Note, it was not surprising that the points per minute statistic was the single most important predictor in predicting the win total for the game.

In addition it makes sense that two-point percentage and clock remaining were also important predictors. Surprisingly, the points per minute gradient had very little impact as a feature importance. This is surprising because the underlying data which the statistic was based off of itself was a powerful predictor and yet the associated statistic was almost negligible.

Also, we originally thought that close games would be useful in identifying more competitive games where a win total may be higher than usual, but this was not the case.

Table I. All Statistics & Feature Importances

Input Importance
pts_per_min 0.294
twopt_perc 0.101
clock_remain 0.083
away_points 0.076
threept_perc 0.073
num_ties 0.067
home_points 0.063
num_turnovers 0.055
num_totalfouls 0.053
freethrow_perc 0.052
num_rebounds 0.044
leading_by 0.03
gradient_pts 0.008
gradient_total 0.002

There is still room for improvement with the model, notably expanding the feature input space. For example, instead of simply counting the number of fouls, turnovers, and other statistics, one can break it out by team. In doing so the model may be able to identify when teams are close to entering the bonus/double bonus for fouls and other possible relationships with other statistics when split up.

In addition, more WNBA play-by-play data may allow for more flexible models to be more accurate than the random forest.

Figure II. Probability vs. Time Graphs (6 WNBA Games)

Conclusion

One can model WNBA totals to predict the total for any given WNBA game with play-by-play data. The model predicted for many games (Figure 2) very rapid convergence to a probability of 5% or 95%. For these games the model was beating the live lines offered by sports books by a significant margin.

Additional analysis on how often this model’s rapid convergence creates profitable betting opportunities for bettors is still needed. In addition, a similar model can be constructed for point spreads and moneylines. It may even be possible to create the point spread model and obtain the moneyline for free.

Future models could determine games with potentially elevated variance between the actual total and the line total. If such a model could find such games at rates slightly higher than chance, it would further improve the likelihood of the proposed model’s profitability, due to its rapid convergence.

Such a model could be created from previous team information and individual player stats, as well as recent performances of players/teams. The next data invasion is here and the world of sports betting will continue to grow as sports betting becomes legal across the country.

Acknowledgements

Data & Analytics: Kyle Western

Engineering: Akshay Patel, Daniel Hood, Justyn Laufenberg, & Sam Huffman

Executive: Caroline Smith, Melissa Betts, TAN, & TCG

How would you rate this article?
Follow Action Network Staff on Twitter
@ActionNetworkHQ

Top Offers

Odds Boosts
See More >
Odds boosts are simply regular bets offered at enticing odds. They're available to everyone and there's no limit to how many you can bet.
Sportsbook Reviews
See More >
Discover the best online sports betting sites and take advantage of bonus offers from legal sportsbooks.
Expert Picks
See More >
See what plays the Action Network experts are making for all of today's games.
PRO Membership
See More >
Access betting systems and signals to get daily, actionable picks.
Download the App
See More >
Download the Action app to track all your bets in one place. Exclusive data helped you make smarter betting decisions.
Newsletter
See More >
The best sports betting newsletter with trends, insights and news - condensed in a two-minute read.
Live Odds
See More >
See live odds and the best lines for every game.

Follow Us On Social

Top Stories