Accessible Unlicensed Requires Authentication Published by De Gruyter June 15, 2019

A point-based Bayesian hierarchical model to predict the outcome of tennis matches

Martin Ingram

Abstract

A well-established assumption in tennis is that point outcomes on each player’s serve in a match are independent and identically distributed (iid). With this assumption, it is enough to specify the serve probabilities for both players to derive a wide variety of event distributions, such as the expected winner and number of sets, and number of games. However, models using this assumption, which we will refer to as “point-based”, have typically performed worse than other models in the literature at predicting the match winner. This paper presents a point-based Bayesian hierarchical model for predicting the outcome of tennis matches. The model predicts the probability of winning a point on serve given surface, tournament and match date. Each player is given a serve and return skill which is assumed to follow a Gaussian random walk over time. In addition, each player’s skill varies by surface, and tournaments are given tournament-specific intercepts. When evaluated on the ATP’s 2014 season, the model outperforms other point-based models, predicting match outcomes with greater accuracy (68.8% vs. 66.3%) and lower log loss (0.592 vs. 0.641). The results are competitive with approaches modelling the match outcome directly, demonstrating the forecasting potential of the point-based modelling approach.

References

Barnett, T. J. 2006. Mathematical Modelling in Hierarchical Games with Specific Reference to Tennis. Ph.D. thesis.Search in Google Scholar

Barnett, T. and S. R. Clarke 2005. “Combining Player Statistics to Predict Outcomes of Tennis Matches.” IMA Journal of Management Mathematics 16:113–120.Search in Google Scholar

Carpenter, B., A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li, and A. Riddell 2016. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 20:1–37.Search in Google Scholar

Elo, A. E. 1978. The Rating of Chessplayers, Past and Present. Arco Pub, p.34.Search in Google Scholar

Gelman, A. and D. B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7:457–472.Search in Google Scholar

Gelman, A., H. S. Stern, J. B. Carlin, D. B. Dunson, A. Vehtari, and D. B. Rubin 2013. Bayesian Data Analysis (3rd edition). Chapman and Hall/CRC, pp. 42–43.Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48:377–394.Search in Google Scholar

Glickman, M. E. 2001. “Dynamic Paired Comparison Models with Stochastic Variances.” Journal of Applied Statistics 28:673–689.Search in Google Scholar

Klaassen, F. J. and J. R. Magnus 2001. “Are Points in Tennis Independent and Identically Distributed? Evidence from a Dynamic Binary Panel Data Model.” Journal of the American Statistical Association 96:500–509.Search in Google Scholar

Klaassen, F. J. and J. R. Magnus 2003. “Forecasting the Winner of a Tennis Match.” European Journal of Operational Research 148:257–267.Search in Google Scholar

Kovalchik, S. A. 2016. “Searching for the Goat of Tennis Win Prediction.” Journal of Quantitative Analysis in Sports 12:127–138.Search in Google Scholar

Kovalchik, S. and M. Ingram 2016. “Hot Heads, Cool Heads, and Tacticians: Measuring the Mental Game in Tennis (id: 1464).” MIT Sloan Sports Analytics Conference, March 11-12, Boston, USA, .Search in Google Scholar

Kovalchik, S. and M. Reid 2018. “A Calibration Method with Dynamic Updates for Within-Match Forecasting of Wins in Tennis.” International Journal of Forecasting 35:756–766.Search in Google Scholar

Minka, T. P. 2001. “Expectation Propagation for Approximate Bayesian Inference.” in Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 362–369.Search in Google Scholar

Morris, B. and C. Bialik 2015. “Serena Williams and the Difference between All-Time Great and Greatest of All Time.” .Search in Google Scholar

Newton, P. K. and J. B. Keller 2005. “Probability of Winning at Tennis i. Theory and Data.” Studies in applied Mathematics 114:241–269.Search in Google Scholar

Newton, P. K. and K. Aslam 2006. “Monte Carlo Tennis.” SIAM Review 48:722–742.Search in Google Scholar

O’Malley, A. J. 2008. “Probability Formulas and Statistical Analysis in Tennis.” Journal of Quantitative Analysis in Sports 4:15.Search in Google Scholar

Pollard, G., R. Cross, and D. Meyer 2006. “An Analysis of Ten Years of the Four Grand Slam Men’s Singles Data for Lack of Independence of Set Outcomes.” Journal of Sports Science & Medicine 5:561.Search in Google Scholar

Riddle, L. H. 1988. “Probability Models for Tennis Scoring Systems.” Applied Statistics 37: 63–75.Search in Google Scholar

Published Online: 2019-06-15
Published in Print: 2019-10-25

©2019 Walter de Gruyter GmbH, Berlin/Boston