Abstract
Bill James’ discovery of a Pythagorean formula for win expectation in baseball has been a useful resource to analysts and coaches for over 30 years. Extensions of the Pythagorean model have been developed for all of the major professional team sports but none of the individual sports. The present paper attempts to address this gap by deriving a Pythagorean model for win production in tennis. Using performance data for the top 100 male singles players between 2004 and 2014, this study shows that, among the most commonly reported performance statistics, a model of break points won provides the closest approximation to the Pythagorean formula, explaining 85% of variation in season wins and having the lowest cross-validation prediction error among the models considered. The mid-season projections of the break point model had performance that was comparable to an expanded model that included eight other serve and return statistics as well as player ranking. A simple match prediction algorithm based on a break point model with the previous 9 months of match history had a prediction accuracy of 67% when applied to 2015 match outcomes, whether using the least-squares or Pythagorean power coefficient. By demonstrating the striking similarity between the Pythagorean formula for baseball wins and the break point model for match wins in tennis, this paper has identified a potentially simple yet powerful analytic tool with a wide range of potential uses for player performance evaluation and match forecasting.
Acknowledgments
I am grateful to the staff at the ATP and flashscore.com for making a vast amount of tennis data available to the public and the research presented in this paper possible.
References
Baumer, B. and A. Zimbalist. 2014. The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball. Philadelphia, Pennsylvania: University of Pennsylvania Press.10.9783/9780812209129Search in Google Scholar
Braunstein, A. 2010. “Consistency and Pythagoras.” Journal of Quantitative Analysis in Sports 6(1):1–16.10.2202/1559-0410.1215Search in Google Scholar
Caro, C. A. and R. Machtmes. 2013. “Testing the Utility of the Pythagorean Expectation Formula on Division One College Football: An Examination and Comparison to the Morey Model.” Journal of Business & Economics Research 11(12):537–542.10.19030/jber.v11i12.8261Search in Google Scholar
Cha, D. U., D. P. Glatt, and P. M. Sommers. 2007. “An Empirical Test of Bill James’s Pythagorean Formula.” Journal of Recreational Mathematics 35(2):117–130.Search in Google Scholar
Cochran, J. J. and R. Blackstock. 2009. “Pythagoras and the National Hockey League.” Journal of Quantitative Analysis in Sports 5(2):1–13.10.2202/1559-0410.1181Search in Google Scholar
Davenport, C. and K. Woolner. 1999. “Revisiting the Pythagorean Theorem: Putting Bill James’ Pythagorean Theorem to the Test.” The Baseball Prospectus.Search in Google Scholar
Faraway, J. J. 2005. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Boca Raton, Florida: CRC Press.Search in Google Scholar
Gilsdorf, K. F. and V. A. Sukhatme. 2008. “Testing Rosen’s Sequential Elimination Tournament Model Incentives and Player Performance in Professional Tennis.” Journal of Sports Economics 9(3):287–303.10.1177/1527002507306790Search in Google Scholar
Hamilton, H. H. 2011. “An Extension of the Pythagorean Expectation for Association Football.” Journal of Quantitative Analysis in Sports 7(2):1–18.10.2202/1559-0410.1335Search in Google Scholar
Hammond, C., W. P. Johnson, and S. J. Miller. 2015. “The James Function.” Mathematics Magazine 88(1):54–71.10.4169/math.mag.88.1.54Search in Google Scholar
James, B. 1981. Baseball Abstract. Self-Published, Lawrence, KS.Search in Google Scholar
Klaassen, F. J. and J. R. Magnus. 2001. “Are Points in Tennis Independent and Identically Distributed? Evidence from a Dynamic Binary Panel Data Model.” Journal of the American Statistical Association 96(454):500–509.10.1198/016214501753168217Search in Google Scholar
Knottenbelt, W. J., D. Spanias, and A. M. Madurska. 2012. “A Common-opponent Stochastic Model for Predicting the Outcome of Professional Tennis Matches.” Computers & Mathematics with Applications 64(12):3820–3827.10.1016/j.camwa.2012.03.005Search in Google Scholar
McHale, I. and A. Morton. 2011. “A Bradley-Terry Type Model for Forecasting Tennis Match Results.” International Journal of Forecasting 27(2):619–630.10.1016/j.ijforecast.2010.04.004Search in Google Scholar
Miller, S. J. 2007. “A Derivation of the Pythagorean Won-loss Formula in Baseball.” Chance 20(1):40–48.10.1080/09332480.2007.10722831Search in Google Scholar
Miller, S. J., T. Corcoran, J. Gossels, V. Luo, and J. Porflio. 2014. “Pythagoras at the Bat.” in Social Networks and the Economics of Sports, 89–113. Springer.10.1007/978-3-319-08440-4_6Search in Google Scholar
Morris, C. 1977. “The Most Important Points in Tennis.” Optimal Strategies in Sports 5:131–140.Search in Google Scholar
R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.Search in Google Scholar
Rosenfeld, J. W., J. I. Fisher, D. Adler, and C. Morris. 2010. “Predicting Overtime with the Pythagorean Formula.” Journal of Quantitative Analysis in Sports 6(2):1–19.10.2202/1559-0410.1244Search in Google Scholar
Stefani, R. T. 1997. “Survey of the Major World Sports Rating Systems.” Journal of Applied Statistics 24(6):635–646.10.1080/02664769723387Search in Google Scholar
Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
Vollmayr-Lee, B. 2002. More than You Probably ever Wanted to Know about the “Pythagorean” Method. http://www.eg.bucknell.edu/bvoll-may/baseball/pythagoras.html.Search in Google Scholar
Winston, W. L. (2012). Mathletics: How Gamblers, Managers, and Sports Enthusiasts use Mathematics in Baseball, Basketball, and Football. Princeton, New Jersey: Princeton University Press.10.1515/9781400842070Search in Google Scholar
©2016 by De Gruyter