Bill James’ discovery of a Pythagorean formula for win expectation in baseball has been a useful resource to analysts and coaches for over 30 years. Extensions of the Pythagorean model have been developed for all of the major professional team sports but none of the individual sports. The present paper attempts to address this gap by deriving a Pythagorean model for win production in tennis. Using performance data for the top 100 male singles players between 2004 and 2014, this study shows that, among the most commonly reported performance statistics, a model of break points won provides the closest approximation to the Pythagorean formula, explaining 85% of variation in season wins and having the lowest cross-validation prediction error among the models considered. The mid-season projections of the break point model had performance that was comparable to an expanded model that included eight other serve and return statistics as well as player ranking. A simple match prediction algorithm based on a break point model with the previous 9 months of match history had a prediction accuracy of 67% when applied to 2015 match outcomes, whether using the least-squares or Pythagorean power coefficient. By demonstrating the striking similarity between the Pythagorean formula for baseball wins and the break point model for match wins in tennis, this paper has identified a potentially simple yet powerful analytic tool with a wide range of potential uses for player performance evaluation and match forecasting.
Baumer, B. and A. Zimbalist. 2014. The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball. Philadelphia, Pennsylvania: University of Pennsylvania Press.
Braunstein, A. 2010. “Consistency and Pythagoras.” Journal of Quantitative Analysis in Sports 6(1):1–16.
Caro, C. A. and R. Machtmes. 2013. “Testing the Utility of the Pythagorean Expectation Formula on Division One College Football: An Examination and Comparison to the Morey Model.” Journal of Business & Economics Research 11(12):537–542.
Cha, D. U., D. P. Glatt, and P. M. Sommers. 2007. “An Empirical Test of Bill James’s Pythagorean Formula.” Journal of Recreational Mathematics 35(2):117–130.
Cochran, J. J. and R. Blackstock. 2009. “Pythagoras and the National Hockey League.” Journal of Quantitative Analysis in Sports 5(2):1–13.
Davenport, C. and K. Woolner. 1999. “Revisiting the Pythagorean Theorem: Putting Bill James’ Pythagorean Theorem to the Test.” The Baseball Prospectus.
Faraway, J. J. 2005. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Boca Raton, Florida: CRC Press.
Gilsdorf, K. F. and V. A. Sukhatme. 2008. “Testing Rosen’s Sequential Elimination Tournament Model Incentives and Player Performance in Professional Tennis.” Journal of Sports Economics 9(3):287–303.
Hamilton, H. H. 2011. “An Extension of the Pythagorean Expectation for Association Football.” Journal of Quantitative Analysis in Sports 7(2):1–18.
Hammond, C., W. P. Johnson, and S. J. Miller. 2015. “The James Function.” Mathematics Magazine 88(1):54–71.
James, B. 1981. Baseball Abstract. Self-Published, Lawrence, KS.
Klaassen, F. J. and J. R. Magnus. 2001. “Are Points in Tennis Independent and Identically Distributed? Evidence from a Dynamic Binary Panel Data Model.” Journal of the American Statistical Association 96(454):500–509.
Knottenbelt, W. J., D. Spanias, and A. M. Madurska. 2012. “A Common-opponent Stochastic Model for Predicting the Outcome of Professional Tennis Matches.” Computers & Mathematics with Applications 64(12):3820–3827.
McHale, I. and A. Morton. 2011. “A Bradley-Terry Type Model for Forecasting Tennis Match Results.” International Journal of Forecasting 27(2):619–630.
Miller, S. J. 2007. “A Derivation of the Pythagorean Won-loss Formula in Baseball.” Chance 20(1):40–48.
Miller, S. J., T. Corcoran, J. Gossels, V. Luo, and J. Porflio. 2014. “Pythagoras at the Bat.” in Social Networks and the Economics of Sports, 89–113. Springer.
Morris, C. 1977. “The Most Important Points in Tennis.” Optimal Strategies in Sports 5:131–140.
R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Rosenfeld, J. W., J. I. Fisher, D. Adler, and C. Morris. 2010. “Predicting Overtime with the Pythagorean Formula.” Journal of Quantitative Analysis in Sports 6(2):1–19.
Stefani, R. T. 1997. “Survey of the Major World Sports Rating Systems.” Journal of Applied Statistics 24(6):635–646.
Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 267–288.
JQAS, an official journal of the American Statistical Association, publishes research on the quantitative aspects of professional and collegiate sports. Articles deal with subjects as measurements of player performance, tournament structure, and the frequency and occurrence of records. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis.