Abstract
In this paper, we empirically evaluate the predictive power of eight sports ranking methods. For each ranking method, we implement two versions, one using only win-loss data and one utilizing score-differential data. The methods are compared on 4 datasets: 32 National Basketball Association (NBA) seasons, 112 Major League Baseball (MLB) seasons, 22 NCAA Division 1-A Basketball (NCAAB) seasons, and 56 NCAA Division 1-A Football (NCAAF) seasons. For each season of each dataset, we apply 20-fold cross validation to determine the predictive accuracy of the ranking methods. The non-parametric Friedman hypothesis test is used to assess whether the predictive errors for the considered rankings over the seasons are statistically dissimilar. The post-hoc Nemenyi test is then employed to determine which ranking methods have significant differences in predictive power. For all datasets, the null hypothesis – that all ranking methods are equivalent – is rejected at the 99% confidence level. For NCAAF and NCAAB datasets, the Nemenyi test concludes that the implementations utilizing score-differential data are usually more predictive than those using only win-loss data. For the NCAAF dataset, the least squares and random walker methods have significantly better predictive accuracy at the 95% confidence level than the other methods considered.
- 1
- 2
In equation (2), we take the fraction to be
if the game results in a 0–0 tie. - 3
A digraph is weakly connected if replacing its arcs with undirected edges yields a connected graph.
- 4
Recall that for a matrix with non-negative entries, there exists a positive, real eigenvalue (called the Perron-Frobenius eigenvalue) such that any other eigenvalue is smaller in magnitude. The Perron-Frobenius eigenvalue is simple and the corresponding eigenvector (called the Perron-Frobenius eigenvector) has non-negative entries. See, for example, Horn and Johnson (1991).
- 5
The matrices W and S are irreducible if the corresponding directed graph is strongly connected. The matrix W is irreducible if there is no partition of the teams V=V1ߎV2 such that no team in V1 has beat a team in V2.
References
Berry, S. M. 2003. “A Statistician Reads the Sports Pages: College Football Rankings: The BCS and the CLT.” Chance 16:46–49.10.1080/09332480.2003.10554849Search in Google Scholar
Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345.10.1093/biomet/39.3-4.324Search in Google Scholar
Burer, S. 2012. “Robust Rankings for College Football.” Journal of Quantitative Analysis in Sports 8(2).10.1515/1559-0410.1405Search in Google Scholar
Callaghan, T., P. J. Mucha, and M. A. Porter. 2007. “Random Walker Ranking for NCAA Division I-A Football.” American Mathematical Monthly 114:761–777.10.1080/00029890.2007.11920469Search in Google Scholar
CBB. 2012. “http://www.sports-reference.com/cbb/.” webpage accessed November 1, 2012.Search in Google Scholar
CFB. 2012. “http://www.sports-reference.com/cfb/.” webpage accessed October 29, 2012.Search in Google Scholar
Chan, V. 2011. “Prediction Accuracy of Linear Models for Paired Comparisons in Sports.” Journal of Quantitative Analysis in Sports 7(3), Article 18.Search in Google Scholar
Chartier, T. P., E. Kreutzer, A. N. Langville, and K. E. Pedings. 2011a. “Sensitivity and Stability of Ranking Vectors.” SIAM Journal on Scientific Computing 33:1077–1102.10.1137/090772745Search in Google Scholar
Chartier, T. P., E. Kreutzer, A. N. Langville, and K. E. Pedings. 2011b. “Sports Ranking with Nonuniform Weighting.” Journal of Quantitative Analysis in Sports 7(3), Article 6.Search in Google Scholar
Colley, W. N. 2002. “Colley’s Bias-Free College Football Ranking Method: The Colley Matrix Explained.” Technical report, Princeton University.Search in Google Scholar
David, H. A. 1963. The Method of Paired Comparisons. Charles Griffin & Co.Search in Google Scholar
Demšar, J. 2006. “Statistical Comparisons of Classifiers Over Multiple Data Sets.” JMLR 7:1–30.Search in Google Scholar
Dwork, C., R. Kumar, M. Naor, and D. Sivakumar. 2001a. “Rank Aggregation Methods for the Web.” pp. 613–622, in: Proceedings of the 10th International Converence on World Wide Web. ACM.10.1145/371920.372165Search in Google Scholar
Dwork, C., R. Kumar, M. Naor, and D. Sivakumar. 2001b. “Rank Aggregation Revisited.” pp. 613–622, in: Proceedings International Conference World Wide Web (WWW10).Search in Google Scholar
Elo, A. E. 1961. “The New U.S.C.F. Rating System.” Chess Life 16:160–161.Search in Google Scholar
Foulds, L. R. 1992. Graph Theory Applications. Springer.10.1007/978-1-4612-0933-1Search in Google Scholar
Gill, R. 2009. “Assessing Methods for College Football Rankings.” Journal of Quantitative Analysis in Sports 5(2), Article 3.Search in Google Scholar
Glickman, M. E. 1995. “A Comprehensive Guide to Chess Ratings.” American Chess Journal 3:59–102.Search in Google Scholar
Harville, D. 1977. “The Use of Linear-Model Methodology to Rate High School or College Football Teams.” Journal of the American Statistical Society 72:278–289.10.1080/01621459.1977.10480991Search in Google Scholar
Herbrich, R., T. Minka, and T. Graepel. 2007. “Trueskill: A Bayesian Skill Rating System.” Advances in Neural Information Processing Systems 19:569.Search in Google Scholar
Hirani, A. N., K. Kalyanaraman, and S. Watts. 2011. “Least Squares Ranking on Graphs.” arXiv:1011.1716v4.Search in Google Scholar
Hochbaum, D. S. 2010. “The Separation and Separation-Deviation Methodology for Group Decision Making and Aggregate Ranking.” TutORials in Operations Research 7:116–141.10.1287/educ.1100.0073Search in Google Scholar
Horn, R. A. and C. R. Johnson. 1991. Matrix Analysis. Cambridge University Press.10.1017/CBO9780511840371Search in Google Scholar
Jiang, X., L.-H. Lim, Y. Yao, and Y. Ye. 2010. “Statistical Ranking and Combinatorial Hodge Theory.” Mathematical Programming Ser. B 127:203–244.10.1007/s10107-010-0419-xSearch in Google Scholar
Keener, J. P. 1993. “The Perron-Frobenius Theorem and the Ranking of Football Teams.” SIAM Review 35:80–93.10.1137/1035004Search in Google Scholar
Langville, A. N. and C. D. Meyer. 2012. Who’s #1?: The Science of Rating and Ranking. Princeton University Press.10.1515/9781400841677Search in Google Scholar
Leake, R. 1976. “A Method for Ranking Teams: With an Application to College Football.” Management Science in Sports 4:27–46.Search in Google Scholar
Massey, K. 1997. Statistical Models Applied to the Rating of Sports Teams, Master’s thesis, Bluefield College.Search in Google Scholar
Miwa, T. 2012. “ http://cse.niaes.affrc.go.jp/miwa/probcalc/s-range/.” webpage accessed November 26, 2012.Search in Google Scholar
MLB. 2012. “http://www.baseball-reference.com/.” webpage accessed October 29, 2012.Search in Google Scholar
NBA. 2012. “http://www.basketball-reference.com/.” webpage accessed October 29, 2012.Search in Google Scholar
Osting, B., C. Brune, and S. Osher. 2013a. “Enhanced statistical rankings via targeted data collection.” JMLR, W&CP 28(1):489–497.Search in Google Scholar
Osting, B., J. Darbon, and S. Osher. 2013b. “Statistical Ranking Using the ℓ1 -Norm on Graphs.” accepted to AIMS J. Inverse Problems and Imaging.10.3934/ipi.2013.7.907Search in Google Scholar
Page, L., S. Brin, R. Motwani, and T. Winograd. 1999. “The PageRank Citation Ranking: Bringing Order to the Web.” Technical report, Stanford InfoLab Technical Report 1999–66.Search in Google Scholar
Pickle, D. and B. Howard. 1981. “Computer to Aid in Basketball Championship Selection.” NCAA News 4.Search in Google Scholar
Shaffer, J. P. 1995. “Multiple Hypothesis Testing.” Annual Review of Psychology, 46:561–584.10.1146/annurev.ps.46.020195.003021Search in Google Scholar
Stefani, R. T. 1977. “Football and Basketball Predictions Using Least Squares.” IEEE Transactions on Systems, Man, and Cybernetics 7:117–121.10.1109/TSMC.1977.4309667Search in Google Scholar
Stefani, R. T. 1980. “Improved Least Squares Football, Basketball, and Soccer Predictions.” IEEE Transactions on Systems, Man, and Cybernetics 10:116–123.10.1109/TSMC.1980.4308442Search in Google Scholar
Stefani, R. 2011. “The Methodology of Officially Recognized International Sports Rating Systems.” Journal of Quantitative Analysis in Sports 7(4), Article 10.Search in Google Scholar
Tran, N. M. 2011. “Pairwise Ranking: Choice of Method Can Produce Arbitrarily Different Rank Order.” arXiv:1103.1110v1.Search in Google Scholar
Trono, J. A. 2010. “Rating/Ranking Systems, Post-Season Bowl Games, and ‘The Spread’.” Journal of Quantitative Analysis in Sports 6(3), Article 6.Search in Google Scholar
Xu, Q., Y. Yao, T. Jiang, Q. Huang, B. Yan, and W. Lin. 2011. “Random Partial Paired Comparison for Subjective Video Quality Assessment via HodgeRank.” pp. 393–402, in Proceedings of the 19th ACM International Conference on Multimedia.10.1145/2072298.2072350Search in Google Scholar
©2013 by Walter de Gruyter Berlin Boston