Accessible Unlicensed Requires Authentication Published by De Gruyter May 27, 2013

Ranking rankings: an empirical comparison of the predictive power of sports ranking methods

Daniel Barrow, Ian Drayer, Peter Elliott, Garren Gaut and Braxton Osting


In this paper, we empirically evaluate the predictive power of eight sports ranking methods. For each ranking method, we implement two versions, one using only win-loss data and one utilizing score-differential data. The methods are compared on 4 datasets: 32 National Basketball Association (NBA) seasons, 112 Major League Baseball (MLB) seasons, 22 NCAA Division 1-A Basketball (NCAAB) seasons, and 56 NCAA Division 1-A Football (NCAAF) seasons. For each season of each dataset, we apply 20-fold cross validation to determine the predictive accuracy of the ranking methods. The non-parametric Friedman hypothesis test is used to assess whether the predictive errors for the considered rankings over the seasons are statistically dissimilar. The post-hoc Nemenyi test is then employed to determine which ranking methods have significant differences in predictive power. For all datasets, the null hypothesis – that all ranking methods are equivalent – is rejected at the 99% confidence level. For NCAAF and NCAAB datasets, the Nemenyi test concludes that the implementations utilizing score-differential data are usually more predictive than those using only win-loss data. For the NCAAF dataset, the least squares and random walker methods have significantly better predictive accuracy at the 95% confidence level than the other methods considered.

Corresponding author: Braxton Osting, UCLA, Department of Mathematics, 405 Hilgard Avenue, Los Angeles, CA 90095, USA, Tel.: +3108252601

  1. 1
  2. 2

    In equation (2), we take the fraction to be

    if the game results in a 0–0 tie.

  3. 3

    A digraph is weakly connected if replacing its arcs with undirected edges yields a connected graph.

  4. 4

    Recall that for a matrix with non-negative entries, there exists a positive, real eigenvalue (called the Perron-Frobenius eigenvalue) such that any other eigenvalue is smaller in magnitude. The Perron-Frobenius eigenvalue is simple and the corresponding eigenvector (called the Perron-Frobenius eigenvector) has non-negative entries. See, for example, Horn and Johnson (1991).

  5. 5

    The matrices W and S are irreducible if the corresponding directed graph is strongly connected. The matrix W is irreducible if there is no partition of the teams V=V1ߎV2 such that no team in V1 has beat a team in V2.


Berry, S. M. 2003. “A Statistician Reads the Sports Pages: College Football Rankings: The BCS and the CLT.” Chance 16:46–49.Search in Google Scholar

Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345.Search in Google Scholar

Burer, S. 2012. “Robust Rankings for College Football.” Journal of Quantitative Analysis in Sports 8(2).Search in Google Scholar

Callaghan, T., P. J. Mucha, and M. A. Porter. 2007. “Random Walker Ranking for NCAA Division I-A Football.” American Mathematical Monthly 114:761–777.Search in Google Scholar

CBB. 2012. “.” webpage accessed November 1, 2012.Search in Google Scholar

CFB. 2012. “.” webpage accessed October 29, 2012.Search in Google Scholar

Chan, V. 2011. “Prediction Accuracy of Linear Models for Paired Comparisons in Sports.” Journal of Quantitative Analysis in Sports 7(3), Article 18.Search in Google Scholar

Chartier, T. P., E. Kreutzer, A. N. Langville, and K. E. Pedings. 2011a. “Sensitivity and Stability of Ranking Vectors.” SIAM Journal on Scientific Computing 33:1077–1102.Search in Google Scholar

Chartier, T. P., E. Kreutzer, A. N. Langville, and K. E. Pedings. 2011b. “Sports Ranking with Nonuniform Weighting.” Journal of Quantitative Analysis in Sports 7(3), Article 6.Search in Google Scholar

Colley, W. N. 2002. “Colley’s Bias-Free College Football Ranking Method: The Colley Matrix Explained.” Technical report, Princeton University.Search in Google Scholar

David, H. A. 1963. The Method of Paired Comparisons. Charles Griffin & Co.Search in Google Scholar

Demšar, J. 2006. “Statistical Comparisons of Classifiers Over Multiple Data Sets.” JMLR 7:1–30.Search in Google Scholar

Dwork, C., R. Kumar, M. Naor, and D. Sivakumar. 2001a. “Rank Aggregation Methods for the Web.” pp. 613–622, in: Proceedings of the 10th International Converence on World Wide Web. ACM.Search in Google Scholar

Dwork, C., R. Kumar, M. Naor, and D. Sivakumar. 2001b. “Rank Aggregation Revisited.” pp. 613–622, in: Proceedings International Conference World Wide Web (WWW10).Search in Google Scholar

Elo, A. E. 1961. “The New U.S.C.F. Rating System.” Chess Life 16:160–161.Search in Google Scholar

Foulds, L. R. 1992. Graph Theory Applications. Springer.Search in Google Scholar

Gill, R. 2009. “Assessing Methods for College Football Rankings.” Journal of Quantitative Analysis in Sports 5(2), Article 3.Search in Google Scholar

Glickman, M. E. 1995. “A Comprehensive Guide to Chess Ratings.” American Chess Journal 3:59–102.Search in Google Scholar

Harville, D. 1977. “The Use of Linear-Model Methodology to Rate High School or College Football Teams.” Journal of the American Statistical Society 72:278–289.Search in Google Scholar

Herbrich, R., T. Minka, and T. Graepel. 2007. “Trueskill: A Bayesian Skill Rating System.” Advances in Neural Information Processing Systems 19:569.Search in Google Scholar

Hirani, A. N., K. Kalyanaraman, and S. Watts. 2011. “Least Squares Ranking on Graphs.” arXiv:1011.1716v4.Search in Google Scholar

Hochbaum, D. S. 2010. “The Separation and Separation-Deviation Methodology for Group Decision Making and Aggregate Ranking.” TutORials in Operations Research 7:116–141.Search in Google Scholar

Horn, R. A. and C. R. Johnson. 1991. Matrix Analysis. Cambridge University Press.Search in Google Scholar

Jiang, X., L.-H. Lim, Y. Yao, and Y. Ye. 2010. “Statistical Ranking and Combinatorial Hodge Theory.” Mathematical Programming Ser. B 127:203–244.Search in Google Scholar

Keener, J. P. 1993. “The Perron-Frobenius Theorem and the Ranking of Football Teams.” SIAM Review 35:80–93.Search in Google Scholar

Langville, A. N. and C. D. Meyer. 2012. Who’s #1?: The Science of Rating and Ranking. Princeton University Press.Search in Google Scholar

Leake, R. 1976. “A Method for Ranking Teams: With an Application to College Football.” Management Science in Sports 4:27–46.Search in Google Scholar

Massey, K. 1997. Statistical Models Applied to the Rating of Sports Teams, Master’s thesis, Bluefield College.Search in Google Scholar

Miwa, T. 2012. “ .” webpage accessed November 26, 2012.Search in Google Scholar

MLB. 2012. “.” webpage accessed October 29, 2012.Search in Google Scholar

NBA. 2012. “.” webpage accessed October 29, 2012.Search in Google Scholar

Osting, B., C. Brune, and S. Osher. 2013a. “Enhanced statistical rankings via targeted data collection.” JMLR, W&CP 28(1):489–497.Search in Google Scholar

Osting, B., J. Darbon, and S. Osher. 2013b. “Statistical Ranking Using the ℓ1 -Norm on Graphs.” accepted to AIMS J. Inverse Problems and Imaging.Search in Google Scholar

Page, L., S. Brin, R. Motwani, and T. Winograd. 1999. “The PageRank Citation Ranking: Bringing Order to the Web.” Technical report, Stanford InfoLab Technical Report 1999–66.Search in Google Scholar

Pickle, D. and B. Howard. 1981. “Computer to Aid in Basketball Championship Selection.” NCAA News 4.Search in Google Scholar

Shaffer, J. P. 1995. “Multiple Hypothesis Testing.” Annual Review of Psychology, 46:561–584.Search in Google Scholar

Stefani, R. T. 1977. “Football and Basketball Predictions Using Least Squares.” IEEE Transactions on Systems, Man, and Cybernetics 7:117–121.Search in Google Scholar

Stefani, R. T. 1980. “Improved Least Squares Football, Basketball, and Soccer Predictions.” IEEE Transactions on Systems, Man, and Cybernetics 10:116–123.Search in Google Scholar

Stefani, R. 2011. “The Methodology of Officially Recognized International Sports Rating Systems.” Journal of Quantitative Analysis in Sports 7(4), Article 10.Search in Google Scholar

Tran, N. M. 2011. “Pairwise Ranking: Choice of Method Can Produce Arbitrarily Different Rank Order.” arXiv:1103.1110v1.Search in Google Scholar

Trono, J. A. 2010. “Rating/Ranking Systems, Post-Season Bowl Games, and ‘The Spread’.” Journal of Quantitative Analysis in Sports 6(3), Article 6.Search in Google Scholar

Xu, Q., Y. Yao, T. Jiang, Q. Huang, B. Yan, and W. Lin. 2011. “Random Partial Paired Comparison for Subjective Video Quality Assessment via HodgeRank.” pp. 393–402, in Proceedings of the 19th ACM International Conference on Multimedia.Search in Google Scholar

Published Online: 2013-05-27
Published in Print: 2013-06-01

©2013 by Walter de Gruyter Berlin Boston