Abstract
Multiple models are discussed for ranking teams in a league and introduce a new model called the Oracle method. This is a Markovovian method that can be customized to incorporate multiple team traits into its ranking. Using a foresight prediction of NFL game outcomes for the 2002–2013 seasons, it is shown that the Oracle method correctly picked 64.1% of the games under consideration, which is higher than any of the methods compared, including ESPN Power Rankings, Massey, Colley, and PageRank.
Acknowledgments
All game data used in the compilation of the rankings in this article was retrieved from Sports Reference (2014).The authors thank the anonymous referees for their thoroughly meticulous review of our initial manuscript and subsequent revisions. Their suggestions helped us make more accurate the data provided in this paper, add more models for comparison, and ultimately, allowed us to tremendously improve the presentation of this paper.
Appendix
One of the advantages in using the Oracle method is the possible customization that can be implemented. We will now show in more detail how these customizations are done and how the ranks are computed. We also include an expanded list of the methods we considered in the NFL predictions from Section 4.
A.1 Oracle implementation
Let us take a second look into 6-team the round-robin tournament with outcomes summarized in Table 1. As stated before, one can consider different up and down vectors for the Oracle method. Indeed, having only information about the outcomes of the games, we use only e, w, and w+ as the possible up and down vectors in the ranking, and after five rounds we have w=[4 3 3 2 2 1]T and w+=[5 4 4 3 3 2]T. The corresponding Oracle matrices and their corresponding column-stochastic matrix Pm=(pij) would be as follows:
For the ranking methods PageRank and PageRank(w+), we simply used
and α=0.85. Hence we obtained the two column-stochastic matrices
Finally, the rankings in Table 2 were obtained by finding the Perron vector of each of the column-stochastic matrices above.
A.2 Expanded results for NFL predictions
In the predictions, the winner is clearly determined by the score in each game, but the actual score may also be used as another statistic for each match. In addition to the usual incidence matrix Am that captures the number of wins of each team, we also consider the weighted score matrix
In Table 4, we give an expanded list of models, again with the percentage of games each method picked correctly starting in week 4 and week 11. In this table, numbers given in bold are those that outperform the WH method. We note that Oracle variants which do not incorporate the score into either of the up or down vectors do a bit worse than those that do use this statistic, which would lead one to the conclusion that the score difference between teams in NFL games does give some insight into the quality of the teams in that league.
In Table A1 and Table A2, for each of the eight prediction methods we give the overall prediction percentage of all NFL games in the seasons ranging from 1966 to 2013 and 2002–2013, respectively, by considering all possible starting weeks between week 4 and week 11. More importantly, when just comparing the predicting power of the Or(w+, s+) model in relation to other Markov methods, the results show that Oracle model provides a viable alternative to a Markovian method to rank sport teams.
Average of correct foresight prediction percentage of all NFL games starting in week 4 up to week 11.
NFL Seasons between 1966 and 2013 | ||||
---|---|---|---|---|
Starting week | Week 4 | Week 5 | Week 6 | Week 7 |
WH | 62.89% | 62.72% | 62.89% | 63.00% |
Massey | 62.77% | 63.19% | 63.37% | 63.71% |
Colley | 61.79% | 61.83% | 62.07% | 62.07% |
PageRank | 59.33% | 59.46% | 59.51% | 59.88% |
Keener | 60.08% | 60.49% | 60.87% | 61.36% |
Biased voter | 60.80% | 61.13% | 61.56% | 61.91% |
Bradley-Terry model | 62.49% | 62.82% | 63.40% | 63.49% |
Oracle(w+,s+) | 63.41% | 63.26% | 63.63% | 63.57% |
Starting Week | Week 8 | Week 9 | Week 10 | Week 11 |
WH | 63.00% | 63.42% | 63.61% | 63.94% |
Massey | 63.96% | 64.04% | 64.52% | 64.99% |
Colley | 62.13% | 62.31% | 62.69% | 63.15% |
PageRank | 59.98% | 60.36% | 60.70% | 61.44% |
Keener | 61.82% | 61.94% | 62.28% | 62.73% |
Biased voter | 61.88% | 62.01% | 62.36% | 62.86% |
Bradley-Terry model | 63.50% | 63.78% | 64.01% | 64.23% |
Oracle(w+, s+) | 63.78% | 64.22% | 64.70% | 65.09% |
Bold indicates methods whose predictions, on average, are better than or equal to the WH method.
Average of correct foresight prediction percentage of all NFL games starting in week 4 up to week 11.
NFL Seasons between 2002 and 2013 | ||||
---|---|---|---|---|
Starting week | Week 4 | Week 5 | Week 6 | Week 7 |
WH | 62.25% | 62.41% | 62.67% | 63.06% |
ESPN PR | 63.02% | 63.15% | 63.37% | 63.66% |
Massey | 63.80% | 64.31% | 64.74% | 65.20% |
Colley | 61.99% | 62.31% | 62.46% | 62.73% |
PageRank | 58.67% | 58.87% | 58.58% | 58.76% |
Keener | 61.04% | 61.58% | 61.52% | 62.19% |
Biased voter | 61.17% | 61.71% | 61.96% | 62.29% |
Bradley-Terry model | 63.58% | 64.04% | 64.54% | 64.99% |
Oracle(w+, s+) | 64.10% | 64.22% | 64.34% | 64.65% |
Starting Week | Week 8 | Week 9 | Week 10 | Week 11 |
WH | 63.61% | 63.98% | 64.26% | 65.76% |
ESPN PR | 64.09% | 63.97% | 64.02% | 65.48% |
Massey | 65.96% | 65.93% | 66.07% | 67.41% |
Colley | 63.60% | 63.77% | 63.96% | 65.32% |
PageRank | 59.06% | 59.68% | 59.64% | 60.61% |
Keener | 63.01% | 63.45% | 63.50% | 65.07% |
Biased voter | 62.94% | 63.04% | 63.20% | 64.19% |
Bradley-Terry model | 65.72% | 65.53% | 65.55% | 66.98% |
Oracle(w+, s+) | 65.42% | 65.26% | 65.47% | 66.63% |
Bold indicates methods whose predictions, on average, are better than or equal to the WH method.
In Table A3, for each of the eight prediction methods we give the overall prediction percentage of all NFL games in the seasons ranging from 1966 to 2013 and 2002 to 2013, respectively, using the 10-week fixed prediction model described in Section 4. Comparison with Table 3 shows that the results are similar to the usual foresight prediction.
Average of correct 10-week fixed foresight prediction percentage of all NFL games.
1966–2013 | 2002–2013 | |
---|---|---|
Starting week | Week 11 | Week 11 |
WH | 63.77% | 65.31% |
ESPN | n/a | 64.86% |
Massey | 64.30% | 65.93% |
Colley | 63.08% | 63.57% |
PageRank | 61.13% | 61.48% |
Keener | 61.69% | 64.30% |
Biased voter | 62.51% | 62.51% |
Bradley-Terry model | 63.97% | 65.14% |
Oracle(w+, s+) | 64.72% | 65.57% |
Bold indicates methods whose predictions, on average, are better than or equal to the WH method.
Table A4 is the same as Table 3, except that predictions are made for games played from week 4 or week 11 through the the final week of each season (rather than the penultimate week). If comparing the results in these two tables, one would see that the predictive power of the Or(w+, s+) model improves relative to the WH method, which is not surprising since the WH method is assumed to not do so well at predicting games in the final week of the season.
Average of correct foresight prediction percentage of all NFL games starting either in week 4 or in week 11 up to, and including, the final week of the season.
1966–2013 | 2002–2013 | |||
---|---|---|---|---|
Starting week | Week 4 | Week 11 | Week 4 | Week 11 |
WH | 62.79% | 63.61% | 61.98% | 64.75% |
ESPN PR | n/a | n/a | 62.57% | 64.29% |
Massey | 62.95% | 65.13% | 63.53% | 66.39% |
Colley | 61.77% | 63.00% | 61.61% | 64.15% |
PageRank | 59.35% | 61.08% | 58.87% | 60.72% |
Keener | 60.10% | 62.34% | 61.06% | 64.53% |
Biased voter | 60.73% | 62.45% | 60.94% | 63.33% |
Bradley-Terry model | 62.63% | 64.32% | 63.41% | 66.17% |
Oracle(w+, s+) | 63.45% | 65.05% | 63.93% | 65.95% |
Bold indicates methods whose predictions, on average, are better than or equal to the WH method.
- 1
The method of Massey used for the BCS Rankings in college football is proprietary, and thus not publicly available. The method we discuss is the original idea of Massey (1997), which he developed for an honors thesis as an undergraduate at Bluefield College.
- 2
For this data, we compute the rating vector based on the algorithm and Matlab routine given in Hunter (2004).
- 3
In Langville and Meyer (2003), it is given that Google originally used α=0.85.
- 4
In Callaghan et al. (2007), the authors named their method as Random Walker Ranking. As that description may also fit other Markov methods, we refer to it as Biased Random Walker.
- 5
All the variations of PageRank we have tested, always promote T6 to the 2nd highest ranking.
- 6
All computations were performed using Matlab R2013a.
- 7
One could certainly argue that this addition of the Oracle node opens the possibility for a distortion of the rankings in some way, especially by artificially forcing connectedness early in season. However, the data supports that, at least for standard choices of the statistics—score differential, wins, etc.—incorporated into the up and down vectors, this does not happen.
- 8
Teams with identical ranks for all methods other than WH meet, on average, less than once per year in the weeks considered. In the WH method, teams with the same record meet, on average, 24 times per season in the weeks considered.
References
Agresti, A. 2002. Categorical Data Analysis, Wiley Series in Probability and Statistics. Hoboken, New Jersey: Wiley-Interscience, 2nd edition.10.1002/0471249688Search in Google Scholar
Agresti, A. and D. B. Hitchcock. 2005. “Bayesian Inference for Categorical Data Analysis.” Statistical Methods and Applications 14:297–330.10.1007/s10260-005-0121-ySearch in Google Scholar
Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345.10.1093/biomet/39.3-4.324Search in Google Scholar
Brin, S. and L. Page. 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Computer Networks ISDN Systems 30:107–117.10.1016/S0169-7552(98)00110-XSearch in Google Scholar
Callaghan, T., P. J. Mucha, and M. A. Porter. 2007. “Random Walker Ranking for NCAA Division I-A Football.” American Mathematical Monthly 114:761–777.10.1080/00029890.2007.11920469Search in Google Scholar
Colley, W. 2002. “Colley’s Bias Free College Football Ranking Method: The Colley Matrix Explained.” Retrieved January 10, 2014 from http://www.colleyrankings.com/matrate.pdf.Search in Google Scholar
Constantine, P. G. and D. F. Gleich. 2010. “Random Alpha PageRank.” Internet Mathematics 6:189–236.10.1080/15427951.2009.10129185Search in Google Scholar
David, H. A. 1963. The Method of Paired Comparisons. New York: Hafner Publishing Company.Search in Google Scholar
Easterbrook, G. 2008. “Time to Look Back on Some Horrible Predictions.” Retrieved January 10, 2014 from sports.espn.go.com/espn/page2/story?page=easterbrook/090210.Search in Google Scholar
ESPN 2014. “NFL Power Rankings.” Retrieved January 10, 2014 from http://espn.go.com/nfl/powerrankings.Search in Google Scholar
Ford, J. L. R. 1957. “Solution of a Ranking Problem from Binary Comparisons.” American Mathematical Monthly 64:28–33.10.1080/00029890.1957.11989117Search in Google Scholar
Gleich, D. F. 2011. “Review of: Numerical algorithms for personalized search in self-organizing information networks by Sep Kamvar, Princeton University Press, 2010.” Linear Algebra and its Applications 435:908–909.10.1016/j.laa.2011.01.013Search in Google Scholar
Horn, R. A. and C. R. Johnson. 1990. Matrix Analysis. New York, NY: Cambridge University Press.Search in Google Scholar
Hunter, D. R. 2004. “MM Algorithms for Generalized Bradley-Terry Models.” The Annals of Statistics 32:384–406.10.1214/aos/1079120141Search in Google Scholar
Keener, J. P. 1993. “The Perron-Frobenius Theorem and the Ranking of Football Teams.” SIAM Review 35:80–93.10.1137/1035004Search in Google Scholar
Langville, A. N. and C. D. Meyer. 2003. “Deeper Inside PageRank.” Internet Mathematics 1:257–380.Search in Google Scholar
Langville, A. N. and C. D. Meyer. 2006. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ, USA: Princeton University Press.10.1515/9781400830329Search in Google Scholar
Langville, A. N. and C. D. Meyer. 2012. Who’s #1?: The Science of Rating and Ranking. Princeton, NJ, USA: Princeton University Press.10.1515/9781400841677Search in Google Scholar
Massey, K. 1997. “Statistical Models Applied to the Rating of Sports Teams.” Bachelor’s honors thesis, Bluefield College.Search in Google Scholar
Page, L., S. Brin, R. Motwani, and T. Winograd. 1999. “The Pagerank Citation Ranking: Bringing Order to the Web.” Technical Report 1999–66, Stanford InfoLab.Search in Google Scholar
Sports Reference, LLC. 2014. “Pro-Football-Reference.” Retrieved January 10, 2014 from http://www.pro-football-reference.com/.Search in Google Scholar
Thurstone, L. L. 1927. “A Law of Comparative Judgment.” Psychological Review 34:273–286.10.1037/h0070288Search in Google Scholar
Zermelo, E. 1929. “Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung.” Mathematische Zeitschrift 29:436–460.10.1007/BF01180541Search in Google Scholar
©2014 by Walter de Gruyter Berlin/Boston