# An Oracle method to predict NFL games

• Eduardo Cabral Balreira , Brian K. Miceli and Thomas Tegtmeyer

## Abstract

Multiple models are discussed for ranking teams in a league and introduce a new model called the Oracle method. This is a Markovovian method that can be customized to incorporate multiple team traits into its ranking. Using a foresight prediction of NFL game outcomes for the 2002–2013 seasons, it is shown that the Oracle method correctly picked 64.1% of the games under consideration, which is higher than any of the methods compared, including ESPN Power Rankings, Massey, Colley, and PageRank.

Corresponding author: Eduardo Cabral Balreira, Mathematics, Trinity University, One Trinity Place, San Antonio, TX 78212, USA, Tel.: +2109998243, e-mail:

## Acknowledgments

All game data used in the compilation of the rankings in this article was retrieved from Sports Reference (2014).The authors thank the anonymous referees for their thoroughly meticulous review of our initial manuscript and subsequent revisions. Their suggestions helped us make more accurate the data provided in this paper, add more models for comparison, and ultimately, allowed us to tremendously improve the presentation of this paper.

## Appendix

One of the advantages in using the Oracle method is the possible customization that can be implemented. We will now show in more detail how these customizations are done and how the ranks are computed. We also include an expanded list of the methods we considered in the NFL predictions from Section 4.

### A.1 Oracle implementation

Let us take a second look into 6-team the round-robin tournament with outcomes summarized in Table 1. As stated before, one can consider different up and down vectors for the Oracle method. Indeed, having only information about the outcomes of the games, we use only e, w, and w+ as the possible up and down vectors in the ranking, and after five rounds we have w=[4 3 3 2 2 1]T and w+=[5 4 4 3 3 2]T. The corresponding Oracle matrices and their corresponding column-stochastic matrix Pm=(pij) would be as follows:

O5(e,w+)=(0111105001011400011140100013000101310000021111110)andP5(e,w+)=(0131314140521001301415421000141415421013000153210001401532112000003211213131415150).

O5(e,w)=(0111104001011300011130100012000101210000011111110)andP5(e,w)=(0131314140415001301415315000141415315013000152150001401521512000001151213131414150).

O5(w,w)=(0111104001011300011130100012000101210000014332210)andP5(w,w)=(0151515150415001501515315000151515315015000152150001501521515000001154535352525150).

For the ranking methods PageRank and PageRank(w+), we simply used

H=(01212131300012013140001313140120001400013014100000)

and α=0.85. Hence we obtained the two column-stochastic matrices

Gα=αH+16(1α)eeTandGα(w+)=αH+16(1α)w+eT.

Finally, the rankings in Table 2 were obtained by finding the Perron vector of each of the column-stochastic matrices above.

### A.2 Expanded results for NFL predictions

In the predictions, the winner is clearly determined by the score in each game, but the actual score may also be used as another statistic for each match. In addition to the usual incidence matrix Am that captures the number of wins of each team, we also consider the weighted score matrix Asm=(aijs), where if Ti beats Tj, we have aijs is the score difference and ajis=0. In the case of NFL games, the margin of victory can also be measured in the number of possessions (mostly touchdowns) that team is ahead. Hence, we consider the weighted margin matrix Amm=(aijm), where aijm=aijs/7. Ranking methods using the score and margin matrix are denoted using a prefix of s- and m-, respectively. Finally we consider w+ and s+ as possible customizations for the PageRank and Oracle variants.

In Table 4, we give an expanded list of models, again with the percentage of games each method picked correctly starting in week 4 and week 11. In this table, numbers given in bold are those that outperform the WH method. We note that Oracle variants which do not incorporate the score into either of the up or down vectors do a bit worse than those that do use this statistic, which would lead one to the conclusion that the score difference between teams in NFL games does give some insight into the quality of the teams in that league.

In Table A1 and Table A2, for each of the eight prediction methods we give the overall prediction percentage of all NFL games in the seasons ranging from 1966 to 2013 and 2002–2013, respectively, by considering all possible starting weeks between week 4 and week 11. More importantly, when just comparing the predicting power of the Or(w+, s+) model in relation to other Markov methods, the results show that Oracle model provides a viable alternative to a Markovian method to rank sport teams.

Table A1

Average of correct foresight prediction percentage of all NFL games starting in week 4 up to week 11.

NFL Seasons between 1966 and 2013
Starting weekWeek 4Week 5Week 6Week 7
WH62.89%62.72%62.89%63.00%
Massey62.77%63.19%63.37%63.71%
Colley61.79%61.83%62.07%62.07%
PageRank59.33%59.46%59.51%59.88%
Keener60.08%60.49%60.87%61.36%
Biased voter60.80%61.13%61.56%61.91%
Oracle(w+,s+)63.41%63.26%63.63%63.57%
Starting WeekWeek 8Week 9Week 10Week 11
WH63.00%63.42%63.61%63.94%
Massey63.96%64.04%64.52%64.99%
Colley62.13%62.31%62.69%63.15%
PageRank59.98%60.36%60.70%61.44%
Keener61.82%61.94%62.28%62.73%
Biased voter61.88%62.01%62.36%62.86%
Oracle(w+, s+)63.78%64.22%64.70%65.09%

Bold indicates methods whose predictions, on average, are better than or equal to the WH method.

Table A2

Average of correct foresight prediction percentage of all NFL games starting in week 4 up to week 11.

NFL Seasons between 2002 and 2013
Starting weekWeek 4Week 5Week 6Week 7
WH62.25%62.41%62.67%63.06%
ESPN PR63.02%63.15%63.37%63.66%
Massey63.80%64.31%64.74%65.20%
Colley61.99%62.31%62.46%62.73%
PageRank58.67%58.87%58.58%58.76%
Keener61.04%61.58%61.52%62.19%
Biased voter61.17%61.71%61.96%62.29%
Oracle(w+, s+)64.10%64.22%64.34%64.65%
Starting WeekWeek 8Week 9Week 10Week 11
WH63.61%63.98%64.26%65.76%
ESPN PR64.09%63.97%64.02%65.48%
Massey65.96%65.93%66.07%67.41%
Colley63.60%63.77%63.96%65.32%
PageRank59.06%59.68%59.64%60.61%
Keener63.01%63.45%63.50%65.07%
Biased voter62.94%63.04%63.20%64.19%
Oracle(w+, s+)65.42%65.26%65.47%66.63%

Bold indicates methods whose predictions, on average, are better than or equal to the WH method.

In Table A3, for each of the eight prediction methods we give the overall prediction percentage of all NFL games in the seasons ranging from 1966 to 2013 and 2002 to 2013, respectively, using the 10-week fixed prediction model described in Section 4. Comparison with Table 3 shows that the results are similar to the usual foresight prediction.

Table A3

Average of correct 10-week fixed foresight prediction percentage of all NFL games.

1966–20132002–2013
Starting weekWeek 11Week 11
WH63.77%65.31%
ESPNn/a64.86%
Massey64.30%65.93%
Colley63.08%63.57%
PageRank61.13%61.48%
Keener61.69%64.30%
Biased voter62.51%62.51%
Oracle(w+, s+)64.72%65.57%

Bold indicates methods whose predictions, on average, are better than or equal to the WH method.

Table A4 is the same as Table 3, except that predictions are made for games played from week 4 or week 11 through the the final week of each season (rather than the penultimate week). If comparing the results in these two tables, one would see that the predictive power of the Or(w+, s+) model improves relative to the WH method, which is not surprising since the WH method is assumed to not do so well at predicting games in the final week of the season.

Table A4

Average of correct foresight prediction percentage of all NFL games starting either in week 4 or in week 11 up to, and including, the final week of the season.

1966–20132002–2013
Starting weekWeek 4Week 11Week 4Week 11
WH62.79%63.61%61.98%64.75%
ESPN PRn/an/a62.57%64.29%
Massey62.95%65.13%63.53%66.39%
Colley61.77%63.00%61.61%64.15%
PageRank59.35%61.08%58.87%60.72%
Keener60.10%62.34%61.06%64.53%
Biased voter60.73%62.45%60.94%63.33%
Oracle(w+, s+)63.45%65.05%63.93%65.95%

Bold indicates methods whose predictions, on average, are better than or equal to the WH method.

1. 1

The method of Massey used for the BCS Rankings in college football is proprietary, and thus not publicly available. The method we discuss is the original idea of Massey (1997), which he developed for an honors thesis as an undergraduate at Bluefield College.

2. 2

For this data, we compute the rating vector based on the algorithm and Matlab routine given in Hunter (2004).

3. 3

In Langville and Meyer (2003), it is given that Google originally used α=0.85.

4. 4

In Callaghan et al. (2007), the authors named their method as Random Walker Ranking. As that description may also fit other Markov methods, we refer to it as Biased Random Walker.

5. 5

All the variations of PageRank we have tested, always promote T6 to the 2nd highest ranking.

6. 6

All computations were performed using Matlab R2013a.

7. 7

One could certainly argue that this addition of the Oracle node opens the possibility for a distortion of the rankings in some way, especially by artificially forcing connectedness early in season. However, the data supports that, at least for standard choices of the statistics—score differential, wins, etc.—incorporated into the up and down vectors, this does not happen.

8. 8

Teams with identical ranks for all methods other than WH meet, on average, less than once per year in the weeks considered. In the WH method, teams with the same record meet, on average, 24 times per season in the weeks considered.

## References

Agresti, A. 2002. Categorical Data Analysis, Wiley Series in Probability and Statistics. Hoboken, New Jersey: Wiley-Interscience, 2nd edition.10.1002/0471249688Search in Google Scholar

Agresti, A. and D. B. Hitchcock. 2005. “Bayesian Inference for Categorical Data Analysis.” Statistical Methods and Applications 14:297–330.10.1007/s10260-005-0121-ySearch in Google Scholar

Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345.10.1093/biomet/39.3-4.324Search in Google Scholar

Brin, S. and L. Page. 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Computer Networks ISDN Systems 30:107–117.10.1016/S0169-7552(98)00110-XSearch in Google Scholar

Callaghan, T., P. J. Mucha, and M. A. Porter. 2007. “Random Walker Ranking for NCAA Division I-A Football.” American Mathematical Monthly 114:761–777.10.1080/00029890.2007.11920469Search in Google Scholar

Colley, W. 2002. “Colley’s Bias Free College Football Ranking Method: The Colley Matrix Explained.” Retrieved January 10, 2014 from http://www.colleyrankings.com/matrate.pdf.Search in Google Scholar

Constantine, P. G. and D. F. Gleich. 2010. “Random Alpha PageRank.” Internet Mathematics 6:189–236.10.1080/15427951.2009.10129185Search in Google Scholar

David, H. A. 1963. The Method of Paired Comparisons. New York: Hafner Publishing Company.Search in Google Scholar

Easterbrook, G. 2008. “Time to Look Back on Some Horrible Predictions.” Retrieved January 10, 2014 from sports.espn.go.com/espn/page2/story?page=easterbrook/090210.Search in Google Scholar

ESPN 2014. “NFL Power Rankings.” Retrieved January 10, 2014 from http://espn.go.com/nfl/powerrankings.Search in Google Scholar

Ford, J. L. R. 1957. “Solution of a Ranking Problem from Binary Comparisons.” American Mathematical Monthly 64:28–33.10.1080/00029890.1957.11989117Search in Google Scholar

Gleich, D. F. 2011. “Review of: Numerical algorithms for personalized search in self-organizing information networks by Sep Kamvar, Princeton University Press, 2010.” Linear Algebra and its Applications 435:908–909.10.1016/j.laa.2011.01.013Search in Google Scholar

Horn, R. A. and C. R. Johnson. 1990. Matrix Analysis. New York, NY: Cambridge University Press.Search in Google Scholar

Hunter, D. R. 2004. “MM Algorithms for Generalized Bradley-Terry Models.” The Annals of Statistics 32:384–406.10.1214/aos/1079120141Search in Google Scholar

Keener, J. P. 1993. “The Perron-Frobenius Theorem and the Ranking of Football Teams.” SIAM Review 35:80–93.10.1137/1035004Search in Google Scholar

Langville, A. N. and C. D. Meyer. 2003. “Deeper Inside PageRank.” Internet Mathematics 1:257–380.Search in Google Scholar

Langville, A. N. and C. D. Meyer. 2006. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ, USA: Princeton University Press.10.1515/9781400830329Search in Google Scholar

Langville, A. N. and C. D. Meyer. 2012. Who’s #1?: The Science of Rating and Ranking. Princeton, NJ, USA: Princeton University Press.10.1515/9781400841677Search in Google Scholar

Massey, K. 1997. “Statistical Models Applied to the Rating of Sports Teams.” Bachelor’s honors thesis, Bluefield College.Search in Google Scholar

Page, L., S. Brin, R. Motwani, and T. Winograd. 1999. “The Pagerank Citation Ranking: Bringing Order to the Web.” Technical Report 1999–66, Stanford InfoLab.Search in Google Scholar

Sports Reference, LLC. 2014. “Pro-Football-Reference.” Retrieved January 10, 2014 from http://www.pro-football-reference.com/.Search in Google Scholar

Thurstone, L. L. 1927. “A Law of Comparative Judgment.” Psychological Review 34:273–286.10.1037/h0070288Search in Google Scholar

Zermelo, E. 1929. “Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung.” Mathematische Zeitschrift 29:436–460.10.1007/BF01180541Search in Google Scholar

Published Online: 2014-3-27
Published in Print: 2014-6-1

©2014 by Walter de Gruyter Berlin/Boston