Abstract
Before any play of a National Football League (NFL) game, the probability that a given team will win depends on many situational variables (such as time remaining, yards to go for a first down, field position and current score) as well as the relative quality of the two teams as quantified by the Las Vegas point spread. We use a random forest method to combine pre-play variables to estimate Win Probability (WP) before any play of an NFL game. When a subset of NFL play-by-play data for the 12 seasons from 2001 to 2012 is used as a training dataset, our method provides WP estimates that resemble true win probability and accurately predict game outcomes, especially in the later stages of games. In addition to being intrinsically interesting in real time to observers of an NFL football game, our WP estimates can provide useful evaluations of plays and, in some cases, coaching decisions.
References
Breiman, L. 2001a. “Random forests.” Machine Learning 45:5–32.10.1023/A:1010933404324Search in Google Scholar
Breiman, L. 2001b. “Statistical modeling: the two cultures.” Statistical Science 16(3):199–231.10.1214/ss/1009213726Search in Google Scholar
Buttrey, S. E., A. R. Washburn, and W. L. Price. 2011. “Estimating NHL scoring rates.” Journal of Quantitative Analysis in Sports 7(3):1–18.10.2202/1559-0410.1334Search in Google Scholar
Chandler, G. and G. Stevens. 2012. “An exploratory study of minor league baseball statistics.” Journal of Quantitative Analysis in Sports 8(4):1–28.10.1515/1559-0410.1445Search in Google Scholar
Cutler, D. R., T. C. Edwards, Jr., K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, and J. J. Lawler. 2007. “Random forests for classification in ecology.” Ecology 88(11):2783–2792.10.1890/07-0539.1Search in Google Scholar PubMed
Diaz-Uriarte, R. and S. A. de Andres. 2006. “Gene selection and classification of microarray data using random forest.” Bioinformatics 7(3):1–13.10.1186/1471-2105-7-3Search in Google Scholar PubMed PubMed Central
Freiman, M. H. 2010. “Using random forests and simulated annealing to predict probabilities of election to the baseball hall of fame.” Journal of Quantitative Analysis in Sports 6(2):1–35.10.2202/1559-0410.1245Search in Google Scholar
Fry, M. J. and F. A. Shukairy. 2012. “Searching for momentum in the NFL.” Journal of Quantitative Analysis in Sports 8(1):1–20.10.1515/1559-0410.1362Search in Google Scholar
Genuer, R., J. Poggi, and C. Tuleau. 2008. “Random forests: some methodological insights.” Research report INRIA Saclay, RR-6729.Search in Google Scholar
Genuer, R., J. Poggi, and C. Tuleau-Malot. 2010. “Variable selection using random forests.” Pattern Recognition Letters 31(14):2225–2236.10.1016/j.patrec.2010.03.014Search in Google Scholar
Hucaljuk, J. and A. Rakipovic. 2011. “Predicting football scores using machine learning techniques.” MIPRO, 2011 Proceedings of the 34th International Convention, 1623–1627.Search in Google Scholar
Johnson, A. W., A. J. Stimpson, and T. K. Clark. 2012. “Turning the tide: big plays and psychological momentum in the NFL.” MIT Sloan Sports Analytics Conference 2012.Search in Google Scholar
Liaw, A. and M. Wiener. 2002. “Classification and regression by randomForest.” R News 2(3):2225–2236.Search in Google Scholar
Lindsey, G. R. 1961. “The progress of the score during a baseball game.” Journal of the American Statistical Association 56:703–728.10.1080/01621459.1961.10480656Search in Google Scholar
Lin, Y. and Y. Jeon. 2006. “Random forests and adaptive nearest neighbors.” Journal of the American Statistical Association 101:578–590.10.1198/016214505000001230Search in Google Scholar
Mills, B. M. and S. Salaga. 2011. “Using tree ensembles to analyze National Baseball Hall of Fame voting patterns: an application to discrimination in BBWAA voting.” Journal of Quantitative Analysis in Sports 7(4):1–32.10.2202/1559-0410.1367Search in Google Scholar
Schwartz, A. 2004. The numbers game, New York: Thomas Dunne Books.Search in Google Scholar
Stern, H. 1994. “A brownian motion model for the progress of sports scores.” Journal of the American Statistical Association 89:1128–1134.10.1080/01621459.1994.10476851Search in Google Scholar
Svetnik, V., A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston. 2003. “Random forest: a classification and regression tool for compound classification and QSAR modeling.” Journal of chemical information and computer sciences 43(6):1947–1958.10.1021/ci034160gSearch in Google Scholar PubMed
Tango, T., M. Lichtman, A. Dolphin, and P. Palmer. 2006. The Book: Playing the Percentages in Baseball, New York: TMA Press.Search in Google Scholar
Xu, R., D. Nettleton, and D. J. Nordman. 2014. “Predictor augmentation in random forests.” Statistics and Its Interface Accepted.10.4310/SII.2014.v7.n2.a3Search in Google Scholar
©2014 by Walter de Gruyter Berlin/Boston