Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 27, 2014

Using random forests to estimate win probability before each play of an NFL game

  • Dennis Lock EMAIL logo and Dan Nettleton


Before any play of a National Football League (NFL) game, the probability that a given team will win depends on many situational variables (such as time remaining, yards to go for a first down, field position and current score) as well as the relative quality of the two teams as quantified by the Las Vegas point spread. We use a random forest method to combine pre-play variables to estimate Win Probability (WP) before any play of an NFL game. When a subset of NFL play-by-play data for the 12 seasons from 2001 to 2012 is used as a training dataset, our method provides WP estimates that resemble true win probability and accurately predict game outcomes, especially in the later stages of games. In addition to being intrinsically interesting in real time to observers of an NFL football game, our WP estimates can provide useful evaluations of plays and, in some cases, coaching decisions.

Corresponding author: Dennis Lock, Department of Statistics, Iowa State University, Ames, IA 50011, USA, e-mail:


Breiman, L. 2001a. “Random forests.” Machine Learning 45:5–32.10.1023/A:1010933404324Search in Google Scholar

Breiman, L. 2001b. “Statistical modeling: the two cultures.” Statistical Science 16(3):199–231.10.1214/ss/1009213726Search in Google Scholar

Buttrey, S. E., A. R. Washburn, and W. L. Price. 2011. “Estimating NHL scoring rates.” Journal of Quantitative Analysis in Sports 7(3):1–18.10.2202/1559-0410.1334Search in Google Scholar

Chandler, G. and G. Stevens. 2012. “An exploratory study of minor league baseball statistics.” Journal of Quantitative Analysis in Sports 8(4):1–28.10.1515/1559-0410.1445Search in Google Scholar

Cutler, D. R., T. C. Edwards, Jr., K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, and J. J. Lawler. 2007. “Random forests for classification in ecology.” Ecology 88(11):2783–2792.10.1890/07-0539.1Search in Google Scholar PubMed

Diaz-Uriarte, R. and S. A. de Andres. 2006. “Gene selection and classification of microarray data using random forest.” Bioinformatics 7(3):1–13.10.1186/1471-2105-7-3Search in Google Scholar PubMed PubMed Central

Freiman, M. H. 2010. “Using random forests and simulated annealing to predict probabilities of election to the baseball hall of fame.” Journal of Quantitative Analysis in Sports 6(2):1–35.10.2202/1559-0410.1245Search in Google Scholar

Fry, M. J. and F. A. Shukairy. 2012. “Searching for momentum in the NFL.” Journal of Quantitative Analysis in Sports 8(1):1–20.10.1515/1559-0410.1362Search in Google Scholar

Genuer, R., J. Poggi, and C. Tuleau. 2008. “Random forests: some methodological insights.” Research report INRIA Saclay, RR-6729.Search in Google Scholar

Genuer, R., J. Poggi, and C. Tuleau-Malot. 2010. “Variable selection using random forests.” Pattern Recognition Letters 31(14):2225–2236.10.1016/j.patrec.2010.03.014Search in Google Scholar

Hucaljuk, J. and A. Rakipovic. 2011. “Predicting football scores using machine learning techniques.” MIPRO, 2011 Proceedings of the 34th International Convention, 1623–1627.Search in Google Scholar

Johnson, A. W., A. J. Stimpson, and T. K. Clark. 2012. “Turning the tide: big plays and psychological momentum in the NFL.” MIT Sloan Sports Analytics Conference 2012.Search in Google Scholar

Liaw, A. and M. Wiener. 2002. “Classification and regression by randomForest.” R News 2(3):2225–2236.Search in Google Scholar

Lindsey, G. R. 1961. “The progress of the score during a baseball game.” Journal of the American Statistical Association 56:703–728.10.1080/01621459.1961.10480656Search in Google Scholar

Lin, Y. and Y. Jeon. 2006. “Random forests and adaptive nearest neighbors.” Journal of the American Statistical Association 101:578–590.10.1198/016214505000001230Search in Google Scholar

Mills, B. M. and S. Salaga. 2011. “Using tree ensembles to analyze National Baseball Hall of Fame voting patterns: an application to discrimination in BBWAA voting.” Journal of Quantitative Analysis in Sports 7(4):1–32.10.2202/1559-0410.1367Search in Google Scholar

Schwartz, A. 2004. The numbers game, New York: Thomas Dunne Books.Search in Google Scholar

Stern, H. 1994. “A brownian motion model for the progress of sports scores.” Journal of the American Statistical Association 89:1128–1134.10.1080/01621459.1994.10476851Search in Google Scholar

Svetnik, V., A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston. 2003. “Random forest: a classification and regression tool for compound classification and QSAR modeling.” Journal of chemical information and computer sciences 43(6):1947–1958.10.1021/ci034160gSearch in Google Scholar PubMed

Tango, T., M. Lichtman, A. Dolphin, and P. Palmer. 2006. The Book: Playing the Percentages in Baseball, New York: TMA Press.Search in Google Scholar

Xu, R., D. Nettleton, and D. J. Nordman. 2014. “Predictor augmentation in random forests.” Statistics and Its Interface Accepted.10.4310/SII.2014.v7.n2.a3Search in Google Scholar

Published Online: 2014-3-27
Published in Print: 2014-6-1

©2014 by Walter de Gruyter Berlin/Boston

Downloaded on 7.6.2023 from
Scroll to top button