Using high-resolution player tracking data made available by the National Football League (NFL) for their 2019 Big Data Bowl competition, we introduce the Expected Hypothetical Completion Probability (EHCP), a objective framework for evaluating plays. At the heart of EHCP is the question “on a given passing play, did the quarterback throw the pass to the receiver who was most likely to catch it?” To answer this question, we first built a Bayesian non-parametric catch probability model that automatically accounts for complex interactions between inputs like the receiver’s speed and distances to the ball and nearest defender. While building such a model is, in principle, straightforward, using it to reason about a hypothetical pass is challenging because many of the model inputs corresponding to a hypothetical are necessarily unobserved. To wit, it is impossible to observe how close an un-targeted receiver would be to his nearest defender had the pass been thrown to him instead of the receiver who was actually targeted. To overcome this fundamental difficulty, we propose imputing the unobservable inputs and averaging our model predictions across these imputations to derive EHCP. In this way, EHCP can track how the completion probability evolves for each receiver over the course of a play in a way that accounts for the uncertainty about missing inputs.
Carpenter, B., A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. 2017. “Stan: a probabilistic programing language.” Journal of Statistical Software 76(1):1–32.
Cervone, D., A. D’Amour, L. Bornn, and K. Goldsberry. 2016. “A multiresolution stochastic process model for predicting basketball possession outcomes.” Journal of the American Statistical Association 111(514):585–599.
Cervone, D., A. D’Amour, L. Bornn, and K. Goldsberry. 2016. “A multiresolution stochastic process model for predicting basketball possession outcomes.” Journal of the American Statistical Association 111(514):585–599.10.1080/01621459.2016.1141685)| false
Gelman, A., A. Jakulin, M. G. Pittau, and Y.-S. Su. 2008. “A weakly informative default prior distribution for logistic regression.” Annals of Applied Statistics 2(4):1360–1383.10.1214/08-AOAS191)| false
Linero, A. R. 2018. “Bayesian regression trees for high-dimensional prediction and variable selection.” Journal of the American Statistical Association 113(522):626–636.10.1080/01621459.2016.1264957)| false
JQAS, an official journal of the American Statistical Association, publishes research on the quantitative aspects of professional and collegiate sports. Articles deal with subjects as measurements of player performance, tournament structure, and the frequency and occurrence of records. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis.