Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter April 28, 2020

Extracting NFL tracking data from images to evaluate quarterbacks and pass defenses

  • Sarah Mallepalle , Ronald Yurko , Konstantinos Pelechrinis ORCID logo and Samuel L. Ventura EMAIL logo


The NFL collects detailed tracking data capturing the location of all players and the ball during each play. Although the raw form of this data is not publicly available, the NFL releases a set of aggregated statistics via their Next Gen Stats (NGS) platform. They also provide charts showing the locations of pass attempts and outcomes for individual quarterbacks. Our work aims to partially close the gap between what data is available privately (to NFL teams) and publicly, and our contribution is two-fold. First, we introduce an image processing tool designed specifically for extracting the raw data from the NGS pass charts. We extract the pass outcome, coordinates, and other metadata. Second, we analyze the resulting dataset, examining the spatial tendencies and performances of individual quarterbacks and defenses. We use a generalized additive model for completion percentages by field location. We introduce a naive Bayes approach for estimating the 2-D completion percentage surfaces of individual teams and quarterbacks, and we provide a one-number summary, completion percentage above expectation (CPAE), for evaluating quarterbacks and team defenses. We find that our pass location data closely matches the NFL’s tracking data, and that our CPAE metric closely matches the NFL’s proprietary CPAE metric.

A Data scraped from next gen stats

completionsnumber of completions thrown
touchdownsnumber of touchdowns thrown
attemptsnumber of passes thrown
interceptionsnumber of interceptions thrown
extraLargeImgURL of extra-large-sized image (1200 × 1200)
weekweek of game
gameId10-digit game identification number
seasonNFL season
firstNamefirst name of player
lastNamelast name of player
teamteam name of player
positionposition of player
seasonTyperegular (“reg”) or postseason (“post”)

B Example subset of data

2018020400PHIsuper-bowlNick FolesCOMPLETE−3.616.9postNEPHI2017
2018020400PHIsuper-bowlNick FolesCOMPLETE16.2−3.0postNEPHI2017
2018020400PHIsuper-bowlNick FolesCOMPLETE11.5−6.4postNEPHI2017
2018020400PHIsuper-bowlNick FolesTOUCHDOWN−8.55.7postNEPHI2017
2018020400PHIsuper-bowlNick FolesTOUCHDOWN−18.830.1postNEPHI2017
2018020400PHIsuper-bowlNick FolesTOUCHDOWN−19.341.2postNEPHI2017
2018020400PHIsuper-bowlNick FolesINTERCEPTION21.837.9postNEPHI2017
2018020400PHIsuper-bowlNick FolesINCOMPLETE5.17.9postNEPHI2017
2018020400PHIsuper-bowlNick FolesINCOMPLETE−12.939.6postNEPHI2017
2018020400PHIsuper-bowlNick FolesINCOMPLETE26.18.0postNEPHI2017


Table 2:

CPAE for 2017 and 2018 seasons for QBs with at least 100 passes in a season.

Drew Brees4.214396.14473
Ryan Fitzpatrick0.471123.42157
Nick Foles−3.641523.42229
Russell Wilson5.773093.39295
Matthew Ryan2.775243.22552
Carson Wentz0.073333.08313
Derek Carr0.273002.96429
Kirk Cousins−0.153942.53467
Derrick Watson2.531102.43492
Cameron Newton−1.183522.14392
Marcus Mariota0.954951.75275
Jared Goff−0.414281.7553
Ben Roethlisberger2.463941.29518
Patrick Mahomes1.27445
Philip Rivers0.344161.15560
Rayne Prescott−0.144081.11434
Jameis Winston2.552680.44295
Andrew Luck0.33559
Mitchell Trubisky−1.362620.27323
Ryan Tannehill0.08191
Brock Osweiler0.06163
John Stafford3.14384−0.04480
Aaron Rodgers−0.15573
Baker Mayfield−0.38269
Alexander Smith4.31418−0.88254
Tom Brady3.23524−0.89519
Elisha Manning−2.22369−1536
Sam Darnold−1.05289
Casey Keenum0.33382−1.38509
Joseph Flacco−0.23438−1.67367
Nicholas Mullens−1.87118
Andrew Dalton−1.25307−1.89195
Lamar Jackson−2.07112
Joshua Allen−3.44237
Casey Beathard−4.94185−4.37168
Joshua Rosen−4.54260
Jeffrey Driskel−4.83110
Robby Bortles−1.9399−5.04336

D Defense CPAE

Table 3:

Defensive CPAE for 2017 and 2018 seasons.

  1. Lower number represents better defense.


Arthur, David and Sergei Vassilvitskii. 2007. “K-means++: The Advantages of Careful Seeding.” Pp. 1027–1035, 9 Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms., New Orleans, Louisiana: Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, SODA ’07.Search in Google Scholar

Baumer, Benjamin, Shane Jensen, and Gregory Matthews. 2015. “openWAR: An Open Source System for Evaluating Overall Player Performance in Major League Baseball.” Journal of Quantitative Analysis in Sports 11(2): 69–84.10.1515/jqas-2014-0098Search in Google Scholar

Berri, David J. and John Charles Bradbury. 2010. “Working in the Land of the Metricians.” Journal of Sports Economics 11(1): 29–47. Los Angeles, CA: Sage Publications Sage CA.10.1177/1527002509354891Search in Google Scholar

Burke, Brian. 2019. “DeepQB: Deep Learning with Player Tracking to Quantify Quarterback Decision-Making & Performance”. 13th MIT Sloan Sports Analytics Conference.Search in Google Scholar

Casella, Paul. 2015. Statcast Primer: Baseball will Never be the Same. in Google Scholar

Cervone, Dan, Luke Bornn, and Kirk Goldsberry. 2016a. “NBA Court Realty.” 10th MIT Sloan Sports Analytics Conference.Search in Google Scholar

Cervone, Daniel, Alex D’Amour, Luke Bornn, and Kirk Goldsberry. 2016b. “A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes.” Journal of the American Statistical Association 111(514): 585–599. Taylor & Francis.10.1080/01621459.2016.1141685Search in Google Scholar

Elmore, Ryan and Peter DeWitt. 2017. ballr: Access to Current and Historical Basketball Data. R package version 0.1.1, in Google Scholar

Daley, D. J. and Vere-Jones, D. 2006. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. New York, NY, USA: Springer New York Inc., Springer Science & Business Media.Search in Google Scholar

Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. “A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” Pp. 226–231, 6 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96, arbitrary shape of clusters, clustering algorithms, efficiency on large spatial databases, handling nlj4-275oise, Portland, Oregon: AAAI Press. in Google Scholar

Fast, Mike. 2010. “What the Heck is PITCHf/x?” The Hardball Times Baseball Annual 2010. in Google Scholar

Fernández, Javier, F. C. Barcelona, Luke Bornn, and Dan Cervone. 2019. “Decomposing the Immeasurable Sport: A Deep Learning Expected Possession Value Framework for Soccer.” 13th Annual MIT Sloan Sports Analytics Conference.Search in Google Scholar

Franks, Alexander M., Alexander D’Amour, Daniel Cervone, and Luke Bornn. 2016. “Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics.” Journal of Quantitative Analysis in Sports 12(4): 151–165. De Gruyter.10.1515/jqas-2016-0098Search in Google Scholar

Friendly, Michael, Chris Dalzell, Martin Monkman, and Dennis Murphy. 2019. Lahman: Sean ’Lahman’ Baseball Database. R package version 7.0-1, in Google Scholar

Gudmundsson, Joachim and Horton, Michael. 2017. “Spatio-Temporal Analysis of Team Sports.” ACM Computing Surveys (CSUR). 50(2): 22. ACM.10.1145/3054132Search in Google Scholar

Hastie, Trevor J. and Robert J. Tibshirani. 1990. “Generalized Additive Models.” Monographs on Statistics and Applied Probability 43: 205–208. Chapman and Hall.10.21236/ADA147454Search in Google Scholar

Hernandez, T. J. 2019a. Most Predictable Running Back Stats (2019 Update). in Google Scholar

Hernandez, T. J. 2019b. Most Predictable Quarterback Stats (2019 Update). in Google Scholar

Horowitz, Maksim, Ron Yurko, and Samuel L. Ventura. 2017. nflscrapR: Compiling the NFL play-by-play API for easy use in R. R package version 1.4.0, in Google Scholar

Julia, S. Stiller and Michael J. Lopez. 2019. Meta-metrics to Quantify Properties of Quarterback Statistics. Cambridge, Massachusetts: Poster presented at the 2019 New England Symposium on Statistics in Sports, Harvard UniversitySearch in Google Scholar

Katz, Sharon and Brian Burke. 2017. How is Total QBR Calculated? We Explain our Quarterback Rating. in Google Scholar

Koschan, Andreas, and Mongi A. Abidi. 2008. Digital Color Image Processing. New York, NY, USA: Wiley-Interscience.10.1002/9780470230367Search in Google Scholar

Le, Hoang Minh, Yisong Yue, Peter A. Carr, and Patrick Lucey. 2017. “Coordinated Multi-Agent Imitation Learning” Proceedings of the 34th International Conference on International Conference on Machine Learning (ICML).Search in Google Scholar

Lowe, Zach. 2013. Lights, Cameras, Revolution. mar, January 24, 2018.Search in Google Scholar

Luke Benz. 2019. ncaahoopR: NCAA Men’s Basketball Play-By-Play Functionality., R package version 1.4.2,Search in Google Scholar

MacQueen, J. 1967. “Some methods for classification and analysis of multivariate observations.” Pp. 281–297 Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of California Press. in Google Scholar

NBA. 2013. NBA Partners with Stats LLC for Tracking Technology. in Google Scholar

NFL. 2019. NFL Operations: NFL Next Gen Stats. in Google Scholar

Pedersen, Eric, David Miller, Gavin Simpson, and Noam Ross. 2018. Hierarchical Generalized Additive Models: An Introduction with MGCV. doi: 10.7287/peerj.preprints.27320.10.7287/peerj.preprints.27320v1Search in Google Scholar

Power, Paul, Hector Ruiz, Xinyu Wei, and Patrick Lucey. 2017. “Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data.” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17.10.1145/3097983.3098051Search in Google Scholar

Schatz, Aaron. 2006. Methods To Our Madness. in Google Scholar

Seidl, Thomas, Aditya Cherukumudi, Andrew Hartnett, Peter Carr, and Patrick Lucey. 2018. “Bhostgusters: Realtime Interactive Play Sketching with Synthesized NBA Defenses.” 12th Annual MIT Sloan Sports Analytics Conference.Search in Google Scholar

Sievert, Carson. 2015. pitchRx: Tools for Harnessing ’MLBAM’ ’Gameday’ Data and Visualizing ’pitchfx’. R package version 1.8.2, in Google Scholar

Sievert, Carson and Brian M. Mills. 2017. “Handbook of Statistical Methods and Analyses in Sports.” Pp. 55–82 in Using publicly available baseball data to measure and evaluate pitching performance. Chapman and Hall/CRC.Search in Google Scholar

Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis.Search in Google Scholar

Szeliski, Richard. 2010. Computer Vision: Algorithms and Applications. Springer Science & Business Media.Search in Google Scholar

Thomas, A. C. and Samuel L. Ventura. 2013. nhlscrapr: Compiling the NHL Real Time Scoring System Database for Easy Use in R. R package version 1.8.1, in Google Scholar

Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer, ISBN 0-387-95457-0, in Google Scholar

Wood, Simon. 2019. Define tensor product smooths or tensor product interactions in GAM formulae. R package version 1.8-28, in Google Scholar

Wyshynski, Greg. 2019. Inside the Arrival of NHL Player Tracking, from Microchips to Megabets. in Google Scholar

Yurko, Ronald, Maksim Horowitz, and Samuel Ventura. 2019. “nflWAR: A Reproducible Method for Offensive Player Evaluation in Football.” Journal of Quantitative Analysis in Sports 15: 163–183.10.1515/jqas-2018-0010Search in Google Scholar

Published Online: 2020-04-28
Published in Print: 2020-06-25

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 23.2.2024 from
Scroll to top button