Route identification in the National Football League

An application of model-based curve clustering using the EM algorithm

Dani Chu 1 , Matthew Reyers 1 , James Thomson 1  and Lucas Yifan Wu 1
  • 1 Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
Dani Chu
  • Corresponding author
  • Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
  • Email
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Matthew Reyers, James Thomson and Lucas Yifan Wu


Tracking data in the National Football League (NFL) is a sequence of spatial-temporal measurements that varies in length depending on the duration of the play. In this paper, we demonstrate how model-based curve clustering of observed player trajectories can be used to identify the routes run by eligible receivers on offensive passing plays. We use a Bernstein polynomial basis function to represent cluster centers, and the Expectation Maximization algorithm to learn the route labels for each of the 33,967 routes run on the 6963 passing plays in the data set. With few assumptions and no pre-existing labels, we are able to closely recreate the standard route tree from our algorithm. We go on to suggest ideas for new potential receiver metrics that account for receiver deployment and movement common throughout the league. The resulting route labels can also be paired with film to enable streamlined queries of game film.

  • Aghabozorgi, S., A. S. Shirkhorshidi, and T. Y. Wah. 2015. “Time-series clustering – a decade review.” Information Systems 53:16–38. ISSN 0306-4379. URL

    • Crossref
    • Export Citation
  • Ajmeri, O. and A. Shah. 2012. “Using computer vision and machine learning to automatically classify nfl game film and develop a player tracking system.” In Proceedings of the 2012 MIT Sloan Sports Analytics Conference.

  • AlShaher, A. A. 2018. “Arabic character recognition using regression curves with the expectation maximization algorithm.” International Journal of Computer, Electrical, Automation, Control and Information Engineering 12(12):1087–1091. ISSN eISSN:1307-6892. URL

  • Baumer, B., S. Jensen, and G. Matthews. 2015. “Openwar: an open source system for evaluating overall player performance in major league baseball.” Journal of Quantitative Analysis in Sports 11(12):1–27.

  • Bernstein, S. N. 1911. “Démonstration du théorème de weierstrass fondée sur le calcul des probabilités.” Communications de la Société Mathématique de Kharkov 2 13(11):1–2.

  • Bouveyron, C. and J. Jacques. 2011. “Model-based clustering of time series in group-specific functional subspaces.” Advances in Data Analysis and Classification 5(4):281–300. ISSN 1862-5355. URL

    • Crossref
    • Export Citation
  • Broadie, M. 2011. “Assessing golfer performance on the pga tour.” Interfaces 42(2):146–165. 10.2307/41472743.

  • “Red chalk talk: route tree (3 of 4).” 2015. URL [Online; posted 30-August-2015].

  • Burke, B. 2019. “Deepqb: deep learning with player tracking to quantify quarterback decision-making and performance.” In Proceedings of the 2019 MIT Sloan Sports Analytics Conference.

  • Burris, K. 2019. “A trajectory planning algorithm for quantifying space ownership in professional football.” Accessed: 2019-09-05.

  • Chamroukhi, F. 2013. “Robust em algorithm for model-based curve clustering.” arXiv e-prints, art. arXiv:1312.7022.

  • Chu, D., M. Reyers, L. Wu, and J. Thomson. 2019. “Routes to success.” Accessed: 2019-09-05.

  • Csardi, G. and T. Nepusz. 2006. “The igraph software package for complex network research.” InterJournal, Complex Systems 1695(5):1–9. URL

  • Danon, L., A. Díaz-Guilera, J. Duch, and A. Arenas. 2005. “Comparing community structure identification.” Journal of Statistical Mechanics: Theory and Experiment 2005(09): P09008–P09008. URL

  • Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum likelihood from incomplete data via the em algorithm.” Journal of the Royal Statistical Society. Series B (Methodological) 39(1):1–38. ISSN 359246. URL

    • Crossref
    • Export Citation
  • Deshpande, S. and K. Evans. 2019. “Expected hypothetical completion probability.” Accessed: 2019-09-05.

  • Dong, J. J., L. Wang, J. Gill, and J. Cao. 2018. “Functional principal component analysis of glomerular filtration rate curves after kidney transplant.” Statistical Methods in Medical Research 27(12):3785–3796.

    • Crossref
    • PubMed
    • Export Citation
  • Faria, S. and G. Soromenho. 2010. “Fitting mixtures of linear regressions.” Journal of Statistical Computation and Simulation 80(2):201–225. URL

    • Crossref
    • Export Citation
  • Gaffney, S. 2004. Probabilistic Curve-Aligned Clustering and Prediction with Mixture Models. PhD thesis. 1. Irvine: University of California.

  • Ha, C. and L. Calestini. 2019. “Efficient speed usage and the impact of fatigue in speed performance: an exploratory study.” Accessed: 2019-09-05.

  • Hochstedler, J. 2016. “Finding the open receiver: a quantitative geospatial analysis of quarterback decision-making.” In Proceedings of the 2016 MIT Sloan Sports Analytics Conference.

  • Hochstedler, J. and P. T. Gagnon. 2017. “American football route identification using supervised machine learning.” In Proceedings of the 2017 MIT Sloan Sports Analytics Conference.

  • Horowitz, M., R. Yurko, and S. Ventura. 2018. nflscrapR: Compiling the NFL Play-by-Play API for easy use in R. URL R package version 1.8.1.

  • Leroy, A., A. Marc, O. Dupas, J. L. Rey, and S. Gey. 2018. “Functional data analysis in sport science: example of swimmers’ progression curves clustering.” Applied Sciences 8(10):1766. ISSN 2076-3417. URL

    • Crossref
    • Export Citation
  • McNicholas, P. D. and T. B. Murphy. 2010. “Model-based clustering of microarray expression data via latent gaussian mixture models.” Bioinformatics 26(21):2705–2712. ISSN 1367-4803. URL

    • Crossref
    • PubMed
    • Export Citation
  • Miller, A. C. and L. Bornn. 2017. “Possession sketches: mapping nba strategies.” In Proceedings of the 2017 MIT Sloan Sports Analytics Conference.

  • NBA. 2013. “NBA partners with stats llc for tracking technology.” [Online; posted Sep 5, 2013].

  • NFL. 2019. “NFL Next Gen Stats.” Accessed: 2019-04-23.

  • NHL. 2019. “NHL plans to deploy puck and player tracking technology next season.” Accessed: 2019-01-25.

  • Rossler, B. 2019. “Introducing targets above expectation.” URL

  • Soslow, J., J. Flancer, E. Dong, and A. Castle. 2019. “Using autoencoded receiver routes to optimize yardage.” Accessed: 2019-09-05.

  • Sterken, N. 2019. “Routenet: a convolutional neuralnetwork for classifying routes.” Accessed: 2019-09-05.

  • Stern, H. S. 1994. “A brownian motion model for the progress of sports scores.” Journal of the American Statistical Association 89(427):1128–1134. ISSN 1621459. URL

    • Crossref
    • Export Citation
  • Vonder, A. H. 2019. “Exploratory data analysis of passing plays using nfl tracking data.” Accessed: 2019-09-05.

  • Wickham, H. 2017. tidyverse: Easily Install and Load the ‘Tidyverse’. URL R package version 1.2.1.

  • Wu, P. and B. Gu. 2019. “Direct: a two-level system for defensive pass interference rooted in repeatability, enforceability, clarity, and transparency.” Accessed: 2019-09-05.

  • Yurko, R., S. Ventura, and M. Horowitz. 2019. “nflWAR: a reproducible method for offensive player evaluation in football.” Journal of Quantitative Analysis in Sports 15(3):163–183.

    • Crossref
    • Export Citation
Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Log in with your institution

Journal + Issues

JQAS, an official journal of the American Statistical Association, publishes research on the quantitative aspects of professional and collegiate sports. Articles deal with subjects as measurements of player performance, tournament structure, and the frequency and occurrence of records. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis.