Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 30, 2013

Estimating player contribution in hockey with regularized logistic regression

  • Robert B. Gramacy EMAIL logo , Shane T. Jensen and Matt Taddy


We present a regularized logistic regression model for evaluating player contributions in hockey. The traditional metric for this purpose is the plus-minus statistic, which allocates a single unit of credit (for or against) to each player on the ice for a goal. However, plus-minus scores measure only the marginal effect of players, do not account for sample size, and provide a very noisy estimate of performance. We investigate a related regression problem: what does each player on the ice contribute, beyond aggregate team performance and other factors, to the odds that a given goal was scored by their team? Due to the large-p (number of players) and imbalanced design setting of hockey analysis, a major part of our contribution is a careful treatment of prior shrinkage in model estimation. We showcase two recently developed techniques – for posterior maximization or simulation – that make such analysis feasible. Each approach is accompanied with publicly available software and we include the simple commands used in our analysis. Our results show that most players do not stand out as measurably strong (positive or negative) contributors. This allows the stars to really shine, reveals diamonds in the rough overlooked by earlier analyses, and argues that some of the highest paid players in the league are not making contributions worth their expense.

Corresponding author: Robert B. Gramacy, Booth School of Business, The University of Chicago, 5807 S Woodlawn Ave, Chicago, IL 60637, USA, Tel.: +773-702-0739


A Estimation details and software

Although L1 penalty methods and their Bayesian interpretation have been known for some time, it is only recently that joint penalty–coefficient posterior computation and simulation methods have become available for datasets of the size encountered here.

Probably the most well-known publicly available library for L1 penalty inference in logistic regression is the glmnet package (Friedman, Hastie, and Tibshirani, 2010) for R. Conditional on a single shared value of λ, this implementation estimates a sparse set of coefficients. A convenient wrapper routine called cv.glmnet allows one to chose λ by cross-validation (CV). Unfortunately, CV works poorly in our setting of large, sparse, and imbalanced XP, where each model fit is relatively expensive and there is often little overlap between nonzero covariates in the training and validation sets. Moreover, when maximizing (rather than sampling from) the posterior, a single shared λ penalty leads to over-shrinkage of significant βj as penalty choice is dominated by a large number of spurious predictors. But use of CV to choose unique λj for each covariate would imply an impossibly large search.

Instead, we propose two approaches, both accompanied by publicly available software in packages for R: joint MAP inference with textir, and posterior simulation with reglogit.

A.1 Fast variable selection and MAP inference with textir

Taddy (2012a) proposes a gamma-lasso framework for MAP estimation in logistic regression, wherein coefficients and their independent L1 penalties are inferred under a conjugate gamma hyperprior. An efficient coordinate descent algorithm is derived, including conditions for global convergence, and the resulting estimation is shown in Taddy (2012a) as superior, in both predictive performance and computation time, to the more common strategy of CV lasso estimation under a single shared λ (as in glmnet). Results in this paper were all obtained using the publicly available textir package for R (Taddy 2012b), which uses the slam (Hornik, Meyer, and Buchta, 2011) package’s simple-triplet matrices to take advantage of design sparsity.

Prior specification in the gamma-lasso attaches independent gamma G9(λj;9s, r) hyperpriors on each L1 penalty, with E[λj] = s/r and var[λ] = s/r2, such that, for j =1…p,

Laplace priors are often motivated through estimation utility—the prior spike at zero corresponds to a preference for eliminating regressors from the model in absence of significant evidence. Our hyperprior is motivated by complementary considerations: for strong signals and large ∣βj∣, expected λj shrinks in the joint distribution to reduce estimation bias.

This leads to the joint negative log posterior minimization objective

where s, r>0. We have set σ=1 and r=1/2 throughout Section 3. In choosing s=E[λj]/2, we focus on the conditional prior standard deviation,

, for the coefficients. Hence our value of s=7.5, for E[λj=15, implies expected SD(βj)≈0.095. To put this in context, exp[3×0.095]≈1.33, implying that a single player increasing his team’s for-vs-against odds by 1/3 is three deviations away from the prior mean.

As an illustration of the implementation, the following snippets show commands to run our main team-player model (see ?mnlm for details). With X=cbind(XP,XT) as defined in Section 2.1, the list of 30 ridge and np gamma-lasso penalties are specified

and the model is then fit

where X is not normalized since this would up-weight players with little ice-time.

A.2 Full posterior inference via reglogit

Extending a well-known result by Holmes and Held (2006), Gramacy and Polson (2012) showed that three sets of latent variables could be employed to obtain sample from the full posterior distribution using a standard Gibbs strategy (Geman and Geman, 1984). The full conditionals required for the Gibbs sampler are given below for the L1 and λj=λ case (note that β includes α here for notational convenience).

Note that N+ indicates the normal distribution truncated to the positive real line,

and Ω=diag(ω1,…ωn). Holmes and Held (2006) give a rejection sampling algorithm for ωizi, however Gramacy and Polson (2012) argue that it is actually more efficient to draw ωiyi,λ (i.e., marginalizing over zi) as follows. A proposal
and ɛk∼Exp(1) may be accepted with probability min{1, Ai} where
. Larger K improves the approximation, although K=100 usually suffices. Everything extends to the L2 case upon fixing
. Extending to separate λj is future work.

We use the implementation of this Gibbs sampler provided in the reglogit package (Gramacy, 2012a) for R. For the λ prior we use the package defaults of a=2 and b=0.1 which are shared amongst several other fully Bayesian samplers for the ordinary linear regression context [e.g., blasso in the monomvn package (Gramacy, 2012b)]. Usage is similar to that described for mnlm. To obtain T samples from the posterior, simply call:

Estimating the full posterior distribution for β allows posterior means to be calculated, as well as component-wise variances and correlations between βj’s. However, unlike MAP estimates

, none of the samples or the overall mean is sparse. Gramacy and Polson (2012) show how their algorithm can be extended to calculate the MAP via simulation, but this only provides sparse estimators in the limit.

Another option is to use the posterior mean for λ obtained by Gibbs sampling on the entire covariate set, and use it as a guide in MAP estimation. Indeed, this is what is done in Section 3, where mean of our shared λ from Gibbs sampling was considered when setting priors for E9[λj].

B Extension to player-player interactions

We explore the possibility of on-ice synergies or mismatches between players by adding interaction terms into our model. The extension is easy to write down: simply add columns to the design matrix that are row-wise products of unique pairs of columns in the original XP. However, this implies substantial computational cost: see Gramacy and Polson (2012) for examples of regularized logistic regression estimators that work well for estimating main effects but break down in the presence of interaction terms.

There is a further representational problem in the context of our hockey application. There are about 27K unique player pairs observed in the data. Combining these interaction terms with the original np players, ng goals and thirty teams, we have a double-precision data storage requirement of nearly a gigabyte. While this is not a storage issue for modern desktops, it does lead to a computational bottleneck since temporary space required for the linear algebra routines cannot fit in fast memory. Fortunately, our original design matrix has a high degree of sparsity, which means that the interaction-expanded design matrix is even more sparse, and the sparse capabilities of textir makes computation feasible. Inference on the fully interaction-expanded design takes about 20 s on an Apple Mac Desktop. All other methods we tried, including reglogit, failed in initialization stages.

While the computational feat is impressive, our results indicate little evidence of significant player interaction effects. Figure 9 shows results for estimation with E9[λ=15]: only four non-zero interactions are found when augmenting the team-player model to include player-player interaction terms.6 Importantly, there is a negligible effect of including these interactions on the individual player effect estimates (which are our primary interest). The only player with a visually detectable connecting line is Joe Thornton, and to see it you may need a magnifying glass. We guess from the neighboring interaction term that his ability is enhanced when Patrick Marleau is on the ice. The most interesting result from this analysis involves the pairing of David Krejci and Dennis Wideman, which has a large negative interaction. One could caution Bruins coach Claude Julien against having this pair of players on the ice at the same time.

Figure 9 Comparing (non-zero) main effects for team-player model to their values in the interaction-expanded team-player model (dots) The lines point to the unexpanded estimates, and the x-axis is ordered by the dots.
Figure 9

Comparing (non-zero) main effects for team-player model to their values in the interaction-expanded team-player model (dots) The lines point to the unexpanded estimates, and the x-axis is ordered by the dots.

Using reglogit to obtain samples from the full posterior distribution of the interaction-expanded model is not feasible, but we can still glean some insight by sampling from the posterior distribution for the simpler model with the original team-player design augmented to include the four significant interactions found from our initial MAP analysis. Figure 10 compares pairwise abilities for the same three players used as examples in Figure 6. We observe minimal changes to the estimates obtained under the original team-player design, echoing the results from the MAP analysis. In total, we regard the ability to entertain interactions as an attractive feature of our methodology, but it does not change how we view the relative abilities of players.

Figure 10 Comparing the ability of Datsyuk, Roloson, and Marchant to the 60-odd other players with non-zero coefficients in the team-player model, showing coefficients under the interaction-expanded model as well.
Figure 10

Comparing the ability of Datsyuk, Roloson, and Marchant to the 60-odd other players with non-zero coefficients in the team-player model, showing coefficients under the interaction-expanded model as well.

Many thanks to Samuel Ventura and Andrew C. Thomas for supplying a cleaned version of the data. We are grateful to the editor, an associate editor, and one referee whose comments on an early version of the manuscript led to many improvements.

  1. 1Note that we include goalies in our analysis.

  2. 2Fitted in R using the command fit>-glm(goals~XP, family=“binomial”).

  3. 3We used forward step-wise regression with the Bayes information criterion (BIC).

  4. 4This is the lowest possible budget from which lines can be formed satisfying (4).

  5. 5Sweater sales is another matter.

  6. 6We omitted goalie-skater and goalie-goalie interaction terms.


Awad, T. 2009. “Numbers On Ice: Fixing Plus/Minus.” Hockey Prospectus. See www.puckprospectus.comSearch in Google Scholar

Friedman, J. H., T. Hastie, and R. Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33(1): 1–22.10.18637/jss.v033.i01Search in Google Scholar

Geman, S. and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.” IEEE Transaction on Pattern Analysis and Machine Intelligence 6: 721–741.10.1109/TPAMI.1984.4767596Search in Google Scholar

Gramacy, R. 2012a. Reglogit: Simulation-based Regularized Logistic Regression. R package version in Google Scholar

Gramacy, R. B. 2012b. Monomvn: Estimation for multivariate normal and Student-t data with monotone missingness. R package version 1.8-9.Search in Google Scholar

Gramacy, R. and N. Polson. 2012. “Simulation-Based Regularized Logistic Regression.” Bayesian Analysis 7: 1–24.10.1214/12-BA719Search in Google Scholar

Hoerl, A. E. and R. W. Kennard. (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12: 55–67.Search in Google Scholar

Holmes, C. and K. Held. 2006. “Bayesian Auxilliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis 1(1): 145–168.Search in Google Scholar

Hornik, K., D. Meyer, and C. Buchta. (2011). slam: Sparse Lightweight Arrays and Matrices. R package version 0.1-23.Search in Google Scholar

Ilardi, S. and A. Barzilai. 2004. “Adjusted Plus-Minus Ratings: New and Improved for 2007–2008.” in Google Scholar

Macdonald, B. 2010. “A Regression-based Adjusted Plus-Minus Statistic for NHL Players.” Tech. rep., arXiv: 1006.4310.Search in Google Scholar

Rosenbaum, D. T. 2004. “Measuring How NBA Players Help Their Teams Win.” in Google Scholar

Schuckers, M. E., D. F. Lock, C. Wells, C. J. Knickerbocker, and R. H. Lock. 2010. “National Hockey League Skater Ratings Based upon All On-Ice Events: An Adjusted Minus/Plus Probability (AMPP) Approach.” Tech. rep., St. Lawrence University.Search in Google Scholar

R Development Core Team 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Search in Google Scholar

Taddy, M. 2012a. textir: Inverse Regression for Text. R package version 1.8-6.Search in Google Scholar

Taddy, M. 2012b. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association, accepted for publication.Search in Google Scholar

Thomas, A. C., S. L. Ventura, S. Jensen, and S. Ma. 2012. “Competing Process Hazard Function Models for Player Ratings in Ice Hockey.” Tech. rep., ArXiv:1208.0799.Search in Google Scholar

Tibshirani, R. 1996. “Regression shrinkage and selection via the lasso.” J. R. Statist. Soc. B, 58: 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Vollman, R. 2010. “Howe and Why: Ten Ways to Measure Defensive Contributions.” Hockey Prospectus.Search in Google Scholar

Published Online: 2013-03-30

©2013 by Walter de Gruyter Berlin Boston

Downloaded on 7.12.2023 from
Scroll to top button