Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Quantitative Analysis in Sports

An official journal of the American Statistical Association

Editor-in-Chief: Steve Rigdon, PhD

CiteScore 2017: 0.67

SCImago Journal Rank (SJR) 2017: 0.290
Source Normalized Impact per Paper (SNIP) 2017: 0.853

See all formats and pricing
More options …
Volume 9, Issue 1


Volume 1 (2005)

Spain retains its title and sets a new record – generalized linear mixed models on European football championships

Andreas Groll / Jasmin Abedieh
Published Online: 2013-03-30 | DOI: https://doi.org/10.1515/jqas-2012-0046


Nowadays many approaches that analyze and predict the results of football matches are based on bookmakers’ ratings. It is commonly accepted that the models used by the bookmakers contain a lot of expertise as the bookmakers’ profits and losses depend on the performance of their models. One objective of this article is to analyze the role of bookmakers’ odds together with many additional, potentially influental covariates with respect to a national team’s success at European football championships and especially to detect covariates, which are able to explain parts of the information covered by the odds. Therefore a pairwise Poisson model for the number of goals scored by national teams competing in European football championship matches is used. Moreover, the generalized linear mixed model (GLMM) approach, which is a widely used tool for modeling cluster data, allows to incorporate team-specific random effects. Two different approaches to the fitting of GLMMs incorporating variable selection are used, subset selection as well as a Lasso-type technique, including an L1-penalty term that enforces variable selection and shrinkage simultaneously. Based on the two preceeding European football championships a sparse model is obtained that is used to predict all matches of the current tournament resulting in a possible course of the European football championship (EURO) 2012.

Keywords: EURO 2012; football; generalized linear mixed model; Lasso; sports tournaments; variable selection


  • Akaike, H. 1973. “Information Theory and the Extension of the Maximum Likelihood Principle,” Second International Symposium on Information Theory 267–281.Google Scholar

  • Bates, D. and M. Maechler. 2010. lme4: Linear Mixed-Effects Models Using S4 classes. R package version 0.999375–34.Google Scholar

  • Bernard, A. B. and M. R. Busse. 2004. “Who Wins the Olympic Games: Economic Developement and Medal Totals.” The Review of Economics and Statistics 86(1):413–417.Google Scholar

  • Breslow, N. E. and D. G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Model.” Journal of the American Statistical Association 88:9–25.Google Scholar

  • Breslow, N. E. and X. Lin. 1995. “Bias Correction in Generalized Linear Mixed Models With a Single Component of Dispersion,” Biometrika 82:81–91.CrossrefGoogle Scholar

  • Broström, G. 2009. glmmML: Generalized Linear Models With Clustering. R package version 0.81–6.Google Scholar

  • Brown, T. D., J. L. V. Raalte, B. W. Brewer, C. R. Winter, A. E. Cornelius, and M. B. Andersen. 2002. “World Cup Soccer Home Advantage.” Journal of Sport Behavior 25:134–144.Google Scholar

  • Carlin, J. B., L. C. Gurrin, J. A. C. Sterne, R. Morley, and T. Dwyer. 2005. “Regression Models for Twin Studies: A Critical Review.” International Journal of Epidemiology 34:1089–1099.CrossrefGoogle Scholar

  • Clarke, S. R. and J. M. Norman. 1995. “Home Ground Advantage of Individual Clubs in English Soccer.” The Statistician 44:509–521.CrossrefGoogle Scholar

  • Dawson, P. and S. Dobson. 2010. “The Influence of Social Pressure and Nationality on Individual Decisions. Evidence From the Behaviour of Referees.” Journal of Economic Psychology 31:181–191.CrossrefGoogle Scholar

  • Dyte, D. and S. R. Clarke. 2000. “A Ratings Based Poisson Model for World Cup Soccer Simulation.” Journal of the Operational Research Society 51(8):993–998.Google Scholar

  • Eugster, M. J. A., J. Gertheiss, and S. Kaiser. 2011. “Having the Second Leg at Home-Advantage in the UEFA Champions League Knockout Phase?” Journal of Quantitative Analysis in Sports 7(1).Google Scholar

  • Fahrmeir, L. and G. Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models (2nd ed.). New York: Springer-Verlag.Google Scholar

  • Frohwein, T. 2010, June. Die falschen Pferde. In: e-politik.de (08.06.2010), available at: http://www.e-politik.de/lesen/artikel/2010/die-falschen-pferde/(12.06.2012).

  • Gerhards, J., M. Mutz, and G. G. Wagner 2012. “Keiner Kommt an Spanien Vorbei-auβer dem Zufall.” DIW-Wochenbericht 24:14–20.Google Scholar

  • Gerhards, J. and G. G. Wagner. 2008. “Market Value Versus Accident-Who Becomes European Soccer Champion?” DIW-Wochenbericht 24:236–328.Google Scholar

  • Gerhards, J. and G. G. Wagner. 2010. “Money and a Little Bit of Chance: Spain Was Odds-On Favourite of the Football Worldcup.” DIW- Wochenbericht 29:12–15.Google Scholar

  • Goeman, J. J. 2010. “L1 Penalized Estimation in the Cox Proportional Hazards Model.” Biometrical Journal 52:70–84.Google Scholar

  • Groll, A. 2011a. glmmLasso: Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation. R package version 1.1.0.Google Scholar

  • Groll, A. 2011b. Variable Selection by Regularization Methods for Generalized Mixed Models. Ph.D. thesis, University of Munich, Göttingen. Cuvillier Verlag.Google Scholar

  • Groll, A. and G. Tutz. 2012. “Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation.” Statistics and Computing. DOI: 10.1007/s11222-012-9359-z.CrossrefGoogle Scholar

  • Leitner, C., A. Zeileis, and K. Hornik. 2008. “Who is Going to Win the EURO 2008? (A statistical investigation of bookmakers odds).” Research report series, Department of Statistics and Mathematics, University of Vienna.Google Scholar

  • Leitner, C., A. Zeileis, and K. Hornik. 2010a. “Forecasting Sports Tournaments by Ratings of (Prob)abilities: A Comparison for the EURO 2008.” International Journal of Forecasting 26(3):471–481.CrossrefGoogle Scholar

  • Leitner, C., A. Zeileis, and K. Hornik. 2010b. “Forecasting the Winner of the FIFA World Cup 2010. Research Report Series.” Department of Statistics and Mathematics, University of Vienna.Google Scholar

  • Leitner, C., A. Zeileis, and K. Hornik. 2011. “Bookmaker Concensus and Agreement for the UEFA Champions League 2008/09.” IMA Journal of Management Mathematics 22(2):183–194.Google Scholar

  • Lin, X. and N. E. Breslow. 1996. “Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion.” Journal of the American Statistical Association 91:1007–1016.Google Scholar

  • Nevill, A., N. Balmer, and M. Williams. 1999. “Crowd Influence on Decisions in Association Football.” The Lancet 353 (9162), 1416.Google Scholar

  • Pinheiro, J. C. and D. M. Bates 2000. Mixed-Effects Models in S and S-Plus. New York: Springer.Google Scholar

  • Pollard, R. 2008. “Home Advantage in Football: A Current Review of an Unsolved Puzzle.” The Open Sports Sciences Journal 1:12–14.Google Scholar

  • Pollard, R. and G. Pollard. 2005 “Home Advantage in Soccer: A Review of its Existence and Causes.” International Journal of Soccer and Science Journal 3(1):25–33.Google Scholar

  • Schelldorfer, J. and P. Bühlmann. 2011. “GLMMLasso: An algorithm for High-Dimensional Generalized Linear Mixed Models Using L1-Penalization. Preprint, ETH Zurich. http://stat.ethz.ch/people/schell.

  • Schwarz, G. 1978. “Estimating the Dimension of a Model,” Annals of Statistics 6:461–464.CrossrefGoogle Scholar

  • Stoy, V., R. Frankenberger, D. Buhr, L. Haug, B. Springer, and J. Schmid. 2010. “Das Ganze ist Mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft 2010.” Working Paper 46, Eberhard Karls University, Tübingen, Germany.Google Scholar

  • Tibshirani, R. 1996. “Regression Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society B 58:267–288.Google Scholar

  • Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S (4th ed.). New York: Springer.Google Scholar

  • Yang, H. 2007. Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis. Ph.D. thesis, North Carolina State University.Google Scholar

  • Zeileis, A., C. Leitner, and K. Hornik. 2012. History repeating: Spain beats Germany in the EURO 2012 final. Working paper, Faculty of Economics and Statistics, University of Innsbruck.Google Scholar

About the article

Corresponding author: Andreas Groll, Department of Mathematics LMU Munich

Published Online: 2013-03-30

The German state betting agency ODDSET ranked Spain in third place among the favorites for the EURO 2008 with odds of 6.50 (usually, in statistics odds represent the ratio of the probability that an event will happen to the probability that it will not happen; however, European bookmakers specify the gross ratio which represents the ratio of paid amount to stake. So putting €1 on Spain as the EURO 2008 champion would have given back €6.50. Thus, European odds can be directly transformed into probabilities by taking the inverse and adjusting for the bookmakers’ margins) behind Germany (4.50) and Italy (5.50). Before the FIFA World Cup 2010 Spain was ranked in first place among the favorites with odds of 5.00 together with Brazil.

The German state betting agency ODDSET ranked Greece in 12th place among the favorites for the EURO 2004 with odds of 45.00.

Although this represents a quite small basis of data, we abstain from using earlier European championships, as one of our main objects is to analyze the explanatory power of bookmakers’ odds together with many additional, potentially influental covariates. Unfortunately, the possibility of betting on the overall cup winner before the start of the tournament is quite novel. The German state betting agency ODDSET e.g. offered the bet for the first time at the EURO 2004.

There are countless examples in history for such events, throughout all competitions. We want to mention only some of the most famous ones: Germany’s first World Cup success in Switzerland 1954, known as the “miracle from Bern”; Greece’s victory at the EURO 2004 (compare footnote 1); FC Porto’s triumph in the UEFA CL season 2003/2004.

The GDP per capita is the gross domestic product divided by midyear population. The GDP is the sum of gross values added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products.

We had to resort to different sources in order to collect data for all participating countries at the EURO 2004, 2008 and 2012. Amongst the most useful ones are http://www.wko.at, http://www.statista.com/ and http://epp.eurostat.ec.europa.eu. For some years the populations of Russia and Ukraine had to be searched individually.

Unfortunately, the archive of the webpage was established not until 4th October 2004, so the average market values of the national teams that we used for the EURO 2004 can only be seen as a rough approximation, as market values certainly changed after the EURO 2004.

Note that European national teams also gain UEFA team points. For each game played in the most recently completed full cycle (a full cycle is defined as all qualifying games and final tournament games, whereas a half cycle is defined as all games played in the latest qualifying round only) of both the latest FIFA World Cup and European championship, with addition of points for each game played at the latest completed half cycle. Similar to the FIFA points a time-dependent weight-adjustment is used, allocating to both the latest full and half cycle double the weight as to the older full cycle. Thus, the UEFA team points would reflect a lot of information about the current strength of a national team in a European-wide comparison, but as the UEFA changed the coefficient ranking system in 2008, we focused on the UEFA club ranking.

Note that this variable is not available by any soccer data provider and thus had to be counted “by hand.”

The two variables “Maximum number of teammates” and “second maximum number of teammates” are highly (negatively) correlated with the number of different clubs, where the players are under contract, and hence also include information about the structure of the teams’ squads. Therefore, we did not consider the number of different clubs as a separate variable.

This variable is available on several soccer data providers, see for example http://www.kicker.de/.

As we are in a matched-pair design, we do not exclude single observations from the training data, but single matches.

A closer look on the coefficient paths of this model shows that for sligthly smaller values of the tuning parameter than the selected one, the variables ODDSET odds and fairness would have been included. Besides, in most of the training data sets both ODDSET odds and fairness have been included at the optimal tuning parameter.

In comparison to Model 2, for glmmLasso based on LOOCV now several variables are not selected anymore, when the variable fairness (V2) is excluded. This may be due to the considerable correlations between the fairness and these variables, e.g. corV2,V10=−0.29, corV2,V11=−0.16 and corV2,V12=−0.16 (see Table 6 in Appendix A).

Three-way odds consider only the tendency of a match with the possible results winning of Team 1, draw or defeat of Team 1 and are usually fixed some days before the corresponding match takes place.

The transformed probabilities only serve as an approximation, based on the assumption that the bookmakers’ margins follow a discrete uniform distribution on the three possible match tendencies.

For convenience we suppress the index t for both teams here, which indicates the number of the game for a team, as well as the indices j and

corresponding to the match-specific random effects. As the match under consideration could have a different number in the individual match numbering of each team, one should correctly write
if Team k and Team l are facing each other in a certain match j, where the superscript indicates that the estimate is depends on the opponent’s covariates.

Similar to footnote 3, in the following we suppress both the indices j and

corresponding to the match-specific random effects and the index for the match numbering as well as the superscripts for both teams, in order to keep the notation simple. Note here that for the two teams of Ireland and Ukraine that did not qualify for either EURO 2004 or 2008 no random effects estimates exist and thus their random effects are set to zero. Besides, it has to be mentioned that the match-specific random effects estimates cannot be used for the prediction of new matches.

Citation Information: Journal of Quantitative Analysis in Sports, Volume 9, Issue 1, Pages 51–66, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388, DOI: https://doi.org/10.1515/jqas-2012-0046.

Export Citation

©2013 by Walter de Gruyter Berlin Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Gunther Schauberger and Andreas Groll
Statistical Modelling, 2018, Page 1471082X1879993
Leonardo Egidi, Francesco Pauli, and Nicola Torelli
Statistical Modelling, 2018, Page 1471082X1879841
Jan Lasek and Marek Gagolewski
Statistical Modelling, 2018, Page 1471082X1879842
Andreas Groll, Thomas Kneib, Andreas Mayr, and Gunther Schauberger
Journal of Quantitative Analysis in Sports, 2018, Volume 14, Number 2, Page 65

Comments (0)

Please log in or register to comment.
Log in