Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 30, 2013

Spain retains its title and sets a new record – generalized linear mixed models on European football championships

  • Andreas Groll EMAIL logo and Jasmin Abedieh


Nowadays many approaches that analyze and predict the results of football matches are based on bookmakers’ ratings. It is commonly accepted that the models used by the bookmakers contain a lot of expertise as the bookmakers’ profits and losses depend on the performance of their models. One objective of this article is to analyze the role of bookmakers’ odds together with many additional, potentially influental covariates with respect to a national team’s success at European football championships and especially to detect covariates, which are able to explain parts of the information covered by the odds. Therefore a pairwise Poisson model for the number of goals scored by national teams competing in European football championship matches is used. Moreover, the generalized linear mixed model (GLMM) approach, which is a widely used tool for modeling cluster data, allows to incorporate team-specific random effects. Two different approaches to the fitting of GLMMs incorporating variable selection are used, subset selection as well as a Lasso-type technique, including an L1-penalty term that enforces variable selection and shrinkage simultaneously. Based on the two preceeding European football championships a sparse model is obtained that is used to predict all matches of the current tournament resulting in a possible course of the European football championship (EURO) 2012.

Corresponding author: Andreas Groll, Department of Mathematics LMU Munich


A Correlation structure of the EURO 2004 and 2008 data

Table 6

Correlation matrix of the considered metric variables for the EURO 2004 and 2008.


B Alternative predictions of the EURO 2012

Figure 4 Estimated results of the knockout stage for the EURO 2012 using prediction method (b).
Figure 4

Estimated results of the knockout stage for the EURO 2012 using prediction method (b).

Table 7

Estimated group stage results together with final group standings for the EURO 2012 using prediction method (b).

Table 7 Estimated group stage results together with final group standings for the EURO 2012 using prediction method (b).
  1. 1

    The German state betting agency ODDSET ranked Spain in third place among the favorites for the EURO 2008 with odds of 6.50 (usually, in statistics odds represent the ratio of the probability that an event will happen to the probability that it will not happen; however, European bookmakers specify the gross ratio which represents the ratio of paid amount to stake. So putting €1 on Spain as the EURO 2008 champion would have given back €6.50. Thus, European odds can be directly transformed into probabilities by taking the inverse and adjusting for the bookmakers’ margins) behind Germany (4.50) and Italy (5.50). Before the FIFA World Cup 2010 Spain was ranked in first place among the favorites with odds of 5.00 together with Brazil.

  2. 2

    The German state betting agency ODDSET ranked Greece in 12th place among the favorites for the EURO 2004 with odds of 45.00.

  3. 3

    Although this represents a quite small basis of data, we abstain from using earlier European championships, as one of our main objects is to analyze the explanatory power of bookmakers’ odds together with many additional, potentially influental covariates. Unfortunately, the possibility of betting on the overall cup winner before the start of the tournament is quite novel. The German state betting agency ODDSET e.g. offered the bet for the first time at the EURO 2004.

  4. 4

    There are countless examples in history for such events, throughout all competitions. We want to mention only some of the most famous ones: Germany’s first World Cup success in Switzerland 1954, known as the “miracle from Bern”; Greece’s victory at the EURO 2004 (compare footnote 1); FC Porto’s triumph in the UEFA CL season 2003/2004.

  5. 5

    The GDP per capita is the gross domestic product divided by midyear population. The GDP is the sum of gross values added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products.

  6. 6

    We had to resort to different sources in order to collect data for all participating countries at the EURO 2004, 2008 and 2012. Amongst the most useful ones are, and For some years the populations of Russia and Ukraine had to be searched individually.

  7. 7

    Unfortunately, the archive of the webpage was established not until 4th October 2004, so the average market values of the national teams that we used for the EURO 2004 can only be seen as a rough approximation, as market values certainly changed after the EURO 2004.

  8. 8

    Note that European national teams also gain UEFA team points. For each game played in the most recently completed full cycle (a full cycle is defined as all qualifying games and final tournament games, whereas a half cycle is defined as all games played in the latest qualifying round only) of both the latest FIFA World Cup and European championship, with addition of points for each game played at the latest completed half cycle. Similar to the FIFA points a time-dependent weight-adjustment is used, allocating to both the latest full and half cycle double the weight as to the older full cycle. Thus, the UEFA team points would reflect a lot of information about the current strength of a national team in a European-wide comparison, but as the UEFA changed the coefficient ranking system in 2008, we focused on the UEFA club ranking.

  9. 9

    Note that this variable is not available by any soccer data provider and thus had to be counted “by hand.”

  10. 10

    The two variables “Maximum number of teammates” and “second maximum number of teammates” are highly (negatively) correlated with the number of different clubs, where the players are under contract, and hence also include information about the structure of the teams’ squads. Therefore, we did not consider the number of different clubs as a separate variable.

  11. 11

    This variable is available on several soccer data providers, see for example

  12. 12

    As we are in a matched-pair design, we do not exclude single observations from the training data, but single matches.

  13. 13

    A closer look on the coefficient paths of this model shows that for sligthly smaller values of the tuning parameter than the selected one, the variables ODDSET odds and fairness would have been included. Besides, in most of the training data sets both ODDSET odds and fairness have been included at the optimal tuning parameter.

  14. 14

    In comparison to Model 2, for glmmLasso based on LOOCV now several variables are not selected anymore, when the variable fairness (V2) is excluded. This may be due to the considerable correlations between the fairness and these variables, e.g. corV2,V10=−0.29, corV2,V11=−0.16 and corV2,V12=−0.16 (see Table 6 in Appendix A).

  15. 15

    Three-way odds consider only the tendency of a match with the possible results winning of Team 1, draw or defeat of Team 1 and are usually fixed some days before the corresponding match takes place.

  16. 16

    The transformed probabilities only serve as an approximation, based on the assumption that the bookmakers’ margins follow a discrete uniform distribution on the three possible match tendencies.

  17. 17

    For convenience we suppress the index t for both teams here, which indicates the number of the game for a team, as well as the indices j and

    corresponding to the match-specific random effects. As the match under consideration could have a different number in the individual match numbering of each team, one should correctly write
    if Team k and Team l are facing each other in a certain match j, where the superscript indicates that the estimate is depends on the opponent’s covariates.

  18. 18

    Similar to footnote 3, in the following we suppress both the indices j and

    corresponding to the match-specific random effects and the index for the match numbering as well as the superscripts for both teams, in order to keep the notation simple. Note here that for the two teams of Ireland and Ukraine that did not qualify for either EURO 2004 or 2008 no random effects estimates exist and thus their random effects are set to zero. Besides, it has to be mentioned that the match-specific random effects estimates cannot be used for the prediction of new matches.


Akaike, H. 1973. “Information Theory and the Extension of the Maximum Likelihood Principle,” Second International Symposium on Information Theory 267–281.Search in Google Scholar

Bates, D. and M. Maechler. 2010. lme4: Linear Mixed-Effects Models UsingS4 classes. R package version 0.999375–34.Search in Google Scholar

Bernard, A. B. and M. R. Busse. 2004. “Who Wins the Olympic Games: Economic Developement and Medal Totals.” The Review of Economics and Statistics 86(1):413–417.10.1162/003465304774201824Search in Google Scholar

Breslow, N. E. and D. G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Model.” Journal of the American Statistical Association 88:9–25.Search in Google Scholar

Breslow, N. E. and X. Lin. 1995. “Bias Correction in Generalized Linear Mixed Models With a Single Component of Dispersion,” Biometrika 82:81–91.Search in Google Scholar

Broström, G. 2009. glmmML: Generalized Linear Models With Clustering. R package version 0.81–6.Search in Google Scholar

Brown, T. D., J. L. V. Raalte, B. W. Brewer, C. R. Winter, A. E. Cornelius, and M. B. Andersen. 2002. “World Cup Soccer Home Advantage.” Journal of Sport Behavior 25:134–144.Search in Google Scholar

Carlin, J. B., L. C. Gurrin, J. A. C. Sterne, R. Morley, and T. Dwyer. 2005. “Regression Models for Twin Studies: A Critical Review.” International Journal of Epidemiology 34:1089–1099.10.1093/ije/dyi153Search in Google Scholar PubMed

Clarke, S. R. and J. M. Norman. 1995. “Home Ground Advantage of Individual Clubs in English Soccer.” The Statistician 44:509–521.10.2307/2348899Search in Google Scholar

Dawson, P. and S. Dobson. 2010. “The Influence of Social Pressure and Nationality on Individual Decisions. Evidence From the Behaviour of Referees.” Journal of Economic Psychology 31:181–191.Search in Google Scholar

Dyte, D. and S. R. Clarke. 2000. “A Ratings Based Poisson Model for World Cup Soccer Simulation.” Journal of the Operational Research Society 51(8):993–998.10.1057/palgrave.jors.2600997Search in Google Scholar

Eugster, M. J. A., J. Gertheiss, and S. Kaiser. 2011. “Having the Second Leg at Home-Advantage in the UEFA Champions League Knockout Phase?” Journal of Quantitative Analysis in Sports 7(1).10.2202/1559-0410.1275Search in Google Scholar

Fahrmeir, L. and G. Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models (2nd ed.). New York: Springer-Verlag.10.1007/978-1-4757-3454-6Search in Google Scholar

Frohwein, T. 2010, June. Die falschen Pferde. In: (08.06.2010), available at: in Google Scholar

Gerhards, J., M. Mutz, and G. G. Wagner 2012. “Keiner Kommt an Spanien Vorbei-auβer dem Zufall.” DIW-Wochenbericht 24:14–20.Search in Google Scholar

Gerhards, J. and G. G. Wagner. 2008. “Market Value Versus Accident-Who Becomes European Soccer Champion?” DIW-Wochenbericht 24:236–328.Search in Google Scholar

Gerhards, J. and G. G. Wagner. 2010. “Money and a Little Bit of Chance: Spain Was Odds-On Favourite of the Football Worldcup.” DIW- Wochenbericht 29:12–15.Search in Google Scholar

Goeman, J. J. 2010. “L1 Penalized Estimation in the Cox Proportional Hazards Model.” Biometrical Journal 52:70–84.Search in Google Scholar

Groll, A. 2011a. glmmLasso: Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation. R package version 1.1.0.Search in Google Scholar

Groll, A. 2011b. Variable Selection by Regularization Methods for Generalized Mixed Models. Ph.D. thesis, University of Munich, Göttingen. Cuvillier Verlag.Search in Google Scholar

Groll, A. and G. Tutz. 2012. “Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation.” Statistics and Computing. DOI: 10.1007/s11222-012-9359-z.10.1007/s11222-012-9359-zSearch in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2008. “Who is Going to Win the EURO 2008? (A statistical investigation of bookmakers odds).” Research report series, Department of Statistics and Mathematics, University of Vienna.Search in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2010a. “Forecasting Sports Tournaments by Ratings of (Prob)abilities: A Comparison for the EURO 2008.” International Journal of Forecasting 26(3):471–481.10.1016/j.ijforecast.2009.10.001Search in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2010b. “Forecasting the Winner of the FIFA World Cup 2010. Research Report Series.” Department of Statistics and Mathematics, University of Vienna.Search in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2011. “Bookmaker Concensus and Agreement for the UEFA Champions League 2008/09.” IMA Journal of Management Mathematics 22(2):183–194.10.1093/imaman/dpq016Search in Google Scholar

Lin, X. and N. E. Breslow. 1996. “Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion.” Journal of the American Statistical Association 91:1007–1016.10.1080/01621459.1996.10476971Search in Google Scholar

Nevill, A., N. Balmer, and M. Williams. 1999. “Crowd Influence on Decisions in Association Football.” The Lancet 353 (9162), 1416.Search in Google Scholar

Pinheiro, J. C. and D. M. Bates 2000. Mixed-Effects Models in S and S-Plus. New York: Springer.10.1007/978-1-4419-0318-1Search in Google Scholar

Pollard, R. 2008. “Home Advantage in Football: A Current Review of an Unsolved Puzzle.” The Open Sports Sciences Journal 1:12–14.10.2174/1875399X00801010012Search in Google Scholar

Pollard, R. and G. Pollard. 2005 “Home Advantage in Soccer: A Review of its Existence and Causes.” International Journal of Soccer and Science Journal 3(1):25–33.Search in Google Scholar

Schelldorfer, J. and P. Bühlmann. 2011. “GLMMLasso: An algorithm for High-Dimensional Generalized Linear Mixed Models Using L1-Penalization. Preprint, ETH Zurich. in Google Scholar

Schwarz, G. 1978. “Estimating the Dimension of a Model,” Annals of Statistics 6:461–464.10.1214/aos/1176344136Search in Google Scholar

Stoy, V., R. Frankenberger, D. Buhr, L. Haug, B. Springer, and J. Schmid. 2010. “Das Ganze ist Mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft 2010.” Working Paper 46, Eberhard Karls University, Tübingen, Germany.Search in Google Scholar

Tibshirani, R. 1996. “Regression Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society B 58:267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S (4th ed.). New York: Springer.10.1007/978-0-387-21706-2Search in Google Scholar

Yang, H. 2007. Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis. Ph.D. thesis, North Carolina State University.Search in Google Scholar

Zeileis, A., C. Leitner, and K. Hornik. 2012. History repeating: Spain beats Germany in the EURO 2012 final. Working paper, Faculty of Economics and Statistics, University of Innsbruck.Search in Google Scholar

Published Online: 2013-03-30

©2013 by Walter de Gruyter Berlin Boston

Downloaded on 29.11.2023 from
Scroll to top button