Akaike, H. 1973. “Information Theory and the Extension of the Maximum Likelihood Principle,” Second International Symposium on Information Theory 267–281.Google Scholar
Bates, D. and M. Maechler. 2010. lme4: Linear Mixed-Effects Models Using S4 classes. R package version 0.999375–34.Google Scholar
Broström, G. 2009. glmmML: Generalized Linear Models With Clustering. R package version 0.81–6.Google Scholar
Brown, T. D., J. L. V. Raalte, B. W. Brewer, C. R. Winter, A. E. Cornelius, and M. B. Andersen. 2002. “World Cup Soccer Home Advantage.” Journal of Sport Behavior 25:134–144.Google Scholar
Carlin, J. B., L. C. Gurrin, J. A. C. Sterne, R. Morley, and T. Dwyer. 2005. “Regression Models for Twin Studies: A Critical Review.” International Journal of Epidemiology 34:1089–1099.PubMedCrossrefGoogle Scholar
Dawson, P. and S. Dobson. 2010. “The Influence of Social Pressure and Nationality on Individual Decisions. Evidence From the Behaviour of Referees.” Journal of Economic Psychology 31:181–191.Web of ScienceCrossrefGoogle Scholar
Dyte, D. and S. R. Clarke. 2000. “A Ratings Based Poisson Model for World Cup Soccer Simulation.” Journal of the Operational Research Society 51(8):993–998.Google Scholar
Eugster, M. J. A., J. Gertheiss, and S. Kaiser. 2011. “Having the Second Leg at Home-Advantage in the UEFA Champions League Knockout Phase?” Journal of Quantitative Analysis in Sports 7(1).Google Scholar
Fahrmeir, L. and G. Tutz. 2001. Multivariate Statistical Modelling Based on Generalized Linear Models (2nd ed.). New York: Springer-Verlag.Google Scholar
Frohwein, T. 2010, June. Die falschen Pferde. In: e-politik.de (08.06.2010), available at: http://www.e-politik.de/lesen/artikel/2010/die-falschen-pferde/(12.06.2012).
Gerhards, J., M. Mutz, and G. G. Wagner 2012. “Keiner Kommt an Spanien Vorbei-auβer dem Zufall.” DIW-Wochenbericht 24:14–20.Google Scholar
Gerhards, J. and G. G. Wagner. 2008. “Market Value Versus Accident-Who Becomes European Soccer Champion?” DIW-Wochenbericht 24:236–328.Google Scholar
Gerhards, J. and G. G. Wagner. 2010. “Money and a Little Bit of Chance: Spain Was Odds-On Favourite of the Football Worldcup.” DIW- Wochenbericht 29:12–15.Google Scholar
Groll, A. 2011a. glmmLasso: Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation. R package version 1.1.0.Google Scholar
Groll, A. 2011b. Variable Selection by Regularization Methods for Generalized Mixed Models. Ph.D. thesis, University of Munich, Göttingen. Cuvillier Verlag.Google Scholar
Leitner, C., A. Zeileis, and K. Hornik. 2008. “Who is Going to Win the EURO 2008? (A statistical investigation of bookmakers odds).” Research report series, Department of Statistics and Mathematics, University of Vienna.Google Scholar
Leitner, C., A. Zeileis, and K. Hornik. 2010a. “Forecasting Sports Tournaments by Ratings of (Prob)abilities: A Comparison for the EURO 2008.” International Journal of Forecasting 26(3):471–481.Web of ScienceGoogle Scholar
Leitner, C., A. Zeileis, and K. Hornik. 2010b. “Forecasting the Winner of the FIFA World Cup 2010. Research Report Series.” Department of Statistics and Mathematics, University of Vienna.Google Scholar
Leitner, C., A. Zeileis, and K. Hornik. 2011. “Bookmaker Concensus and Agreement for the UEFA Champions League 2008/09.” IMA Journal of Management Mathematics 22(2):183–194.Web of ScienceGoogle Scholar
Lin, X. and N. E. Breslow. 1996. “Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion.” Journal of the American Statistical Association 91:1007–1016.CrossrefGoogle Scholar
Nevill, A., N. Balmer, and M. Williams. 1999. “Crowd Influence on Decisions in Association Football.” The Lancet 353 (9162), 1416.Google Scholar
Pinheiro, J. C. and D. M. Bates 2000. Mixed-Effects Models in S and S-Plus. New York: Springer.Google Scholar
Pollard, R. 2008. “Home Advantage in Football: A Current Review of an Unsolved Puzzle.” The Open Sports Sciences Journal 1:12–14.Google Scholar
Pollard, R. and G. Pollard. 2005 “Home Advantage in Soccer: A Review of its Existence and Causes.” International Journal of Soccer and Science Journal 3(1):25–33.Google Scholar
Schelldorfer, J. and P. Bühlmann. 2011. “GLMMLasso: An algorithm for High-Dimensional Generalized Linear Mixed Models Using L1-Penalization. Preprint, ETH Zurich. http://stat.ethz.ch/people/schell.
Stoy, V., R. Frankenberger, D. Buhr, L. Haug, B. Springer, and J. Schmid. 2010. “Das Ganze ist Mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft 2010.” Working Paper 46, Eberhard Karls University, Tübingen, Germany.Google Scholar
Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S (4th ed.). New York: Springer.Google Scholar
Yang, H. 2007. Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis. Ph.D. thesis, North Carolina State University.Google Scholar
Zeileis, A., C. Leitner, and K. Hornik. 2012. History repeating: Spain beats Germany in the EURO 2012 final. Working paper, Faculty of Economics and Statistics, University of Innsbruck.Google Scholar
About the article
Published Online: 2013-03-30
The German state betting agency ODDSET ranked Spain in third place among the favorites for the EURO 2008 with odds of 6.50 (usually, in statistics odds represent the ratio of the probability that an event will happen to the probability that it will not happen; however, European bookmakers specify the gross ratio which represents the ratio of paid amount to stake. So putting €1 on Spain as the EURO 2008 champion would have given back €6.50. Thus, European odds can be directly transformed into probabilities by taking the inverse and adjusting for the bookmakers’ margins) behind Germany (4.50) and Italy (5.50). Before the FIFA World Cup 2010 Spain was ranked in first place among the favorites with odds of 5.00 together with Brazil.
The German state betting agency ODDSET ranked Greece in 12th place among the favorites for the EURO 2004 with odds of 45.00.
Although this represents a quite small basis of data, we abstain from using earlier European championships, as one of our main objects is to analyze the explanatory power of bookmakers’ odds together with many additional, potentially influental covariates. Unfortunately, the possibility of betting on the overall cup winner before the start of the tournament is quite novel. The German state betting agency ODDSET e.g. offered the bet for the first time at the EURO 2004.
There are countless examples in history for such events, throughout all competitions. We want to mention only some of the most famous ones: Germany’s first World Cup success in Switzerland 1954, known as the “miracle from Bern”; Greece’s victory at the EURO 2004 (compare footnote 1); FC Porto’s triumph in the UEFA CL season 2003/2004.
The GDP per capita is the gross domestic product divided by midyear population. The GDP is the sum of gross values added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products.
We had to resort to different sources in order to collect data for all participating countries at the EURO 2004, 2008 and 2012. Amongst the most useful ones are http://www.wko.at, http://www.statista.com/ and http://epp.eurostat.ec.europa.eu. For some years the populations of Russia and Ukraine had to be searched individually.
Unfortunately, the archive of the webpage was established not until 4th October 2004, so the average market values of the national teams that we used for the EURO 2004 can only be seen as a rough approximation, as market values certainly changed after the EURO 2004.
Note that European national teams also gain UEFA team points. For each game played in the most recently completed full cycle (a full cycle is defined as all qualifying games and final tournament games, whereas a half cycle is defined as all games played in the latest qualifying round only) of both the latest FIFA World Cup and European championship, with addition of points for each game played at the latest completed half cycle. Similar to the FIFA points a time-dependent weight-adjustment is used, allocating to both the latest full and half cycle double the weight as to the older full cycle. Thus, the UEFA team points would reflect a lot of information about the current strength of a national team in a European-wide comparison, but as the UEFA changed the coefficient ranking system in 2008, we focused on the UEFA club ranking.
Note that this variable is not available by any soccer data provider and thus had to be counted “by hand.”
The two variables “Maximum number of teammates” and “second maximum number of teammates” are highly (negatively) correlated with the number of different clubs, where the players are under contract, and hence also include information about the structure of the teams’ squads. Therefore, we did not consider the number of different clubs as a separate variable.
This variable is available on several soccer data providers, see for example http://www.kicker.de/.
As we are in a matched-pair design, we do not exclude single observations from the training data, but single matches.
A closer look on the coefficient paths of this model shows that for sligthly smaller values of the tuning parameter than the selected one, the variables ODDSET odds and fairness would have been included. Besides, in most of the training data sets both ODDSET odds and fairness have been included at the optimal tuning parameter.
In comparison to Model 2, for glmmLasso based on LOOCV now several variables are not selected anymore, when the variable fairness (V2) is excluded. This may be due to the considerable correlations between the fairness and these variables, e.g. corV2,V10=−0.29, corV2,V11=−0.16 and corV2,V12=−0.16 (see Table 6 in Appendix A).
Three-way odds consider only the tendency of a match with the possible results winning of Team 1, draw or defeat of Team 1 and are usually fixed some days before the corresponding match takes place.
The transformed probabilities only serve as an approximation, based on the assumption that the bookmakers’ margins follow a discrete uniform distribution on the three possible match tendencies.
For convenience we suppress the index t for both teams here, which indicates the number of the game for a team, as well as the indices j and
Similar to footnote 3, in the following we suppress both the indices j and