# Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: An application to the FIFA World Cup 2014

• Andreas Groll , Gunther Schauberger and Gerhard Tutz

## Abstract

In this article an approach for the analysis and prediction of international soccer match results is proposed. It is based on a regularized Poisson regression model that includes various potentially influential covariates describing the national teams’ success in previous FIFA World Cups. Additionally, within the generalized linear model (GLM) framework, also differences of team-specific effects are incorporated. In order to achieve variable selection and shrinkage, we use tailored Lasso approaches. Based on preceding FIFA World Cups, two models for the prediction of the FIFA World Cup 2014 are fitted and investigated. Based on the model estimates, the FIFA World Cup 2014 is simulated repeatedly and winning probabilities are obtained for all teams. Both models favor the actual FIFA World Champion Germany.

Corresponding author: Andreas Groll, Department of Mathematics, Ludwig-Maximilians-University, Theresienstr. 39, 80333 Munich, e-mail:

## Acknowledgments

We are grateful to Falk Barth and Johann Summerer from the ODDSET-Team for providing us all necessary odds data and to Sven Grothues from the Transfermarkt.de-Team for the pleasant collaboration. The article has strongly benefited from a methodical and statistical perspective by suggestions from Helmut Küchenhoff and Christian Groll. The insightful discussions with the hobby football expert Tim Frohwein also helped a lot to improve the article.

## Appendix

### Prediction results and most probable tournament outcome for the WC1994 data

Table 9

Estimated probabilities (in %) for reaching the different stages in the FIFA World Cup 2014 for all 32 teams based on 100,000 simulation runs of the FIFA World Cup 2014 and based on the estimates of the WC1994 data together with winning probabilities based on the ODDSET odds.

Round of 16Quarter finalsSemi finalsFinalWorld ChampionOddset
1.
GER86.168.152.332.820.514.2
2.
ESP91.364.147.531.719.510.9
3.
BRA93.064.948.230.819.120.3
4.
POR73.351.135.118.79.32.4
5.
URU71.350.722.511.55.12.8
6.
BEL82.836.922.410.24.35.9
7.
ITA67.246.319.59.44.03.5
8.
SUI72.345.619.78.53.50.7
9.
ARG77.644.518.97.83.114.2
10.
CRO64.926.213.86.02.10.7
11.
FRA62.235.413.75.31.93.5
12.
COL76.333.410.94.11.33.9
13.
ENG47.328.19.53.71.33.5
14.
CHI50.118.08.63.31.02.0
15.
NED44.915.16.92.50.73.5
16.
BIH56.625.27.92.40.70.5
17.
ALG49.313.25.61.60.40.1
18.
CIV61.321.45.51.70.40.7
19.
USA23.210.74.81.50.40.7
20.
ECU38.817.34.81.30.30.7
21.
NGA39.314.23.40.80.20.4
22.
RUS42.79.03.50.80.21.2
23.
GHA17.47.22.90.70.20.7
24.
MEX28.06.92.40.70.20.7
25.
JPN43.011.52.20.50.10.5
26.
HON26.69.82.20.50.10.1
27.
IRN26.47.91.60.30.10.1
28.
KOR25.23.81.10.20.00.2
29.
CRC14.25.61.00.20.00.1
30.
CMR14.02.30.60.10.00.2
31.
AUS13.72.40.60.10.00.2
32.
GRE19.43.10.30.10.00.7
Table 10

Estimated (adapted) probabilities (in %) for reaching the next stages in the FIFA World Cup 2014 for all 32 teams based on 100,000 simulation runs of the FIFA World Cup 2014.

Round of 16Quarter finalsSemi finalsFinalWorld Champion
1.
GER86.181.468.453.273.9
2.
ARG77.648.447.554.826.1
3.
BRA93.076.673.346.80.0
4.
NED44.966.067.045.20.0
5.
BEL82.865.752.50.00.0
6.
CRC14.268.033.00.00.0
7.
FRA62.268.831.60.00.0
8.
COL76.341.626.70.00.0
9.
URU71.358.40.00.00.0
10.
SUI72.351.60.00.00.0
11.
USA23.234.30.00.00.0
12.
MEX28.034.00.00.00.0
13.
GRE19.432.00.00.00.0
14.
NGA39.331.20.00.00.0
15.
CHI50.123.40.00.00.0
16.
ALG49.318.60.00.00.0
17.
ESP91.30.00.00.00.0
18.
POR73.30.00.00.00.0
19.
ITA67.20.00.00.00.0
20.
CRO64.90.00.00.00.0
21.
CIV61.30.00.00.00.0
22.
BIH56.60.00.00.00.0
23.
ENG47.30.00.00.00.0
24.
JPN43.00.00.00.00.0
25.
RUS42.70.00.00.00.0
26.
ECU38.80.00.00.00.0
27.
HON26.60.00.00.00.0
28.
IRN26.40.00.00.00.0
29.
KOR25.20.00.00.00.0
30.
GHA17.40.00.00.00.0
31.
CMR14.00.00.00.00.0
32.
AUS13.70.00.00.00.0

After each round, the data set (WC1994) is extended with by the matches already played and the model is refitted. Only actual matches from the World Cup are simulated.

Table 11

Most probable final group standings together with the corresponding probabilities for the FIFA World Cup 2014 based on 100,000 simulation runs and on the estimates of the WC1994 data.

Group A

43%
Group B

33%
Group C

24%
Group D

22%
1.
BRA
1.
ESP
1.
COL
1.
URU
2.
CRO
2.
CHI
2.
CIV
2.
ITA
MEX
NED
JPN
ENG
CMR
AUS
GRE
CRC
Group E

22%
Group F

24%
Group G

36%
Group H

24%
1.
SUI
1.
ARG
1.
GER
1.
BEL
2.
FRA
2.
BIH
2.
POR
2.
ALG
ECU
NGA
GHA
RUS
HON
IRN
USA
KOR
Figure 7:

Most probable course of the knockout stage together with corresponding probabilities for the FIFA World Cup 2014 based on 100,000 simulation runs and on the estimates of the WC1994 data.

## References

Akaike, H. 1973. “Information Theory and the Extension of the Maximum Likelihood Principle.” Second International Symposium on Information Theory 267–281.Search in Google Scholar

Dixon, M. J. and S. G. Coles. 1997. “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 46:265–280.10.1111/1467-9876.00065Search in Google Scholar

Dobson, S. and J. Goddard. 2011. The Economics of Football. Cambridge: Cambridge University Press.10.1017/CBO9780511973864Search in Google Scholar

Dyte, D. and S. R. Clarke. 2000. “A Ratings Based Poisson Model for World Cup Soccer Simulation.” Journal of the Operational Research Society 51(8):993–998.10.1057/palgrave.jors.2600997Search in Google Scholar

Elo, A. E. 2008. The Rating of Chess Players. Past and Present, San Rafael: Ishi Press.Search in Google Scholar

Eugster, M. J. A., J. Gertheiss, and S. Kaiser. 2011. “Having the Second Leg at Home – Advantage in the UEFA Champions League Knockout Phase?” Journal of Quantitative Analysis in Sports 7(1).10.2202/1559-0410.1275Search in Google Scholar

Forrest, D. and R. Simmons. 2000. “Forecasting Sport: The Behaviour and Performance of Football Tipsters.” International Journal of Forecasting 16:317–331.10.1016/S0169-2070(00)00050-9Search in Google Scholar

Goldman-Sachs Global Investment Research. 2014. “The World Cup and Economics 2014.” Accessed February 23, 2015. http://www.goldmansachs.com/our-thinking/outlook/world-cup-and-economics-2014-folder/world-cup-economics-report.pdf.Search in Google Scholar

Groll, A. and J. Abedieh. 2013. “Spain Retains its Title and Sets a New Record – Generalized Linear Mixed Models on European Football Championships.” Journal of Quantitative Analysis in Sports 9:51–66.10.1515/jqas-2012-0046Search in Google Scholar

Groll, A. and G. Tutz. 2014. “Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation.” Statistics and Computing 24:137–154.10.1007/s11222-012-9359-zSearch in Google Scholar

Hoerl, A. E. and R. W. Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12:55–67.10.1080/00401706.1970.10488634Search in Google Scholar

Karlis, D. and I. Ntzoufras. 2003. “Analysis of Sports Data by Using Bivariate Poisson Models.” The Statistician 52:381–393.10.1111/1467-9884.00366Search in Google Scholar

Karlis, D. and I. Ntzoufras. 2011. “Robust Fitting of Football Prediction Models,” IMA Journal of Management Mathematics 22:171–182.10.1093/imaman/dpq013Search in Google Scholar

Koopman, S. J. and R. Lit. 2015. “A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League.” Journal of the Royal Statistical Society, A 178:167–186.10.1111/rssa.12042Search in Google Scholar

Lee, A. J. 1997. “Modeling Scores in the Premier League: Is Manchester United Really the Best?” Chance 10:15–19.10.1080/09332480.1997.10554791Search in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2010a. “Forecasting Sports Tournaments by Ratings of (Prob)abilities: A Comparison for the EURO 2008.” International Journal of Forecasting 26:471–481.10.1016/j.ijforecast.2009.10.001Search in Google Scholar

Leitner, C., A. Zeileis, and K. Hornik. 2010b. “Forecasting the Winner of the FIFA World Cup 2010.” Report Series / Department of Statistics and Mathematics, 100. Institute for Statistics and Mathematics, WU Vienna.Search in Google Scholar

Lloyd’s. 2014. “FIFA World Cup: How Much Are Those Legs Worth?” Accessed February 16, 2015. http://www.lloyds.com/news-and-insight/news-and-features/market-news/industry-news-2014/fifa-world-cup-how-much-are-those-leg-worth.Search in Google Scholar

Maher, M. J. 1982. “Modelling Association Football Scores.” Statistica Neerlandica 36:109–118.10.1111/j.1467-9574.1982.tb00782.xSearch in Google Scholar

McHale, I. G. and P. A. Scarf. 2006. “Forecasting International Soccer Match Results Using Bivariate Discrete Distributions.” Technical Report 322, Working paper, Salford Business School.Search in Google Scholar

McHale, I. G. and P. A. Scarf. 2011. “Modelling the Dependence of Goals Scored by Opposing Teams in International Soccer Matches.” Statistical Modelling 41:219–236.10.1177/1471082X1001100303Search in Google Scholar

Meier, L., S. Van de Geer, and P. Bühlmann. 2008. “The Group Lasso for Logistic Regression.” Journal of the Royal Statistical Society, B 70:53–71.10.1111/j.1467-9868.2007.00627.xSearch in Google Scholar

R Core Team. 2014. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/.Search in Google Scholar

Rue, H. and O. Salvesen. 2000. “Prediction and Retrospective Analysis of Soccer Matches in a League.” Journal of the Royal Statistical Society: Series D (The Statistician) 49:399–418.10.1111/1467-9884.00243Search in Google Scholar

Schwarz, G. 1978. “Estimating the Dimension of a Model.” Annals of Statistics 6:461–464.10.1214/aos/1176344136Search in Google Scholar

Silver, N. 2014. “It’s Brazil’s World Cup to Lose.” Accessed February 18, 2015. http://fivethirtyeight.com/features/its-brazils-world-cup-to-lose/.Search in Google Scholar

Stoy, V., R. Frankenberger, D. Buhr, L. Haug, B. Springer, and J. Schmid. 2010. “Das Ganze ist mehr als die Summe seiner Lichtgestalten. Eine ganzheitliche Analyse der Erfolgschancen bei der Fußballweltmeisterschaft 2010.” Working Paper 46, Eberhard Karls University, Tübingen, Germany.Search in Google Scholar

Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society, B 58:267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

Yuan, M. and Y. Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society, B 68:49–67.10.1111/j.1467-9868.2005.00532.xSearch in Google Scholar

Zeileis, A., C. Leitner, and K. Hornik. 2012. “History Repeating: Spain Beats Germany in the EURO 2012 final.” Working Paper, Faculty of Economics and Statistics, University of Innsbruck.Search in Google Scholar

Zeileis, A., C. Leitner, and K. Hornik. 2014. “Home Victory for Brazil in the 2014 FIFA World Cup.” Working paper, Faculty of Economics and Statistics, University of Innsbruck.Search in Google Scholar

Published Online: 2015-5-16
Published in Print: 2015-6-1