Accessible Published by De Gruyter June 27, 2019

Bayesian statistics meets sports: a comprehensive review

Edgar Santos-Fernandez ORCID logo, Paul Wu and Kerrie L. Mengersen

Abstract

Bayesian methods are becoming increasingly popular in sports analytics. Identified advantages of the Bayesian approach include the ability to model complex problems, obtain probabilistic estimates and predictions that account for uncertainty, combine information sources and update learning as new data become available. The volume and variety of data produced in sports activities over recent years and the availability of software packages for Bayesian computation have contributed significantly to this growth. This comprehensive survey reviews and characterizes the latest advances in Bayesian statistics in sports, including methods and applications. We found that a large proportion of these articles focus on modeling/predicting the outcome of sports games and on the development of statistics that provides a better picture of athletes’ performance. We provide a description of some of the advances in basketball, football and baseball. We also summarise the sources of data used for the analysis and the most commonly used software for Bayesian computation. We found a similar number of publications between 2013 and 2018 as compared to those published in the three previous decades, which is an indication of the growing adoption rate of Bayesian methods in sports.

1 Introduction

Statistical techniques generally fall within the “Bayesian” category when they rely on the Bayes theorem, treat unknown parameters probabilistically and give a subjective treatment to probabilities (Bernardo and Smith 2009). Bayesian statistics has been rapidly gaining traction in sports science in recent years. Due to the recent and large number of Bayesian articles in sports literature, we are motivated to review some of the most commonly used techniques and methods. The main questions we addressed are: (1) what are the main developments?, (2) what are the most popular techniques?, (3) in what sports? and (4) what are the main challenges? However, the purpose of the article is not to establish a direct comparison between frequentist and Bayesian methods.

A wide range of Bayesian techniques can be found in the sports literature. For instance, Bayesian hierarchical models (e.g. Reich et al. 2006; Albert 2008; Baio and Blangiardo 2010; Miller et al. 2014), Bayesian regression (BR) (Jensen, Shirley, and Wyner 2009b; Albert 2016; Deshpande and Jensen 2016; Silva and Swartz 2016; Boys and Philipson 2018), spatial and spatio-temporal analysis (Jensen et al. 2009b; Yousefi and Swartz 2013; Miller et al. 2014), Hidden Markov Models (HMM) (Franks et al. 2015), etc.

Modern sports science is both characterized and challenged by the volume and variety of available data. Good examples of this are the basketball STATS SportVU tracking technology, the MLB baseball PITCHf/x and the golf ShotLink system. While traditional statistical analyses focused on points scored, averages and number of goals, recent advances in sports analytics consider more complex issues such as the interaction of the players in offensive and defensive actions. See, for example, Gudmundsson and Horton (2017).

There are several theoretical and computational advantages for choosing Bayesian techniques for modelling (Bernardo and Smith 2009; Berger 2013). More specifically in the context of sports, more and more scientist are going Bayesian because these methods allow to:

  1. 1.

    incorporate expert information or prior believes,

  2. 2.

    use Bayesian learning where the current posterior distribution becomes the prior for future data,

  3. 3.

    provide probabilistic rather than point estimates,

  4. 4.

    obtain posterior distributions for the parameters of interest,

  5. 5.

    include latent variables,

  6. 6.

    model complex problems,

  7. 7.

    integrate and combine efficiently data that comes from different sources,

  8. 8.

    update regularly the model when new data becomes available,

  9. 9.

    treat effectively missing data,

  10. 10.

    deal more effectively with small dataset using prior information to improve the parameter estimates,

  11. 11.

    use of non-standard distributions,

  12. 12.

    obtain probabilistic rankings of players or teams using the MCMC chains,

  13. 13.

    make predictions taking into consideration uncertainty,

  14. 14.

    capture spatial neighbouring information using prior distributions and incorporate spatial dependency.

The rest of the article has been organised as follows. The next section discusses the method we adopted to carry out the review. Next, in the results section, we examine several of these Bayesian techniques and then we discuss the main developments undergone by relevant sports.

2 Materials and methods for the comprehensive review process

The literature review was conducted according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) (Liberati et al. 2009) guidelines, with the aim of reducing the publication bias as much as possible. We searched in Google Scholar, Scopus and PubMed databases using the keyword: “sport*” along with: “Bayesian regression,” “Bayesian statistics,” “Gibbs sampler,” “Bayesian Hierarchical model,” “Empirical Bayes methods,” “Hidden Markov model,” “Markov chain Monte Carlo” or “MCMC,” “posterior” and “prior distribution,” “spatial analysis” and “spatio-temporal modeling.” Other related techniques were not included because they are not fully Bayesian i.e. they do not give a subjective treatment to probabilities. For example, naïve Bayes is mentioned across sports papers, and despite they use the Bayes rule, no subjective interpretation of probability is given (Hand and Yu 2001) and therefore they are not considered fully Bayesian. Similarly, Empirical Bayes (EBA) does not classify as fully Bayesian because the prior distribution is generally obtained from observed data. However, EBA methods were included in this review since they have enjoyed great popularity among practitioners in sports statistics for decades.

The search started on 05-Jan-2018 and ended on 31-Aug-2018. We focused on relevant English language journal peer-reviewed articles and books published from 1985 using Bayesian statistical methods for modelling and analysis of team and individual sports, including basketball, baseball, football, marathon, swimming, triathlon, etc. We also reviewed papers about relevant issues or technologies concerning these sports, including wearable technologies and doping. However, gambling, computer vision and video analytics are beyond the scope of this review.

We focused on (1) the statistical method, (2) the most relevant findings and the conclusions, (3) the area of application and type of sport, (4) the data sources including the season or competition and country and (5) the software they used for the analysis (if mentioned). Articles’ metadata (e.g. authors’ affiliation) was extracted from the PDF documents using 𝖱 (R Core Team 2017).

3 Results

In total n = 96 articles were initially identified from the database search while 31 were found through the review process or identified by the authors in previous research. A total of n = 42 articles were excluded because they fell out of topic or were non-Bayesian. Figure 1 depicts the review process.

Figure 1: Flow chart of the comprehensive review process based on the PRISMA (Liberati et al. 2009) methodology.

Figure 1:

Flow chart of the comprehensive review process based on the PRISMA (Liberati et al. 2009) methodology.

The authors of these publications were from the United States (36%), Australia (9%), the United Kingdom (8%), Canada (8%), Sweden (8%), Switzerland (7%), Brazil (4%), Germany (3%), the Netherlands (3%), Japan (3%) and Hong Kong (3%).

3.1 Bayesian statistical methods

Bayesian statistical models are based on the Bayes theorem. The posterior distribution for the parameter of interest θ is obtained using:

(1)f(θ|z)=f(z|θ)f(θ)f(z|θ)f(θ)dθ

where f(θ) and f(z|θ) are the prior distribution and likelihood, respectively.

The following subsections describe papers grouped by technique which includes Bayesian regression models and methods accounting for space and time.

3.1.1 Bayesian regression (BR)

Bayesian linear regression is the most common model of choice when assessing the association between a response variable y and p predictors x=(x1,x2,,xp). In the simplest formulation,

(2)yi=β1xi1+β2xi2++βkxik+εi

where i = 1, 2, ⋯ , n is the observation number and ε is the residual that is assumed to be normally distributed with zero mean and constant variance σ2. Priors are then placed on the parameter vector β and σ2. See Gelman et al. (2014) for more details. Some examples of the use of this modeling paradigm are given below.

A Bayesian regression approach for small treatment effects, which are commonly encountered in sports performance studies, was proposed by Mengersen et al. (2016) as an alternative to the traditional magnitude-based inference suggested by Batterham and Hopkins (2006). These authors addressed the effect of altitude training regimens on the running performance and blood parameters (hemoglobin mass, maximum blood lactate concentration) in triathlon. They considered G =3 treatments: live high-train low (LHTL), intermittent hypoxic exposure (IHE) and placebo, and 8 participants per group. Another predictor in the model (X) was the change (in %) in training load before and after for each participant.

Letting I1 and I2 be indicator values for treatments 1 and 2, respectively, this model can be described in a similar manner to Eq. (2) or in an equivalent hierarchical manner as:

(3)yij𝒩(μij,σj2),μij=β0+β1X+j=12βj+1Ij

where the i-th observation reside within groups (j), with each group having a potentially different variance.

In another example, Deshpande and Jensen (2016) estimated basketball players’ contributions to their team winning probabilities using a high dimensional regression. Let yi be the win probability of the home team in the ith shift, where the shifts are the periods between substitutions. They used the following regression equation:

(4)yi=μ+θhi1+θhi5θai1θai5+τHiτAi+σεi

where μ is the home court advantage. The subscripts h and a represent the home and away teams, respectively. The θ’s are the player’s effect and hence θhi1 and θai1 are the effects from player 1 in the home team and the away team, respectively. For each of the 488 players on the league, they obtained θ estimates. The parameters τHi and τAi are the partial effect associated with the home and away teams. σ denotes a measure of the variability. The marginal posterior densities for θ provide a good picture of the player contribution to winning.

Often, the response yi is a binary variable following a Bernoulli distribution, for instance whether the player scored or not. This case is frequently approached using a logistic regression model, which can be defined as follows:

(5)yiBern(pi),
(6)logit(pi)=log(pi/(1pi))=β1xi1+β2xi2++βkxik+εi.

Here, εi represents extra-Bernoulli variation; see Besag et al. (1995) for details.

Deshpande and Wyner (2017) approached the issue of baseball pitch “framing” from a Bayesian hierarchical perspective using logistic regression. The term “framing” refers to an action often carried out by catchers making a pitch look more like a strike. In order to assess the impact of the catcher in the decision, they estimated the probability that a pitch is called a strike by a given umpire and other covariates such as the pitch location (x, z), the pitcher, etc.

They employed the following logistic model:

(7)logit(pi)=Θbu,B+Θcau,CA+Θpu,P+Θcou,CO+fu(x,z)

where b, ca, co, p and u are the partial effects of the batter, catcher, count, pitcher, and the umpire. The pitch location is given by fu(x,z). Other factors such as the type of pitch (fastball, curveball, etc.) and the speed could also have been included in the model in a straightforward manner. In other examples, Miskin, Fellingham, and Florence (2010) used logistic regression to assess the importance of several skills in volleyball and Cafarelli, Rigdon, and Rigdon (2012) to obtain the probability of converting a third down in National Football League (NFL) based on the number yards to go.

Another useful model for binary response variables in the literature is the probit regression, which is based on the probit link function:

(8)pi=Φ(β1xi1+β2xi2++βkxik+ε)

where Φ is the standard normal cumulative distribution function.

Probit regression was used by Jensen et al. (2009b) to construct a baseball defensive model and predict the catching probability given the location of the defender in the field, the velocity of the ball and the direction. They defined their model as:

(9)pij=Φ(βi0+βi1Dij+βi2DijFij+βi3DijVij+βi4DijVijFij+ε)

This model gives the probability of the ball j being caught by the player i. Here, Dij represents the distance travelled by the player and Vij is the velocity. The variable Fij is 1 when moving forward and 0 otherwise. Note that this model considers the interaction between predictors and includes a categorical variable. It also allows for the computation of the player’s contribution to the defence in terms of runs saved, compared to the rest of the players in the same position.

The multinomial logistic regression in McFadden (1973) is a generalized method for cases where a categorical response variable takes more than two values, for example, the possible outcomes of a shot in basketball y={0,1,2,3}. Reich et al. (2006) used this model to assess the relationship between predictors such as the defensive strength of the opposition team and playing home or away, first or second half, and the response variables: (1) location when shooting, (2) the shooting frequency and (3) the efficiency. They let yi be a region in the court for a given shot i that follows a multinomial distribution with parameter θ(η), and they defined a predictor

(10)ηi=log(A)+xiβ

where A is a vector of the area of the section j, since some of the sections have different areas. The model is then defined as:

(11)θj(ηi)=exp(log(Aj)+xiβj)l=1pexp(log(Al)+xiβl).

See also Glickman and Hennessy (2015) for another example of the use of a multinomial logit model for competitors’ rank ordering in Alpine skiing competitions.

Bayesian linear mixed models are also on the rise. Revie et al. (2017) considered mixed models to show the viability of players’ perceptions (via surveys) to predict fitness levels for cases when a direct fitness measurement is inconvenient or not possible.

Bayesian non-parametric and semi-parametric regression models have also been employed, albeit less commonly, in sports analysis. For example, a semi-parametric latent variable approach was used by Wimmer et al. (2011) for modeling points performance in decathlon events assuming four latent abilities (sprint, jumping, throwing, and endurance). Another Bayesian nonparametric model which is based on a Dirichlet process mixture model was suggested by Pradier, Ruiz, and Perez-Cruz (2016) for modeling the effect of covariates such as the age, gender and environment in marathon runners’ performance.

Other popular regression techniques are log-linear models. See for instance, Boys and Philipson (2018), who employed an additive log-linear model for ranking cricketers, accounting for factors such as the year and player’s age. Several other log-linear models will be discussed in the football subsection 3.2.2.

3.1.2 Accounting for time

Time is a critical factor in modelling and analysis of many sports (Kovalchik and Albert 2017). The performance of athletes and teams are known to change during the season and even during the course of a game due to factors such as fatigue. Often, it is of interest to analyse factors such as fatigue or momentum that are difficult or impractical to measure. A common approach for such analysis over time is to employ a state space model (SSM) or hidden Markov model (HMM). Both of these models assume that there is an underlying latent variable, z say, that governs the value of the observed variable, y. The form of z determines the type of model employed: if z is categorical or ordinal then a hidden Markov model (HMM) is appropriate, whereas a continuous z leads to a state space model (SSM).

HMMs have been used by several authors (e.g. Albert 1993, Jensen, McShane, and Wyner 2009a, Dadashi et al. 2013, Koulis, Muthukumarana, and Briercliffe 2014). Dadashi et al. (2013) proposed a HMM to estimate timing coordination between hands and feet via estimation of temporal phases of breaststroke swimming. The model uses three axis information from wearable inertial measurement units (IMU) worn on arms and legs, and predicts three hidden states [Q=(q1,q2,q3)] corresponding to glide, propulsion and recovery in leg and arm movements.

Typically, a HMM model is defined as λ = (A, B, π), where A is the state transition probability matrix, B is the emission probability matrix that relates hidden states to observations from the wearable sensor and π is the initial state probability. The HMM model was trained using supervised learning from expert annotated video. The authors reported that the model detected correctly the phases 93.5% of the time in arm strokes and 94.4% in leg strokes. See Dadashi et al. (2013) for a detailed description.

A Poisson HMM model was used by Koulis et al. (2014) for modelling batting performance in cricket. They used a Bayesian approach with multiple states related to the batsman’s performance, where the observed variable is the number of runs per game produced. A Bayesian HMM was also used by Franks et al. (2015) to model basketball defensive placements, where the hidden states are the offensive player being guarded by each defender.

Glickman and Stern (1998) suggested a Bayesian state-space approach to predict American football teams strengths using a first-order auto-regressive process. They found that this model was able to predict the outcomes of games outcomes slightly better than the Las Vegas Betting Line oddsmaker. A nonlinear version of this model was suggested by Glickman (2001) to evaluate paired comparisons in NFL football and chess.

Some other approaches accounting for time were suggested by Stephenson and Tawn (2013) and Kovalchik and Albert (2017). Stephenson and Tawn (2013) applied concepts of extreme value theory to model annual best racing times in athletics considering an exponentially decreasing trend. This facilitates the comparison of athletes who performed in different decades. In tennis, Kovalchik and Albert (2017) fitted temporal data (time-to-serve) using a Bayesian hierarchical model and the covariates point importance and the length of the previous rally.

3.1.3 Accounting for space and time

As discussed above, modern tracking technology is providing the location of players and the ball at regular small intervals of time. This is opening the door to spatio-temporal analysis, in which the court (basketball), the course (golf) or the field (baseball) is often discretized using a grid. This grid is often a square in the Cartesian systems, e.g. one-square-foot quadrats (Miller et al. 2014) or a slice in the polar coordinates system, e.g. Reich et al. (2006); Yousefi and Swartz (2013).

In the case of basketball, a common task is to compute the probability of scoring as a function of the location. This generally yields a heat map of scoring probabilities. For instance, Reich et al. (2006) used logit multinomial Bayesian regression to assess the relationship between the shot location in the court and some covariates such as the presence of key players from the same team in the court, defensive strength, playing home or away, etc. They also assessed the significance of these predictors on shooting frequency and efficiency in different regions measured using polar coordinates (the distance to the basket and the angle). They used conditionally autoregressive (CAR) and two neighbor relation CAR priors to achieve a smoother surface borrowing information from neighbors.

Shortridge, Goldsberry, and Adams (2014), for example, extended this idea by computing the spatial variability in scoring within an empirical Bayesian framework. They used a shrinkage approach to obtain a smoother scoring probability surface. Spatial shooting patterns have been also modelled using a log-Gaussian Cox process (LGCP) (Miller et al. 2014; Franks et al. 2015). Also in basketball, Cervone et al. (2016) suggested the use of a conditional autoregressive model to compute the expected score as a function of factors such as the player in possession of the ball, defensive stance, etc. Here the CAR prior accounts for spatial autocorrelation by adding a random effect for the player. See Cervone et al. (2016) for more details.

In baseball, Jensen et al. (2009b) used a hierarchical model to estimate the probability of a defensive player catching a ball considering among other parameters the location of the player. Pitch horizontal and vertical coordinates around the strike zone were considered for estimating the probability of a strike Deshpande and Wyner (2017). Yousefi and Swartz (2013) introduced a metric for golf putting performance considering the distance to the pin and the angle. This approach splits the “green” area into eight slices centred at the pin, where the within slice probability of scoring is dependent on the distance.

3.1.4 Other methods

Variations and extensions of the above general classes of models have also been used in sports analytics. For instance, Swartz, Gill, and Muthukumarana (2009) developed a simulator for predicting one-day cricket games outcomes using a latent variable approach. Ofoghi et al. (2013) considered the selection of racing athletes in the multi-event cycling omnium using Bayesian Networks.

See also for example Stenling et al. (2015) for Bayesian structural equation model (SEM) with application to sport psychology settings. The authors estimated several latent factors associated with the athlete’s behavioral regulation measured with the Sport Motivation Scale. They found a better fit to the data using a Bayesian approach compared to the traditional method based on the maximum likelihood.

Empirical Bayes is a very popular statistical technique in which the prior distribution in Eq. (1) is obtained from observed data e.g. obtained from previous games or from players with similar characteristics or the same position. This makes EBA to be considered as pseudo or not fully Bayesian.

EBA models are generally hierarchical where the parameter of interest is assumed to come from a common pooled distribution. EBA is particularly useful in part because it leads to fast computation. For decades EBA has enjoyed consistent use in baseball for modeling batting averages Efron and Morris (1973); Brown (2008); Neal et al. (2010); Jiang et al. (2010). For instance, estimates of baseball averages of players with a few at-bats can be obtained using the league average as a prior distribution. See an extended discussion in Robinson (2017). In basketball, spatial models of shooting effectiveness are commonly built using EBA because a smooth scoring intensity surface can be obtained (e.g. Shortridge et al., 2014). Another example can be found in Baker and McHale (2017) who recently suggested an approach for estimating player strengths based on empirical Bayes.

Finally, another area that is of interest in sports analytics is experimental design. See, for example, Glickman (2008) who developed a Bayesian locally optimal design approach for knockout-based competitions.

3.2 Sports

Whereas the previous section focused on the methods and gave examples of sports in which those methods had been employed, it is also of interest to focus on the sports and review the methods that have been employed. This section presents such a discussion for the three most common sports in the literature review, namely basketball, football, baseball and includes relevant issues like streakiness and doping.

3.2.1 Basketball

Basketball is one of the most popular, dynamic and competitive sports worldwide. In this game, two teams of players interact in different locations of the court according to a given set of rules with the aim of scoring in the opponent team’s basket.

An early paper by Reich et al. (2006) used logit multinomial Bayesian regression to assess the relationship between the shot location in the court and some covariates such as the presence of key players from the same team in the court, defensive strength, playing home or away, etc. The inference in this work was limited to a one NBA player (Sam Cassell) during the season 2003–2004.

The adoption of the SportVU player tracking technology after 2010 in the NBA marks a milestone in basketball analytics. It enhanced the individual statistics levels by capturing (at 25 frames per second) the coordinates of each player (x, y) and the ball (x, y, z). Detailed statistics such as the players’ distance covered in a game and the speed developed when approaching the basket, became available as result.

The success of the paper by Goldsberry (2012) on spatial modeling of shooting effectiveness motivated a large number of publications in this area. See e.g. Shortridge et al. (2014) who suggested metrics like the expected number of points per shot for a given location within the offensive court and points above league average for each player.

Other spatial models that explicitly incorporate spatial information have also been proposed. For example, Miller et al. (2014) employed a log-Gaussian Cox process as a spatial prior and combined this with dimension reduction to obtain the players’ shooting intensities and the identification of shooting habits.

Cervone et al. (2016) introduced a statistic called expected possession value (EPV) plotted as the expected number of points in a given offensive play that a team might score versus time (from 0 to 25 seconds). This metric depends on the player in possession of the ball, its location, the defense placement, etc. The contribution of players to the team’s win probability was assessed by Deshpande and Jensen (2016) using Bayesian linear regression model. See also Lam (2018) who used a Bayesian regression to predict the outcome of NBA basketball games.

3.2.2 Football

Numerous studies have attempted to model the outcome of football matches (Rue and Salvesen 2000; Karlis and Ntzoufras 2008; Baio and Blangiardo 2010). For instance, Karlis and Ntzoufras (2008) used a Poisson difference distribution to model the difference of goals in football games using data from the English Premier League. Let Xi and Yi be the score of the home and away teams in the ith game. They defined a statistic Zi as follows

(12)Zi=XiYiPD(λ1i,λ2i)

where PD is the Poisson difference distribution with rates λ1i and λ2i, that are obtained using the following log-linear link functions:

(13)log(λ1i)=μ+H+AHTi+DATi
(14)log(λ2i)=μ+AATi+DHTi

where μ is a constant parameter. H is the home team coefficient. A and D are the parameters for the team attack and defense.

Baio and Blangiardo (2010) suggested some improvements on Karlis and Ntzoufras (2008) model to predict football results in the Italian Serie A championship. They obtained the number of goals from each team using a Poisson distribution rather than modeling the difference as suggested by Karlis and Ntzoufras (2008). The model produces estimates of the posterior distributions of attack and defense.

Suzuki et al. (2010) suggested a Bayesian model for forecasting the results of the 2006 World Cup taking into consideration the FIFA World Ranking. In this approach, the number of goals on each team is fit using Poisson distributions.

(15)XAB|λAPois(λARARB)
(16)XBA|λBPois(λBRBRA)

The values RA and RB are the ratings of the team A and B, respectively. The prior distributions for λA and λB were set as Gamma distributions, and expert knowledge was incorporated via elicitation. This approach does not consider other relevant variables such as the offensive/defensive skills.

Other papers used Bayesian methods for assessing the importance of player skills (Thomas, Fellingham, and Vehrs 2009), functional performance (Carvalho et al. 2017), optimum substitution times (Silva and Swartz 2016).

3.2.3 Baseball

A wide range of Bayesian techniques has been used in baseball. Predicting batsmen averages has fascinated researchers and statisticians for a long time and “several authors recently have taken a swing at the subject” (Neal et al. 2010). For instance, Efron and Morris (1973); Brown (2008); Neal et al. (2010); Jiang et al. (2010) predicted baseball averages within an empirical Bayes approach. In another examples (Albert 1993, 2008) used hidden Markov models for assessing streakiness among batsmen.

Jensen et al. (2009a) used a log-linear hierarchical model to predict player’s number of home runs per season. Age, player position and home ballpark are predictors included in the model along with previous seasons performance. A mixture model on the intercept term is used to create groups of home run hitters (elite and non-elite). The probability that a hitter is a member of the elite group is determined using a hidden Markov model. This model was shown to have better predictive accuracy than other competing methods. McShane et al. (2011) used also a hierarchical Bayesian model for the selection of performance variables that better describes offensive abilities.

A good defense is critical for winning games. However, this aspect is difficult to quantify since the traditional assessment is rather subjective, making it hard to compare the contribution of the players. Jensen et al. (2009b) addressed this issue using a Bayesian probit regression model for assessing the fielder’s effectiveness. They computed the player’s contribution to the defense in terms of runs saved, compared to the rest of the players playing the same position. The model predicts the catching probability given the location of the defender in the field, the velocity of the ball and direction that the fielder has to move to (forward or backward). It would be interesting to see an extension of this analysis considering baseball park’s constraints.

Healey (2017) suggested new statistics for players performance based on batted-ball parameters (speeds, vertical and horizontal angles) within the Bayesian philosophy. Probability density estimates are obtained using a non-parametric approach. In another example, Bendtsen (2017) suggested Bayesian networks for modeling career regimes.

3.2.4 Other team sports

In ice hockey, Thomas (2006) modeled the scoring probability as a continuous-time Markov process, and Gramacy, Jensen, and Taddy (2013) introduced a Bayesian logistic regression model for evaluating the impact of hockey players on team’s scoring. This last model is an alternative to the plus-minus approach and is based on a Laplace prior for the regression coefficients to facilitate variable selection, which is required in these high dimensional regression problems characterized by a large number of players. See also Thomas et al. (2013) who introduced a method for determining players abilities by modelling the team’s scoring rate as a semi-Markov process using hazard functions.

Giles et al. (2017) used a Bayesian regression model to assess the association between mental toughness and behavioral perseverance accounting for physical fitness in Australian rules footballers. They found association between these variables except in presence of fatigue. Multiple examples can also be found in cricket. See e.g. Damodaran (2006); Brewer (2008); Swartz et al. (2009); Boys and Philipson (2018).

3.2.5 Other related issues

Streakiness: Another cluster of publications addressed the issue of streakiness (also known as the hot hand phenomenon). Say, for example, when a baseball player shows a pattern indicating a substantially larger (than average) proportion of hits (successes) in a period of time. This apocryphal phenomenon is supposed to be experienced by athletes during the season and it has been largely studied among others by Gilovich, Vallone, and Tversky (1985); Albright (1993); Bar-Eli, Avugos, and Raab (2006).

Albert (1993), for example, inspired by the thesis of Albright (1993), used two-state hidden Markov chains while Albert (2008) employed the Bayes factor for detecting non-random changes in the batting performance. A similar approach was followed by Wetzels et al. (2016) who analyzed the streakiness rates in basketball. Yang (2004) suggested a Bayesian binary segmentation method for analyzing consecutive successes or failures. This method relies on the Bayes factor to assess the change in the success rates. The author analyzed several popular events considered to be the result of a streaky performance in basketball, baseball and golf. Whether the player missed the previous shot has been considered by Reich et al. (2006) as a predictor of the shooting frequency and location in the court in basketball. However, they found no relationship between them.

Doping: Antidoping studies on biological markers are generally addressed using a longitudinal approach within a Bayesian framework to account for within and between athlete variation. Elements of Bayesian inference are particularly useful, ranging from the established population-based reference antidoping approach to an individual passport system.

Sottas et al. (2006), for example, suggested a method for the detection of abnormal T/E (testosterone glucuronide/epitestosterone glucuronide) ratio values. This approach compares the test results against a cutoff threshold obtained using Bayesian inference and the estimated population and intraindividual mean and coefficient of variation. Robinson et al. (2007) extended this approach for the detection of another illegal drug (recombinant human erythropoietin) and Schulze et al. (2009) added genotype (UGT2B17) as a predictor into a Bayesian framework suggested by Sottas et al. (2006) to achieve an increased test sensitivity. Bayesian inference is also used by Van Renterghem et al. (2011) for the detection of testosterone based on new biomarkers.

Relative age effect: The Relative Age Effect (RAE) establishes that children/athletes who were born in the first months after the school year cutoff have more chances of success. Ishigami (2016) used a Poisson Bayesian regression model to investigate the impact of the RAE and birthplace on the chances of becoming a professional athlete in Japan. The author reported that those who were born in the first month after the cutoff were three times more likely to become a professional athlete.

3.3 Software for Bayesian computation

Bayesian computational techniques are included in most statistical software packages. We present a summary of the most popular software for conducting Bayesian analysis in sports data science, according to the papers we reviewed (Figure 2). For inclusion, the author(s) had to clearly state the name of the software package employed.

Figure 2: Most popular software used for Bayesian analyzes in sports science.

Figure 2:

Most popular software used for Bayesian analyzes in sports science.

In the papers we reviewed, 𝖱 was by far the most popular software, accounting for approximately half of the total mentions. MATLAB (2017), 𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Lunn et al. 2000) and 𝖲𝗍𝖺𝗇 (Stan Development Team 2017) completed the top four. Curiously the Python language (Python Software Foundation 2017) so far does not seem to be popular among sports scientists. The most commonly mentioned packages within the 𝖱 environment were 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin, Quinn, and Park 2011) and 𝗋𝗃𝖺𝗀𝗌 (Plummer 2016), followed by 𝖽𝖾𝗉𝗆𝗂𝗑𝖲𝟦 (Visser and Speekenbrink 2010), 𝖱𝟤𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Sturtz, Ligges, and Gelman 2005), 𝗋𝗌𝗍𝖺𝗇 (Stan Development Team 2018), 𝖧𝗂𝖽𝖽𝖾𝗇𝖬𝖺𝗋𝗄𝗈𝗏 (Harte 2017) and 𝖻𝗋𝗆𝗌 (Bürkner 2017).

3.4 Summary of methods and applications

The variety and complexity of Bayesian statistical techniques applied to sports science problems have increased substantially over the last 15 years. To gain a better insight, we present a summary of the research articles in the Appendix (Table 3). We grouped them by sport (baseketball, football, etc) or category (doping, streaking, etc). We present first team sports followed by individual ones.

The column method refers to a classification from Table 1. We also include the statistical software/package used for the computations. In the case of R packages, some authors did not mention the version. Therefore, we cite here the most recent. The last column refers to the sources of the data specifying the competition, the season(s) and the sample size if mentioned.

Table 1:

Symbols and definitions.

SymbolTechnique
BHMBayesian hierarchical modeling
BRBayesian regression (e.g. logistic, multiple, etc)
LSLongitudinal studies
EBAEmpirical Bayesian approach
SASTASpatial and/or spatio-temporal analysis
TSTime series
BNBayesian networks
HMMHidden Markov model
MCMarkov chain
BNPBayesian nonparametric, Bayesian survival models, etc
OtherOther techniques, including Bayesian structural
    equation modeling

The contingency Table 2 contains the use of the statistical technique across time. The column Accounting for time (AT) contains times series, longitudinal studies and HMM. Accounting for space and time (AST) comprise the articles considering spatial and temporal association. The evolution per year is shown in Figure 3.

Table 2:

Cross-tabulation of the Bayesian technique vs. the publication period.

ATASTBHMBREBAOtherSum
(1985, 2005]2011048
(2005, 2009]45852125
(2009, 2013]13552420
(2013, 2018]4812133949
Sum11162624718102

  1. AT, Accounting for time; AST, accounting for space and time; BHM, Bayesian hierarchical models; BR, Bayesian regression; EBA, empirical Bayesian approach

Figure 3: Evolution of the Bayesian technique per year.

Figure 3:

Evolution of the Bayesian technique per year.

Bayesian hierarchical models (BHM) and Bayesian regression (BR) are the most popular techniques, followed by methods accounting for space and time. The frequencies during the period 2013–2018 are approximately similar to those from 1985 to 2013. This shows a tremendous rise in the use of Bayesian methods. Note however that we do not know the growth rate of scientific articles in sports statistics (frequentists + Bayesians).

Figure 4 shows the number of publications in each sports. Note that approximately 50% of the publications where on three team sports (basketball, baseball and football).

Figure 4: Number of papers on each sport including the doping category. The category others includes one mention of the following sports: American football, athletics, Australian rules football, ball-games, decathlon, free-weight, frisbee, marathon, multiple sports, Paralympic sports, rowing, rugby union, running, skiing, triathlon and wrestling.

Figure 4:

Number of papers on each sport including the doping category. The category others includes one mention of the following sports: American football, athletics, Australian rules football, ball-games, decathlon, free-weight, frisbee, marathon, multiple sports, Paralympic sports, rowing, rugby union, running, skiing, triathlon and wrestling.

4 Discussion

In recent years a growing number of scientific publications have been showing the benefits, the potential and the limitations of the Bayesian philosophy in sports statistics (Ivarsson et al. 2015; Gucciardi and Zyphur 2016; Gucciardi et al. 2016; Mengersen et al. 2016). These benefits include the capacity to model complex sports problem and to make predictions taking into consideration uncertainty. In conducting this review, we found a substantial number of publications in multiple areas and applications ranging from golf, rugby, basketball, cricket, etc.

This study was designed to provide an integral characterization of the state of the art of Bayesian sports statistics as a rapidly maturing discipline. To the best of our knowledge, this is the most comprehensive review undertaken on Bayesian methods in sports statistics. We can group the majority of the reviewed articles according to the problem they try to solve as follows. They focus on the:

  1. identification of factors or covariates that contribute to scoring, winning or to a better performance,

  2. forecasting and prediction,

  3. between and within-season spatial and temporal effectiveness,

  4. players interaction and dynamics,

  5. unusual streaky outcomes (streakiness),

  6. developing of new metrics,

  7. home court advantage, ball possession, and bias assessment of referees judgment, tournaments design,

  8. player’s abilities, rankings, player’s paired comparisons and contributions to their teams in attack and defensive settings,

  9. optimization of resources such as substitution, roster, athlete selection, batting order, players placements on the field,

  10. training regimes effectiveness, endurance, mental toughness,

  11. visualizations and model comparison,

  12. wearable technology, activities identification and pattern recognition,

  13. doping.

We found a tremendous development and a large proportion of the papers dealing with data from major professional sports leagues in the United States (MLB and NBA), in part because these have been generating high-resolution data for many years.

Similarly, far more research was identified on team sports than in individual ones, possibly because team sports are more complex and statistically richer. Although the articles considered represent almost every single continent, they were mostly concentrated in the United States, Australia, Canada, the United Kingdom, and Sweden. A question in our minds before conducting this research was whether these contributions were by sports scientist or by statisticians. We found that most of them have been contributed by statisticians and data scientists. As pointed out by Bernards et al. (2017) “most current sports scientists are not trained in Bayesian methods” (yet). A large number of these publications fall within Swartz (2018) second criteria for a good sports paper: “they address a real sporting problem” and therefore they are considered applied research.

We identified some well-established niches where specific Bayesian models are intensively used. These included Bayesian longitudinal models in anti-doping studies and log-linear models for modelling football game outcomes and EBA for baseball averages estimation. We found great interest in the search for the greatest athletes in multiple sports, e.g. athletics (Stephenson and Tawn 2013), golf (Baker and McHale 2015), tennis (Baker and McHale 2017), chess (Glickman 1999).

Bayesian methods are not a panacea for every data analysis problem. For instance, dealing with poor data or poor models will have limited success in the Bayesian context, despite some compensation can be made taking a Bayesian approach. Many of the methods based on MCMC can be computationally intensive. However, recent approaches like variational Bayes provide a substantial computational speed up (Ruiz and Perez-Cruz 2015; Blei, Kucukelbir, and McAuliffe 2017). Another limitation is the scalability of the model to big data problems, although the latest statistical advances are making possible to take advantages of modern parallel computing (Angelino, Johnson, and Adams 2016; Minsker et al. 2017).

Some challenges for future research are (1) dealing with increasingly complex datasets while exposing the methods/applications for an audience without a deep statistical background, (2) the creation of ready-to-use tools e.g. shiny apps, allowing practitioners and sports enthusiasts easy implementations and analysis, and (3) possibly embracing principles of open science e.g. open source codes, data and methodology. Good examples of the third point are Cervone et al. (2016) and Mengersen et al. (2016).

As suggested by Figure 4 a large number of sports are quite unexplored to date and the advent of high-resolution data in the next future will attract without doubt multiple research and collaborations. These high dimensional and big datasets will both motivate and benefit from the development of more efficient Bayesian MCMC methods in this area.

5 Conclusion

The Bayesian revolution has arrived in sports analytics. Since 2005 there has been a substantial increase in the Bayesian modeling in sports. We found that the number of papers between 2013 and 2018 was similar to those published in the previous three decades (1985–2013). Based on the review, Bayesian regression and Bayesian hierarchical models emerged as the most popular techniques, but other methods such as HMM and Bayesian spatial analysis are on the rise. More and more sports scientists are incorporating prior beliefs in the model and using posterior distributions to make inferences about parameters within a Bayesian paradigm. Recent new data sources have motivated the exploration of new methodologies and insights. Similarly, recent research advances have enhanced the way we summarize and make inference on sports by introducing new metrics and methods. These advances will continue to be complemented by the growing confidence of sports scientists to look beyond the traditional analytics boundaries and explore methods used in other fields.

Acknowledgments

This research was supported by the Australian Research Council (ARC) Laureate Fellowship Program and the Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS) and by the project “Bayesian Learning for Decision Making in the Big Data Era” (ID: FL150100150). First Investigator: D. Prof. Kerrie Mengersen. Thanks to Jacinta Holloway who helped in the selection of the papers. We also thank Dr. Richard Boys for his insightful comments during the early stages of the article. The authors declare no competing interests.

Appendix

Summary of methods and applications

Table 3:

Summary of publications included in the review.

Author(s)MethodDescriptionSoftware/packageSportData
Reich et al. (2006)BR, BHM, SASTALogit multinomial regression to assess the relationship between the shot location and frequency (response variables) and predictors such as defensive blocks, home advantage, etc. Conditionally autoregressive (CAR) and two neighbor relation CAR (2NRCAR) priors are used to achieve a smoother surface𝖱Basketballshot chart data of the NBA Season 2003–2004 of the player Sam Cassell (Minnesota Timberwolves)
Shortridge et al. (2014)EBA, SASTAIt computes the spatial effectiveness of shooting using empirical Bayesian smoothing rate estimates. It provides estimates of the expected number of points per shots per area in the court𝖱, 𝖢𝗅𝖺𝗌𝗌𝖨𝗇𝗍 (Bivand 2017), 𝗌𝗉 (Bivand, Pebesma, and Gomez-Rubio 2013), 𝖶𝖾𝗂𝗀𝗁𝗍𝗌 (Pasek et al. 2016)BasketballESPN data from NBA (2011–2012) from made and missed field shots. They used the locations in Cartesian coordinates of the field goals
Miller et al. (2014)BHM, SASTAShooting intensity modeling and players shooting habits identification using a log Gaussian Cox process (LGCP). They suggested a dimensionality reduction approach via non-negative matrix factorization𝖱, 𝖭𝖬𝖥 (Gaujoux and Seoighe 2018)BasketballMade and failed shoots obtained from the optical player tracking data. NBA season 2012–2013 regular season
Franks et al. (2015)SASTA, BR, HMMDefensive effectiveness assessment using spatial and spatio-temporal analysis and HMM𝖲𝗍𝖺𝗇BasketballOptical player tracking data from the NBA season 2013–2014. The locations of the players are obtained from cameras and recorded at 25 frames per second
Lamas et al. (2015)BHMBayesian inference to compute the outcome probabilities based on offensive-defensive actions𝖱, 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin et al. 2011)BasketballData obtained using video from six play-off games of Liga ACB in Spain (2010–2011). n = 1548 space creation dynamics (SCD) and space protection dynamics (SPD)
Deshpande and Jensen (2016)BRBayesian linear regression model for assessing the contribution of players to the team’s win probability𝖱, 𝗌𝗈𝗇𝗈𝗌𝗏𝗇 (Gramacy 2017a)BasketballPlay-by-play ESPN data from the NBA (2006–2014)
Cervone et al. (2016)SASTA, MCComputation of the expected possession value (EPV) using a Markov model𝖱, 𝖱-𝖨𝖭𝖫𝖠BasketballNBA season 2013–2014 Optical player tracking data from the NBA season 2013–2014 from STATS LLC. They provide a sample game dataset (Miami Heat vs. Brooklyn Nets)
Lam (2018)BRBayesian regression to predict the teams winning probabilities based on past games and the players’ performance𝖯𝗒𝗍𝗁𝗈𝗇BasketballNBA seasons 2013–2015. Data from Basketball-Reference consisting of 17 metrics e.g. field goals per minute, 3-point field goals per minute, etc.
Bar-Eli and Tenenbaum (1988)BFPsychological crisis assessment using the Bayesian likelihood ratioBasketballQuestionnaire from 28 basketball experts
Wetzels et al. (2016)HMMHidden Markov model for analyzing the streakiness rate𝖱 and 𝖧𝗂𝖽𝖽𝖾𝗇𝖬𝖺𝗋𝗄𝗈𝗏 (Harte 2017)Basketball and psychology(1) basketball free-throw shooting from the NBA seasons 2005–2010 and (2) a visual discrimination task (4 participants)
Efron and Morris (1973)EBABaseball batting averages prediction to illustrate the use of the James-Stein estimatorBaseballProportion of hits in the first 45 at bat from fourteen baseball players in the MLB season 1970
Albert (1993)BHM, MCHitting streaks probability estimation using a two states Markov model and a Bayesian hierarchical modelBaseballMLB 1988–1989 season. n = 200 players, 100 from each season (1988 and 1989), being 50 from each league per year
Albert (2008)BHMAssessment of the hitting streakiness by means of Bayesian inferenceBaseballHits and outs from 287 players from the MLB season 2005
Brown (2008)BHM, EBAPrediction of baseball batting averages using empirical and hierarchical Bayes approachBaseballFirst and second half of season 2005 in the MLB ( number of hits and outs)
Jensen et al. (2009b)BR, SASTAPlayer’s defensive performance assessment using empirical Bayes approach and probit regressionBaseballHigh-resolution data of locations of batted balls (MLB seasons 2002–2005) from Baseball Info Solutions. n120,000
Jiang et al. (2010)EBABaseball batting averages using empirical Bayes approach and linear modelsBaseballnumber of hits and at-bats from the MLB season 2005. Players with more than 11 at bats
Neal et al. (2010)EBA BREmpirical Bayes approach to predict baseball averages in the second half of the season based on the first halfBaseballMLB season 2004–2005. Hits and at- bats data obtained from https://www.retrosheet.org/
Jensen et al. (2009a)BHM HMMHome run hitting prediction using a log-linear hierarchical model. They used a hidden Markov model for separating hitters into two categories (elite and non-elite)BaseballMLB seasons 1990–2005 from Lahman Baseball Database. n = 10,280 player-years
McShane et al. (2011)BHM HMMThey implemented a hierarchical Bayesian variable selection model for a better assessment of players’ abilitiesBaseballAppelman database MLB 1974–2008 seasons. A 50 offensive stats (singles, doubles, home runs, etc) from n = 8596 player-seasons and 1575 players
Albert (2016)BRRandom effects model for estimating batting performance𝖱BaseballLahman MLB Baseball Database from the 2011 season. Strikeouts, home runs, hit-in-plays and out-in-plays
Ishigami (2016)BR; BHMPoisson Bayesian regression to estimate the effect of Relative Age Effect (RAE) and place where athletes were born𝖱, 𝗋𝗃𝖺𝗀𝗌 ; 𝖲𝗍𝖺𝗇Baseball and footballSeason 2012. 12 teams Nippon Professional Baseball Organization (NPB); and 198 players. Japan Professional Football League (J. League); 277 players
Bendtsen (2017)BNBayesian networks for modeling career regimes𝖱, 𝖽𝖾𝗉𝗆𝗂𝗑𝖲𝟦 (Visser and Speekenbrink 2010)BaseballA random sample of 30 players that debuted during 2005 or after obtained from www.retrosheet.org
Deshpande and Wyner (2017)BHM, BRBayesian hierarchical model and Bayesian logistic regression for pitch framing𝖱, 𝖲𝗍𝖺𝗇 (Stan Development Team 2017), 𝗋𝗌𝗍𝖺𝗇 (Stan Development Team 2018)BaseballHorizontal and vertical coordinates obtained from the high-resolution pitch tracking dataset MLB PITCHf/x (seasons 2011–2015)
Healey (2017)BRSuggested new statistics for players performance based on batted-ball parameters (speeds, vertical and horizontal angles) within the Bayesian philosophy. Probability density estimates are obtained using a nonparametric approach𝖱BaseballMLB Sportvision’s HIT f/x (Season 2014) comprising measurements from more than 100,000 batted-balls
Rue and Salvesen (2000)BR TSDynamic log-linear Poisson model to predict games outcomes. This time dependent approach considers the teams attack and defense strengths and a psychological effect𝖫𝖠𝖯𝖠𝖢𝖪 𝗅𝗂𝖻𝗋𝖺𝗋𝗒 (Anderson et al. 1999)FootballPremier League and division 1 during 1993–1995 and 1997–1998 seasons
Karlis and Ntzoufras (2008)BHMBayesian modelling of the match differences using the Poisson difference distribution𝖱, 𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Lunn et al. 2000)Footballgoals/game scored in the English Premiership by the 20 teams in the season 2006–2007
Baio and Blangiardo (2010)BHMBayesian log-linear random effect model to predict football results𝖱, 𝖶𝗂𝗇𝖡𝖴𝖦𝖲Footballgoals/game scored in the Italian Serie A championship. 1991–1992 and 2007–2008 seasons. 20 teams
Suzuki et al. (2010)BRBayesian log-linear Poisson model for predicting match outcomes based on expert’s opinions and the team’s rankingsFootballgoals scored by each of the 32 teams competing in the 2006 Soccer World Cup
Shahtahmassebi and Moyeed (2016)BHMGeneralized Poisson difference distribution (GPDD) for modeling goal differences𝖱FootballGoals scored in Italian Serie A (2012–2013) obtained from ESPN. 20 teams and 380 matches
Koopman and Lit (2015)TSTime series analysis using bivariate Poisson distribution for modeling teams goal differences𝖮𝗑𝗆𝖾𝗍𝗋𝗂𝖼𝗌FootballEnglish football Premier League (2003–2012 seasons). Goals scored in the 3420 matches
Thomas et al. (2009)BRBayesian linear regression for assessing the importance of skills𝖬𝖠𝖳𝖫𝖠𝖡 (MATLAB 2017)FootballVideo annotation data from Women National Collegiate Athletic Association Division I. n = 10 games
Carvalho et al. (2017)LS, BHMBayesian multilevel model for fitting functional performance and growth curves for body mass and stature𝖱, 𝖻𝗋𝗆𝗌 (Bürkner 2017), 𝖲𝗍𝖺𝗇 (Stan Development Team 2017)FootballGrowth in body size and functional capacities in n = 33 under-11 youth soccer players from a Spanish first division club
Silva and Swartz (2016)BRBayesian logistic regression to determine optimum substitution times𝖶𝗂𝗇𝖡𝖴𝖦𝖲FootballEnglish Premier League (2009–2010), the German Bundesliga (2009–2010), the Spanish La Liga (2009–2010), the Italian Serie A (2009–2010), North America’s Major League Soccer (2010) and the 2010 World Cup
Razali et al. (2017)BNBayesian networks for predicting the matches’ results𝖶𝖾𝗄𝖺 (Hall et al. 2009)FootballEnglish Premier League (EPL) (2010–2011, 2011–2012 and 2012–2013). The data from the 20 teams was obtained from http://www.football-data.co.uk
Swartz et al. (2009)BHMDeveloped a simulator of the game outcome based on a Bayesian latent variable model𝖶𝗂𝗇𝖡𝖴𝖦𝖲Cricket472 games comprising 257,922 bowled balls of the ICC from Jan 2001 to Jul 2006
Koulis et al. (2014)HMMPoisson HMM to model batting performance in cricket. They used a Bayesian approach with multiple states related to the batsman’s performance, where the observed variable is the number of runs produced per gameCricketHistorical data from the top 20 ODI batsmen ranked at July 7, 2013, obtained from www.espncricinfo.com)
Stevenson and Brewer (2017)Other, BHMUsing a Bayesian survival approach they assessed the hypothesis that batting is a more difficult task at the beginning of the game. The constructed model allows the estimation of batting abilities during the batting stagesNested sampling implemented in Julia (Bezanson et al. 2012)CricketTest Match data (batsmen from New Zealand during 1990s and 2000s) from Statsguru and Cricinfo website
Boys and Philipson (2018)BRAdditive log-linear model for ranking cricketers, accounting for factors such as the year, player’s age, etc.𝖱, 𝖼𝗈𝖽𝖺 (Plummer et al. 2006)Cricketn = 2855 test match cricketers from 1877-August 2017
Thomas (2006)MCModeled the scoring probability as a continuous time Markov processIce hockeyManual annotation of 18 games from the Harvard Men’s Varsity Hockey team (2004–2005 season)
Gramacy et al. (2013)BR, BHMAn approach for evaluating the impact of players performance on scoring using a logistic regression model𝖱; 𝗋𝖾𝗀𝗅𝗈𝗀𝗂𝗍 Gramacy (2017b) 𝗍𝖾𝗑𝗍𝗂𝗋 (Taddy 2013)Ice hockeyPlayers on ice of the games from 2007–2011 seasons obtained from www.nhl.com. A total of 1467 players and 18,154 goals recorded
Thomas et al. (2013)BHMTeam’s scoring rate as a semi-Markov process using hazard functions𝖱 and 𝖢++Ice hockey (NHL)Shifts from season 2007–2008 until 2011–2012. 30 teams
Glickman and Stern (1998)TSA Bayesian state-space approach for predicting games scores differences based on first-order auto-regressive processAmerican footballOutcomes of the 28 teams in the National Football League (NFL) seasons 1988–1993
Cafarelli et al. (2012)BHM, BRBayesian logistic models for modeling the probability of converting a third down play𝖶𝗂𝗇𝖡𝖴𝖦𝖲American footballyards to go, outcome of each first down by team from National Football League (NFL) season 2007
Revie et al. (2017)BR, BHMBayesian linear mixed model and support vector machine (SVM) to model players’ fitness as a function of players’ perceptions when direct fitness measurements are not frequently possible𝖱Rugby unionQuestionnaire of 38 professional players from Jan-Apr 2012 and data from counter movement jump (CMJ) tests
Miskin et al. (2010)BR, MCVolleyball’s skill importance assessment using Markov chains and Bayesian logistic regression. This allow obtaining importance scores. In the Markov process the transition probability matrix were obtained using a Dirichlet priorVolleyballServes, passes, digs, and attacks during the 2006 competitive season of a women’s division I
Mendes et al. (2018)BHM, LS, BRLongitudinal hierarchical approach for modeling accumulated hours of structured volleyball and other sports practice𝖱, 𝖻𝗋𝗆𝗌, 𝖲𝗍𝖺𝗇VolleyballQuestionnaire of n = 78 elite male players from Brazilian volleyball clubs
Bar-Eli et al. (1995)BFBayesian likelihood ratio for assessing the referee’s behavior in competitionsBall-gamesQuestionnaire by eighty professional male athletes from Israel
Yang (2004)BFBayesian binary segmentation method for analyzing streakiness (consecutive successes or failures). Bayes factor tests are used to assess the change in the success rate𝖱𝖭𝖡𝖨𝖭 from the International Mathematics and Statistics LibraryTeam and individuals sports: basketball, baseball and golfSequence of Bernoulli trials (win/loss) from Golden State Warriors in the NBA (2000–2001). Tiger Woods’ sequence of wins or loss major PGA golf championships (1996–2001). Barry Bonds home run hitting pattern in the MLB season 2001
Murray (2017)BHMTeam’s score-augmented win-loss Bayesian model𝖱Ultimate (frisbee)2016 USA Ultimate Club Division results
Mengersen et al. (2016)BRBayesian inference approach for small effects as alternative of the traditional magnitude-based inference suggested by Batterham and Hopkins (2006)𝖱, 𝖡𝖱𝗎𝗀𝗌 (Thomas et al. 2006), 𝖱𝟤𝖶𝗂𝗇𝖡𝖴𝖦𝖲 (Sturtz et al. 2005), 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄 (Martin et al. 2011)TriathlonThree variables were measured (hemoglobin mass, submaximal running economy and maximum blood lactate concentration) in 24 participants in 3 groups (live high-train low, and intermittent hypoxic exposure and placebo
Wimmer et al. (2011)BRIt uses semi-parametric Latent Variable Models to fit decathlon performance outcomes using age and month of the competition as covariates𝖱 𝖬𝖢𝖬𝖢𝗉𝖺𝖼𝗄Decathlon3103 competitions from the world’s best performance records (1998–2009)
Pradier et al. (2016)BNPBayesian Nonparametric Models (BNP) approach for modeling the performance of marathon runners𝖬𝖠𝖳𝖫𝖠𝖡MarathonNew York City (2006–2011, 249,899 runners), Boston and London (2010–2011, 117,255 runners) marathons
Stephenson and Tawn (2013)OtherBayesian inference based on extreme value methods to identify best athlete performance assuming a exponential decreasing trendAthleticsMale/female annual best times in Olympic distance track events (100 m, 200 m, etc) from 1908–2010
Dadashi et al. (2013)HMMSwimming temporal phases modeled using HMMSwimming (breaststroke)7 well-trained swimmers (4 males and 3 females) equipped with wearable inertial measurement units
Dadashi, Millet, and Aminian (2015)BRBayesian approach for estimating cycles swimming velocity using data from wearable technologySwimming (breaststroke)Eight professional and seven recreational swimmers wearing IMU
Kovalchik and Albert (2017)BHMBayesian hierarchical model of serve routine (time-to-serve) considering the covariates point importance and the length of the previous rally. Point𝖱, 𝗋𝗃𝖺𝗀𝗌 (Plummer 2016)Tennis175 matches from the 2016 Australian Open using Hawk-Eye multi-camera
Baker and McHale (2017)EBAEmpirical Bayes model extension for estimating players’ strengthsTennisGrand Slams (1968–2016), 21,921 matches and 1123 players
Glickman and Hennessy (2015)BHMMultinomial logit model for rank ordering of competitors based on the extreme value distribution𝖱, 𝗋𝗃𝖺𝗀𝗌SkiingWomen’s Alpine downhill competitions (2002–2013)
Usami (2017)LSBayesian longitudinal method for paired comparisons based on the Bradley-Terry modelSumo wrestling10 wrestlers from the Japan Sumo Association (2005–2009)
Ofoghi et al. (2013)BNRacing athletes selection using machine learning techniques and Bayesian networks in the multi-event cycling omnium𝖶𝖾𝗄𝖺Cycling (omnium)Australian Championships 2009, World Championships 2007–2010, the UCI World Cups (2010–2011), and the Oceania Championships 2010
Yousefi and Swartz (2013)SASTA, BHMBayesian spatial model for estimating the expected number of putts within the green area according to the distance to the hole and the angle. This approach, based on a truncated-Poisson distribution, allows assessing putting performance accounting for the difficulty of puttsGolfShotLink data from the PGA Tour 2012
Vetter, Yu, and Foose (2017)BR, BHMBayesian regression to assess the impact of several predictors (age, ability, training and intensity) on the training outcomes in four exercise types (muscular strength, speed, power and cardiorespiratory)𝖱VariousCombined data from 34 studies between 1984 and 2015
Percy (2013)BHMBayesian shrinkage method for class handicapping (that allows athletes to compete on equal terms). This method would allow reducing the actual large number of handicapping classes (often with just a few competitors) by grouping the competitors in smaller number of classes𝖤𝗑𝖼𝖾𝗅Paralympic sportsWomen’s 100m running finals and men’s 100m freestyle swimming. Paralympic Games Beijing 2008
Glickman (2008)BRBayesian model of paired comparisons in knockout-based competitions using the Thurstone-Mosteller approach. This method matches the competitors with the aim of maximizing the probability that the best player advances in the competition as much as possible. Given the strength of each competitor, the model computes the probability of one defeating the other, placing a multivariate normal prior on the strength𝖢Simulated data
Sottas et al. (2006)LSBayesian longitudinal analysis of blood samples to detect abnormal values of a biomarker𝖬𝖠𝖳𝖫𝖠𝖡 (MATLAB 2017)DopingTwo longitudinal studies: (1) double-blind study, 17 athletes and 332 observations. (2) 188 samples from 11 male athletes
Robinson et al. (2007)LSBayesian inference of longitudinal blood sample for doping detectionDoping135 blood profiles from 1039 samples from the three studies (elite athletes, amateur athletes and volunteers)
Schulze et al. (2009)LSLongitudinal Bayesian model considering genotype information for derivation of doping cut-off pointsDopingUrinary samples in 55 male volunteers having one, two or no allele of the UGT2B17 gene
Sottas, Saugy, and Saudan (2010)LS, BHMLongitudinal study of bio-markers based on Bayesian inferenceDoping432 urine samples from 28 participants
Van Renterghem et al. (2011)LSAdaptive model based on Bayesian inference for finding new bio-markers to be used in doping detection𝖬𝖠𝖳𝖫𝖠𝖡Doping42 urine samples from six healthy male volunteers
Stenling et al. (2015)OtherDiscusses the use of Bayesian structural equation modeling in sport psychology settings illustrated using data from a Sport Motivation Scale II. They reported a better fit to the date using this Bayesian approach compared to the traditional maximum likelihood𝖬𝗉𝗅𝗎𝗌 (Muthén and Muthén 1998-2012)Multiple team and individual sports380 subjects from high school and sport teams in Sweden
Tamminen et al. (2016)OtherEmotion regulation assessment using a multilevel Bayesian structural equation modeling approach. Emotion regulation at personal and team levels were found to be associated with athletes’ enjoyment and commitment𝖬𝗉𝗅𝗎𝗌Multiple team sportsn = 451 adolescent athletes from 45 teams in Ontario and British Columbia, Canada
Gucciardi et al. (2016)OtherSelf-reported mental toughness assessment accounting for cultural differences using Bayesian structural equation modeling and approximate measurement invariance𝖬𝗉𝗅𝗎𝗌Multiple individual and team sportsMale and female athletes from Australia (n = 353), China (n = 254) and Malaysia (n = 341)
Josefsson et al. (2017)OtherBayesian cross-sectional and longitudinal design for modeling mindfulness on rumination and emotion regulation𝖬𝗉𝗅𝗎𝗌Multiple sports172 male and 69 female elite athletes from Sweden
Giles et al. (2017)BRBayesian regression to assess the association between mental toughness and behavioral perseverance accounting for physical fitness. Although they found association between these variables, mental toughness was not a good predictor of behavioral perseverance in presence of fatigue𝖬𝗉𝗅𝗎𝗌Australian rules football38 male footballers from the West Australian Football League and Western Australian Amateur Football League

References

Albert, J. 1993. “A Statistical Analysis of Hitting Streaks in Baseball: Comment.” Journal of the American Statistical Association 88:1184–1188.Search in Google Scholar

Albert, J. 2008. “Streaky Hitting in Baseball.” Journal of Quantitative Analysis in Sports 4:1184–1188.Search in Google Scholar

Albert, J. 2016. “Improved Component Predictions of Batting and Pitching Measures.” Journal of Quantitative Analysis in Sports 12:73–85.Search in Google Scholar

Albright, S. C. 1993. “A Statistical Analysis of Hitting Streaks in Baseball.” Journal of the American Statistical Association 88:1175–1183.Search in Google Scholar

Anderson, E., Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, S. Hammarling, A. Greenbaum, A. McKenney, et al. 1999. LAPACK Users’ Guide (Third Ed.). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.Search in Google Scholar

Angelino, E., M. J. Johnson, and R. P. Adams. 2016. “Patterns of Scalable Bayesian Inference.” Foundations and Trends® in Machine Learning 9:119–247.Search in Google Scholar

Baio, G. and M. Blangiardo. 2010. “Bayesian Hierarchical Model for the Prediction of Football Results.” Journal of Applied Statistics 37:253–264.Search in Google Scholar

Baker, R. D. and I. G. McHale. 2015. “Deterministic Evolution of Strength in Multiple Comparisons Models: Who is the Greatest Golfer?” Scandinavian Journal of Statistics 42:180–196. .Search in Google Scholar

Baker, R. D. and I. G. McHale. 2017. “An Empirical Bayes Model for Time-Varying Paired Comparisons Ratings: Who is the Greatest Women’s Tennis Player?” European Journal of Operational Research 258:328–333. .Search in Google Scholar

Bar-Eli, M. and G. Tenenbaum. 1988. “Time Phases and the Individual Psychological Crisis in Sports Competition: Theory and Research Findings.” Journal of Sports Sciences 6:141–149. .Search in Google Scholar

Bar-Eli, M., N. Levy-Kolker, J. S. Pie, and G. Tenenbaum. 1995. “A Crisis-Related Analysis of Perceived Referees’ Behavior in Competition.” Journal of Applied Sport Psychology 7:63–80.Search in Google Scholar

Bar-Eli, M., S. Avugos, and M. Raab. 2006. “Twenty Years of ‘Hot Hand’ Research: Review and Critique.” Psychology of Sport and Exercise 7:525–553.Search in Google Scholar

Batterham, A. M. and W. G. Hopkins. 2006. “Making Meaningful Inferences about Magnitudes.” International Journal of Sports Physiology and Performance 1:50–57.Search in Google Scholar

Bendtsen, M. 2017. “Regimes in Baseball Players’ Career Data.” Data Mining and Knowledge Discovery 31:1580–1621. .Search in Google Scholar

Berger, J. O. 2013. Statistical Decision Theory and Bayesian Analysis. New York: Springer Science & Business Media.Search in Google Scholar

Bernardo, J. M. and A. F. Smith. 2009. Bayesian Theory. Volume 405, England: John Wiley & Sons.Search in Google Scholar

Bernards, J. R., K. Sato, G. G. Haff, and C. D. Bazyler. 2017. “Current Research and Statistical Practices in Sport Science and a Need for Change.” Sports (Basel) 5(4):87.Search in Google Scholar

Besag, J., P. Green, D. Higdon, and K. Mengersen. 1995. “Bayesian Computation and Stochastic Systems.” Statistical Science 10:3–41.Search in Google Scholar

Bezanson, J., S. Karpinski, V. B. Shah, and A. Edelman. 2012. “Julia: A Fast Dynamic Language for Technical Computing.” arXiv preprint arXiv:1209.5145.Search in Google Scholar

Bivand, R. 2017. classInt: Choose Univariate Class Intervals. , R package version 0.1-24.Search in Google Scholar

Bivand, R. S., E. Pebesma, and V. Gomez-Rubio. 2013. Applied Spatial Data Analysis with R. Second edition. New York, NY: Springer. .Search in Google Scholar

Blei, D. M., A. Kucukelbir, and J. D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112(518):859–877.Search in Google Scholar

Boys, R. J. and P. M. Philipson. 2018. “On the Ranking of Test Match Batsmen.” arXiv preprint arXiv:1806.05496.Search in Google Scholar

Brewer, B. J. 2008. “Getting Your Eye in: A Bayesian Analysis of Early Dismissals in Cricket.” arXiv preprint arXiv:0801.4408.Search in Google Scholar

Brown, L. D. 2008. “In-Season Prediction of Batting Averages: A Field Test of Empirical Bayes and Bayes Methodologies.” The Annals of Applied Statistics 2:113–152.Search in Google Scholar

Bürkner, P.-C. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80:1–28.Search in Google Scholar

Cafarelli, R., C. J. Rigdon, and S. E. Rigdon. 2012. “Models for Third Down Conversion in the National Football League.” Journal of Quantitative Analysis in Sports 8.Search in Google Scholar

Carvalho, H. M., J. A. Lekue, S. M. Gil, and I. Bidaurrazaga-Letona. 2017. “Pubertal Development of Body Size and Soccer-Specific Functional Capacities in Adolescent Players.” Research in Sports Medicine 25:421–436. .Search in Google Scholar

Cervone, D., A. D’Amour, L. Bornn, and K. Goldsberry. 2016. “A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes.” Journal of the American Statistical Association 111:585–599.Search in Google Scholar

Dadashi, F., A. Arami, F. Crettenand, G. P. Millet, J. Komar, L. Seifert, and K. Aminian. 2013. “A Hidden Markov Model of the Breaststroke Swimming Temporal Phases Using Wearable Inertial Measurement Units.” in Body Sensor Networks (BSN), 2013 IEEE International Conference on, IEEE, 1–6.Search in Google Scholar

Dadashi, F., G. P. Millet, and K. Aminian. 2015. “A Bayesian Approach for Pervasive Estimation of Breaststroke Velocity Using a Wearable IMU.” Pervasive and Mobile Computing 19:37–46.Search in Google Scholar

Damodaran, U. 2006. “Stochastic Dominance and Analysis of ODI Batting Performance: The Indian Cricket Team, 1989–2005.” Journal of Sports Science & Medicine 5:503.Search in Google Scholar

Deshpande, S. K. and S. T. Jensen. 2016. “Estimating an NBA Player’s Impact on his Team’s Chances of Winning.” Journal of Quantitative Analysis in Sports 12:51–72. .Search in Google Scholar

Deshpande, S. K. and A. Wyner. 2017. “A Hierarchical Bayesian Model of Pitch Framing.” Journal of Quantitative Analysis in Sports 13:95–112.Search in Google Scholar

Efron, B. and C. Morris. 1973. “Combining Possibly Related Estimation Problems.” Journal of the Royal Statistical Society. Series B (Methodological) 35:379–421.Search in Google Scholar

Franks, A., A. Miller, L. Bornn, K. Goldsberry. 2015. “Characterizing the Spatial Structure of Defensive Skill in Professional Basketball.” The Annals of Applied Statistics 9(1):94–121.Search in Google Scholar

Gaujoux, R. and C. Seoighe. 2018. The Package NMF: Manual Pages. , r package version 0.21.0.Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. Volume 2, Boca Raton, FL: CRC Press.Search in Google Scholar

Giles, B., P. S. Goods, D. R. Warner, D. Quain, P. Peeling, K. J. Ducker, B. Dawson, and D. F. Gucciardi. 2017. “Mental Toughness and Behavioural Perseverance: A Conceptual Replication and Extension.” Journal of Science and Medicine in Sport 21:640–645.Search in Google Scholar

Gilovich, T., R. Vallone, and A. Tversky. 1985. “The Hot Hand in Basketball: On the Misperception of Random Sequences.” Cognitive Psychology 17:295–314.Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48:377–394.Search in Google Scholar

Glickman, M. E. 2001. “Dynamic Paired Comparison Models with Stochastic Variances.” Journal of Applied Statistics 28:673–689.Search in Google Scholar

Glickman, M. E. 2008. “Bayesian Locally Optimal Design of Knockout Tournaments.” Journal of Statistical Planning and Inference 138:2117–2127.Search in Google Scholar

Glickman, M. E. and H. S. Stern. 1998. “A State-Space Model for National Football League Scores.” Journal of the American Statistical Association 93:25–35.Search in Google Scholar

Glickman, M. E. and J. Hennessy. 2015. “A Stochastic Rank Ordered Logit Model for Rating Multi-Competitor Games and Sports.” Journal of Quantitative Analysis in Sports 11. .Search in Google Scholar

Goldsberry, K. 2012. “Courtvision: New Visual and Spatial Analytics for the NBA.” in 2012 MIT Sloan Sports Analytics Conference.Search in Google Scholar

Gramacy, R. B. 2017a. monomvn: Estimation for Multivariate Normal and Student-t Data with Monotone Missingness. , R package version 1.9-7.Search in Google Scholar

Gramacy, R. B. 2017b. reglogit: Simulation-Based Regularized Logistic Regression. , r package version 1.2-5.Search in Google Scholar

Gramacy, R. B., S. T. Jensen, and M. Taddy. 2013. “Estimating Player Contribution in Hockey with Regularized Logistic Regression.” Journal of Quantitative Analysis in Sports 9:97–111.Search in Google Scholar

Gucciardi, D. and M. Zyphur. 2016. “Exploratory Structural Equation Modelling and Bayesian Estimation.” in An Introduction to Intermediate and Advanced Analyses for Sport and Exercise Scientists. United Kingdom: John Wiley & Sons, pp. 172–194.Search in Google Scholar

Gucciardi, D. F., C.-Q. Zhang, V. Ponnusamy, G. Si, and A. Stenling. 2016. “Cross-Cultural Invariance of the Mental Toughness Inventory Among Australian, Chinese, and Malaysian Athletes: A Bayesian Estimation Approach.” Journal of Sport and Exercise Psychology 38:187–202. .Search in Google Scholar

Gudmundsson, J. and M. Horton. 2017. “Spatio-Temporal Analysis of Team Sports.” ACM Computing Surveys (CSUR) 50:22.Search in Google Scholar

Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. “The WEKA Data Mining Software: An Update.” SIGKDD Explorations 11:10–18.Search in Google Scholar

Hand, D. J. and K. Yu. 2001. “Idiot’s Bayes–not so Stupid After All?” International Statistical Review 69:385–398.Search in Google Scholar

Harte, D. 2017. HiddenMarkov: Hidden Markov Models. Wellington: Statistics Research Associates., R package version 1.8-11.Search in Google Scholar

Healey, G. 2017. “Learning, Visualizing, and Assessing a Model for the Intrinsic Value of a Batted Ball.” IEEE Access 5:13811–13822.Search in Google Scholar

Ishigami, H. 2016. “Relative age and Birthplace Effect in Japanese Professional Sports: A Quantitative Evaluation Using a Bayesian Hierarchical Poisson model.” Journal of sports sciences 34:143–154.Search in Google Scholar

Ivarsson, A., M. B. Andersen, A. Stenling, U. Johnson, and M. Lindwall. 2015. “Things we Still haven’t Learned (So Far).” Journal of Sport and Exercise Psychology 37:449–461.Search in Google Scholar

Jensen, S. T., B. B. McShane, and A. J. Wyner. 2009a. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis 4(4):631–652.Search in Google Scholar

Jensen, S. T., K. E. Shirley, and A. J. Wyner. 2009b. “Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball.” The Annals of Applied Statistics 3(2):491–520.Search in Google Scholar

Jiang, W., C.-H. Zhang, et al. 2010. “Empirical Bayes In-Season Prediction of Baseball Batting Averages.” in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown. Beachwood, Ohio, USA: Institute of Mathematical Statistics, pp. 263–273. .Search in Google Scholar

Josefsson, T., A. Ivarsson, M. Lindwall, H. Gustafsson, A. Stenling, J. Böröy, E. Mattsson, J. Carnebratt, S. Sevholt, and E. Falkevik. 2017. “Mindfulness Mechanisms in Sports: Mediating Effects of Rumination and Emotion Regulation on Sport-Specific Coping.” Mindfulness 8:1354–1363. .Search in Google Scholar

Karlis, D. and I. Ntzoufras. 2008. “Bayesian Modelling of Football Outcomes: Using the Skellam’s Distribution for the Goal Difference.” IMA Journal of Management Mathematics 20:133–145.Search in Google Scholar

Koopman, S. J. and R. Lit. 2015. “A Dynamic Bivariate Poisson Model for Analysing and Forecasting Match Results in the English Premier League.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 178:167–186. .Search in Google Scholar

Koulis, T., S. Muthukumarana, and C. D. Briercliffe. 2014. “A Bayesian Stochastic Model for Batting Performance Evaluation in One-Day Cricket.” Journal of Quantitative Analysis in Sports 10:1–13.Search in Google Scholar

Kovalchik, S. A. and J. Albert. 2017. “A Multilevel Bayesian Approach for Modeling the Time-to-Serve in Professional Tennis.” Journal of Quantitative Analysis in Sports 13:49–62. .Search in Google Scholar

Lam, M. W. 2018. “One-Match-Ahead Forecasting in Two-Team Sports with Stacked Bayesian Regressions.” Journal of Artificial Intelligence and Soft Computing Research 8:159–171.Search in Google Scholar

Lamas, L., F. Santana, M. Heiner, C. Ugrinowitsch, and G. Fellingham. 2015. “Modeling the Offensive-Defensive Interaction and Resulting Outcomes in Basketball.” PLoS One 10:e0144435. .Search in Google Scholar

Liberati, A., D. G. Altman, J. Tetzlaff, C. Mulrow, P. C. Gøtzsche, J. P. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Moher. 2009. “The Prisma Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration.” PLoS Medicine 6:e1000100.Search in Google Scholar

Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter. 2000. “WinBUGS – A Bayesian Modelling Framework: Concepts, Structure, and Extensibility.” Statistics and Computing 10:325–337.Search in Google Scholar

Martin, A. D., K. M. Quinn, and J. H. Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software 42:22. .Search in Google Scholar

MATLAB. 2017. “MATLAB and Statistics Toolbox Release.” The MathWorks, Natick, MA, USA.Search in Google Scholar

McFadden, D. 1973. Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers of Econometrics, New York: Academic Press.Search in Google Scholar

McShane, B. B., A. Braunstein, J. Piette, and S. T. Jensen. 2011. “A Hierarchical Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics.” Journal of Quantitative Analysis in Sports 7:1–26.Search in Google Scholar

Mendes, F. G., J. V. Nascimento, E. R. Souza, C. Collet, M. Milistetd, J. Côté, and H. M. Carvalho. 2018. “Retrospective Analysis of Accumulated Structured Practice: A Bayesian Multilevel Analysis of Elite Brazilian Volleyball Players.” High Ability Studies 29(2):1–15.Search in Google Scholar

Mengersen, K. L., C. C. Drovandi, C. P. Robert, D. B. Pyne, and C. J. Gore. 2016. “Bayesian Estimation of Small Effects in Exercise and Sports Science.” PLoS One 11:e0147311. .Search in Google Scholar

Miller, A., L. Bornn, R. Adams, and K. Goldsberry. 2014. “Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball.” in International Conference on Machine Learning, pp. 235–243.Search in Google Scholar

Minsker, S., S. Srivastava, L. Lin, and D. B. Dunson. 2017. “Robust and Scalable Bayes via a Median of Subset Posterior Measures.” The Journal of Machine Learning Research 18:4488–4527.Search in Google Scholar

Miskin, M. A., G. W. Fellingham, and L. W. Florence. 2010. “Skill Importance in Women’s Volleyball.” Journal of Quantitative Analysis in Sports 6.Search in Google Scholar

Murray, T. A. 2017. “Ranking Ultimate Teams Using a Bayesian Score-Augmented Win-Loss Model.” Journal of Quantitative Analysis in Sports 13:63–78. .Search in Google Scholar

Muthén, L. and B. Muthén. 1998-2012. Mplus User’s Guide (7th ed.). Los Angeles, CA: Muthén & Muthén.Search in Google Scholar

Neal, D., J. Tan, F. Hao, and S. S. Wu. 2010. “Simply Better: Using Regression Models to Estimate Major League Batting Averages.” Journal of Quantitative Analysis in Sports 6:1–14.Search in Google Scholar

Ofoghi, B., J. Zeleznikow, C. MacMahon, and D. Dwyer. 2013. “Supporting Athlete Selection and Strategic Planning in Track Cycling Omnium: A Statistical and Machine Learning Approach.” Information Sciences 233:200–213.Search in Google Scholar

Pasek, J., with some assistance from Alex Tahk, some code modified from R-core, Additional contributions by Gene Culter, and M. Schwemmle. 2016. Weights: Weighting and Weighted Statistics. , R package version 0.85.Search in Google Scholar

Percy, D. F. 2013. “Generic Handicapping for Paralympic Sports.” IMA Journal of Management Mathematics 24:349–361. .Search in Google Scholar

Plummer, M. 2016. rjags: Bayesian Graphical Models Using MCMC. , R package version 4-6.Search in Google Scholar

Plummer, M., N. Best, K. Cowles, and K. Vines. 2006. “Coda: Convergence diagnosis and Output Analysis for MCMC.” R News 6:7–11. .Search in Google Scholar

Pradier, M. F., F. J. Ruiz, and F. Perez-Cruz. 2016. “Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling.” PLoS One 11:e0147402.Search in Google Scholar

Python Software Foundation. 2017. Python Language Reference. .Search in Google Scholar

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. .Search in Google Scholar

Razali, N., A. Mustapha, F. A. Yatim, and R. Ab Aziz. 2017. “Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL).” IOP Conference Series: Materials Science and Engineering 226:012099. .Search in Google Scholar

Reich, B. J., J. S. Hodges, B. P. Carlin, and A. M. Reich. 2006. “A Spatial Analysis of Basketball Shot Chart Data.” The American Statistician 60:3–12.Search in Google Scholar

Revie, M., K. J. Wilson, R. Holdsworth, and S. Yule. 2017. “On Modeling Player Fitness in Training for Team Sports with Application to Professional Rugby.” International Journal of Sports Science & Coaching 12:183–193.Search in Google Scholar

Robinson, D. 2017. Introduction to Empirical Bayes: Examples from Baseball Statistics. Gumroad. .Search in Google Scholar

Robinson, N., P.-E. Sottas, P. Mangin, and M. Saugy. 2007. “Bayesian Detection of Abnormal Hematological Values to Introduce a No-Start Rule for Heterogeneous Populations of Athletes.” Haematologica 92:1143–1144. .Search in Google Scholar

Rue, H. and O. Salvesen. 2000. “Prediction and Retrospective Analysis of Soccer Matches in a League.” Journal of the Royal Statistical Society: Series D (The Statistician) 49:399–418.Search in Google Scholar

Ruiz, F. J. and F. Perez-Cruz. 2015. “A Generative Model for Predicting Outcomes in College Basketball.” Journal of Quantitative Analysis in Sports 11:39–52.Search in Google Scholar

Schulze, J. J., J. Lundmark, M. Garle, L. Ekström, P.-E. Sottas, and A. Rane. 2009. “Substantial Advantage of a Combined Bayesian and Genotyping Approach in Testosterone Doping Tests.” Steroids 74:365–368. .Search in Google Scholar

Shahtahmassebi, G. and R. Moyeed. 2016. “An Application of the Generalized Poisson Difference Distribution to the Bayesian Modelling of Football Scores.” Statistica Neerlandica 70:260–273. .Search in Google Scholar

Shortridge, A., K. Goldsberry, and M. Adams. 2014. “Creating Space to Shoot: Quantifying Spatial Relative Field Goal Efficiency in Basketball.” Journal of Quantitative Analysis in Sports 10:303–313. .Search in Google Scholar

Silva, R. M. and T. B. Swartz. 2016. “Analysis of Substitution Times in soccer.” Journal of Quantitative Analysis in Sports 12:113–122. .Search in Google Scholar

Sottas, P.-E., N. Baume, C. Saudan, C. Schweizer, M. Kamber, and M. Saugy. 2006. “Bayesian Detection of Abnormal Values in Longitudinal Biomarkers with an Application to T/E Ratio.” Biostatistics 8:285–296.Search in Google Scholar

Sottas, P.-E., M. Saugy, and C. Saudan. 2010. “Endogenous Steroid Profiling in the Athlete Biological Passport.” Endocrinology and Metabolism Clinics 39:59–73.Search in Google Scholar

Stan Development Team. 2017. The Stan Core Library. .Search in Google Scholar

Stan Development Team. 2018. RStan: the R interface to Stan. , R package version 2.17.3.Search in Google Scholar

Stenling, A., A. Ivarsson, U. Johnson, and M. Lindwall. 2015. “Bayesian Structural Equation Modeling in Sport and Exercise Psychology.” Journal of Sport and Exercise Psychology 37:410–420. .Search in Google Scholar

Stephenson, A. G. and J. A. Tawn. 2013. “Determining the Best Track Performances of All Time Using a Conceptual Population Model for Athletics Records.” Journal of Quantitative Analysis in Sports 9:67–76.Search in Google Scholar

Stevenson, O. G. and B. J. Brewer. 2017. “Bayesian Survival Analysis of Batsmen in Test Cricket.” Journal of Quantitative Analysis in Sports 13:25–36.Search in Google Scholar

Sturtz, S., U. Ligges, and A. Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12:1–16. .Search in Google Scholar

Suzuki, A. K., L. E. B. Salasar, J. G. Leite, and F. Louzada-Neto. 2010. “A Bayesian Approach for Predicting Match Outcomes: The 2006 (Association) Football World Cup.” Journal of the Operational Research Society 61:1530–1539. .Search in Google Scholar

Swartz, T. B. 2018. “Where Should I Publish my Sports Paper?” The American Statistician 1–6. .Search in Google Scholar

Swartz, T. B., P. S. Gill, and S. Muthukumarana. 2009. “Modelling and Simulation for One-Day Cricket.” Canadian Journal of Statistics 37:143–160. .Search in Google Scholar

Taddy, M. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association 108(503):755–770.Search in Google Scholar

Tamminen, K. A., P. Gaudreau, C. E. McEwen, and P. R. Crocker. 2016. “Interpersonal Emotion Regulation Among Adolescent Athletes: A Bayesian Multilevel Model Predicting Sport Enjoyment and Commitment.” Journal of Sport and Exercise Psychology 38:541–555. .Search in Google Scholar

Thomas, A. C. 2006. “The Impact of Puck Possession and Location on Ice Hockey Strategy.” Journal of Quantitative Analysis in Sports 2.Search in Google Scholar

Thomas, A., B. O’Hara, U. Ligges, and S. Sturtz. 2006. “Making BUGS open.” R News 6:12–17. .Search in Google Scholar

Thomas, C., G. Fellingham, and P. Vehrs. 2009. “Development of a Notational Analysis System for Selected Soccer Skills of a Women’s College Team.” Measurement in Physical Education and Exercise Science 13:108–121. .Search in Google Scholar

Thomas, A. C., S. L. Ventura, S. T. Jensen, and S. Ma. 2013. “Competing Process Hazard Function Models for Player Ratings in Ice Hockey.” The Annals of Applied Statistics 7:1497–1524. .Search in Google Scholar

Usami, S. 2017. “Bayesian Longitudinal Paired Comparison Model and its Application to Sports Data Using Weighted Likelihood Bootstrap.” Communications in Statistics – Simulation and Computation 46:1974–1990. .Search in Google Scholar

Van Renterghem, P., P. Van Eenoo, P.-E. Sottas, M. Saugy, and F. Delbeke. 2011. “A Pilot Study on Subject-Based Comprehensive Steroid Profiling: Novel Biomarkers to Detect Testosterone Misuse in Sports.” Clinical Endocrinology 75:134–140. .Search in Google Scholar

Vetter, R. E., H. Yu, and A. K. Foose. 2017. “Effects of Moderators on Physical Training Programs: A Bayesian Approach.” The Journal of Strength & Conditioning Research 31:1868–1878.Search in Google Scholar

Visser, I. and M. Speekenbrink. 2010. “depmixS4: An R Package for Hidden Markov Models.” Journal of Statistical Software 36:1–21. .Search in Google Scholar

Wetzels, R., D. Tutschkow, C. Dolan, S. van der Sluis, G. Dutilh, and E.-J. Wagenmakers. 2016. “A Bayesian Test for the Hot Hand Phenomenon.” Journal of Mathematical Psychology 72:200–209. .Search in Google Scholar

Wimmer, V., N. Fenske, P. Pyrka, and L. Fahrmeir. 2011. “Exploring Competition Performance in Decathlon Using Semi-Parametric Latent Variable Models.” Journal of Quantitative Analysis in Sports 7:1–21.Search in Google Scholar

Yang, T. Y. 2004. “Bayesian Binary Segmentation Procedure for Detecting Streakiness in Sports.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 167:627–637. .Search in Google Scholar

Yousefi, K. and T. B. Swartz. 2013. “Advanced Putting Metrics in Golf.” Journal of Quantitative Analysis in Sports 9:239–248.Search in Google Scholar

Published Online: 2019-06-27
Published in Print: 2019-10-25

©2019 Walter de Gruyter GmbH, Berlin/Boston