Quantitative evaluation of fielding ability in baseball has been an ongoing challenge for statisticians. Detailed recording of ball-in-play data in recent years has spurred the development of sophisticated fielding models. Foremost among these approaches, Jensen et al. (2009) used a hierarchical Bayesian model to estimate spatial fielding curves for individual players. These previous efforts have not addressed evolution in a player’s fielding ability over time. We expand the work of Jensen et al. (2009) to model the fielding ability of individual players over multiple seasons. Several different models are implemented and compared via posterior predictive validation on hold-out data. Among our choices, we find that a model which imposes shrinkage towards an age-specific average gives the best performance. Our temporal models allow us to delineate the performance of a fielder on a season-to-season basis versus their entire career.
A baseball team's offensive prowess is a function of two types of abilities: batting and baserunning. While each has been studied extensively in isolation, the effects of their interaction is not well understood. We model offensive output as a scalar function f of an individual player's batting and baserunning profile z. Each of these profiles is in turn estimated from Retrosheet data using heirarchical Bayesian models. We then use the SimulOutCome simulation engine as a method to generate values of f(z) over a fine grid of points. Finally, for each of several methods of taking the extra base, we graphically depict the surface f(z) over changes in the probability of advancing via that method. This framework allows us to draw conclusions both about optimal baserunning strategies in general, and about how particular offensive profiles affect a player's optimal baserunning strategy. We present many informative visualizations and analyze specific aspects of several well-known Major League players.
We propose two new measures for evaluating offensive ability of NBA players, using one-dimensional shooting data from three seasons beginning with the 2004-05 season. These measures improve upon currently employed shooting statistics by accounting for the varying shooting patterns of players over different distances from the basket. This variance also provides us with an intuitive metric for clustering players, wherein performance of players is calculated and compared to his cluster center as a baseline. To further improve the accuracy of our measures, we develop our own variation of smoothing and shrinkage, reducing any small sample biases and abnormalities.The first measure, SCAB or, Scoring Ability Above Baseline, measures a player's ability to score as a function of time on court. The second metric, SHTAB or Shooting Ability, calculates a player's propensity to score on a per-shot basis. Our results show that a combination of SCAB and SHTAB can be used to separate out players based on their offensive game. We observe that players who are highly ranked according to our measures are regularly considered as top performers on offense by experts, with the notable exception of LeBron James; the same claim holds for the offensive dregs. We suggest possible explanations for our findings and explore possibilities of future work with regard to player defense.
Numerous statistics have been proposed to measure offensive ability in Major League Baseball. While some of these measures may offer moderate predictive power in certain situations, it is unclear which simple offensive metrics are the most reliable or consistent. We address this issue by using a hierarchical Bayesian variable selection model to determine which offensive metrics are most predictive within players across time. Our sophisticated methodology allows for full estimation of the posterior distributions for our parameters and automatically adjusts for multiple testing, providing a distinct advantage over alternative approaches. We implement our model on a set of fifty different offensive metrics and discuss our results in the context of comparison to other variable selection techniques. We find that a large number of metrics demonstrate signal. However, these metrics are (i) highly correlated with one another, (ii) can be reduced to about five without much loss of information, and (iii) these five relate to traditional notions of performance (e.g., plate discipline, power, and ability to make contact).
A plethora of statistics have been proposed to measure the effectiveness of pitchers in Major League Baseball. While many of these are quite traditional (e.g., ERA, wins), some have gained currency only recently (e.g., WHIP, K/BB). Some of these metrics may have predictive power, but it is unclear which are the most reliable or consistent. We address this question by constructing a Bayesian random effects model that incorporates a point mass mixture and fitting it to data on twenty metrics spanning approximately 2,500 players and 35 years. Our model identifies FIP, HR/9, ERA, and BB/9 as the highest signal metrics for starters and GB%, FB%, and K/9 as the highest signal metrics for relievers. In general, the metrics identified by our model are independent of team defense. Our procedure also provides a relative ranking of metrics separately by starters and relievers and shows that these rankings differ quite substantially between them. Our methodology is compared to a Lasso-based procedure and is internally validated by detailed case studies.