Computing and machine learning advancements have led to the creation of many cutting-edge predictive algorithms, some of which have been demonstrated to provide more accurate forecasts than traditional statistical tools. In this manuscript, we provide evidence that the combination of modest statistical methods with informative data can meet or exceed the accuracy of more complex models when it comes to predicting the NCAA men’s basketball tournament. First, we describe a prediction model that merges the point spreads set by Las Vegas sportsbooks with possession based team efficiency metrics by using logistic regressions. The set of probabilities generated from this model most accurately predicted the 2014 tournament, relative to approximately 400 competing submissions, as judged by the log loss function. Next, we attempt to quantify the degree to which luck played a role in the success of this model by simulating tournament outcomes under different sets of true underlying game probabilities. We estimate that under the most optimistic of game probability scenarios, our entry had roughly a 12% chance of outscoring all competing submissions and just less than a 50% chance of finishing with one of the ten best scores.
Within sports analytics, there is substantial interest in comprehensive statistics intended to capture overall player performance. In baseball, one such measure is wins above replacement (WAR), which aggregates the contributions of a player in each facet of the game: hitting, pitching, baserunning, and fielding. However, current versions of WAR depend upon proprietary data, ad hoc methodology, and opaque calculations. We propose a competitive aggregate measure, openWAR, that is based on public data, a methodology with greater rigor and transparency, and a principled standard for the nebulous concept of a “replacement” player. Finally, we use simulation-based techniques to provide interval estimates for our openWAR measure that are easily portable to other domains.
Each year the members of the Baseball Writers Association of America (BBWAA) vote for eligible former players to be inducted into the Baseball Hall of Fame. The BBWAA tabulates and releases vote totals, but individual ballots remain private. However, many voters forgo their ballot privacy to publish their ballots through various media channels. These publicly available ballots can be aggregated to create a subset of the true ballots. Using these released ballots and the totals released by the BBWAA, this research assesses what can be learned about the group of voters who chose to not disclose their ballot. Attributes of the known and unknown ballot groups are studied by looking at differences in voting preference for individual players as well as voting differences between classes of voters that are defined using latent class analysis (LCA).