Jump to ContentJump to Main Navigation
Show Summary Details

Journal of Quantitative Analysis in Sports

An official journal of the American Statistical Association

Editor-in-Chief: Mark Glickman PhD

4 Issues per year


SCImago Journal Rank (SJR) 2015: 0.288
Source Normalized Impact per Paper (SNIP) 2015: 0.358
Impact per Publication (IPP) 2015: 0.250

Online
ISSN
1559-0410
See all formats and pricing

A mixture-of-modelers approach to forecasting NCAA tournament outcomes

Lo-Hua Yuan
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Anthony Liu
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Alec Yeh
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Aaron Kaufman
  • Harvard University – Government, Cambridge, Massachusetts, USA
/ Andrew Reece
  • Harvard University – Psychology, Cambridge, Massachusetts, USA
/ Peter Bull
  • Harvard University – Institute for Applied Computational Science, Cambridge, Massachusetts, USA
/ Alex Franks
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Sherrie Wang
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Dmitri Illushin
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
/ Luke Bornn
  • Corresponding author
  • Harvard University – Statistics, Cambridge, Massachusetts, USA
  • Email:
Published Online: 2015-02-24 | DOI: https://doi.org/10.1515/jqas-2014-0056

Abstract

Predicting the outcome of a single sporting event is difficult; predicting all of the outcomes for an entire tournament is a monumental challenge. Despite the difficulties, millions of people compete each year to forecast the outcome of the NCAA men’s basketball tournament, which spans 63 games over 3 weeks. Statistical prediction of game outcomes involves a multitude of possible covariates and information sources, large performance variations from game to game, and a scarcity of detailed historical data. In this paper, we present the results of a team of modelers working together to forecast the 2014 NCAA men’s basketball tournament. We present not only the methods and data used, but also several novel ideas for post-processing statistical forecasts and decontaminating data sources. In particular, we highlight the difficulties in using publicly available data and suggest techniques for improving their relevance.

Keywords: basketball; data decontamination; forecasting; model ensembles

References

  • Boulier, Bryan L. and Herman O. Stekler. 1999. “Are Sports Seedings Good Predictors?: An Evaluation.” International Journal of Forecasting 15(1):83–91. [Crossref]

  • Brown, Mark and Joel Sokol. 2010. “An Improved LRMC Method for NCAA Basketball Prediction.” Journal of Quantitative Analysis in Sports 6(3):1–23.

  • Bryan, Kevin, Michael Steinke, and Nick Wilkins. 2006. Upset Special: Are March Madness Upsets Predictable? Available at SSRN 899702.

  • Carlin, Bradley P. 1996. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50(1):39–43.

  • Cesa-Bianchi, Nicolo and Gabor Lugosi. 2001. “Worst-Case Bounds for the Logarithmic Loss of Predictors.” Machine Learning 43(3):247–264. [Crossref]

  • Cochocki, A. and Rolf Unbehauen. 1993. Neural Networks for Optimization and Signal Processing. 1st ed. New York, NY, USA: John Wiley & Sons, Inc., ISBN 0471930105.

  • Cover, Thomas M. and Joy A Thomas. 2012. Elements of Information Theory. John Wiley & Sons, Inc., Hoboken, New Jersey.

  • Demir-Kavuk, Ozgur, Mayumi Kamada, Tatsuya Akutsu, and Ernst-Walter Knapp. 2011. “Prediction using Step-wise L1, L2 Regularization and Feature Selection for Small Data Sets with Large Number of Features.” BMC Bioinformatics 12:412. [Crossref] [Web of Science]

  • ESPN. 2014. NCAA Division I Men’s Basketball Statistics – 2013–14, 2014. (http://kenpom.com/index.php?s=RankAdjOE). Accessed on February 22, 2014 and March 28, 2014.

  • Friedman, J. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 2:1189–1232. [Crossref]

  • Fritsch, Stefan, Frauke Guenther, and Maintainer Frauke Guenther. 2012. “Package ‘Neuralnet’.” Training of Neural Network (1.32).

  • Hamilton, Howard H. 2011. “An Extension of the Pythagorean Expectation for Association Football.” Journal of Quantitative Analysis in Sports 7(2). DOI: 10.2202/1559-0410.1335. [Crossref]

  • Harville, David A. 2003. “The Selection or Seeding of College Basketball or Football Teams for Postseason Competition.” Journal of the American Statistical Association 98(461):17–27.

  • Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer.

  • Huang, Tzu-Kuo, Ruby C. Weng, and Chih-Jen Lin. 2006. “Generalized Bradley-Terry Models and Multi-Class Probability Estimates.” Journal of Machine Learning Research 7(1):85–115.

  • Jacobson, Sheldon H. and Douglas M. King. 2009. “Seeding in the NCAA Men’s Basketball Tournament: When is a Higher Seed Better?” Journal of Gambling Business and Economics 3(2):63.

  • Kaplan, Edward H. and Stanley J. Garstka. 2001. “March Madness and the Office Pool.” Management Science 47(3):369–382. [Crossref]

  • Koenker, Roger and Gilbert W. Bassett, Jr. 2010. “March Madness, Quantile Regression Bracketology, and the Hayek Hypothesis.” Journal of Business & Economic Statistics 28(1):26–35. [Web of Science] [Crossref]

  • Liaw, Andy and Matthew Wiener. 2002. “Classification and Regression by Randomforest.” R News 2(3):18–22.

  • Massey, Kenneth. 2014. College Basketball Ranking Composite. (http://www.masseyratings.com/cb/compare.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Matuszewski, Erik. 2011. “March Madness Gambling Brings Out Warnings From NCAA to Tournament Players.” Bloomberg News, March 2011. (http://www.bloomberg.com/news/2011-03-17/march-madness-gambling-brings-out-warnings-from-ncaa-to-tournament-players.html).

  • McCrea, Sean M. and Edward R. Hirt. 2009. “March Madness: Probability Matching in Prediction of the NCAA Basketball Tournament”. Journal of Applied Social Psychology, 39(12):2809–2839. [Crossref] [Web of Science]

  • MomentumMedia. 2006. NCAA Eliminates Two-in-four Rule. (http://www.momentummedia.com/articles/cm/cm1406/bbtwoinfour.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Moore, Sonny. 2014. Sonny Moore’s Computer Power Ratings. (http://sonnymoorepowerratings.com/m-basket.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Platt, John C. 1999. Probabilities for SV Machines. MIT Press. (http://research.microsoft.com/apps/pubs/default.aspx?id=69187). Accessed on February 22, 2014 and March 28, 2014.

  • Pomeroy, Ken. 2014. Pomeroy College Basketball Ratings, 2014. (http://kenpom.com/index.php?s=RankAdjOE). Accessed on February 22, 2014 and March 28, 2014.

  • Ridgeway, Greg. 2007. “Generalized Boosted Models: A Guide to the GBM Package.” Update 1(1):2007.

  • Riedmiller, Martin and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” Pp. 586–591 in IEEE International Conference on Neural Networks.

  • Sagarin, Jeff. 2014. Jeff Sagarin’s College Basketball Ratings, 2014. (http://sagarin.com/sports/cbsend.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Schwertman, Neil C., Thomas A. McCready, and Lesley Howard. 1991. “Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 45(1):35–38.

  • Smith, Tyler and Neil C. Schwertman. 1999. “Can the NCAA Basketball Tournament Seeding be Used to Predict Margin of Victory?” The American Statistician 53(2):94–98.

  • Sokol, Joel. 2014. LRMC Basketball Rankings, 2014. (http://www2.isye.gatech.edu/jsokol/lrmc/). Accessed on February 22, 2014 and March 28, 2014.

  • Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 288:267–288. [Web of Science]

  • Timthy P. Chartier, E. Kreutzer, A. Langville and K. Pedings. 2011. “Sports Ranking with Nonuniform Weighting.” Journal of Quantitative Analysis in Sports 7(3):1–16.

  • Toutkoushian, E. 2011. Predicting March Madness: A Statistical evaluation of the Men’s NCAA Basketball Tournament.

About the article

Corresponding author: Luke Bornn, Harvard University – Statistics, Cambridge, Massachusetts, USA, e-mail:


Published Online: 2015-02-24

Published in Print: 2015-03-01


Citation Information: Journal of Quantitative Analysis in Sports, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388, DOI: https://doi.org/10.1515/jqas-2014-0056. Export Citation

Comments (0)

Please log in or register to comment.
Log in