Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Quantitative Analysis in Sports

An official journal of the American Statistical Association

Editor-in-Chief: Steve Rigdon, PhD

4 Issues per year

CiteScore 2017: 0.67

SCImago Journal Rank (SJR) 2017: 0.290
Source Normalized Impact per Paper (SNIP) 2017: 0.853

See all formats and pricing
More options …
Volume 11, Issue 1


Volume 1 (2005)

A mixture-of-modelers approach to forecasting NCAA tournament outcomes

Lo-Hua Yuan / Anthony Liu / Alec Yeh / Aaron Kaufman / Andrew Reece / Peter Bull
  • Harvard University – Institute for Applied Computational Science, Cambridge, Massachusetts, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Alex Franks / Sherrie Wang / Dmitri Illushin / Luke Bornn
Published Online: 2015-02-24 | DOI: https://doi.org/10.1515/jqas-2014-0056


Predicting the outcome of a single sporting event is difficult; predicting all of the outcomes for an entire tournament is a monumental challenge. Despite the difficulties, millions of people compete each year to forecast the outcome of the NCAA men’s basketball tournament, which spans 63 games over 3 weeks. Statistical prediction of game outcomes involves a multitude of possible covariates and information sources, large performance variations from game to game, and a scarcity of detailed historical data. In this paper, we present the results of a team of modelers working together to forecast the 2014 NCAA men’s basketball tournament. We present not only the methods and data used, but also several novel ideas for post-processing statistical forecasts and decontaminating data sources. In particular, we highlight the difficulties in using publicly available data and suggest techniques for improving their relevance.

Keywords: basketball; data decontamination; forecasting; model ensembles


  • Boulier, Bryan L. and Herman O. Stekler. 1999. “Are Sports Seedings Good Predictors?: An Evaluation.” International Journal of Forecasting 15(1):83–91.CrossrefGoogle Scholar

  • Brown, Mark and Joel Sokol. 2010. “An Improved LRMC Method for NCAA Basketball Prediction.” Journal of Quantitative Analysis in Sports 6(3):1–23.Google Scholar

  • Bryan, Kevin, Michael Steinke, and Nick Wilkins. 2006. Upset Special: Are March Madness Upsets Predictable? Available at SSRN 899702.Google Scholar

  • Carlin, Bradley P. 1996. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50(1):39–43.Google Scholar

  • Cesa-Bianchi, Nicolo and Gabor Lugosi. 2001. “Worst-Case Bounds for the Logarithmic Loss of Predictors.” Machine Learning 43(3):247–264.CrossrefGoogle Scholar

  • Cochocki, A. and Rolf Unbehauen. 1993. Neural Networks for Optimization and Signal Processing. 1st ed. New York, NY, USA: John Wiley & Sons, Inc., ISBN 0471930105.Google Scholar

  • Cover, Thomas M. and Joy A Thomas. 2012. Elements of Information Theory. John Wiley & Sons, Inc., Hoboken, New Jersey.Google Scholar

  • Demir-Kavuk, Ozgur, Mayumi Kamada, Tatsuya Akutsu, and Ernst-Walter Knapp. 2011. “Prediction using Step-wise L1, L2 Regularization and Feature Selection for Small Data Sets with Large Number of Features.” BMC Bioinformatics 12:412.CrossrefWeb of ScienceGoogle Scholar

  • ESPN. 2014. NCAA Division I Men’s Basketball Statistics – 2013–14, 2014. (http://kenpom.com/index.php?s=RankAdjOE). Accessed on February 22, 2014 and March 28, 2014.

  • Friedman, J. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 2:1189–1232.CrossrefGoogle Scholar

  • Fritsch, Stefan, Frauke Guenther, and Maintainer Frauke Guenther. 2012. “Package ‘Neuralnet’.” Training of Neural Network (1.32).Google Scholar

  • Hamilton, Howard H. 2011. “An Extension of the Pythagorean Expectation for Association Football.” Journal of Quantitative Analysis in Sports 7(2). DOI: 10.2202/1559-0410.1335.CrossrefGoogle Scholar

  • Harville, David A. 2003. “The Selection or Seeding of College Basketball or Football Teams for Postseason Competition.” Journal of the American Statistical Association 98(461):17–27.Google Scholar

  • Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer.Google Scholar

  • Huang, Tzu-Kuo, Ruby C. Weng, and Chih-Jen Lin. 2006. “Generalized Bradley-Terry Models and Multi-Class Probability Estimates.” Journal of Machine Learning Research 7(1):85–115.Google Scholar

  • Jacobson, Sheldon H. and Douglas M. King. 2009. “Seeding in the NCAA Men’s Basketball Tournament: When is a Higher Seed Better?” Journal of Gambling Business and Economics 3(2):63.Google Scholar

  • Kaplan, Edward H. and Stanley J. Garstka. 2001. “March Madness and the Office Pool.” Management Science 47(3):369–382.CrossrefGoogle Scholar

  • Koenker, Roger and Gilbert W. Bassett, Jr. 2010. “March Madness, Quantile Regression Bracketology, and the Hayek Hypothesis.” Journal of Business & Economic Statistics 28(1):26–35.Web of ScienceCrossrefGoogle Scholar

  • Liaw, Andy and Matthew Wiener. 2002. “Classification and Regression by Randomforest.” R News 2(3):18–22.Google Scholar

  • Massey, Kenneth. 2014. College Basketball Ranking Composite. (http://www.masseyratings.com/cb/compare.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Matuszewski, Erik. 2011. “March Madness Gambling Brings Out Warnings From NCAA to Tournament Players.” Bloomberg News, March 2011. (http://www.bloomberg.com/news/2011-03-17/march-madness-gambling-brings-out-warnings-from-ncaa-to-tournament-players.html).

  • McCrea, Sean M. and Edward R. Hirt. 2009. “March Madness: Probability Matching in Prediction of the NCAA Basketball Tournament”. Journal of Applied Social Psychology, 39(12):2809–2839.CrossrefWeb of ScienceGoogle Scholar

  • MomentumMedia. 2006. NCAA Eliminates Two-in-four Rule. (http://www.momentummedia.com/articles/cm/cm1406/bbtwoinfour.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Moore, Sonny. 2014. Sonny Moore’s Computer Power Ratings. (http://sonnymoorepowerratings.com/m-basket.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Platt, John C. 1999. Probabilities for SV Machines. MIT Press. (http://research.microsoft.com/apps/pubs/default.aspx?id=69187). Accessed on February 22, 2014 and March 28, 2014.

  • Pomeroy, Ken. 2014. Pomeroy College Basketball Ratings, 2014. (http://kenpom.com/index.php?s=RankAdjOE). Accessed on February 22, 2014 and March 28, 2014.

  • Ridgeway, Greg. 2007. “Generalized Boosted Models: A Guide to the GBM Package.” Update 1(1):2007.Google Scholar

  • Riedmiller, Martin and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” Pp. 586–591 in IEEE International Conference on Neural Networks.Google Scholar

  • Sagarin, Jeff. 2014. Jeff Sagarin’s College Basketball Ratings, 2014. (http://sagarin.com/sports/cbsend.htm). Accessed on February 22, 2014 and March 28, 2014.

  • Schwertman, Neil C., Thomas A. McCready, and Lesley Howard. 1991. “Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 45(1):35–38.Google Scholar

  • Smith, Tyler and Neil C. Schwertman. 1999. “Can the NCAA Basketball Tournament Seeding be Used to Predict Margin of Victory?” The American Statistician 53(2):94–98.Google Scholar

  • Sokol, Joel. 2014. LRMC Basketball Rankings, 2014. (http://www2.isye.gatech.edu/jsokol/lrmc/). Accessed on February 22, 2014 and March 28, 2014.

  • Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 288:267–288.Web of ScienceGoogle Scholar

  • Timthy P. Chartier, E. Kreutzer, A. Langville and K. Pedings. 2011. “Sports Ranking with Nonuniform Weighting.” Journal of Quantitative Analysis in Sports 7(3):1–16.Google Scholar

  • Toutkoushian, E. 2011. Predicting March Madness: A Statistical evaluation of the Men’s NCAA Basketball Tournament.Google Scholar

About the article

Corresponding author: Luke Bornn, Harvard University – Statistics, Cambridge, Massachusetts, USA, e-mail:

Published Online: 2015-02-24

Published in Print: 2015-03-01

Citation Information: Journal of Quantitative Analysis in Sports, Volume 11, Issue 1, Pages 13–27, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388, DOI: https://doi.org/10.1515/jqas-2014-0056.

Export Citation

©2015 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Andrew Hoegh, Dipayan Maiti, and Scotland Leman
Journal of Computational and Graphical Statistics, 2017, Page 0
C. Soto Valero
International Journal of Computer Science in Sport, 2016, Volume 15, Number 2

Comments (0)

Please log in or register to comment.
Log in