Accessible Requires Authentication Published by De Gruyter February 24, 2015

Building an NCAA men’s basketball predictive model and quantifying its success

Michael J. Lopez and Gregory J. Matthews


Computing and machine learning advancements have led to the creation of many cutting-edge predictive algorithms, some of which have been demonstrated to provide more accurate forecasts than traditional statistical tools. In this manuscript, we provide evidence that the combination of modest statistical methods with informative data can meet or exceed the accuracy of more complex models when it comes to predicting the NCAA men’s basketball tournament. First, we describe a prediction model that merges the point spreads set by Las Vegas sportsbooks with possession based team efficiency metrics by using logistic regressions. The set of probabilities generated from this model most accurately predicted the 2014 tournament, relative to approximately 400 competing submissions, as judged by the log loss function. Next, we attempt to quantify the degree to which luck played a role in the success of this model by simulating tournament outcomes under different sets of true underlying game probabilities. We estimate that under the most optimistic of game probability scenarios, our entry had roughly a 12% chance of outscoring all competing submissions and just less than a 50% chance of finishing with one of the ten best scores.

Corresponding author: Michael J. Lopez, Skidmore College – Mathematics and Computer Science, 815 N. Broadway Harder Hall, Saratoga Springs, New York 12866, USA, Tel.: +9784072221, e-mail:


Barra, A. 2014. Is March Madness a Sporting Event – or a Gambling Event?. URL (accessed June 1, 2014). Search in Google Scholar

Boudway, I. 2014. The Legal Madness Around NCAA Bracket Pools. URL (accessed June 1, 2014). Search in Google Scholar

Boulier, B. L. and H. O. Stekler. 1999. “Are Sports Seedings Good Predictors?: An Evaluation.” International Journal of Forecasting 15:83–91. Search in Google Scholar

Breiter, D. J. and B. P. Carlin. 1997. “How to Play Office Pools if You Must.” Chance 10:5–11. Search in Google Scholar

Carlin, B. P. 1996. “Improved NCAA Basketball Tournament Modeling Via Point Spread and Team Strength Information.” The American Statistician 50:39–43. Search in Google Scholar

Caruana, R. and A. Niculescu-Mizil. 2006. “An Empirical Comparison of Supervised Learning Algorithms.” In Proceedings of the 23rd International Conference on Machine Learning, ACM. pp. 161–168. Search in Google Scholar

Colquitt, L. L., N. H. Godwin, and S. B. Caudill. 2001. “Testing Efficiency Across Markets: Evidence from the NCAA Basketball Betting Market.” Journal of Business Finance & Accounting 28:231–248. Search in Google Scholar

Constantinou, A. C., N. E. Fenton, and M. Neil. 2013. “Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty using Bayesian Networks.” Knowledge-Based Systems 50:60–86. Search in Google Scholar

Dietterich, T. G. (2000). Ensemble methods in machine learning. Multiple classifier systems (pp. 1–15). Berlin, Heidelberg: Springer. Search in Google Scholar

ESPN. 2014. Official Rules. URL (accessed June 1, 2014). Search in Google Scholar

Hansen, L. K. and P. Salamon. 1990. “Neural Network Ensembles.” IEEE Transactions on Pattern Analysis and Machine Intelligence 12:993–1001. Search in Google Scholar

Harville, D. 1980. “Predictions for National Football League Games Via Linear-Model Methodology.” Journal of the American Statistical Association 75:516–524. Search in Google Scholar

Kaggle. 2014. Competition Forum. URL (accessed June 1, 2014). Search in Google Scholar

Kubatko, J., D. Oliver, K. Pelton, and D. T. Rosenbaum. 2007. “A Starting Point for Analyzing Basketball Statistics.” Journal of Quantitative Analysis in Sports 3:1–22. Search in Google Scholar

Kvam, P. and J. S. Sokol. 2006. “A Logistic Regression/Markov Chain Model for NCAA Basketball.” Naval Research Logistics (NrL) 53:788–803. Search in Google Scholar

Linna, K., E. Moore, R. Paul, and A. Weinbach. 2014. “The Effects of the Clock and Kickoff Rule Changes on Actual and Market-Based Expected Scoring in NCAA Football.” International Journal of Financial Studies 2:179–192. Search in Google Scholar

Metrick, A. 1996. “March Madness? Strategic Behavior in NCAA Basketball Tournament Betting Pools.” Journal of Economic Behavior & Organization 30:159–172. Search in Google Scholar

Nichols, M. W. 2014. “The Impact of Visiting Team Travel on Game Outcome and Biases in NFL Betting Markets.” Journal of Sports Economics 15:78–96. Search in Google Scholar

Opitz, D. and R. Maclin. 1999. “Popular Ensemble Methods: An Empirical Study.” Journal of Artificial Intelligence Research 11:169–198. Search in Google Scholar

Pagels, J. 2014. Challenging the Tournament Challenge: Devising a More Equitable Bracket Scoring System. URL Search in Google Scholar

Paul, R. J. and A. P. Weinbach. 2014. “Market Efficiency and Behavioral Biases in the WNBA Betting Market.” International Journal of Financial Studies 2:193–202. Search in Google Scholar

Paul, R. and A. Weinbach. 2005. “Market Efficiency and NCAA College Basketball Gambling.” Journal of Economics and Finance 29:403–408. Search in Google Scholar

Pomeroy, K. 2012. Ratings Glossary. URL (accessed June 1, 2014). Search in Google Scholar

Schwertman, N. C., K. L. Schenk, and B. C. Holbrook. 1996. “More Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 50:34–38. Search in Google Scholar

Stern, H. 1991. “On the Probability of Winning a Football Game.” The American Statistician 45:179–183. Search in Google Scholar

TeamRankings. 2014. NCAA BB Team Possessions per Game. URL (accessed June 1, 2014). Search in Google Scholar

Tsu, T. 2014. March Madness: Distracted Workers, Illegal Gambling, Loss of Sleep? URL (accessed June 1, 2014). Search in Google Scholar

Yahoo 2014. Official Rules. URL (accessed June 1, 2014). Search in Google Scholar

Published Online: 2015-2-24
Published in Print: 2015-3-1

©2015 by De Gruyter