Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Journal of Quantitative Analysis in Sports

An official journal of the American Statistical Association

Editor-in-Chief: Glickman, PhD, Mark

4 Issues per year

CiteScore 2016: 0.44

SCImago Journal Rank (SJR) 2015: 0.288
Source Normalized Impact per Paper (SNIP) 2015: 0.358

See all formats and pricing
In This Section

Nearest-neighbor matchup effects: accounting for team matchups for predicting March Madness

Andrew Hoegh
  • Corresponding author
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
  • Email:
/ Marcos Carzolio
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
/ Ian Crandell
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
/ Xinran Hu
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
/ Lucas Roberts
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
/ Yuhyun Song
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
/ Scotland C. Leman
  • Virginia Tech – Department of Statistics, Blacksburg, VA, USA
Published Online: 2015-02-23 | DOI: https://doi.org/10.1515/jqas-2014-0054


Recently, the surge of predictive analytics competitions has improved sports predictions by fostering data-driven inference and steering clear of human bias. This article details methods developed for Kaggle’s March Machine Learning Mania competition for the 2014 NCAA tournament. A submission to the competition consists of outcome probabilities for each potential matchup. Most predictive models are based entirely on measures of overall team strength, resulting in the unintended “transitive property.” These models are therefore unable to capture specific matchup tendencies. We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. In particular we develop a general framework that couples a model predicting a point spread with a clustering procedure that borrows strength from games similar to a current matchup. This results in a model capable of issuing predictions controlling for team strength and that capture specific matchup characteristics.

Keywords: K nearest neighbors; matchup effects; relative strength; transitivity


  • Boulier, B. L. and H. O. Stekler. 2003. “Predicting the Outcomes of National Football League Games.” International Journal of Forecasting 19:257–270. [Crossref]

  • Brown, M. and J. Sokol. 2010. “An Improved LRMC Method for NCAA Basketball Prediction.” Journal of Quantitative Analysis in Sports 6:1–23.

  • Carlin, B. P. 1996. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50:39–43.

  • Caudill, S. B. 2003. “Predicting Discrete Outcomes with the Maximum Score Estimator: The Case of the NCAA Men’s Basketball Tournament.” International Journal of Forecasting 19:313–317. [Crossref]

  • Goldbloom, A. 2014. “March Machine Learning Mania.” (http://www.kaggle.com/c/march-machine-learning-mania), accessed June 18, 2014.

  • Harville, D. A. and M. H. Smith. 1994. “The Home-Court Advantage: How Large is it, and does it vary from Team to Team?” The American Statistician 48:22–28.

  • House, L., S. Leman, and C. Han. 2010. “Bayesian Visual Analytics (bava).” FODAVA Technical Report.

  • Hu, X., L. Bradel, D. Maiti, L. House, C. North, and S. Leman. 2013. “Semantics of Directly Manipulating Spatializations.” Visualization and Computer Graphics, IEEE Transactions on 19:2052–2059.

  • James, B. 1983. “Baseball Abstract.” New York: Ballantine.

  • Kvam, P. and J. S. Sokol. 2006. “A Logistic Regression/Markov Chain Model for NCAA Basketball.” Naval Research Logistics 53:788–803. [Crossref]

  • Lewis, M. 2004. “Moneyball.” New York: W. W. Norton & Company.

  • Manski, C. F. and S. R. Lerman. 1977. “The Estimation of Choice Probabilities from Choice Based Samples.” Econometrica: Journal of the Econometric Society 1977–1988.

  • Massey, K. 2014. “College Basketball Rating Composite.” (masseyratings.com), accessed April 7, 2014.

  • Miller, S. J. 2007. “A Derivation of the Pythagorean Won-loss Formula in Baseball.” Chance 20:40–48.

  • Pomeroy, K. 2012. “Ratings Glossary.” (http://kenpom.com/blog/index.php/weblog/entry/ratings_glossary), accessed June 18, 2014.

  • Rosenthal, J. 2013. “The Rosenthal Fit: A Statistical Ranking of NCAA Men’s Basketball Teams.” (http://andrewgelman.com/2014/02/25/basketball-stats-dont-model-probability-win-model-expected-score-differential/), accessed June 18, 2014.

  • Sagarin, J. 2014. “Ratings Glossary.” (http://www.usatoday.com/sports/ncaab/sagarin/), accessed June 18, 2014.

  • Schwertman, N. C., K. L. Schenk, and B. C. Holbrook. 1996. “More Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 50:34–38.

  • Silver, N. 2003. “Introducing Pecota.” Baseball Prospectus 2003:507–514.

  • Silver, N. 2014. “Building a Bracket is Hard this Year, But We’ll Help You Play the Odds.” (http://fivethirtyeight.com/features/nate-silvers-ncaa-basketball-predictions/), accessed June 18, 2014.

  • Smith, T. and N. C. Schwertman. 1999. “Can the NCAA Basketball Tournament Seeding be used to Predict Margin of Victory?” The American Statistician 53:94–98.

  • West, B. T. 2006. “A Simple and Exible Rating Method for Predicting Success in the NCAA Basketball Tournament.” Journal of Quantitative Analysis in Sports 2:3.

  • Wright, C. 2012. “Statistical Predictors of March Madness: An Examination of the NCAA Men’s’ Basketball Championship.” (http://economics-files.pomona.edu/GarySmith/Econ190/Wright%20March%20Madness%20Final%20Paper.pdf), accessed June 18, 2014.

About the article

Corresponding author: Andrew Hoegh, Virginia Tech – Department of Statistics, Hutcheson Hall – RM 406A 250 Drillfield Drive, Blacksburg, VA 24061, USA, e-mail:

Published Online: 2015-02-23

Published in Print: 2015-03-01

Citation Information: Journal of Quantitative Analysis in Sports, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388, DOI: https://doi.org/10.1515/jqas-2014-0054. Export Citation

Comments (0)

Please log in or register to comment.
Log in