Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Quantitative Analysis in Sports

An official journal of the American Statistical Association

Editor-in-Chief: Steve Rigdon, PhD

4 Issues per year

CiteScore 2017: 0.67

SCImago Journal Rank (SJR) 2017: 0.290
Source Normalized Impact per Paper (SNIP) 2017: 0.853

See all formats and pricing
More options …
Volume 11, Issue 1


Volume 1 (2005)

Nearest-neighbor matchup effects: accounting for team matchups for predicting March Madness

Andrew Hoegh / Marcos Carzolio / Ian Crandell / Xinran Hu / Lucas Roberts / Yuhyun Song / Scotland C. Leman
Published Online: 2015-02-23 | DOI: https://doi.org/10.1515/jqas-2014-0054


Recently, the surge of predictive analytics competitions has improved sports predictions by fostering data-driven inference and steering clear of human bias. This article details methods developed for Kaggle’s March Machine Learning Mania competition for the 2014 NCAA tournament. A submission to the competition consists of outcome probabilities for each potential matchup. Most predictive models are based entirely on measures of overall team strength, resulting in the unintended “transitive property.” These models are therefore unable to capture specific matchup tendencies. We introduce our novel nearest-neighbor matchup effects framework, which presents a flexible way to account for team characteristics above and beyond team strength that may influence game outcomes. In particular we develop a general framework that couples a model predicting a point spread with a clustering procedure that borrows strength from games similar to a current matchup. This results in a model capable of issuing predictions controlling for team strength and that capture specific matchup characteristics.

Keywords: K nearest neighbors; matchup effects; relative strength; transitivity


  • Boulier, B. L. and H. O. Stekler. 2003. “Predicting the Outcomes of National Football League Games.” International Journal of Forecasting 19:257–270.CrossrefGoogle Scholar

  • Brown, M. and J. Sokol. 2010. “An Improved LRMC Method for NCAA Basketball Prediction.” Journal of Quantitative Analysis in Sports 6:1–23.Google Scholar

  • Carlin, B. P. 1996. “Improved NCAA Basketball Tournament Modeling via Point Spread and Team Strength Information.” The American Statistician 50:39–43.Google Scholar

  • Caudill, S. B. 2003. “Predicting Discrete Outcomes with the Maximum Score Estimator: The Case of the NCAA Men’s Basketball Tournament.” International Journal of Forecasting 19:313–317.CrossrefGoogle Scholar

  • Goldbloom, A. 2014. “March Machine Learning Mania.” (http://www.kaggle.com/c/march-machine-learning-mania), accessed June 18, 2014.

  • Harville, D. A. and M. H. Smith. 1994. “The Home-Court Advantage: How Large is it, and does it vary from Team to Team?” The American Statistician 48:22–28.Google Scholar

  • House, L., S. Leman, and C. Han. 2010. “Bayesian Visual Analytics (bava).” FODAVA Technical Report.Google Scholar

  • Hu, X., L. Bradel, D. Maiti, L. House, C. North, and S. Leman. 2013. “Semantics of Directly Manipulating Spatializations.” Visualization and Computer Graphics, IEEE Transactions on 19:2052–2059.Google Scholar

  • James, B. 1983. “Baseball Abstract.” New York: Ballantine.Google Scholar

  • Kvam, P. and J. S. Sokol. 2006. “A Logistic Regression/Markov Chain Model for NCAA Basketball.” Naval Research Logistics 53:788–803.CrossrefGoogle Scholar

  • Lewis, M. 2004. “Moneyball.” New York: W. W. Norton & Company.Google Scholar

  • Manski, C. F. and S. R. Lerman. 1977. “The Estimation of Choice Probabilities from Choice Based Samples.” Econometrica: Journal of the Econometric Society 1977–1988.Google Scholar

  • Massey, K. 2014. “College Basketball Rating Composite.” (masseyratings.com), accessed April 7, 2014.

  • Miller, S. J. 2007. “A Derivation of the Pythagorean Won-loss Formula in Baseball.” Chance 20:40–48.Google Scholar

  • Pomeroy, K. 2012. “Ratings Glossary.” (http://kenpom.com/blog/index.php/weblog/entry/ratings_glossary), accessed June 18, 2014.

  • Rosenthal, J. 2013. “The Rosenthal Fit: A Statistical Ranking of NCAA Men’s Basketball Teams.” (http://andrewgelman.com/2014/02/25/basketball-stats-dont-model-probability-win-model-expected-score-differential/), accessed June 18, 2014.

  • Sagarin, J. 2014. “Ratings Glossary.” (http://www.usatoday.com/sports/ncaab/sagarin/), accessed June 18, 2014.

  • Schwertman, N. C., K. L. Schenk, and B. C. Holbrook. 1996. “More Probability Models for the NCAA Regional Basketball Tournaments.” The American Statistician 50:34–38.Google Scholar

  • Silver, N. 2003. “Introducing Pecota.” Baseball Prospectus 2003:507–514.Google Scholar

  • Silver, N. 2014. “Building a Bracket is Hard this Year, But We’ll Help You Play the Odds.” (http://fivethirtyeight.com/features/nate-silvers-ncaa-basketball-predictions/), accessed June 18, 2014.

  • Smith, T. and N. C. Schwertman. 1999. “Can the NCAA Basketball Tournament Seeding be used to Predict Margin of Victory?” The American Statistician 53:94–98.Google Scholar

  • West, B. T. 2006. “A Simple and Exible Rating Method for Predicting Success in the NCAA Basketball Tournament.” Journal of Quantitative Analysis in Sports 2:3.Google Scholar

  • Wright, C. 2012. “Statistical Predictors of March Madness: An Examination of the NCAA Men’s’ Basketball Championship.” (http://economics-files.pomona.edu/GarySmith/Econ190/Wright%20March%20Madness%20Final%20Paper.pdf), accessed June 18, 2014.

About the article

Corresponding author: Andrew Hoegh, Virginia Tech – Department of Statistics, Hutcheson Hall – RM 406A 250 Drillfield Drive, Blacksburg, VA 24061, USA, e-mail:

Published Online: 2015-02-23

Published in Print: 2015-03-01

Citation Information: Journal of Quantitative Analysis in Sports, Volume 11, Issue 1, Pages 29–37, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388, DOI: https://doi.org/10.1515/jqas-2014-0054.

Export Citation

©2015 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

N. David Pifer, Timothy D. DeSchriver, Thomas A. Baker, and James J. Zhang
Journal of Sports Analytics, 2018, Page 1
Brian H. Yim and Kevin K. Byon
Journal of Global Sport Management, 2018, Page 1
J. T. Fry, Andrew Hoegh, Scotland Leman, and Matthew Montesano
Journal of Applied Statistics, 2018, Volume 45, Number 2, Page 298
Andrew Hoegh, Dipayan Maiti, and Scotland Leman
Journal of Computational and Graphical Statistics, 2017, Page 0

Comments (0)

Please log in or register to comment.
Log in