The technique presented in Section 3 was used to select two potential upsets per year for years between 2003 and 2015. The games selected as upsets are listed in Table . The number of upsets that occurred each year is shown in Table . Table lists the selection frequency and accuracy for each seed separately.

In total, the presented technique selected 10 upsets correctly out of 26 picks (38.4%) over 13 years.

Table 2 Number of upsets per year.

Table 3 Selection accuracy by seed.

Analysis of the results lead to several observations about the tendencies of the technique. We selected two upsets correctly one time, one upset correctly eight times, and zero upsets correctly four times. However, out of the 4 years where we got zero correct, 2 years had zero upsets actually occur. Therefore, we selected at least one upset correctly in nine out of the 11 years that had at least one upset occur. Moreover, given that we choose exactly two potential upsets per year, we can observe from the historical record that the maximum number of upsets that we could have chosen correctly is 18. Therefore, we selected 10 out of the 18 possible upsets that we could have selected correctly. The games chosen also tend to favor stronger seeds, as we pick a 13-seed fourteen times, a 14-seed eight times, and a 15-seed four times. Selecting higher-seed teams with higher frequency is reasonable because the 13-seeds are more likely to win than the 15-seeds.

Another interesting observation is that the tolerance value determined when using Algorithm 1 for each year remained constant for all years from 2009 onward. A constant tolerance could be evidence of some level of stability, because the fact that it stayed constant for seven consecutive years suggests that is likely to be the correct value to use in future years. However, due to the limited number of years of data available, the value for further years should be determined using the algorithm until this theory can be further tested.

To assess the variability of our technique, let *X*_{i}
denote the number of correctly predicted upsets in the *i*th year of analysis (i.e., year 1 is 2003, year 2 is 2004, and so on). Then we have

$$\overline{X}=\frac{1}{13}\sum _{i=1}^{13}{X}_{i}=\frac{10}{13}\approx 0.769$$(5)

and

$${S}^{2}=\frac{1}{13-1}\left(\sum _{i=1}^{13}{X}_{i}^{2}-13\cdot {\overline{X}}^{2}\right)\approx 0.359,$$(6)

so the sample standard deviation for number of correct selections per year is approximately 0.599. Assuming that the *X*_{i}
are independent, the variance in the number of correct selections across all 13 years is approximately 4.667, with a standard deviation of 2.16.

To compare the performance of our technique to predictions made by randomly choosing games, we determined the expected number of correct selections when two teams were randomly selected as predicted upsets. We can either randomly select two teams each year with equal probability or, because we know the historical frequency of each seed winning a round-of-64 game, we can randomly select two possible upsets each year using the historical frequency of an upset occurring for each seed as weights. The weights used for weighted random selection each year were calculated using upsets that occurred prior to that year. For example, when randomly selecting teams as predictions for 2010, the frequency of upsets from 1985 to 2009 was used for weighting. Each game was then determined to be an upset by modeling it as a Bernoulli variable with the probability being the fraction of historical games of that seed match-up that resulted in upsets. The following proposition establishes the number of upsets that would be correctly predicted through (1) random selection where each team has the same probability of being selected and (2) random selection where the probability of each team being selected is weighted based on the historical frequency of that team’s seed resulting in an upset.

In order to compute the expected value and variance, let the following terms be defined:

*U*_{y}
: Random variable representing the number of upsets selected correctly in year *y*

*x*_{sy}
: Number of games that seed *s* has won before year *y*

*G*_{s, y}
: Number of round-of-64 games played by *s*-seed teams before year *y*

*n*_{y}
: Total number of upsets that occurred prior to year *y*.

The variables are dependent on the year *y* because the weighted probabilities used each year are computed by using the frequency of upsets occurring before that year. Therefore, these probabilities change each year as new upsets occur each year of the tournament.

#### Proposition 1

*When choosing two 13, 14, or 15-seeded match-ups as upsets randomly for each year between y*_{1} and y_{2} where the probability of choosing each team is the historical frequency with which upsets occur for that seed, the expected number of upsets selected correctly is

$$\begin{array}{ccccc}E\left[{U}_{[{y}_{1},{y}_{2}]}\right]\hfill & =\sum _{y\in [{y}_{1},{y}_{2}]}(4(\frac{{x}_{sy}}{4{n}_{y}}+\frac{3{x}_{sy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{sy}})\hfill & & & \\ & +4\sum _{i\in \{13,14,15\}:i\ne s}\frac{{x}_{iy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{iy}}\left)\right(\frac{{x}_{sy}}{{G}_{y}}),\hfill & & & \end{array}$$(7)

*and the variance is*

$$\begin{array}{ccccc}Var\left[{U}_{[{y}_{1},{y}_{2}]}\right]\hfill & =\sum _{y\in [{y}_{1},{y}_{2}]}(4(\frac{{x}_{sy}}{4{n}_{y}}+\frac{3{x}_{sy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{sy}})\hfill & & & \\ & +4\sum _{i\in \{13,14,15\}:i\ne s}\frac{{x}_{iy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{iy}}\left)\right(\frac{{x}_{sy}}{{G}_{y}})\hfill & & & \\ & +8(\sum _{{s}_{i}={s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{3{x}_{{s}_{i}y}}{4{n}_{y}-{x}_{{s}_{i}y}}{\left(\frac{{x}_{i}}{{G}_{y}}\right)}^{2}\hfill & & & \\ & +\sum _{{s}_{i}\ne {s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{4{x}_{{s}_{j}y}}{4{n}_{y}-{x}_{{s}_{i}y}}\frac{{x}_{i}{x}_{j}}{{G}_{y}^{2}}).\hfill & & & \end{array}$$(8)

#### Proof

The following equations will compute the expected value and variance when using weighted random selection. A modification to use uniform random selection is provided at the end of the proof.

The probability of randomly selecting a specific team with seed *s* is

$$\begin{array}{ccccc}P\left(\text{selecting team with seed s}\right)=\hfill & P\left(\text{choose team first}\right)\hfill & & & \\ & +P(\text{not choose first and}\hfill & & & \\ & \text{choose second})\hfill & & & \end{array}$$(9)

Because there are four teams with each seed, this probability is multiplied by four. Also, the probability depends on year *y*. Therefore,

$$\begin{array}{ccccc}& P\left(\text{selecting team with seed}s\text{in year}y\right)\hfill & & & \\ & =4\left(\frac{{x}_{sy}}{4{n}_{y}}+\frac{3{x}_{sy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{sy}}\right)\hfill & & & \\ & +4\sum _{i\in \{13,14,15\}:i\ne s}\frac{{x}_{iy}}{4{n}_{y}}\frac{{x}_{sy}}{4{n}_{y}-{x}_{iy}}.\hfill & & & \end{array}$$(10)

The probability of the selected team being an actual upset is

$$P\left(\text{seed}s\text{correct in year}y\right)=\frac{{x}_{sy}}{{G}_{y}}.$$(11)

Therefore for each year, the expected number of correctly predicted upsets is

$$\begin{array}{ccccc}E\left[{U}_{y}\right]=\sum _{s\in \{13,14,15\}}\hfill & P\left(\text{selecting team with seed}s\text{in year}y\right)\hfill & & & \\ & *P\left(\text{seed}s\text{correct in year}y\right)\hfill & & & \end{array}$$(12)

Because the results of each year are independent, as upsets occurring one year do not depend on upsets occurring in the previous years, we can add the expected number of upsets each year to arrive at the expected number of upsets over a range of years. When *x*_{s}
, *n*, and *G* are determined by historical data prior to each year, we find the expected number of upsets from *y*
_{1} to *y*
_{2} (including *y*
_{2}) to be

$$E\left[{U}_{[{y}_{1},{y}_{2}]}\right]=\sum _{y\in [{y}_{1},{y}_{2}]}E\left[{U}_{y}\right].$$(13)

To compute the variance, we can rewrite the expected number of upsets as

$$E\left[{U}_{y}\right]=P({U}_{y}=1)+2P({U}_{y}=2).$$(14)

To compute the variance, *E*[*U*_{y}
]^{2} is required.

$$\begin{array}{ccccc}E{\left[{U}_{y}\right]}^{2}\hfill & ={1}^{2}*P({U}_{y}=1)+{2}^{2}*P({U}_{y}=2)\hfill & & & \\ & =E\left[{U}_{y}\right]+2P({U}_{y}=2)\hfill & & & \end{array}$$(15)

To find *P*(*U*_{y}
= 2), we let (*s*_{i}
, *s*_{j}
) be all possible seed pairs where *i* and *j* are each drawn from {13, 14, 15} with replacement . Then

$$\begin{array}{ccccc}P({U}_{y}=2)=\hfill & 4(\sum _{{s}_{i}={s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{3{x}_{{s}_{i}y}}{4{n}_{y}-{x}_{{s}_{i}y}}{\left(\frac{{x}_{i}}{{G}_{y}}\right)}^{2}\hfill & & & \\ & +\sum _{{s}_{i}\ne {s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{4{x}_{{s}_{j}y}}{4{n}_{y}-{x}_{{s}_{i}y}}\frac{{x}_{i}{x}_{j}}{{G}_{y}^{2}}).\hfill & & & \end{array}$$(16)

This allows us to express the variance as

$$\begin{array}{ccccc}Var\left[{U}_{y}\right]\hfill & =E\left[{U}_{y}\right]+8(\sum _{{s}_{i}={s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{3{x}_{{s}_{i}y}}{4{n}_{y}-{x}_{{s}_{i}y}}{\left(\frac{{x}_{i}}{{G}_{y}}\right)}^{2}\hfill & & & \\ & +\sum _{{s}_{i}\ne {s}_{j}}\frac{{x}_{{s}_{i}y}}{4{n}_{y}}\frac{4{x}_{{s}_{j}y}}{4{n}_{y}-{x}_{{s}_{i}y}}\frac{{x}_{i}{x}_{j}}{{G}_{y}^{2}}).\hfill & & & \end{array}$$(17)

Because the result of each year is independent, we can say that

$$Var\left[{U}_{[{y}_{1},{y}_{2}]}\right]=\sum _{y\in [{y}_{1},{y}_{2}]}Var\left[{U}_{y}\right].$$(18)

The above equations express the expected number and variance of correct selections when using weighted random selection. In order to compute the expected number and variance of correct selections when each seed is equally likely to be chosen, modify (10) to be

$$\begin{array}{ccccc}P\left(\text{select team with seed}s\text{in year}y\right)\hfill & =4\left(\frac{1}{12}+\frac{11}{12}\frac{1}{11}\right)\hfill & & & \\ & =\frac{8}{12},\hfill & & & \end{array}$$(19)

and (16) to be

$$P({U}_{y}=2)=\sum _{{s}_{i}={s}_{j}}\frac{4}{12}\frac{3}{11}{\left(\frac{{x}_{i}}{{G}_{y}}\right)}^{2}+\sum _{{s}_{i}\ne {s}_{j}}\frac{4}{12}\frac{4}{11}\frac{{x}_{i}{x}_{j}}{{G}_{y}^{2}}.$$(20)

with the other equations suitably modified because we want each seed to be chosen with probability 1/3 instead of having them depend on the historical frequency of upsets by each seed. □

By using available historical data and uniform random selection and the result from the proposition, the expected number of upsets to be chosen correctly when two upsets are selected per year between 2003 and 2015 is 3.26 with a variance of 2.93. These values change to 4.42 and 3.66, respectively, when using weighted random selection. The year-by-year expected value and variance for weighted random selection can be found in Table .

Table 4 Number of randomly selected upsets correctly chosen.

Our technique selected 10 upsets over the 13 year period between 2003 and 2015. Therefore, our technique produced a number of correct selections that is 2.92 standard deviations above the expected number of correct predictions if weighted random selection were used. If uniform random selection is used, our technique produces a number of correct selections that is 3.94 standard deviations above the expected value of 3.26 correct selections. This is a key comparison to establish the performance of our technique, because what we observe is that our technique performs significantly better than if we used a form of random selection. This means that using our technique to select potential upsets is a more reliable way of identifying potential upsets than choosing match-ups randomly. However, the variance of our technique, 4.667, is significantly higher than that of random selection.

We also applied our technique to the 11 and 12 seeded games instead of the 13, 14, and 15 seeded games. For the 11 and 12 seeded games, we modified Algorithm 1 to select three teams instead of two. Table provides the games selected. When run on the 11 and 12 seeds, there were cases where we had ties; the last step of our technique found multiple games that were selected with the same frequency. In order to resolve this, we count the number of correct selections by weighting each correct selection by the frequency with which it would be chosen if the ties were resolved by random selection. For example, if two teams were tied for third-most-selected and one of them was a correct upset selection, those two teams would be combined into a score of 0.5. In the event of a three-way tie for second with one correct selection, the resulting score would be 1/3. However, the accuracy of the model for the 11 and 12 seeded games was comparable to that expected by weighted random selection. If weighted random selection were used, the expected number of correct selections would be 12.71 with a variance of 8.56, while our technique selected 10.67 upsets correctly. One hypothesis as to why this might be the case is that there is enough information that can be drawn from the covariates of upsets in the past that makes it possible to predict upsets in the future better than randomly selecting teams for 13–15 seeded games, while the 11 and 12 seeded games do not contain enough distinguishing information in the statistics available to us. Because the gap between the seeds in the 11 and 12 seeded games is not as large, the inherent randomness of the games might be overwhelming the information that the covariates provide about what causes an upset to occur for those seeds.

As another way to assess performance, we attempted to estimate the drop-off of our technique by selecting four teams each year instead of two, thereby providing a “next-best” scenario. Selecting four teams each year resulted in 13 of 52 correct selections (compared with 10 of 26 when selecting two each year). This is a significant drop in accuracy, reducing our success rate from 38% to 25% with a marginal drop of three correct selections of 26 additional selected teams, equating to an 11% marginal success rate.

In order to determine the computational effectiveness of our technique, each step of the technique was run 10 times on a computer with an Intel Xeon E3-1246 quad-core processor at 3.5 GHz with 16 GB of memory. The runtimes for each step are listed in Table .

Table 6 Runtimes for each step of technique.

As noted above, we made a compromise in the training procedure where we trained the extra-trees classifier on data from all years, rather than just the years prior to the target year. This could potentially cause data leakage, but was done due to the limited amount of available data. To address this, we also performed some experiments where we trained the classifier on a subset of the available data. First, we tried training the classifier using all the games except those from the target year. For example, if we were attempting to select games for 2013, the classifier was trained on games from 1998–2012 and 2014–2015. This led to us selecting 8 of 26 games correctly (compared to 10 using all the games as described above). We also tried training on all the games before the target year, which for a target year of 2013 means we trained the classifier on games from 1998 to 2012. However, due to the small number of upsets, we only tested this method for years 2011–2015, because years prior to 2011 had very few upsets in the training set. In those five years, we selected 2 of 10 games correctly, compared to 4 using the full dataset for training.

In order to further evaluate the efficacy of our technique, we compare it to other methods found in the literature. We compared our results to the technique described in Lopez and Matthews (2015), which predicts the probability of each team winning against each other team. Therefore, to make a fair comparison, we used the Lopez and Matthews (2015) model to generate the probabilities of each low seed winning their first-round game and selected the two games where the lower seeded team had the highest probability of winning. The model was trained separately for each year using all games that occurred prior to that year’s tournament. One note is that their model uses the home team stats and away team stats as inputs to their logistic regression. In the event of a neutral game, we randomly assigned one team as home and one team as away in addition to marking the game as neutral using the neutral indicator in their model. Due to this randomization, we ran their model 100 times and for each run counted how many upsets would have been selected correctly. Their model gives an average of 7.46 upsets correctly out of 26 selections, with a maximum of nine and a minimum of six. Figure 2 shows a histogram of the number of upsets correctly predicted by the Lopez and Matthews (2015) technique. Because our technique selected 10 upsets correctly out of 26 selections, our technique was better over the given time period. However, their approach has a variance of 0.669 in the number of correctly predicted upsets, which is significantly better than ours.

Bryan, Steinke, and Wilkins (2006) did an analysis on predicting round-of-64 upsets using a regression model where they define an *upset* as a game where the lower-seeded team wins and a *nonupset* as a game where the higher-seeded team wins. Their results were that 41.8% of the games they selected as upsets were actually upsets and 80.99% of the games they selected as nonupsets were actually nonupsets between 1994 and 2005 and 36.36% of the games they selected as upsets were actually upsets and 80.26% of the games they selected as nonupsets were actually nonupsets between 2000 and 2005. However, they declared their model as successfully predicting an upset if “it predicts a probability of upset greater than the historic proportion of games at the given seed difference that resulted in an upset”. They also considered upsets as a 10, 11, 12, or 13-seed winning a game, whereas we consider an upset as a 13, 14, or 15-seed winning. Because we choose games specifically as upsets rather than those where the weaker team is more likely to win than the historical average and have a different definition for what constitutes an upset, the results are not directly comparable.

## Comments (0)