Show Summary Details
More options …

# Journal of Quantitative Analysis in Sports

### An official journal of the American Statistical Association

Editor-in-Chief: Steve Rigdon, PhD

CiteScore 2018: 1.67

SCImago Journal Rank (SJR) 2018: 0.587
Source Normalized Impact per Paper (SNIP) 2018: 1.970

Online
ISSN
1559-0410
See all formats and pricing
More options …
Volume 15, Issue 1

# The advantage of lefties in one-on-one sports

Francois Fagan
• Department of Industrial Engineering and Operations Research, Columbia University, 500 W. 120th Street, Mudd 315, New York, NY, USA
• Email
• Other articles by this author:
/ Martin Haugh
• Corresponding author
• Department of Industrial Engineering and Operations Research, Columbia University, 500 W. 120th Street, Mudd 315, New York, NY, USA
• Email
• Other articles by this author:
/ Hal Cooper
Published Online: 2018-09-15 | DOI: https://doi.org/10.1515/jqas-2017-0076

## Abstract

Left-handers comprise approximately 15% of professional tennis players, but only 11% of the general population. In boxing, baseball, fencing, table-tennis and specialist batting positions in cricket the contrast is even starker, with 30% or more of top players often being left-handed. In this paper we propose a model for identifying the advantage of being left-handed in one-on-one interactive sports (as well as the inherent skill of each player). We construct a Bayesian latent ability model in the spirit of the classic Glicko model but with the additional complication of having a latent factor, i.e. the advantage of left-handedness, that we need to estimate. Inference is further complicated by the truncated nature of data-sets that arise from only having data of the top players. We show how to infer the advantage of left-handedness when only the proportion of top left-handed players is available. We use this result to develop a simple dynamic model for inferring how the advantage of left-handedness varies through time. We also extend the model to cases where we have ranking or match-play data. We test these models on 2014 match-play data from top male professional tennis players, and the dynamic model on data from 1985 to 2016.

Keywords: Bayesian; latent ability models; left-handedness

## 1 Introduction and related literature

In this paper we investigate the extent to which being left-handed impacts elite performance and rankings in one-on-one interactive sports such as tennis, fencing, badminton etc. Our goal is to provide a coherent framework for measuring the benefit of being left-handed in these sports and tracking how this benefit evolves over time. We also aim to provide a framework for considering such questions as “who are the most talented players?” Of course for this latter question to be reasonable it must be the case that the lefty advantage (to the extent that it exists) can be decoupled from the notion of “talent.” Indeed it’s not at all clear that such a decoupling exists.

## 1.1 Causes and extent of the lefty advantage

In fact much of the early research into the performance of left-handers in sports relied on the so-called “innate superiority hypothesis” (ISH), where left-handers were said to have an edge in sporting competitions due to inherent neurological advantages associated with being left-handed (Geschwind and Galaburda 1985; Nass and Gazzaniga 1987). The presence of larger right hemispheric brain regions associated with visual and spatial functions, a lack of lateralization, and a larger corpus callosum (Witelson 1985) (the brain structure involved in communication across hemispheres) were all suggested as neurological mechanisms for this edge. Applications of the ISH to sport occurred primarily in fencing (Bisiacchi et al. 1985; Taddei, Viggiano, and Mecacci 1991; Akpinar et al. 2015), where left-handers appeared to have advantages in attentional tasks (in terms of response to visual stimuli), though there were also proponents for this view in other sports such as tennis (Holtzen 2000).

The idea that an innate advantage was responsible for the significant over-representation of left-handers in professional sports gradually lost momentum following the works of (Wood and Aggleton 1989; Aggleton and Wood 1990; Grouios et al. 2000a). These papers analyzed interactive sports such as tennis and non-interactive sports such as darts and pool. They found there was a surplus of left-handers in the interactive sports, but not generally in the non-interactive sports. (One exception is golf where Loffing and Hagemann (2016, Box 12.1) noted that the proportion of top left-handed1 golfers is higher than in the general population.) It was reasoned that any innate superiority should also bring left-handers into prominence in non-interactive sports and so alternative explanations were sought. The primary argument of (Wood and Aggleton 1989; Aggleton and Wood 1990) was that the prominence of left-handers in a given sport was due to the strategic advantages of being left-handed in that sport.

Indeed the prevailing2 explanation today for the over-representation of left-handers in professional interactive sports is the negative frequency-dependent selection (NFDS) effect. This effect is also assumed to underlie the so-called “fighting hypothesis” (Raymond et al. 1996) which explains why there is long-lasting handedness polymorphism in humans despite the fitness costs that appear to be associated with left-handedness. The NFDS effect is best summarized as stating that right-handed players have less familiarity competing against left-handed players (because of the much smaller percentage of lefties in the population) and therefore perform relatively poorly against them as a result. Key evidence supporting this hypothesis was the demonstration of mechanisms for how NFDS effects might arise (Daems and Verfaillie 1999; Stone 1999; Grossman et al. 2000; Grouios et al. 2000b). The difficulty of playing elite left-handed players in one-on-one interactive sports has long been recognized. For example, Breznik (2013) quotes Monica Seleš, the former women’s world number one tennis player:

“It’s strange to play a lefty (most players are right-handed) because everything is opposite and it takes a while to get used to the switch. By the time I feel comfortable, the match is usually over.”

A more general overview and discussion of NFDS effects can be found in the recent book chapter of Loffing and Hagemann (2016) who also provide extensive statistics regarding the percentage of top lefties across various sports. It is also perhaps worth mentioning that the debate between the ISH and the NFDS mechanism is not quite settled and some research, e.g. (Gursoy 2009) in boxing and (Breznik 2013) in tennis, still argue that the ISH has a role to play.

Recent analyses of combat sports (such as judo (Sterkowicz, Lech, and Blecharz 2010), mixed martial arts (Dochtermann, Gienger, and Zappettini 2014), and boxing (Loffing and Hagemann 2015)) also support the existence of NFDS effects on performance, although they suggest that alternative explanations must still be considered and that the resulting advantage is small. This agrees with (Loffing, Hagemann, and Strauss, 2012a) which suggests that although left-handedness provides an advantage, modern professionalism and training are acting to counter the advantage. Deliberate training was shown in (Schorer et al. 2012) to improve the performance of handball goalies against players of specific handedness while (Ullén, Hambrick, and Mosing 2016) explores the issue of deliberate training vs innate talent in depth. A recent article (Liew 2015) in the Telegraph newspaper in the UK, for example, noted how seven of the first seventeen Wimbledon champions in the open era were left-handed men while there were only two left-handers among the top 32 seeds in the 2015 tournament. Some of this variation is undoubtedly noise (see Section 6) but there do appear to be trends in the value of left-handedness. For example, the same Telegraph article noted that a reverse effect might be taking place in women’s tennis. Specifically, the article noted that 2015 was the first time in the history of the WTA tour that there were four left-handed women among the top 10 in tennis. In most sports the lefty advantage appears to be weaker in women than in men (Loffing and Hagemann 2016). The issue of gender effects of handedness in professional tennis is discussed in (Breznik 2013) where it is shown through descriptive statistics and a PageRank-style analysis that women do indeed have a smaller lefty advantage than men although it’s worth noting their data only extends to 2011. It is also suggested in (Breznik 2013) that the lefty advantage in tennis is weaker in Grand Slams than on the ATP and Challenger tours. They conjecture that possible explanations for this are that the very best players are more able to adjust to playing lefties and they may also be in a better position to tailor their training in anticipation of playing lefties.

Many other researchers have studied the extent of the leftie advantage and how it might arise. For example, (Goldstein and Young 1996) determines a game theoretic evolutionary stable strategy from payoff matrices of summarized performance, whereas (Billiard, Faurie, and Raymond 2005) explicitly uses frequency dependent interactions between left- and right-handed competitors. The work of Abrams and Panaggio (2012) has some similarity to ours as they also model professionals as being the top performers from a general population skill distribution. They use differential equations to define an equilibrium of transitions between left- and right-handed populations. These papers rely then on the NFDS mechanism to generate the lefty advantage. We note that equilibrium-style models suggest the strength of the lefty advantage might be inversely proportional to the proportion of top lefties. Such behavior is not a feature of our modeling framework but nor is it inconsistent with it as we do not model the NFDS mechanism (and resulting equilibrium). Instead our main goal is to measure the size of the lefty advantage rather than building a model that leads to this advantage.

Several researchers have considered how the lefty advantage has evolved with time. In addition to their aforementioned contributions, (Breznik 2013) also plot the mean rank of top lefties and righties over time in tennis and they obtain broadly similar results to those obtained by our Kalman filtering approach. Other researchers have also analyzed the proportion of top lefties in tennis over time. For example, (Loffing, Hagemann, and Strauss 2012b) fit linear and quadratic functions to the data and then extrapolate to draw conclusions on future trends. Their quadratic fit for the proportion of lefties in men’s tennis uses data from 1970 to 2010 and predicts a downwards trend from 1990 onwards. This is contradicted by our data from 2010 to 2015 in Section 6 which suggests that the number of top lefties may have been increasing in recent years. They also perform a separate analysis for amateur players, showing that the lefty advantage increases as the quality of players improves. It is also worth noting that Ghirlanda, Frasnelli, and Vallortigara (2009) introduce a model suggesting the possibility of the lefty advantage remaining stable over time.

## 1.2 Latent ability and competition models

Our work in this paper builds on the extensive3 latent ability and competition models literature. The original two competition models are the Bradley-Terry-Luce (BTL) model (Bradley and Terry 1952; Luce 1959) and the Thurstone-Mosteller (TM) model (Thurstone 1927; Mosteller 1951). BTL assumes each player i has skill Si, so that the probability of player i beating player j is a logistic function of the difference in skills. Specifically BTL assumes

$p(i▷j∣Si,Sj)=1/(1+e−(Si−Sj))$(1)

where ij denotes the event of i beating j. TM is defined similarly, but with the probability of i beating j being a probit function of the difference in their skills. Given match-play data, the skill of each player may be inferred using maximum likelihood estimation (MLE) where the probability of the match-play results is assumed to satisfy

$p(M∣S1,…,SN)=∏i(2)

where Mij is the number of matches where player i beats player j, and $Tij:=Mij+Mji$ is the total number of matches between players i and j. The inferred skills can then be used to predict the outcome of future matches.

There are a few notable extensions to the BTL and TM models including ELO (Elo 1978), Glicko (Glickman 1999) and TrueSkillTM (Herbrich, Minka, and Graepel 2007). ELO models the performance of each player in a match as having a Gaussian distribution centered around their respective skill. Glicko and TrueSkillTM extend the ELO model by putting a Gaussian prior on the skill of each player. These models have been widely applied to various competition settings. For example, ELO was developed as a chess ranking system, and TrueSkillTM has been used for online match making for video games on Xbox Live. These models allow one to infer the skill level of each player and thereby construct player rankings.

## 1.3 Contributions of this work

In this paper we propose a Bayesian latent ability model for identifying the advantage of being left-handed in one-on-one interactive sports but with the additional complication of having a latent factor, i.e. the advantage of left-handedness, that we need to estimate. Inference is further complicated by the truncated nature of data-sets that arise from only observing data related to the top players. The resulting pattern of data “missingness” therefore depends on the latent factor and so it is important that we model it explicitly. We show how to infer the advantage of left-handedness when only the proportion of top left-handed players is available. In this case we show that the distribution of the number of left-handed players among the top n (out of N) converges as N → ∞ to a binomial distribution with a success probability that depends on the tail-length of the innate skill distribution. Since this result would not be possible if we used short- or long-tailed skill distributions, we also argue for the use of a medium-tailed distribution such as the Laplace distribution when modeling the “innate”4 skills of players. We also use this result to develop a simple Kalman filtering model for inferring how the lefty advantage has varied through time in a given sport. Our Kalman filter/smoother enables us to smooth any spurious signals over time and should lead to a more robust inference regarding the dynamics of the lefty advantage.

We also consider various extensions of our model. For example, in order to estimate the innate skills of top players we consider the case when match-play data among the top n players is available. This model is a direct generalization of the Glicko model described earlier. Unlike other models, this extension learns simultaneously from (i) the over-representation of lefties among top players and (ii) match-play results between top lefties and righties. Previously these phenomena were studied separately. We observe that including match-play data in our model makes little difference to the inference of the lefty advantage and therefore helps justify our focus on the simplified model that only considers the proportion of lefties in the top n players. This extension does help us to identify the innate skills of players, however, we acknowledge that these so-called innate skills may only be of interest to the extent that the NFDS mechanism is responsible for the lefty advantage. (To the extent that the innate superiority hypothesis holds, it’s hard to disentangle the notion of innate skill or talent from the lefty advantage and using the phrase “innate skills” would be quite misleading in this case.)

The remainder of this paper is organized as follows. In Section 2 we describe our skill and handedness model and also develop our main theoretical results here. In Section 3 we introduce match-play results among top players into the model while in Section 4 we consider a variation where we only know the handedness and external rankings of the top players. We present numerical results in Section 5 using data from men’s professional tennis in 2014. In Section 6 we propose a simple Kalman filtering model for inferring how the lefty advantage in a given sport varies through time and we conclude in Section 7 where possible directions for future research are also outlined. Various proofs and other technical details are deferred to the appendix.

## 2 The latent skill and handedness model

We assume there is a universe of N players and for $i=1,…,N$ we model the skill, Si, of the ith player as the sum of his innate skill Gi and the lefty advantage L if he is in fact left-handed. That is, we assume

$Si=Gi+HiL$

where Hi is the handedness indicator with Hi = 1 if the player is left-handed and Hi = 0 otherwise. The generative framework of our model is:

• Left-handed advantage: $L∼N(0,σL2)$ where σL is assumed to be large and N denotes the normal distribution.

• For players $i=1,2,…,N$

• Handedness: $Hi∼Bernoulli(q)$ where q is the proportion of left-handers in the overall population.

• Innate skill: $Gi∼G$ for some given distribution G

• Skill: $Si=Gi+HiL$.

The joint probability distribution corresponding to the generative model then satisfies

$p(S,L,H)=p(L)∏i=1Np(Hi)p(Si∣Hi,L)$(3)

where we note again that N is the number5 of players in our population universe. We assume we know (from public results of professional competitions etc.) the identity of the top $n players as well as the handedness, i.e. left or right, of each of them. Without loss of generality, we let these top n players have indices in ${1,…,n}$ and define the corresponding event

$Topn,N:={mini=1,…,n{Si}≥maxi=n+1,…,N{Si}}.$(4)

Note, however, that even when we condition on $Topn,N$, the indices in ${1,…,n}$ are not ordered according to player ranking so for example it is possible that $S1 or $Sn etc.

## 2.1 Medium-tailed priors for the innate skill distribution

Thus far we have not specified the distribution G from which the innate skill levels are drawn in the generative model. Here we provide support for the use of medium-tailed distributions such as the Laplace distribution for modeling these skill levels. We do this by investigating the probability of top players being left-handed as the population size N becomes infinitely large. Consider then

$limN→∞p(nl∣L;n,N)$(5)

where nl denotes the number of left-handers among the top n players. For the skill distribution to be plausible, the probability that top players are lefthanded should be increasing in L and be consistent with what we observe in practice for a given sport. Letting $Binomial(x;n,p)$ denote the probability of x successes in a $Binomial(n,p)$ distribution, we have the following result.

#### Proposition 1

Assume that g has support R. Then

$limN→∞p(nl∣L;n,N)=Binomial(nl;n,qq+(1−q)c(L))$(6)

where

$c(L):=lims→∞g(s)g(s−L)$

if the limit $c⁢(L)$ exists.

#### Proof

See Appendix A.1. □

The function $c(L)$ characterizes the tail length of the skill distribution. If $c(L)=0$ for all L > 0 as is the case for example with the normal distribution, then g is said to be short-tailed. In contrast, long-tailed distributions such as the t distribution, have $c(L)=1$ for all $L∈ℝ$. If a distribution is neither short- nor long-tailed, we say it is medium-tailed.

If we use a short-tailed innate skill distribution and L > 0, then $c(L)=0$ and $limN→∞p(nl∣L;n,N)=Binomial(nl;n,1)$. That is, if there is any advantage to being left-handed and the population is sufficiently large, then the top players will all be left-handed almost surely. This property is clearly unrealistic even for sports with a clear left-handed advantage such as fencing, since it is not uncommon to have top ranked players that are right-handed. This raises questions over the heavy use of the short-tailed normal distribution in competition models (Elo 1978; Glickman 1999; Herbrich et al. 2007) and suggests6 that other skill distributions may be more appropriate. As an alternative, consider a long-tailed distribution. In this case we have $c(L)=1$ for all $L∈ℝ$ and $p(nl∣L;n,N)→Binomial(nl;n,q)$ as N → ∞. This too is undesirable, since the probability of a top player being left-handed does not depend on L in the limit and agrees with the probability of being left-handed in the general population. As a consequence, such a distribution would be unsatisfactory for modeling in those sports where we typically see left-handers over-represented among the top players.

We therefore argue that the ideal distribution for modeling the innate skill distribution is a medium-tailed distribution such as the standard Laplace distribution which has PDF

$g(x)=12bexp(−|x|/b)$(7)

where $b=1/2$ is the standard scale parameter of the distribution. It is easy to see in this case that $c(L)=exp(−L/b)$ and substituting this into (6) yields

$limN→∞p(nl∣L;n,N)=Binomial(nl;n,qq+(1−q)exp(−L/b))$(8)

which is much more plausible. For very small values of L, the probability of top players being left-handed is approximately q which is what we would expect given the small advantage of being left-handed. For large positive values of L we see that the probability approaches 1 and for intermediate positive values of L we see that the probability of top players being left handed lies in the interval $(q,1)$. This of course is what we observe in many sports such as fencing, table-tennis, tennis etc. For this reason, we will restrict ourselves to medium-tailed distributions and specifically the Laplace7 distribution for modeling the innate skill levels in the remainder of this paper.

## 2.2 Large N inference using only aggregate handedness data

Following the results of Proposition 1, we assume here that we only know the number nl of the top n players who are left-handed. We shall see later that only knowing nl results in little loss of information regarding L compared to the full information case of Section 3 where we have knowledge of the handedness and all match-play results among the top n players. We shall make use of this observation in Section 6 when we build a model for inferring the dynamics of L through time series observations of nl.

## 2.2.1 Posterior of L in an infinitely large population

Applying Bayes’ rule to (6) yields

$limN→∞p(L∣nl;n,N)∝limN→∞p(L)p(nl∣L;n,N)=p(L)Binomial(nl;n,qq+(1−q)exp(−L/b))=p(L)(nnl)(qq+(1−q)exp(−L/b))nl(1−qq+(1−q)exp(−L/b))nr∝p(L)(1q+(1−q)exp(−L/b))nl(exp(−L/b)q+(1−q)exp(−L/b))nr=p(L)(exp(nl/n⋅L/b)qexp(L/b)+1−q)n$(9)

where nr := nnl is the number of top righthanded players, all factors independent of L were absorbed into the constant of proportionality and the binomial distribution term on the second line is now written explicitly as a function of nl. We shall verify empirically in Section 5 that (9) is a good approximation of $p(L∣nl;n,N)$ if the population size N is large. As the number of top players n increases while keeping $nl/n$ fixed, the nth power in (9) causes the distribution to become more peaked around its mode. This effect can be seen in Figure 1 where we have plotted the r.h.s of (9) for different values of n. If n is sufficiently large then the data will begin to overwhelm the prior on L and the posterior will become dominated by the likelihood factor, i.e. the second term on the r.h.s. of (9). This likelihood term achieves its maximum at

$L∗:=blog(nl/n1−nl/n⋅1−qq)$(10)

Figure 1:

The value of $limN→∞p(L∣nl;n,N)$ as given by the r.h.s. of (9) for different values of n. We assume an $N(0,1)$ prior for $p(L)$, a value of $q=11%$ and we fixed $nl/n=25%$. The dashed vertical line corresponds to L from (10) and the Laplace approximations are from (11).

which we plot as the dashed vertical line in Figure 1. We can clearly see from the figure that the density becomes more peaked around L as n increases while keeping $nl/n$ fixed. The value8 of L provides an easy-to-calculate point estimate of L for large values of n.

The bell-shaped posteriors in Figure 1 suggest that we might be able to approximate the posterior of L as a Gaussian distribution. This can be achieved by first approximating (9) as a Gaussian distribution over L via a Laplace approximation (Barber 2012, Sec. 28.2) to the second term on the r.h.s. of (9). Specifically, we set the mean of the Laplace approximation equal to the mode, L, of (6) and then set the precision to the second derivative of the logarithm evaluated at the mode. This yields:

$(exp(nl/n⋅L/b)qexp(L/b)+1−q)n∼∝N(L;L∗,b2nnrnl)$(11)

Note that this use of the Laplace approximation is non-standard as the left side of (11) is not a distribution over L, but merely a function of L. However if $0 then the left side of (11) is a unimodal function of L and, up to a constant of proportionality, is well approximated by a Gaussian. We can then multiply the normal approximation in (11) by the other term, $p(L)$, which is also Gaussian to construct a final Gaussian approximation for the r.h.s. of (9). This is demonstrated in Figure 1 where (9) is plotted for different values of n and where the likelihood factor was set9 to the exact value of (6) or the Laplace approximation of (11). It is evident from the figure that the Laplace approximation is extremely accurate. This gives us the confidence to use the Laplace approximation in Section 6 when we build a dynamic model for L.

## 2.3 Interpreting the posterior of L

In the aggregate data regime we do not know the posterior distributions of the skills and thus cannot directly infer the effect of left-handedness on match-play outcomes. However, we can still infer this effect in aggregate. As we shall see, the relative ranking of left-handers is governed by the value of L. In particular, if we continue to assume the Laplace distribution for innate skills, then we show in Appendix A.3 that for any fixed and finite value of L, the difference in skills between players at quantiles λj and λi satisfies

$limN→∞(S[Nλj]−S[Nλi])=blog(λi/λj)$(12)

where the convergence is in probability and we use $S[Nλi]$ to denote the [i]th order statistic of $S1,…,SN$. Suppose now10 that $Nλi=x$ and $Nλj=exp(−k/b)x$. Assuming (12) continues to hold approximately for large (but finite) values of N it immediately follows that

$S[Nλj]−S[Nλi]≈k.$(13)

Consider now a left-handed player of rank x with innate skill, G. All other things being equal, if this player was instead right-handed then his skill would change from $S=G+L$ to $S=G$. This corresponds to a value of $k=−L$ and so that player’s ranking would therefore change from x to $exp(L/b)x$ which of course would correspond to an inferior ranking for positive values of L. We can therefore interpret the advantage of being left-handed as improving, i.e. lowering, one’s rank by a multiplicative factor of $exp(−L/b)$.

We can use this result to infer the improvement in rank for lefties due to their left-handedness in various sports (Flatt 2008). We do so by substituting the fraction of top players that are left-handed, $nl/n$, into (10) to obtain L, our point estimate of L. Following the preceding discussion, the (multiplicative) change in rank due to going from left-handed to right-handed can then be approximated by

$exp(L∗/b)=nln−nl⋅1−qq$(14)

which again follows from (10). It is important to interpret (14) correctly. In particular, it represents the multiplicative drop in rank if a particular left-handed player were somehow to give up the advantage of being left-handed. It does not represent his drop in rank if he and all other left-handed players were to simultaneously give up the advantage of being left-handed. In this latter case, the drop in rank would not be as steep as that given in (14) since that player would still remain higher-ranked than the other left-handed players who were below him in the original ranking. In fact, we can argue that the approximate absolute drop in ranking for a left-handed player of rank x when all left-handers give up the benefit of being left-handed is given by the number of right-handed players between ranks x and $exp(L/b)x$

$(1−nl/n)×(xexp(L∗/b)−x).$(15)

We can argue for (15) by noting that on average a fraction $nl/n$ of the players between ranks x and $xexp(L∗/b)$ will be left-handed and they will also fall and remain ranked below the original player when the lefty advantage is stripped away. Therefore the new rank of the left-handed player originally ranked x will be

$x+(1−nl/n)×(xexp(L∗/b)−x)$(16)

when all left-handed players simultaneously give up the advantage of being left-handed. Simplifying (16) using (14) yields a new ranking of

$nlnqx.$

We therefore refer to the r.h.s. of (14) and $nl/(nq)$ as $Dropalone$ and $Dropall$, respectively. Results for various sports are displayed in Table 1. For example, the table suggests that left-handed table-tennis players would see their ranking drop by a factor of approximately 2.91 if the advantage of left-handedness could somehow be stripped away from all of them.

Table 1:

Proportion of left-handers in several interactive one-on-one sports (Flatt 2008; Loffing and Hagemann 2016) with the relative changes in rank under the Laplace distribution for innate skills with $q=11%$.

These results, while pleasing, are not very surprising. After all, a back-of-the-envelope calculation could come to a similar conclusion as follows. The proportion of top table-tennis players who are left-handed is ≈ 32% but the proportion11 of left-handers in the general population is ≈ 11% . Assuming the top left-handers are uniformly spaced among the top right-handers, we would therefore need to reduce the ranking of all left-handers in fencing by a factor of $≈32/11≈2.91=Dropall$ to ensure that the proportion of top-ranked left-handed fencers matches the proportion of left-handers in the general population.

While these results and specifically the interpretation of $exp(L∗/b)$ described above are therefore not too surprising, we can and do interpret them as a form of model validation. In particular, they validate the choice of a medium-tailed distribution to model the innate skill level of each player. We note from Proposition 1 and the following discussion that it would not be possible to obtain these results (in the limit as N → ∞) using either short- or long-tailed distributions to model the innate skills.

## 3 Including match-play and handedness data

Thus far we have not considered the possibility of using match-play data among the top n players to infer the value of L. In this section we extend our model in this direction so that L and the innate skills of the top n players can be inferred simultaneously. We suspect (and indeed this is confirmed in the numerical results of Section 5) that inclusion of the match-play data adds little information regarding the value of L over and beyond what we can already infer from the basic model of Section 2. However, it does in principle allow us to try and answer hypothetical questions regarding the innate skills of players and win-probabilities for players with and without the benefit of left-handedness. We therefore extend our basic model as follows:

• For each combination of players $i

• Match-play results: $Mij∼Binomial(Tij,p(i▷j∣Si,Sj;σM))$

where the probability that player i defeats player j is defined according to

$p(i▷j∣Si,Sj;σM):=11+e−σM(Si−Sj).$(17)

In contrast with the win probability in (1), our win probability in (17) has a hyperparameter σM that we use to adjust for the predictability of each sport. In less predictable sports, for example, weaker players will often beat stronger players and so even when $Si≫Sj$ we have $p(i▷j∣Si,Sj)≈1/2$. On the other hand, more predictable sports would have a larger value of σM which accentuates the effects of skill disparity. Having an appropriate σM allows the model to fit the data much more accurately than if we had simply set σM = 1 as is the case in BTL. It’s worth noting that instead of scaling by σM in (17), we could have scaled the skills themselves so that $Si=σM(Gi+HiL)$ and then used the win probability in (1). We decided against this approach in order to keep the scale of L consistent across sports.

## 3.1 The posterior distribution

The joint probability distribution corresponding to the extended model now satisfies

$p(S,L,M,H)=p(L)∏i=1Np(Hi)p(Si∣Hi,L)∏i(18)

As before we condition on $Topn,N$ so the posterior distribution of interest is then given by

$p(S1:n,L∣M1:n,H1:n,Topn,N)$(19)

where $Sa:b:=(Sa,Sa+1,…,Sb)$, $H1:n$ are their respective handedness indicators and $M1:n$ denotes the match-play results among the top n ranked players. Bayes’ rule therefore implies

$p(S1:n,L∣M1:n,H1:n,Topn,N)∝p(S1:n,L,M1:n,H1:n,Topn,N)=p(L)p(H1:n∣L)p(S1:n∣L,H1:n)p(Topn,N∣S1:n,L,H1:n)×p(M1:n∣S1:n,L,H1:n,Topn,N)∝p(L)∏i≤np(Si∣L,Hi)p(Topn,N∣S1:n,L)∏i,j≤np(Mij∣Si,Sj)$(20)

where in the final line we have simplified the conditional probabilities and dropped the $p(H1:n∣L)=p(H1:n)$ factor since it is independent of $S1:n$ and L. This last statement follows because, as emphasized above, the first n players are not player rankings but are merely12 player indicators from the universe of N players. We know the form of $p(L)$, $p(Si∣L,Hi)$ and $p(Mij∣Si,Sj)$ from the generative model but need to determine $p(Topn,N∣S1:n,L)$. The prior density on the skill of a player in the general population (using the generative framework with Bernoulli(q) handedness) satisfies

$f(S∣L):=p(S∣L)=𝔼H[p(S∣H,L)]=qg(S−L)+(1−q)g(S)$(21)

where we recall that g denotes the PDF of the innate skill distribution, G. Letting $F(S∣L)$ denote the CDF of the density in (21), it follows that

$p(Topn,N∣S1:n,L)=∏i=n+1Np(Si≤min{S1:n}∣S1:n,L)=F(min{S1:n}∣L)N−n.$

We can now simplify (20) to obtain

$p(S1:n,L∣M1:n,H1:n,Topn,N)∝p(L)∏i=1ng(Si−LHi)F(min{S1:n}∣L)N−n∏i,j≤np(Mij∣Si,Sj).$(22)

## 3.2 Inference via MCMC

We use a Metropolis-Hastings (MH) algorithm to sample from the posterior in (22). By virtue of working with the random variables $S1:n$ conditional on the event $Topn,N$, the algorithm will require taking skill samples that are far into the tails of the skill prior, G. In order to facilitate fast sampling from the correct region, we develop a tailored approach for both the initialization and the proposal distribution of the algorithm. The key to our approach is to center and de-correlate the posterior distribution, a process known as “whitening” (Murray and Adams 2010; Barber 2012). This leads to good proposals in the MH sampler and is also useful in setting the skill scaling hyperparameter, σM, as we discuss below. More specifically, we will use a Gaussian proposal distribution with mean vector equal to the current state of the Markov chain and covariance matrix, $λ2Σ/n$, where13 λ = 2.38 and Σ is an approximation to the covariance of the posterior distribution. In the numerical results of Section 5, we will run multiple chains in order to properly diagnose convergence to stationarity using the well-known Gelman-Rubin $R^$ diagnostics. Towards this end, the starting points of the chains will be generated from an $N(μ,γI)$ distribution where μ is an approximation to the mean of the posterior distribution and where γ is set sufficiently large so as to ensure the starting points are over-dispersed.

## 3.2.1 Approximating the mean and covariance of the posterior

The posterior of the skills $S1:n$ can be approximated by considering the posterior distribution of its order statistics, $S[1:n]:=(S[1],S[2],…,S[n])$, where $S[i]$ is the ith largest of $S1:n$. If we were to reorder the player indices so that the index of each player equaled his rank, then we would have $S1:n=S[1:n]$. Unfortunately we don’t a priori know the ranks of the players. Indeed their ranks are uncertain and will follow a distribution over permutations of $1,…,n$ that is dependent on the data. However, it is possible to construct a ranking of players that should have a high posterior probability by running BTL on their match-play results. If we order the player indices according to their BTL ranks then we would expect

$p(S1:n=s1:n,L∣M1:n,H1:n,Topn,N)≈p(S[1:n]=s1:n,L∣M[1:n],H[1:n])$(23)

where $M[1:n],H[1:n]$ are the match-play results and handedness of the now ordered top n players.

We can estimate the mean and covariance of the distribution given by (23) via Monte Carlo. First let us apply Bayes’ rule to separate L, $S[n]$ and $S[1:n−1]$ and obtain

$p(S[1:n],L∣M[1:n],H[1:n])=p(S[1:n−1]∣S[n],L,M[1:n],H[1:n])p(S[n]∣L,M[1:n],H[1:n])p(L∣M[1:n],H[1:n]).$(24)

We would like to jointly sample from this distribution by first sampling L, then sampling $S[n]$ conditional on L, and finally sampling $S[1:n−1]$ conditional on both L and $S[n]$. The empirical mean and covariance of such samples would approximate the true mean and covariance of posterior of L and $S[1:n]$, which in turn would approximate the mean and covariance of L and $S1:n$ — our ultimate object of interest. Unfortunately sampling from (24) is intractable, but we can approximately sample from it as follows:

• 1.

As discussed in Section 2.2, for large populations the posterior of L can be approximated as

$p(L∣M[1:n],H[1:n])∼∝p(L)(exp(nl/n⋅L/b)qexp(L/b)+1−q)n.$(25)

We can easily simulate from the distribution on the r.h.s. of (25) by computing its CDF numerically and then using the inverse transform approach. It can be seen from Figure 2 in Section 5 that this approximation is very accurate for large N.

• 2.

It is intractable to simulate $S[n]$ directly according to the conditional distribution on the r.h.s. of (24). It seems reasonable to assume, however, that

$p(S[n]∣L,M[1:n],H[1:n])≈p(S[n]∣L),$(26)

where we ignore the conditioning on $M[1:n]$ and $H[1:n]$. As with L, we can use the inverse transform approach to generate $S[n]$ according to the distribution on the r.h.s. of (26) by noting that its CDF is proportional to (David and Nagaraja 2003, p. 12)

$F(S[n]∣L)N−n(1−F(S[n]∣L))n−1f(S[n]∣L)$

where f and F are as defined in (21) and the following discussion.

• 3.

Finally, we can handle the conditional distribution of $S[1:n−1]$ on the r.h.s. of (24) by assuming

$p(S[1:n−1]∣S[n],L,M[1:n],H[1:n])≈p(S[1:n−1]∣S[n],L)$(27)

where we again ignore the conditioning on $M[1:n]$ and $H[1:n]$. It is easy to simulate $S[1:n−1]$ from the distribution on the r.h.s of (27). We do this by simply generating n − 1 samples from the distribution $p(S∣S>S[n],L)=p(G+HL∣G+HL>S[n],L)$ (a simple truncated distribution) and then ordering the samples.

We can run steps 1 to 3 repeatedly to generate many samples of $(S[1:n],L)$ and then use these samples to estimate the mean, μ, and covariance matrix, Σ, of the true posterior distribution of $(S1:n,L)$ (where we recall that the top n players have now been ordered according to their BTL ranking). As described above, the resulting Σ is used in the proposal distribution for the MH algorithm while we use μ as the mean of the over-dispersed starting points for each chain. The accuracy of the approximation is empirically investigated in Section 5 and is found to be very close to the true mean and covariance of L and $S1:n$. We note that the approximate mean, μ, and covariance, Σ, could also be used in other MCMC algorithms such as Hamiltonian Monte Carlo (Neal 2011, Sec 4.1) or elliptical slice sampling (Murray, Adams, and Mackay 2010).

## 3.2.2 Setting σM via an empirical Bayesian approach

The hyperparameter σM was introduced in (17) to adjust for the predictability of the sport and we need to determine an appropriate σM in order to fully specify our model. A simple way to do this is to set σM to be the maximum likelihood estimator over the match-play data where the skills are set to be $S1:n=μS$, the approximate posterior mean of the skills derived above. That is,

$σM=argmaxσ>0∏i,j≤np(Mij∣Si=μi,Sj=μj;σ).$(28)

We are thus adopting an empirical Bayes approach where a point estimate of the random variables is used to set the hyperparameter, σM; see (Murphy 2012, p. 172).

An alternative to the empirical Bayes approach would be to allow σM be a random variable in the generative model and to infer its value via MCMC. Unfortunately this approach leads to complications. Recall from Section 2 that σM can be interpreted as scaling the left-handed advantage and innate skill distributions. If σM is allowed to be random then this effectively changes the skill distribution. For example if the skills were normally distributed conditioned on σM, and σM were distributed as an inverse gamma distribution, then the skills would effectively have a t distribution (as the inverse gamma is a conjugate prior to the normal distribution). Since we wish to keep our skills as Laplace distributed, or more generally medium-tailed, it is simpler to fix σM as a hyperparameter.

## 4 Using external rankings and handedness data

An alternative variation on our model is one where we know the individual player handedness of each of the top n players and also have an external ranking scheme of their total skills. For example, such a ranking may be available for the professional athletes in a given sport, e.g. the official world rankings maintained by the World ATP Tour for men’s tennis (Stefani 1997). We will assume without loss of generality that the player indices are ordered as per the given rankings so that the ith ranked player has index i in our model and $Si≥Si+1$ for all $i. We are therefore assuming that the external rankings are “correct” and so we can condition our generative model on these rankings. Specifically, we tighten the assumption made in (4) that players indexed 1 to n are the top n players to assume that the ith indexed player is also the ith ranked player for $i=1,…,n$. Then a similar argument that led to (22) implies that the posterior of interest satisfies

$p(S1:n,L∣H1:n,Si≥Si+1∀i(29)

where I denotes the indicator function. We can simulate from the posterior distribution in (29) using a Gibbs sampler. The conditional marginal distribution (required for the Gibbs sampler) of each player’s skill is then a simple truncated distribution so that

$p(Si∣S−i,H1:n,L)∝g(Si−LHi)I[Si+1≤Si≤Si−1]$(30)

for $1 and where $S−i:={Sj:j≠i,1≤j≤n}$. Similarly the conditional distributions of S1 and Sn satisfy

$p(S1∣S−1,H1:n,L)∝g(S1−LH1)I[S1≤S2]$

and

$p(Sn∣S−n,H1:n,L)∝g(Sn−LHn)F(Sn∣L)N−nI[Sn−1≤Sn].$

Conveniently, the skills of the odd ranked players can be updated simultaneously since they are all independent of each other conditional on the skills of the even ranked players. Similarly, the even-ranked players can also be updated simultaneously conditional on the skills of the odd-ranked players. This makes the sampling parallelizable and efficient to implement when using Metropolis-within-Gibbs. Our algorithm therefore updates the variables in three blocks:

1. Update all even skills simultaneously.

2. Update all odd skills simultaneously.

3. Update L via a Metropolis-Hastings14 (MH) step with

$p(L∣S1:n,H1:n)∝p(L)∏i=1ng(Si−LHi)F(Sn∣L)N−n.$

In the absence of match-play data, we believe this model should yield slightly more accurate inference regarding L than the base model of Section 2 when the left-handers are not evenly spaced among the top n players. For example, it may be the case that all left-handers in the top n are ranked below all the right-handers in the top n. While such a scenario is of course unlikely, it would suggest that the value of L is not as large as that inferred by the base model which only considers nl and not the relative ranking of the nl players among the top n. The model here accounts for the relative ranking and as such, should yield a more accurate inference of L to the extent that the lefties are not evenly spaced among the top n players.

## 5 Numerical results

We now apply our models and results to Mens ATP tennis. Specifically, we use handedness data as well as match-play results from ATP Tennis Navigator (Tennis Navigator 2004), a database that includes more than seven thousand players from 1980 until the present and hundreds of thousands of match results at various levels of professional and semi-professional tennis. We restrict ourselves to players for whom handedness data is available and who have played a minimum number of games (here, set at thirty). This last restriction is required because we run BTL as a preprocessing step in order to extract the top n = 150 players before applying our methods and because BTL can be susceptible to large errors if the graph of matches (with players as nodes, wins as directed edges) is not strongly15 connected. Using data on numbers of recreational tennis players (Tennis Europe 2015; The Physical Activity Council 2016), we roughly estimate a universe of $N=100,000$ advanced players but we note that our results were robust to the specific value of N that we chose. Specifically, we also considered N = 1 and N = 50 million and obtained very similar results regarding L. We used data from 2014 for all of our experiments and our results were based on the model of Section 2 with a Laplace prior on the skills and an uninformative prior on L with σL = 10. The MCMC chains for the extended model of Section 3 were initialized by a random perturbation from the approximate mean as outlined in Section 3.2, and convergence checked using the Gelman-Rubin diagnostic (Gelman and Rubin 1992).

Figure 2:

Posteriors of the left-handed advantage, L, using the inference methods developed in Sections 2, 3 and 4. “Match-play” refers to the results of MCMC inference using the full match-play data, “Handedness and Rankings” refers to MCMC inference using only individual handedness data and external skill rankings, “Aggregate handedness” refers to (9) and “Aggregate handedness with Laplace Approximation” refers to (9) but where the Laplace approximation in (11) substitutes for the likelihood term. “Match-play without conditioning on $Topn,N$” is discussed in Section 5.2.

## 5.1 Posterior distribution of L

In Figure 2, we display the posterior of L obtained using each of the different models of Sections 2 to 4 using data from 2014 only. We observe that these inferred posteriors are essentially identical. This is an interesting result and it suggests that for large populations there is essentially no additional information conveyed to the posterior of L by match-play data or ranked handedness if we are already given the proportion of left-handers among the top n players. The posterior of L can be interpreted in terms of a change in rank as discussed Section 2.3. Since the posterior of the aggregate handedness with the Laplace approximation agrees with the posterior obtained from the full match-play data, we would argue the results of Table 1 are valid even in the light of the match-play data. These results suggest that being left-handed in tennis improves a player’s rank by a factor of approximately 1.36 on average. Of course, these results were based on 2014 data only and as we shall see in Section 6, there is substantial evidence to suggest that L has varied through time.

While match-play data and ranked handedness therefore provide little new information on L over and beyond knowing the proportion of left-handers in the top n players, we can use the match-play model to answer hypothetical questions regarding win probabilities when the lefty advantage is stripped away from the players’ skills. We discuss such hypothetical questions in Section 5.3.

## 5.2 On the importance of conditioning on $Topn,N$

We now consider if it’s important to condition on $Topn,N$ as we do in (22) when evaluating the lefty advantage. This question is of interest because previous papers in the literature have tried to infer the lefty advantage without conditioning on the players in their data-set being the top ranked players. For example, (Del Corral and Prieto-Rodríguez 2010) built a probit model for predicting match-play outcomes and their factors included a player’s rank, height, age and handedness. Using data from 2005 to 2008 they did not find a consistent statistically significant advantage to being lefthanded in this context. That said, their focus was not inferring the lefty advantage, but rather in seeing how useful an indicator it is for predicting match-play outcomes. Moreover, it seems reasonable to assume that any lefty advantage would already be accounted for by a player’s rank. In contrast, (Breznik 2013) attempted to infer the lefty advantage by comparing the mean ranking of top left- and right-handed players as well as the frequency of matches won by top left- vs top right-handed players. Using data from 1968 to 2011 they found a statistically significant advantage for lefties. The analysis in these studies do not account for the fact that the players in their data-sets were the top ranked players. In our model, this is equivalent to removing the $Topn,N$ condition from the left side of (22) and $F(min{S1:n}|L)N−n$ from its right side.

It makes sense then to assess if there is value in conditioning on $Topn,N$ when we use match-play data (and handedness) among top players to estimate the lefty advantage. We therefore re-estimated the lefty advantage by considering the same match-play model of Section 3 but where we did not condition on $Topn,N$. Inference was again performed using MCMC on the posterior of (22) but with the $F(min{S1:n}|L)N−n$ factor ignored. Based on this model we arrive at a very different posterior for L and this may also be seen in Figure 2. Indeed when we fail to condition on $Topn,N$ we obtain a posterior for L that places more probability on a lefty disadvantage than a lefty advantage and whose mode is negative. The reason for this is that in 2014 there were few very highly ranked left-handers with Nadal being the only left-hander among the top 20. In fact when we apply BTL to the match-play data among the top 150 players that year we find that the mean rank of top right-handers is 74.07 while the mean rank of top lefties is 83.39. A model that only considers results among top 150 players that year therefore concludes there appears to be a disadvantage to being left-handed. In contrast, when we also condition upon $Topn,N$ (as indeed we should since this provides additional information) we find this conclusion reversed so that there appears to be a lefty advantage. Moreover this reversal makes sense: with 23 players in the top 150 being lefties we see the percentage of top lefties is 15.33% which is greater than the 11% assumed for the general population. We therefore see that conditioning on $Topn,N$ can result in a significant improvement in the inference of L over and beyond just considering the match-play results among the top n players.

## 5.3 Posterior of skills with and without the advantage of left-handedness

We now consider the posterior distribution of the innate skills and how they differ (in the case of left-handers) from the posterior of the total skills. We also consider the effect of L on match-play probabilities and rankings of individual players. These results are based on the extended model of Section 3. In Figures 3 and 4 we demonstrate the posterior of the skills of Rafael Nadal who plays16 left-handed and the right-handed Roger Federer. During their careers these two players have forged perhaps the greatest rivalry in modern sport. The figures display the posterior distributions of the innate skill, G, and total skill, $S=G+HL$, of Nadal and compares them to the skill, $S=G$, of Federer who is right-handed. Clearly the posterior skill distribution of Federer is to the right of Nadal’s, and the discrepancy between the two is even larger when the advantage of left-handedness is removed. This is not surprising since Federer’s ranking, according to BTL (as well as official year-end rankings), was higher than Nadal’s at the end of 2014. It may be tempting to infer from these posteriors that Federer has a high probability of beating Nadal in any given match (based on 2014 form). The match-play data, however, shows that Nadal and Federer played only once in 2014 with Nadal winning in the semi-final of the Australian Open.

Figure 3:

Posterior distribution of Federer’s skill minus Nadal’s skill with and without the advantage of left-handedness.

Figure 4:

Marginal posterior distribution of Federer’s and Nadal’s skill, with and without the advantage of left-handedness.

Table 2:

Probability of match-play results with and without the advantage of left-handedness.

While Nadal and Federer is probably the most interesting match-up for tennis fans, this match-up also points to one of the weaknesses of our model. Specifically, we do not allow for player interaction effects in determining win probabilities whereas it is well known that some players match up especially well against other players. Nadal, for example, is famous for matching up particularly well against Federer and has a 23-15 career head-to-head win/loss record17 against Federer despite Federer often having a superior record (to Nadal’s) against other players. For this reason (and others outlined in the introduction) we do acknowledge that inference regarding individual players should be conducted with care.

Table 2 extends the analysis of Federer and Nadal to other top ranked players. Each cell in the table provides the probability of the lower ranked player beating the higher ranked player according to their posterior skill distributions. Above the diagonal the advantage of left-handedness is included in the calculations whereas below the diagonal it is not. The effect on the winning probability due to left-handedness can be observed by comparing the values above and below the diagonal. For example Nadal has a 39.5% chance of beating Federer with the advantage of left-handedness included, but this drops to 31.8% when the advantage is excluded. The left-handed players are identified in bold font together with the match-play probabilities that change when the left-handed advantage is excluded, i.e. when left- and right-handed players meet. In all cases removing the advantage of left-handedness decreases the winning probability of left-handed players, although the magnitude of this effect varies on account of the non-linearity of the sigmoidal match-play probabilities in (17).

If the advantage of left-handedness was removed then the decrease in left-handers’ skills would lead to a change in their rankings. In Table 3, for example, we see how the ranking (as determined by posterior skill means) of the top four left-handed players changes when we remove the left-handed advantage. We also display how these rankings would change when we only use handedness data and external rankings as in Section 4, and when we only have aggregate handedness data of the top n players as in Section 2.2. We see that the change in rankings suggested by each of the methods largely agree, although there are some minor variations. Notably Nadal’s rank does not change when we use the full match-play data-set but he does drop from 4 to 5 when we use the other inference approaches. Klizan’s change in rank using the handedness and external rankings data is much smaller than for the other two methods. Overall, however, we see substantial agreement between the three approaches. This argues strongly for use of the simplest approach, i.e. the aggregate handedness approach of Section 2, when the only quantity of interest is the posterior distribution of L.

Table 3:

Changes in rank of prominent left-handed players when the left-handed advantage is excluded.

## 6 A dynamic model for L

Thus far we have only considered inference based on data collected over a single time period, but it is also interesting to investigate how the advantage of left-handedness has changed over time in a given sport. Towards this end, we assume that the advantage of left-handedness, Lt, in period t follows a Gaussian random walk for $t=1,…,T$. We also assume that Lt is latent and therefore unobserved. Instead, we observe the number of top players, nt, as well as the number, $ntl≤nt$, of those top players that are left-handed. The generative model is as follows:

• Initial left-handed advantage: $L0∼N(0,θ2)$

• For time periods $t=1,…,T$

• Left-handed advantage: $Lt∼N(Lt−1,σK2)$

• Number of top left-handers: $ntl∼p(ntl∣Lt;nt)$

where we assume θ2 is large to reflect initial uncertainty on L0 and σK controls how smoothly L varies over time. The posterior distribution of L given the data is then given by

$p(L0:T∣n1:Tl;n1:T)∝p(L0)∏t=1Tp(Lt∣Lt−1)p(ntl∣Lt;nt).$(31)

The main complexity in (31) stems from the distribution $p(ntl∣Lt;nt)$. This quantity is infeasible to compute exactly since it requires marginalizing out the player skills but as we have previously seen, it can be accurately approximated by the Laplace approximation in (11). Conveniently, using this approximation results in all of the factors in (31) being Gaussian and since Gaussians are closed under multiplication, the posterior of $L0:T$ becomes a multivariate Gaussian. Specifically, we have

$p(L0:T∣n1:Tl;n1:T)∼∝N(L0;0,θ2)∏t=1TN(Lt;Lt−1,σK2)N(Lt;Lt∗,b2ntntrntl)∝N(L0:T;μ0:T,Σ0:T)$(32)

where $Lt∗$ is as defined in (10), $μ0:T$ and $Σ0:T$ are the posterior mean and covariance matrix, respectively, of $L0:T$. Both $μ0:T$ and $Σ0:T$ are functions of $n1:Tl,n1:T,θ,σK2$, and can be evaluated analytically using standard Kalman filtering methods (Barber 2012, Sec. 24). A major advantage of the Kalman filter/smoother is that finding an appropriate smoothing parameter $σK2$ via maximum likelihood estimation (MLE) is computationally tractable. The likelihood of observing the handedness data under the generative model is:

$p(n1:Tl;n1:T,σK2)=∫ℝT+1p(n1:Tl,L0:T;n1:T,σK2)dL0:T=∫ℝT+1p(L0)∏t=1Tp(Lt∣Lt−1;σK2)p(ntl∣Lt;nt)dL0:T∼∝∫ℝT+1N(L0;0,θ2)∏t=1TN(Lt;Lt−1,σK2)N(Lt;Lt∗,b2ntntrntl)dL0:T$(33)

where the constant of proportionality coming from the Laplace approximation does not depend on $σK2$. Since all of the factors in (33) involving L are Gaussian, it is possible to analytically integrate out L leaving a closed form expression involving $σK2$. Maximizing this expression w.r.t. $σK2$ leads to a MLE for $σK2$ and the calculation details are provided in Appendix A.4. This value of $σK2$ can then be substituted into (32) to find the posterior of L.

In the top panel of Figure 5 we plot the fraction of left-handers among the top 100 mens tennis players as a function of year from 1985 to 2016 and using data from (Bačić and Gazala 2016). In the bottom panel we plot the inferred value of L over this time period. We note that in 2006 and 2007, the fraction of top left-handers dropped below 11%, the estimated fraction of left-handers in the general population. A naive analysis would conclude that for those years the advantage of left-handedness was negative. However, this would ignore the randomness in the fraction of top left-handers from year to year. The Kalman filter smoothes over the anomalous 2006 and 2007 years and has a posterior on L with positive mean throughout 1985 to 2016. We also recall our observation from the introduction where we noted that only 2 of the top 32 seeds in Wimbledon in 2015 were left-handed. There is no inconsistency between that observation and the data from 2015 in Figure 5, however. While there were indeed only 2 left-handed men among the top 32 in the official year-end world rankings, there was a total of 13 left-handers among the top 100.

Figure 5:

Posteriors of the left-handed advantage, Lt, computed using the Kalman filter/smoother. The dashed red horizontal lines (at 11% in the upper figure and 0 in the lower figure) correspond to the level at which there is no advantage to being left-handed. The error bars in the marginal posteriors of Lt in the lower figure correspond to 1 standard deviation errors.

Finally we note that we could also have included individual player skills as latent states in our model but this would have resulted in a much larger state space and made inference significantly18 more difficult. As we observed in Section 5.1, including match-play results does not change the posterior distribution of L significantly and so we are losing very little information regarding Lt when our model and inference is based only on the observed number of top left-handed players.

## 7 Conclusions and further research

In this paper we have proposed a model for identifying the advantage, L, of being left-handed in one-on-one interactive sports. We use a Bayesian latent ability framework but with the additional complication of having a latent factor, i.e. the advantage of left-handedness, that we needed to estimate. Our results argued for the use of a medium-tailed distribution such as the Laplace distribution when modeling the innate skills of players. We showed how to infer the value of L when only the proportion of top left-handed players is available. In the latter case we showed that the distribution of the number of left-handed players among the top n (out of N) converges as N → ∞ to a binomial distribution with a success probability that depends on the tail-length of the innate skill distribution. We also use this result to develop a simple dynamic model for inferring how the value of L has varied through time in a given sport. In order to estimate the innate skills of top players we also considered the case when match-play data among the top n players was available. We observed that including match-play data in our model makes little or no difference to the inference of the left-handedness advantage but it did allow us to address hypothetical questions regarding match-play win probabilities with and without the benefit of left-handedness.

It is worth noting that our framework is somewhat coarse by necessity. In tennis for example, there are important factors such as player ability varying across different surfaces (clay, hard court or grass) that we don’t model. We also attach equal weight to all matches in our model estimation despite the fact that some matches and tournaments are clearly (much) more important than others. Moreover, and as we shall see below, we assume (for a given sport) that there is a single latent variable, L, which measures the advantage of being left-handed in that sport. We therefore assume that the total skill of each left-handed player benefits to the same extent according to the value of L. This of course would not be true in practice as it seems likely that some lefties take better advantage of being left-handed than others. Alternatively, our model assumes that all righties are disadvantaged to the same extent by being right-handed. Again, it seems far more likely that some right-handed players are more adversely affected playing lefties than other right-handed players. Finally, we don’t allow for interaction effects between two players in determining the probability of one player beating the other. Again, this seems unlikely to be true in practice where some players are known to “match up well” against other players. Nonetheless, we do believe our model captures the value of being left-handed in an aggregate sense and can be reasonably interpreted in that manner. While it is tempting to use the model to answer questions such as “What is the probability that Federer would beat Nadal if Nadal was right-handed?” (and we do ask and answer such questions in Section 5!), we do acknowledge that the answers to such specific questions should be taken lightly for the reasons outlined above.

There are several directions of interest for future research. First, it would be interesting to apply our model to data-sets from other one-on-one sports and to estimate how Lt varies with time in these sports. The Kalman filtering/smoothing approach developed in Section 6 is straightforward to implement and the data requirements are very limited as we only need the aggregate data $(ntl,nt)$ for each time period. While trends in Lt across different sports would be interesting in their own right, the cross-sport dynamics of Lt could be used to shed light on the potential explanations behind the benefit of left-handedness. For example, there is some evidence in Figure 5 suggesting that the benefit of left-handedness in men’s professional tennis has decreased with time. If such a trend could be linked appropriately with other developments in men’s tennis such as the superior strength and speed of the players, superior racket and string technology, time pressure etc. then it may be possible to attach more or less weight to the various hypotheses explaining the benefit of left-handedness. The recent19 work of Loffing (Loffing 2017), for example, studies the link between the lefty advantage and time pressure in elite interactive sports. While these ideas are clearly in the literature already, the Kalman filtering approach provides a systematic, straightforward and consistent approach for measuring Lt. This can only aid with identifying the explanation(s) for the benefits of left-handedness and how it varies across sports and time.

It would also be of interest to consider more complex models that can also account for interactions in the skill levels between players and/or different match-play circumstances. As discussed above, examples of the latter would include distinguishing between surfaces (clay, grass etc.) and grand-slam/regular tour matches in tennis. Given the flexibility afforded by a Bayesian approach it should be straightforward to account for such features in our models. Given limited match-play data in many sports, however, it is not clear that we would be able to learn much about such features. In tennis, for example, even the very best players may only end up playing each other a couple of times a year or less. As mentioned earlier, Nadal and Federer only faced each other once in 2014. It would therefore be necessary to consider data-set spanning multiple years in which case it would presumably also be necessary to include form as well as the general trajectory of career arcs in our models. We note that such modeling might be of more general interest and identifying the value(s) of L might not be the main interest in such a study.

Continuing on from the previous point, there has been considerable interest in recent years in the so-called “interacting performances theory” O’Donoghue (2009). This theory recognizes that the performance (and outcome of a performance) is determined by both the skill level or quality of an opponent as well as the specific type of an opponent. Indeed, different players are influenced by the same opponent types in different ways. Under this theory, it is important20 to be able to identify different types of players. Once these types have been identified we can then label each player as being of a specific type. It may then be possible to accommodate interaction effects between specific players (as outlined in the paragraph immediately above) by instead allowing for player-type interactions. Such a model would require considerably fewer parameters to be estimated than a model which allowed for specific player-interactions.

Returning to the issue of the left-handedness advantage, we would like to adapt these models and apply them to other sports such as cricket and baseball where one-on-one situations still arise and indeed are the main aspect of the sport. It is well-known, for example, that left-handed pitchers in Major League Baseball (approx. 25%) and left-handed batsmen in elite cricket (approx 20%) are over-represented. While the model of Section 2 that only uses aggregate handedness could be directly applied to these sports, it would be necessary to adapt the match-play model of Section 3 to handle them. This follows because the one-on-one situations that occur in these sports do not have binary outcomes like win/lose, but instead have multiple possible outcomes whose probabilities would need to be linked to the skill levels of the two participants.

We hope to consider some of these alternative directions in future research.

## A.1 Proof of proposition 1

#### Proof

We begin by observing that the exchangeability of players implies

$p(nl∣L;n,N)=p(nl∣Topn,N,L;n).$(34)

Integrating (34) over the handedness of the top n players yields

$limN→∞p(nl∣L;n,N)=limN→∞∑h1:n∈{0,1}np(nl,H1:n=h1:n∣Topn,N,L;n)=∑h1:n∈{0,1}n:∑i=1nhi=nllimN→∞p(H1:n=h1:n∣Topn,N,L;n).$(35)

We can expand each term in the summation by conditioning on the top players’ skills,

$limN→∞p(H1:n=h1:n∣Topn,N,L;n)=limN→∞∫ℝnp(H1:n=h1:n∣S1:n,Topn,N,L)p(S1:n∣Topn,N,L)dS1:n=limN→∞∫ℝnp(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n.$(36)

Most of the proof will focus on showing that the r.h.s. of (36) equals $α(L)nl(1−α(L))nr$ where $α(L):=q/(q+(1−q)c(L))$, $nr:=n−nl$ is the number of top right-handers and $c(L)$ is the tail-length of the innate skill distribution as defined in the statement of the proposition. We will write $S[i]$ for the ith order statistic of the skills $S1:N$ and write $H[i]$ for the corresponding induced order statistic. Given that the innate skills have support ℝ we have $F(k∣L)<1$ for all $k∈ℝ$, where F denotes the conditional CDF of a player’s total skill given L. Note that for any values of k, L and ϵ > 0, we can find $Nk,L,ϵ∈ℕ$ such that $NnF(k∣L)N−n<ϵ$ for all $N≥Nk,L,ϵ$. It therefore follows that for such k, L, ϵ and N we have

$∫S[n]≤kp(S1:n∣Topn,N,L)dS1:n=∫S[n]≤k(Nn)F(S[n]∣L)N−n∏i=1nf(Si∣L)dS1:n≤(Nn)F(k∣L)N−n∫ℝn∏i=1nf(Si∣L)dS1:n≤NnF(k∣L)N−n<ϵ$(37)

where f denotes the PDF of F. The conditional distribution of player handedness in (36) factorizes as

$p(H1:n=h1:n∣S1:n,L)=∏i=1np(Hi=hi∣Si,L)$(38)

and consider now the ith term in this product. We have

$p(Hi=1∣Si=s,L)=p(Si=s∣Hi=1,L)p(Hi=1∣L)p(Si=s∣L)=g(s−L)qqg(s−L)+(1−q)g(s)=qq+(1−q)g(s)g(s−L)$(39)

Assuming $c(L):=lims→∞g(s)g(s−L)$ exists as stated in the proposition we can take limits across (39) to obtain

$lims→∞p(Hi=1∣Si=s,L)=qq+(1−q)c(L)$

which we recognize as $α(L)$ which we defined above. A similar argument for the case Hi = 0 yields $lims→∞p(Hi=0∣Si=s,L)=1−α(L)$. Since the limit of a finite product of functions, each having a finite limit, is equal to the product of the limits, it therefore follows that for all ϵ > 0 there exists a $kϵ∈ℝ$ such that if $si>kϵ$ for $i=1,…,n$ then

$ϵ>|p(H1:n=h1:n∣S1:n=s1:n,L)−∏i=1nα(L)hi(1−α(L))1−hi|=|p(H1:n=h1:n∣S1:n=s1:n,L)−α(L)nl(1−α(L))nr|.$(40)

We are now in a position to prove that

$limN→∞∫ℝnp(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n=α(L)nl(1−α(L))nr.$(41)

For any ϵ > 0, for all $N>Nkϵ/3,L,ϵ/3$ we have

$|α(L)nl(1−α(L))nr−∫ℝnp(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n|≤|α(L)nl(1−α(L))nr−∫S[n]≥kϵ/3p(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n|+∫S[n]≤kϵ/3p(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n.$(42)

Observe that

$∫S[n]≥kϵ/3p(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n≤∫S[n]≥kϵ/3(α(L)nl(1−α(L))nr+ϵ3)p(S1:n∣Topn,N,L)dS1:n≤α(L)nl(1−α(L))nr+ϵ3$(43)

where the first inequality follows from (40). Similarly, its minimum value is bounded by

$∫S[n]≥kϵ/3p(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n≥∫S[n]≥kϵ/3(α(L)nl(1−α(L))nr−ϵ3)p(S1:n∣Topn,N,L)dS1:n=(α(L)nl(1−α(L))nr−ϵ3)(1−∫S[n]≤kϵ/3p(S1:n∣Topn,N,L)dS1:n)≥(α(L)nl(1−α(L))nr−ϵ3)(1−ϵ3)≥(α(L)nl(1−α(L))nr−ϵ3)−ϵ3(1−ϵ3)≥α(L)nl(1−α(L))nr−23ϵ$(44)

where the first inequality follows from (40) and the second inequality follows from (37). Combining the upper and lower bounds of (43) and (44) implies that the first term on the r.h.s. of (42) is bounded above by $2ϵ/3$. The second term on the r.h.s. of (42) satisfies

$∫S[n]≤kϵ/3p(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n≤∫S[n]≤kϵ/3p(S1:n∣Topn,N,L)dS1:n≤ϵ/3.$

We can therefore rewrite the bound in (42) as

$|α(L)nl(1−α(L))nr−∫ℝnp(H1:n=h1:n∣S1:n,L)p(S1:n∣Topn,N,L)dS1:n|≤23ϵ+13ϵ=ϵ$

completing the proof of (41). Substituting (41) into (36) and (36) into (35) yields

$limN→∞p(nl∣L;n,N)=∑h1:n∈{0,1}n:∑i=1nhi=nlα(L)nl(1−α(L))nr=(nnl)α(L)nl(1−α(L))nr$

so the limiting distribution is $Binomial(nl;α(L),n)$ as desired. □

## A.2 Rate of convergence for probability of left-handed top player given normal innate skills

Throughout this subsection we use $ϕ(⋅)$ and $Φ(⋅)$ to denote the PDF and CDF, respectively, of a standard normal random variable. We also assume that the advantage of left-handedness, L, is a strictly positive and known constant. The notation $S[1]$ is used to indicate the maximum of N IID variables $S1,…,SN$ with $H[1]$ being the induced order statistic corresponding to $S[1]$ (see (David and Nagaraja 2003, Ch. 6.8) for more background on induced order statistics).

#### Proposition 2

If the innate skills are IID standard normal and the advantage of left-handedness, L, is strictly positive, then $p(H[1]=0∣L)=Ω(exp(−Llog⁡N))$.

#### Proof

Since L > 0 is assumed known we will typically not bother to condition on it explicitly in the arguments below. This means the expectations that appear below are never over L. We begin by observing that

$p(H[1]=0∣L)=𝔼S[1][p(H[1]=0∣S[1])]=𝔼[ϕ(S[1])(1−q)qϕ(S[1]−L)+(1−q)ϕ(S[1])]=𝔼[1exp(S[1]L)exp(−L2/2)q1−q+1]$(45)

where the second equality follows from precisely the same argument that we used to derive (40). Lemma 1 below implies that we can replace $S[1]$ in (45) with $X[1]+L$ where $X[1]$ is the maximum of N IID standard normal random variables and with the equality replaced by a greater-than-or-equal to inequality. We therefore obtain

$p(H[1]=0∣L)≥𝔼[1exp((X[1]+L)L)exp(−L2/2)q1−q+1]=𝔼[1exp(LX[1])/(2c1)+1]$(46)

where $c1:=exp(−L2/2)1−q2q>0$. A little algebra shows that the denominator in (46) is less than or equal to $exp(LX[1])/c1$ if $X[1]≥c2:=ln(2c1)/L$ and it is less than 2 otherwise. Thus we have

$p(H[1]=0∣L)≥𝔼[1exp(LX[1])/c1I{X[1]≥c2}+12I{X[1](47)

We can now complete the proof by applying the results of Lemma 2 and Lemma 3 below beginning with (47). Specifically, we have

$p(H[1]=0∣L)≥c1𝔼[exp(−LX[1])I{X[1]≥c2}]≥c1𝔼[exp(−LX[1])]−c1c3NΦ(c2)N−1≥c1exp(−L𝔼[X[1]])−c1c3NΦ(c2)N−1≥c1exp(−L(2logN))−c1c3NΦ(c2)N−1=Ω(exp(−LlogN))$

where the second inequality follows from Lemma 2, the third inequality follows from Jensen’s inequality, the fourth follows from Lemma 3 (a standard result that bounds the maximum of IID standard normal random variables) and $c3:=𝔼[exp(−LX)I{X≤c2}]$. □

As stated earlier, in each of the following Lemmas it is assumed that $X1,…,XN$ are IID standard normal and $X[1]$ is their maximum.

#### Lemma 1

Define the function

$d(s):=(exp(sL)exp(−L2/2)q1−q+1)−1.$

If the advantage of left-handedness, L, is strictly positive then $E⁢[d⁢(S[1])]≥E⁢[d⁢(X[1]+L)]$.

#### Proof

We first recall the CDF of the total skill, S, is given by $F(s)=qΦ(s−L)+(1−q)Φ(s)$ and that $F(s)≥Φ(s−L)$ for all s if L > 0. We then obtain

$𝔼[d(S[1])]−𝔼[d(X[1]+L)]=∫−∞∞d(s)NF(s)N−1f(s)ds−∫−∞∞d(x+L)NΦ(x)N−1ϕ(x)dx=∫−∞∞d(s)NF(s)N−1f(s)ds−∫−∞∞d(s)NΦ(s−L)N−1ϕ(s−L)ds=∫−∞∞d(s)[NF(s)N−1f(s)−NΦ(s−L)N−1ϕ(s−L)]ds=d⁢(s)⁢[F⁢(s)N−Φ⁢(s−L)N]|−∞∞⏟=0−∫−∞∞∂⁡d⁢(s)∂⁡s⏟<0[F⁢(s)N−Φ⁢(s−L)N]⏟>0ds>0$

where the second to last line follows from integration by parts, and the last line follows because for any L > 0 we have (i) $d(s)$ is a strictly monotonically decreasing function of s and (ii) $F(s)N>Φ(s−L)N$. □

#### Lemma 2

For any constants L and c, we have

$𝔼[exp(−LX[1])I{X[1]≥c}]≥𝔼[exp(−LX[1])]−aNΦ(c)N−1$

where $a:=E⁢[exp⁡(−L⁢X)⁢I{X≤c}]$ where X is a standard normal random variable.

#### Proof

We first note that $Φ(x)≤Φ(c)$ for all $x from which it immediately follows that

$Φ(x)N−1Nϕ(x)exp(−Lx)≤Φ(c)N−1Nϕ(x)exp(−Lx).$(48)

Integrating both sides of (48) w.r.t. x from −∞ to c yields

$∫−∞cΦ(x)N−1Nϕ(x)exp(−Lx)dx≤∫−∞c2Φ(c)N−1Nϕ(x)exp(−Lx)dx$

from which we obtain

$𝔼[exp(−LX[1])I{X[1]≤c}]≤NΦ(c)N−1𝔼[exp(−LX)I{X≤c}]$(49)

where X is a standard normal random variable. The statement of the Lemma now follows by noting

$𝔼[exp(−LX[1])I{X[1])≥c}]=𝔼[exp(−LX[1]))]−𝔼[exp(−LX[1])I{X[1]

where the inequality follows from (49). □

The following Lemma is well known but we include it here for the sake of completeness.

#### Lemma 3

$𝔼[X[1]]≤2logN$.

#### Proof

The proof follows from a simple application of Jensen’s Inequality and the fact that the sum of non-negative variables is larger than the maximum of those variables. For any constant $β∈ℝ$ we have

$𝔼[X[1]]=1βlog(exp(𝔼[βX[1]]))≤1βlog(𝔼[exp(βX[1])])≤1βlog(𝔼[∑i=1Nexp(βXi)])=1βlog(N𝔼[exp(βX)])=1βlog(Nexp(β2/2))≤2logN.$

where we chose $β=2logN$ to obtain the final inequality. □

## A.3 Difference in skills at given quantiles with Laplace distributed innate skills

#### Proposition 3

If the innate skills are IID Laplace distributed with mean 0 and scale parameter b > 0, then for any fixed finite L and sufficiently small quantiles $λi,λj∈(0,1)$

$limN→∞(S[Nλj]−S[Nλi])=blog(λi/λj)$

where the convergence is in probability and we use $S[N⁢λi]$ to denote21 the [Nλi]th order statistic of the skills $S1,…,SN$.

#### Proof

Since the innate skills are Laplace distributed with mean 0 and scale b > 0 they have the CDF

$G(S)={12exp(S/b)if S<01−12exp(−S/b)if S≥0$(50)

while the total skill distribution for someone from the mixed population of left- and right-handers has CDF $F(S∣L)=qG(S−L)+(1−q)G(S)$. For skills $S≥max{L,0}$ it therefore follows from (50) that

$F(S∣L)=q(1−12exp(−(S−L)/b))+(1−q)(1−12exp(−S/b))=1−12exp(−S/b)[qexp(L/b)+1−q].$(51)

Since the Laplace distribution has domain ℝ, for any fixed finite L we have $limλ↓0F−1(1−λ∣L)=∞$. Thus for sufficiently small quantiles $λ∈(0,1)$ we have $F−1(1−λ∣L)>max{L,0}$. Consider such a value of λ. Since $F(F−1(1−λ∣L)∣L)=1−λ$ it that

$λ=1−F(F−1(1−λ∣L)∣L)=12exp(−F−1(1−λ∣L)/b)[qexp(L/b)+1−q]$(52)

where the second equality follows from (51) with $S=F−1(1−λ∣L)$. Simplifying (52) now yields

$F−1(1−λ∣L)=−blog(λ)+blog([qexp(L/b)+1−q]/2).$(53)

From (David and Nagaraja 2003, pp. 288) we know that

$limN→∞(S[Nλj]−S[Nλi])=F−1(1−λj∣L)−F−1(1−λi∣L)$(54)

where convergence is understood to be in probability. Substituting (53) into (54) then yields (for λi and λj sufficiently small)

$limN→∞(S[Nλj]−S[Nλi])=−blog(λj)+blog([qexp(L/b)+1−q]/2)−(−blog(λi)+blog([qexp(L/b)+1−q]/2))=blog(λi/λj)$

as claimed. □

## A.4 Estimating the Kalman filtering smoothing parameter

Here we explain how to compute the MLE for $σK2$ as discussed in Section 6 where we developed a Kalman filter/smoother for estimating the lefty advantage Lt through time. The likelihood of the observed handedness data over the time interval $t=1,…,T$ is given in (33) and satisfies

$p(n1:Tl;n1:T,σK2)∼∝∫ℝT+1N(L0;0,θ2)∏t=1TN(Lt;Lt−1,σK2)N(Lt;Lt∗,σt2)dL0:T$

where $σt2:=b2ntntrntl$. To perform MLE over σK we first simplify the likelihood, keeping only factors involving σK.

We therefore obtain

$p(n1:Tl;n1:T,σK2)∼∝∫ℝT+112πθexp(−L022θ2)∏t=1T12πσKexp(−(Lt−Lt−1)22σK2)12πσtexp(−(Lt−Lt∗)22σt2)dL0:T∝σK−T∫ℝT+1exp(−L022θ2−∑t=1T(Lt−Lt−1)22σK2−(Lt−Lt∗)22σt2)dL0:T∝σK−T∫ℝT+1exp(−12(L02θ2+∑t=1TLt2−2LtLt−1+Lt−12σK2+Lt2−2LtLt∗σt2))dL0:T=σK−T∫ℝT+1exp(−12(L02(θ−2+σK−2)−LT2σK−2+∑t=1TLt2(2σK−2+σt−2)−2σK−2LtLt−1−2σt−2LtLt∗))dL0:T=σK−T∫ℝT+1exp(−12L0:T⊤Σ−1L0:T+v⊤L0:T)dL0:T$(55)

where $Σ−1∈ℝ(T+1)×(T+1)$ is a symmetric tri-diagonal positive semi-definite matrix with entries

$Σ0,0−1=θ−2+σK−2Σt,t−1=2σK−2+σt−2 for 0

and $v∈ℝT+1$ is a vector with entries $v0=0$ and $vt=Lt∗σt−2$ for $t=1,…,T$. We can integrate out $L0:T$ by appropriately normalizing the Gaussian exponential in (55). In particular, we have

$p(n1:Tl;n1:T,σK2)∼∝σK−T∫ℝT+1exp(−12(L0:T−Σv)⊤Σ−1(L0:T−Σv)+12v⊤Σv)dL0:T∝σK−Texp(12v⊤Σv)|Σ|.$(56)

The expression in (56) can then be evaluated numerically for any value of σK. (Note that Σ and therefore the $|Σ|$ term in (56) depends on σK so an explicit solution for the MLE of σK is unlikely to be available.)

## References

• Abrams, D. M. and M. J. Panaggio. 2012. “A Model Balancing Cooperation and Competition can Explain our Right-Handed World and the Dominance of Left-Handed Athletes.” Journal of the Royal Society Interface 9:2718–2722.

• Aggleton, J. P. and C. J. Wood. 1990. “Is There a Left-Handed Advantage in ’Ballistic’ Sports?” International Journal of Sport Psychology 21:46–57. Google Scholar

• Akpinar, S., R. L. Sainburg, S. Kirazci, and A. Przybyla. 2015. “Motor Asymmetry in Elite Fencers.” Journal of Motor Behavior, 47:302–311.

• Bačić, B. and A. H. Gazala. 2016. “Left-Handed Representation in top 100 Male Professional Tennis Players: Multi-Disciplinary Perspectives.” http://tmg.aut.ac.nz/tmnz2016/papers/Boris2016.pdf, accessed: 2017-06-15.

• Barber, D. 2012. Bayesian Reasoning and Machine Learning. New York, NY, USA: Cambridge University Press. Google Scholar

• Billiard, S., C. Faurie, and M. Raymond. 2005. “Maintenance of Handedness Polymorphism in Humans: A Frequency-Dependent Selection Model.” Journal of Theoretical Biology 235:85–93.

• Bisiacchi, P. S., H. Ripoll, J. F. Stein, P. Simonet, and G. Azemar. 1985. “Left-Handedness in Fencers: An Attentional Advantage?” Perceptual and Motor Skills 61:507–513.

• Bradley, R. A. and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39:324–345. http://www.jstor.org/stable/2334029

• Breznik, K. 2013. “On the Gender Effects of Handedness in Professional Tennis.” Journal of Sports Science & Medicine 12:346. Google Scholar

• Cui, Y., M.-Á. Gómez, B. Gonçalves, H. Liu, and J. Sampaio. 2017a. “Effects of Experience and Relative Quality in Tennis Match Performance During Four Grand Slams.” International Journal of Performance Analysis in Sport 17:783–801.

• Cui, Y., M.-Á. Gómez, B. Gonçalves, and J. Sampaio. 2017b. “Identifying Different Tennis Player Types: An Exploratory Approach to Interpret Performance Based on Player Features.” in Complex Systems in Sport, International Congress Linking Theory and Practice 97. Google Scholar

• Daems, A. and K. Verfaillie. 1999. “Viewpoint-Dependent Priming Effects in the Perception of Human Actions and Body Postures.” Visual Cognition 6:665–693.

• David, H. A. and H. N. Nagaraja. 2003. Order Statistics. 3rd ed. Hoboken, N.J: Wiley-Interscience. Google Scholar

• Del Corral, J. and J. Prieto-Rodríguez. 2010. “Are Differences in Ranks Good Predictors for Grand Slam Tennis Matches?” International Journal of Forecasting 26:551–563.

• Dochtermann, N. A., C. Gienger, and S. Zappettini. 2014. “Born to Win? Maybe, but Perhaps only Against Inferior Competition.” Animal Behaviour 96:e1–e3.

• Elo, A. E. 1978. The Rating of Chessplayers, Past and Present. Batsford: Arco Pub. Google Scholar

• Fischer, G. H. and I. W. Molenaar. 2012. Rasch Models: Foundations, Recent Developments, and Applications. Springer-Verlag, NY, USA: Springer Science & Business Media. Google Scholar

• Flatt, A. E. 2008. “Is Being Left-Handed a Handicap? The Short and Useless Answer is “Yes and No.” Proceedings of Baylor University Medical Center 21:304–307.

• Gelman, A. and D. B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7:457–472.

• Geschwind, N. and A. M. Galaburda. 1985. “Cerebral Lateralization: Biological Mechanisms, Associations, and Pathology: I. A Hypothesis and a Program for Research.” Archives of Neurology 42:428–459.

• Ghirlanda, S., E. Frasnelli, and G. Vallortigara. 2009. “Intraspecific Competition and Coordination in the Evolution of Lateralization.” Philosophical Transactions of the Royal Society of London B: Biological Sciences 364:861–866.

• Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Applied Statistics 48:377–394. Google Scholar

• Goldstein, S. R. and C. A. Young. 1996. “Evolutionary” Stable Strategy of Handedness in Major League Baseball.” Journal of Comparative Psychology 110:164–169.

• Grossman, E., M. Donnelly, R. Price, D. Pickens, V. Morgan, G. Neighbor, and R. Blake. 2000. “Brain Areas Involved in Perception of Biological Motion.” Journal of Cognitive Neuroscience 12:711–720.

• Grouios, G., H. Tsorbatzoudis, K. Alexandris, and V. Barkoukis. 2000a. “Do Left-Handed Competitors have an Innate Superiority in Sports?” Perceptual and Motor Skills 90:1273–1282.

• Grouios, G., H. Tsorbatzoudis, K. Alexandris, and V. Barkoukis. 2000b. “Do Left-Handed Competitors have an Innate Superiority in Sports?” Perceptual and Motor Skills 90:1273–1282.

• Gursoy, R. 2009. “Effects of Left-or Right-Hand Preference on the Success of boxers in Turkey.” British Journal of Sports Medicine 43:142–144. Google Scholar

• Herbrich, R., T. Minka, and T. Graepel. 2007. “TrueskillTM: A Bayesian Skill Rating System.” in Advances in Neural Information Processing Systems, 569–576. Google Scholar

• Holtzen, D. W. 2000. “Handedness and Professional Tennis.” International Journal of Neuroscience 105:101–119.

• Liew, J. 2015. “Wimbledon 2015: Once they were Great – but where have all the Lefties Gone?”. The Telegraph, June 27, 2015. https://www.telegraph.co.uk/sport/tennis/wimbledon/11703777/Wimbledon-2015-Once-they-were-great-but-where-have-all-the-lefties-gone.html

• Linderman, S., M. Johnson, and R. P. Adams. 2015. “Dependent Multinomial Models Made Easy: Stick-Breaking with the pólya-Gamma Augmentation.” in Advances in Neural Information Processing Systems, 3456–3464. Google Scholar

• Loffing, F. 2017. “Left-Handedness and Time Pressure in Elite Interactive Ball Games.” Biology Letters 13. DOI: 10.1098/rsbl.2017.0446. Google Scholar

• Loffing, F. and N. Hagemann. 2015. “Pushing Through Evolution? Incidence and Fight Records of Left-Oriented Fighters in Professional Boxing History.” Laterality: Asymmetries of Body, Brain and Cognition 20:270–286.

• Loffing, F. and N. Hagemann. 2016. “Chapter 12 – Performance Differences between Left- and Right-Sided Athletes in One-on-One Interactive Sports.” in Laterality in Sports, edited by F. Loffing, N. Hagemann, B. Strauss, and C. MacMahon, pp. 249–277. San Diego: Academic Press. https://www.sciencedirect.com/science/article/pii/B9780128014264000122

• Loffing, F., N. Hagemann, and B. Strauss. 2012a. “Left-Handedness in Professional and Amateur Tennis.” PLoS One 7. DOI: 10.1371/journal.pone.0049325. Google Scholar

• Loffing, F., N. Hagemann, and B. Strauss. 2012b. “Left-Handedness in Professional and Amateur Tennis.” PLoS One 7:e49325.

• Luce, D. R. 1959. Individual Choice Behavior: A Theoretical Analysis. Courier Corporation, New York: Wiley Publishing. Google Scholar

• Mosteller, F. 1951. “Remarks on the Method of Paired Comparisons: I. The Least Squares Solution Assuming Equal Standard Deviations and Equal Correlations.” Psychometrika 16:3–9. http://dx.doi.org/10.1007/BF02313422Crossref

• Murphy, K. P. 2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT press. Google Scholar

• Murray, I. and R. P. Adams. 2010. “Slice Sampling Covariance Hyperparameters of Latent Gaussian Models.” in Advances in Neural Information Processing Systems, 1732–1740. Google Scholar

• Murray, I., R. P. Adams, and D. Mackay. 2010. “Elliptical Slice Sampling.” in International Conference on Artificial Intelligence and Statistics, 541–548. Google Scholar

• Nass, R. and M. Gazzaniga. 1987. “Cerebral Lateralization and Specialization in Human Central Nervous System.” Handbook of Physiology. Vol. 5. New York: Oxford University Press. https://doi.org/10.1002/cphy.cp010518

• Neal, R. M. 2011. “MCMC using Hamiltonian Dynamics.” Handbook of Markov Chain Monte Carlo 2:113–162. Google Scholar

• O’Donoghue, P. 2005. “Normative Profiles of Sports Performance.” International Journal of Performance Analysis in Sport 5:104–119.

• O’Donoghue, P. 2009. “Interacting Performances Theory.” International Journal of Performance Analysis in Sport 9:26–46.

• Raymond, M., D. Pontier, A.-B. Dufour, and A. P. Moller. 1996. “Frequency-Dependent Maintenance of Left Handedness in Humans.” Proceedings of the Royal Society of London B: Biological Sciences 263:1627–1633.

• Roberts, G. O., A. Gelman, W. R. Gilks. 1997. “Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms.” The Annals of Applied Probability 7:110–120.

• Roberts, G. O., J. S. Rosenthal. 2001. “Optimal Scaling for Various Metropolis-Hastings Algorithms.” Statistical Science 16:351–367.

• Schorer, J., F. Loffing, N. Hagemann, and J. Baker. 2012. “Human Handedness in Interactive Situations: Negative perceptual Frequency Effects Can be Reversed!” Journal of Sports Sciences 30:507–513.

• Stefani, R. T. 1997. “Survey of the Major World Sports Rating Systems.” Journal of Applied Statistics 24:635–646.

• Sterkowicz, S., G. Lech, and J. Blecharz. 2010. “Effects of Laterality on the Technical/Tactical Behavior in View of the Results of Judo Fights.” Archives of Budo 6:173–177. Google Scholar

• Stone, J. V. 1999. “Object Recognition: View-Specificity and Motion-Specificity.” Vision Research 39:4032–4044.

• Taddei, F., M. P. Viggiano, and L. Mecacci. 1991. “Pattern Reversal Visual Evoked Potentials in Fencers.” International Journal of Psychophysiology 11:257–260.

• Tennis Navigator. 2004. “Tennis Software – Tennis Navigator.” http://www.tennisnavigator.com/, accessed: 2015-02-03.

• Tennis Europe. 2015. “About Tennis Europe”. https://www.tenniseurope.org/page/12173, accessed: 2017-01-01.

• The Physical Activity Council. 2016. “2016 Participation Report The Physical Activity Council’s Annual Study Tracking Sports, Fitness, and Recreation Participation in the US.” https://cdn4.sportngin.com/attachments/document/0112/0253/2016_Physical_Activity_Council_Report.pdf

• Thurstone, L. L. 1927. “A Law of Comparative Judgment.” Psychological Review 34:273–286.

• Ullén, F., D. Z. Hambrick, and M. A. Mosing. 2016. “Rethinking Expertise: A Multifactorial Gene–Environment Interaction Model of Expert Performance.” Psychological Bulletin 142:427–446.

• Witelson, S. F. 1985. “The Brain Connection: The Corpus Callosum is Larger in Left-Handers.” Science 229:665–668.

• Wood, C. and J. Aggleton. 1989. “Handedness in Fast Ball Sports: Do Lefthanders have an Innate Advantage?” British Journal of Psychology 80:227–240.

• Yin, S. 2017. Do Lefties have an Advantage in Sports? It Depends. The New York Times. https://www.nytimes.com/2017/11/21/science/lefties-sports-advantage.html

## Footnotes

• 1

It should be noted, however, that most of these left-handed golfers play right-handed and so being left-handed and playing left-handed are not the same.

• 2

We do note, however, that there are other hypotheses for explaining the high proportion of lefties in elite sports. They include higher testosterone levels, personality traits, psychological advantages and early childhood selection. See (Loffing and Hagemann 2016) and the references therein.

• 3

See also the related literature on item response theory (IRT) from psychometrics and the Rasch model (Fischer and Molenaar 2012) which is closely related to the BTL model below.

• 4

Throughout this paper we will use the term “innate skill” to refer to all components of a player’s “skill” apart from the advantage/disadvantage associated with being left-handed. We acknowledge that the term “innate” may be quite misleading – see the discussion below – but will continue with it nonetheless for want of a better term.

• 5

We have in mind that N is the total number of players in the world who can play at a good amateur level or above. In tennis, for example, a good amateur level might be the level of varsity players or strong club players. N will obviously vary by sport but what we have in mind is that the player level should be good enough to take advantage of the left-handedness advantage (to the extent that it exists).

• 6

In defense of the normal distribution, we show in Appendix A.2 that a normally distributed G may in fact be a suitable choice if N is not too large. Though $p(H1=1∣L,S1≥max{S2:N})→1$ as N → ∞ for normally distributed skills, the rate of convergence is only $1−Ω(exp(−LlogN))$. This convergence is sufficiently slow for the normal skill distribution to be somewhat reasonable for moderate values of N. However, the result of Proposition 1 suggests that a normal skill distribution would be inappropriate for very large values of N.

• 7

We will use $b=1/2$ w.l.o.g. since alternative values of b could be accommodated via changes in σL and σM.

• 8

As a sanity check, suppose that $nl/n=q$. Then it follows from (10) that $L*=0$, as expected.

• 9

In each case, we normalized so that the function integrated to 1.

• 10

There is a slight abuse of notation here since we first need to round i to the nearest integer.

• 11

Note that the proportion of left-handers can vary from country to country for various reasons including cultural factors etc. It is therefore difficult to pin down exactly but a value ≈ 11% seems reasonable.

• 12

It is only through the act of conditioning on $Topn,N$ that we can infer something about the rankings of these players.

• 13

The value of λ = 2.38 is optimal under certain conditions; see (Roberts and Rosenthal 2001). We also divide the covariance matrix by the dimension, n, to help counteract the the curse of dimensionality which makes proposing a good point more difficult as the dimension of the state-space increases.

• 14

In the numerical results of Section 5, the MH proposal distribution for L will be a Gaussian with mean equal to the current value of L and where the variance is set during a tuning phase to obtain an acceptance probability ≈ 0.234. (This value is theoretically optimal under certain conditions; see (Roberts, Gelman, and Gilks 1997)).

• 15

Otherwise there may be players that have played and won only one match who are impossible to meaningfully rank.

• 16

It is of interest to note that although Nadal plays left-handed and has done so from a very young age, he is in fact right-handed. Therefore to the extent that the ISH holds, then Nadal may not actually be benefitting from L. In contrast, to the extent the NFDS mechanism is responsible for the lefty advantage, then Nadal should be benefitting from L.

• 17

The 23-15 record is as of writing this article although it should be noted that Federer has won their last 5 encounters. It is also interesting to note that Nadal’s advantage is explained entirely by their clay-court results where he has a 13-2 head-to-head win/loss edge.

• 18

Recent techniques have been developed for MCMC on multinomial linear dynamical systems (Linderman, Johnson, and Adams 2015) that significantly improve the efficiency of inference of large state space models. However these methods will still be orders of magnitude slower than performing inference on the reduced state space containing only L.

• 19

This paper was also discussed in a recent New York Times article (Yin 2017) reflecting the general interest in the lefty advantage beyond academia.

• 20

See for example Cui et al. (2017a), Cui et al. (2017b) and O’Donoghue (2005), all of which discuss techniques for identifying different types of tennis players.

• 21

Here [⋅] denotes rounding to the nearest integer.

Accepted: 2018-08-15

Published Online: 2018-09-15

Published in Print: 2019-02-25

Citation Information: Journal of Quantitative Analysis in Sports, Volume 15, Issue 1, Pages 1–25, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388,

Export Citation

©2019 Walter de Gruyter GmbH, Berlin/Boston.