Two symmetric and computationally efficient Gini correlations

,


Introduction
Measuring the strength of association and correlation between two random variables is of essential importance in many research elds.Many notions of correlations have been proposed and studied [16,21].Perhaps the most commonly used one is Pearson's correlation coe cient which measures the linear relationship between two random variables.Pearson's correlation is computationally e cient with a computation cost of O(n) where n is the sample size.It is the most statistically e cient one for normal variables; however, it is very sensitive to outliers.Even one single outlier might have a large impact on the coe cient's value and its performance [36,37].An important tool to study robustness is the in uence function, which measures e ects due to in nitesimal perturbations of the underlying distribution [13].It has been proven that the Pearson correlation has an unbounded in uence function, indicating its lack of robustness [5].
Alternatively, rank based correlations such as Spearman and Kendall's tau are robust to outliers.Kendall's tau is a similarity measure of the ranks of two random variables [17] and Spearman's correlation is the Pearson correlation coe cient evaluated on the ranks of the two variables [39].Both values are widely used for measuring monotonic relationships.They can be computed e ciently at a cost of O(n log n) [18], and their in uence functions are bounded [3].The tradeo to robustness is a loss of statistical e ciency in normal settings.For the correlation parameter ρ = ., ., . in the normal distribution, the asymptotic relative e ciencies (ARE) of Kendall's tau to the Pearson correlation are about 91%, 89% and 84%, respectively, while the ARE of the Spearman correlation are even lower [3].
Standard Gini correlations [1] are based on the covariance between one variable and the rank of the other.More speci cally, let H be the joint distribution of the random variables X and Y, and let F and G be the marginal distribution functions of X and Y, respectively.The standard Gini correlations are de ned as γ = γ(X, Y) := cov(X, G(Y)) cov(X, F(X)) and γ = γ(Y , X) := cov(Y , F(X)) cov(Y , G(Y)) (1) re ecting di erent roles of X and Y.The representation of the Gini correlations indicates that they have mixed properties of those of the Pearson and Spearman correlations [39].As expected, the statistical e ciency and robustness of Gini correlations are between those of Pearson and Spearman correlations.In terms of balance between e ciency and robustness, Gini correlations play an important role in measuring association for variables from heavy-tailed distributions [43].The Gini correlations are computationally e cient and can be computed at a cost of O(n log n) [31].They are not symmetric in X and Y in general [31,32], i.e., γ(X, Y) ≠ γ(Y , X).
In some applications, this asymmetry is natural and useful [9,12,33].In other scenarios, symmetry is a desired property for dependence measures.Some researchers [21,27] even list symmetry as one of the axioms of association measures.A symmetric Gini correlation was proposed in [4,28], which is based on the joint rank function.It is more statistically e cient than the standard Gini correlations, but it is not computationally e cient with O(n ) complexity, which means it is prohibitive for large n.Yitzhaki and Olkin [42] proposed two symmetric Gini correlations which are the arithmetic mean and geometric mean of the standard Gini correlations, respectively.r ( ) g = r ( ) g (X, Y) := γ + γ and r ( ) Clearly those symmetric Gini correlations inherit the computational e ciency of O(n log n).However, they have not been well studied in literature except that Xu et al. [41] studied r ( ) g under the normal settings.In this paper, we systematically study the properties of these two symmetric Gini correlations and explore their statistical e ciency.Their robustness is studied by means of their in uence functions.The limiting distributions of sample symmetric Gini correlations are established.It is interesting to see that there are three kinds of asymptotical sampling distribution of the sample correlation, r( ) g , depending on di erent cases of r ( ) g .To our best knowledge, this is a novel result and can be applied to the geometric mean type of statistics such as the symmetrized information dependence measure de ned in [26].
It is worthwhile to mention that the Gini correlations in (1) and the symmetric versions in (2) are quite di erent from the Gini gamma or Gini coe cient [10,24], although the names are very similar.Gini correlation γ in (1) is a natural bivariate extension of univariate Gini mean di erence (GMD) from the covariance representation GMD(F) = E|X − X | = Cov(X, F(X)), where X , X are independent copies of X from F. The Gini gamma was proposed by Gini [11].Related to the Spearman correlation in a di erent way, the Gini gamma is a concordance measure which is de ned based on both ranks of X and Y.It is easy to check that the Gini gamma follows all axioms of concordance stated in [30].However, neither r ( ) g nor r ( ) g is a concordance measure, and neither hold to the coherence axiom.
The paper is organized as follows.In Section 2 we provide properties of r ( ) g and r ( ) g .Their in uence functions are presented in Section 3. The limiting distributions of sample correlations are established in Section 4. Statistical e ciency and computational e ciency of various correlations are compared in Subsection 4.2 and their nite sample performance comparison is conducted through a simulation study on elliptical distributions and an asymmetric bivariate log-normal distribution in Section 5. A real data application on the relationship between GDP per capita and suicide rate is presented in Section 6. Final remarks are provided in Section 7. Proofs are relegated to the Appendix.

Two symmetric Gini correlations
Basic properties of the two symmetric Gini correlations r ( )  g and r ( ) g in ( 2) are explored.Their relationships with the linear correlation parameter, ρ, in bivariate elliptical distributions and log-normal distributions are presented.

. General properties
Let X and Y be two random variables from F and G, respectively, with the joint distribution H. Proposition 2.1.Assume that H is continuous and its rst moment exists, then we have If X and Y are statistically independent, then r ( )  g (X, Y) = .4. If Y is a monotonic increasing (decreasing) function of X, then r ( )  g (X, Y) equals + (− ).The symmetry of r ( ) g and r ( ) g is obvious noting the commutative property of addition and multiplication.Properties -in the above two propositions follow simply from the properties of the original Gini correlations γ and γ , shown by [31].Property 5 states that the two symmetric Gini correlations describe a linear relationship between X and Y.
Note that we assume continuous H in Propositions (2.1) and (2.2).If H is not continuous, some revisions on de nitions in γ and γ are needed for general properties.For example, replacing F(x) with (F(x) + F(x−))/ and G(x) with (G(x) + G(x−))/ in (1) keeps γ and γ in the range [− , ].For simplicity, the continuous distribution is assumed throughout the paper.
Before we study the symmetric Gini correlations in elliptical distributions and lognormal distribution, we would like to provide de nitions of other measures of association that will be used and compared in the paper.For H with a nite second moment, the Pearson correlation rp is The rank based Spearman and Kendall's tau correlations don't need a moment condition.The Spearman correlation is de ned as the Pearson correlation on the ranks of X and Y, that is, The Kendall's tau rτ is de ned as where (X , Y ) T and (X , Y ) T are independently distributed from H.
For Z = (X, Y) T from H with nite rst moment, the joint-rank based symmetric Gini correlation r (s) g [28] is de ned as is the spatial rank of z = (x, y) T with respect to H and the norm • is the Euclidean norm.Those correlations have di erent properties and may have di erent values under the same distribution.It is preferred to consider their Fisher consistent versions so that they correspond to the same quantity or same parameter [7].For a distribution H with a parameter ρ, ρr is Fisher consistent for ρ if We denote the Fisher consistent versions of Pearson, Spearman and Kendall's tau correlations as ρp, ρs and ρτ, respectively.
Next the symmetry Gini correlations as well as each of above mentioned correlation are studied in elliptical distributions and lognormal distribution.

. Gini correlations in elliptical distributions
A d-variate continuous random vector Z has an elliptical distribution H if its density function is of the form where µ is the location parameter, the positive de nite matrix Σ is the scatter parameter and the nonnegative function g is the density generating function.One important property for the elliptical distribution is that the nonnegative random variable Conventionally, we write the parameters of bivariate elliptical distributions as (µ , µ , σ , σ , ρ).
If second moment of Z exists, then the covariance matrix exists and is equal to ER d Σ.In this case, the Pearson correlationrp is well de ned and is equal to the parameter ρ.More details on the elliptical distribution family refer to [6].
If σ = σ , the joint-rank based Gini correlation r (s) g proposed in [28] has the following relationship with ρ. where − x sin θ dθ are the complete elliptic integral of the rst kind and the second kind, respectively.The Fisher consistent version of r (s)  g is hard to obtain an explicit form but a numerical solution is possible.For Kendall's tau, Blomqvist [2] proved that rτ = /π arcsin(ρ) in the normal case.Lindskog et al. [20] proved that this such relationship holds under all elliptical distributions in general.Hence the Fisher consistent version of Kendall's correlation is ρτ = sin π rτ .

. Gini correlations in bivariate lognormal distribution
The random vector, (X, Y) T , is said to have a bivariate lognormal distribution with parameters (µ , µ , σ , σ , ρ) if (log X, log Y) T follows a bivariate normal distribution with the same parameters.Clearly, Kendall's tau and Spearman correlation are invariant under monotonically increasing transformations, thus equations ( 6) and (7) still hold.For the Pearson correlation, it is easy to have Then the Fisher consistent version of Pearson correlation for the parameter ρ in the lognormal distribution is For the two new symmetric Gini correlations, we have derived the functional relationships as below.
Proposition 2.4.Under the bivariate lognormal distribution with parameters (µ , µ , σ , σ , ρ), we have where Φ is the cdf of the standard normal variable.Further, if σ = σ = σ, the Fisher consistent version of symmetric Gini correlations are The proposition states that explicit forms of the Fisher consistent symmetric Gini correlations are only available for the homogeneous case.Also (13) indicates that the Fisher consistent version of r ( ) g requires information of the sign of ρ.If σ ≠ σ , we need a numerical method to approximate them.
Plots in Fig. 1 display the relationship of various correlations to the parameter ρ in the lognormal distributions.In the left plot, σ = σ = , we have r ( ) g = r ( ) g > ρ > rs > rp > rτ if < ρ < , otherwise they are equal at 0 and 1.On the right with σ = and σ = , if < ρ < , then r ( ) g > r ( ) g , though the di erences between r ( ) g and r ( ) g are tiny and unnoticeable in the plot.Also we have r ( ) g > rs > rτ > rp.Note that the Pearson correlation, rp, can not reach when σ ≠ σ .The maximum value in the plot above is 0.6642169 when ρ = .From Equation ( 8), it is easy to prove that rp < for ρ = if σ ≠ σ .In other words, for a normal random variable X and a positive constant a ≠ , rp(exp(X), exp(aX)) < , meaning that the Pearson correlation is not suitable to describe nonlinear relationships.

Influence function
The in uence function (IF) introduced by Hampel [13] is now a standard tool which serves two purposes.The rst is to measure local robustness for e ects on estimators due to in nitesimal perturbations of distribution functions.The second is to derive limiting distributions and asymptotic variances.See also [14].For a cdf H on R d and a functional T : where δ z denotes the point mass distribution at z.Under regularity conditions on T (see [14,34] for details), we have E H {IF(Z; T, H)} = and the von Mises expansion where Hn denotes the empirical distribution based on a sample z ,...,zn.This representation shows the connection between the IF and the robustness of T, observation by observation.Further, ( 14) yields the asymptotic m-variate normality of T(Hn), We rst derive the in uence functions for the standard Gini correlations γ and γ , which are stated in the following proposition.

Proposition 3.1. For any continuous bivariate distribution H with nite rst moment, the in uence functions of the traditional Gini correlations are given by
The in uence functions of the standard Gini correlations are approximately linear in u and v. Comparing with the quadratic e ects of the Pearson correlation coe cient [5], γ and γ are more robust than the Pearson correlation.However, they are not strictly robust since their inuence functions are unbounded.Kendall's tau, rτ, and Spearman correlation, rs, have bounded in uence functions [3], which are In this sense, the standard Gini correlations are more robust than rp but less robust than rτ and rs.
Proposition 3.2.For any continuous distribution H with nite rst moment, the in uence functions of r ( ) g and r ( )  g are given by Since the square root function is not di erentiable at zero, the in uence function of r ( ) g does not exist when r ( ) g = .This brings di culty in deriving the limiting distribution of sample r( ) g when r ( ) g = , as explained further in a later section.The in uence function of r ( )  g and that of nonzero r ( ) g are linear combinations of the in uence functions of γ and γ , and hence are approximately linear in u and v.The symmetric Gini correlation r (s)  g proposed in [28] also has an approximately linear in uence function.We expect that the newly studied Gini correlations and the symmetric one based on the joint rank perform similarly in terms of robustness and statistical e ciency.
In Figure 2, we demonstrate the in uence functions of rp, rτ, r (s) g and r ( ) g and r ( ) g under the bivariate normal distribution with µ = µ = , σ = σ = and ρ = . .Since we know that r ( ) g = ρ and r ( ) g = |ρ| for bivariate normal distributions, the in uence functions for the two Gini correlations are identical for ρ = ., and thus share the same plot in Figure 2. Indeed under a general elliptical distribution, IF((u, v) T Note that scales of the value of the in uence functions in the four plots are quite di erent.

Estimation
Estimation of the two new symmetric Gini correlations can be done easily by plugging in estimators γ and γ of γ and γ , respectively.Given a random sample Z = {Z , Z , ..., Zn} with Z i = (X i , Y i ) T , the traditional Gini correlations γ and γ can be estimated by a ratio of U-statistics.That is,  [31] applied U-statistics theorem to establish consistency and asymptotic normality of γ and γ .The same result can be reached through the in uence function approach which is derived in Proposition 3.1.More speci cally, for H with nite second moment, where the asymptotic variances vγ and For a bivariate normal distribution, Xu et al. [41] provided an explicit formula vγ 16) and ( 17) is time-intensive with complexity O(n ).Rewriting U and U as linear combinations of order statistics reduces the computation to O(n log n) [31].That is, where X (i) is the i th order statistic of X , X , ..., Xn and X (Y (i) ) is the X corresponding to the order statistic Y (i) .Similarly, U and U are linear combinations of order statistics.This provides computational e ciency for γ and γ .Thus, we have computationally e cient estimators for r( ) g and r( ) g ; r( ) g is the arithmetic mean of γ and γ , while r( ) g is the geometric mean of γ and γ .
which are continuous functions of γ and γ and they can be e ciently calculated in O(n log n) of time.The strong consistency of r( ) g and r( ) g follows directly from the strong consistency of γ and γ .
Proposition 4.1.Let Z , Z , ..., Zn be a random sample from a continuous bivariate distribution H with nite rst moment.Then r( ) g and r( ) g given in (18) converge almost surely to r ( ) g and r ( ) g , respectively. .

Limiting distributions
To simplify the presentation, we denote With the in uence function derived in Proposition 3.2, we can easily obtain the asymptotic normality of r( ) g .
Under the lognormal distribution, asymptotic normality of the Fisher consistent estimator ρ( ) g is obtained by the Delta method.Its asymptotic variance is k (ρ) − vg , where with ψ and Φ being the pdf and cdf of the standard normal random variable, respectively.
To study the asymptotic behavior of r( ) g , we have to overcome the di culty brought about by the nonexistence of the in uence function when r ( ) g = .It is interesting to see that there are three di erent limiting distributions of r( ) g , corresponding to three cases of r ( ) g .We present the results in the following two propositions.
For r ( ) g ≠ , the in uence function of r ( ) g exists and can be used to establish the asymptotic normality of r( ) g and calculate its asymptotic variance.If r ( ) g = , the in uence function of r ( ) g does not exist, and hence we have to rely on U-statistic theory to derive the limiting distributions of r( ) g .There are two di erent cases resulting from r ( ) g = , depending on whether or not both γ and γ are zero.Without loss of generality, we assume γ = and the two cases correspond to γ = and γ ≠ , respectively.

. Asymptotic relative e ciency
We compare the asymptotic e ciency of the symmetric Gini correlations with other correlations under elliptical distributions and lognormal distributions.We consider Fisher consistent estimators.Note that the purpose here is not to estimate parameter ρ, which is usually provided by likelihood inference.Rather, the Fisher consistent correlation coe cients estimate the same parameter and hence their asymptotic variances and statistical e ciencies are comparable.Denote ρ( ) g , ρ( ) g , ρ(s) g , ργ , ρτ and ρp as corresponding estimators of symmetric Gini, standard Gini γ , Kendall's tau and Pearson correlations.The asymptotic variances of those estimators are derived by the Delta method.
We compute the asymptotic variances (ASV) of the Pearson estimators ρp, and asymptotic relative eciencies (ARE) of estimators ρ( ) g , ρ( ) g , ρ(s) g , ργ, and ρτ relative to ρp, which are reported in the rst part of Table 1.The asymptotic relative e ciency (ARE) of one estimator ρ with respect to another ρ is de ned by The second part of Table 1 lists ASV of all correlations under the lognormal distribution with σ = and σ = .In this case, the Pearson correlation has extremely large asymptotic variances, the result agreeing well with [19,23].The asymptotic variance of rp involves the fourth moment and is given by Witting and Müller-Funk ( [40]) as follows.where For the lognormal case of ρ = ., σ = and σ = , we have vp = .and using the Delta method, the ASV of the Fisher consistent Pearson correlation ρp is vp multiplied by 3.12.
Since we have yet to determine the relationship between ρ and r (s) g for the lognormal distribution, the asymptotic relative e ciencies of ρ(s) g under the lognormal distribution are not presented in this paper.Note that by Remark 4.1, we have γ = γ and hence the ASV's of ρ( ) g and ρ( ) g are same for all cases except for the second setup of the lognormal distribution.In that case, ρ( ) g is 15%, 10% and 2% more e cient than ρ( ) g for ρ = ., .and ., respectively.
Table 1 shows that the asymptotic variances of ρp, ρ( ) g , ρ( ) g , ρg, ργ, and ρτ all decrease as ρ increases in elliptical distributions.Asymptotic variances increase for t distributions as the degrees of freedom ν decrease.Under normal distributions, the Pearson correlation estimator is the maximum likelihood estimator of ρ, thus is the most e cient asymptotically.The two proposed symmetric Gini estimators ρ( ) g , ρ( ) g are both high in e ciency with ARE's greater than 90 percent; thus, more e cient than Kendall's estimator ρτ and the traditional Gini correlation estimator ργ.For heavy-tailed elliptical distributions, symmetric Gini estimators ρ( ) g and ρ( ) g are more e cient than Pearson's estimator ρp.They are also more e cient than the traditional Gini correlation in all elliptical distributions.The rank based symmetric Gini correlation ρ(s) g has a similar e ciency as ρ( ) g and ρ( ) g , but it has a slight advantage when ρ = .and 0.9.Under the lognormal distribution with σ = σ = , ρ( ) g and ρ( ) g are competitive with Kendall's tau.Under the case of σ = , σ = however, a large variation in Y will degrade the performance of γ and consequently ρ( ) g and ρ( ) g .ASV of Kendall's tau is the most e cient in this case.

Empirical Results
We rst conduct a small simulation to compare computational e ciency of each correlation.Then we compare nite sample statistical e ciency of these methods.

. Computational e ciency
To study the computational e ciency of these methods among nite samples, we perform a small simulation to compare the calculation times of the two symmetric Gini correlation estimators r( ) g , r( ) g with Kendall's tau rτ, Spearman rs, and Pearson rp correlation estimators, as well as the symmetric Gini correlation estimator r(s) g .Samples of sizes n = , and were drawn from a bivariate Normal distribution with parameters (µ = µ = , σ = σ = , ρ = ).For each sample, the computation times of each correlation measure were recorded.The procedure is then repeated 30 times to procure the mean and standard deviation of computation times for each measure.In Table 2, we display the mean and standard deviation (in parenthesis) of calculation times for r( ) g , r( ) g , r(s) g , τ, rs, and rp.The values in Table 2 were achieved on a Windows PC with an Intel ® Core TM i7-9700K CPU @ 3.60GHz, 8 cores.The R package "pcaPP" is used for fast computation of Kendall's tau correlation.
Table 2: The mean and standard deviation (in parenthesis) of calculation times for r( ) g , r( ) g , r(s) g , τ, rs, and rp under a bivariate Normal distribution.
From the complexity study, we know that r( ) g , r( ) g , τ, and rs all have calculation times of O(n log n), r(s) g has a calculation time of O(n ), and rp has a calculation time of O(n).In Table 2, we can see that rp is the most computationally e cient, with r( ) g , r( ) g , τ, and rs being only slightly less e cient.It is clear from Table 2 that all of r( ) g , r( ) g , rp, τ, and rs would perform well with most all sample sizes, however, r(s) g would not perform well with large samples.

. Finite sample e ciency
In order to study the e ciency of these methods among nite samples, we conduct a small simulation comparing the two symmetric Gini correlations with Kendall's τ, Spearman, and Pearson correlation estimators.Samples of sizes n = and n = were drawn from 4 t-distributions with degrees of freedom 1, 5, 15, and ∞, and from the Kotz and Lognormal distributions.Let µ = ( , ) T and Σ = ( ) be the parameters.
The R Package "mnormt" was used to generate data from the multivariate t distributions, bivariate normal distribution and the lognormal distribution by taking the exponential transformation of a bivariate normal random sample.We generate data from the Kotz distribution by rst obtaining uniformly distributed random vectors on the unit circle by u = (cos θ, sin θ) T with θ in [0, 2π], then generate r from a Gamma distribution with shape parameter α=2, and scale parameter β=1.Thus, we obtain Σ / ru + µ, a sample from a bivariate Kotz(µ, Σ) distribution.
An estimator ρ(m) is computed for the m th sample and the root mean squared error (RMSE) is used for a criterion for assessing estimators, which is de ned as In our experiment, M is set to be 3000.The procedure is then repeated 30 times to procure the mean and standard deviation of √ nRMSE.In Table , we display the mean and standard deviation (in parenthesis) of √ nRMSE of ρ( ) g , ρ( ) g , ρτ, ρs, and ρp.We notice a decreasing trend in √ nRMSEs as ρ increases for each sample size and an increasing trend as degrees of freedom, ν, decrease for t distributions.Under the normal distribution, √ nRMSEs of both proposed symmetric Gini estimators, ρ( ) g and ρ( ) g , are highly competitive with √ nRMSE of ρp.For ρ = 0.1 , ρ( ) g outperforms ρp in all distributions.We include the heavy-tailed distribution, t( ), to demonstrate the behavior of Pearson and Gini estimators when their asymptotic variances may not exist.We observe that for large sample size, ρp is around twice as large as both ρ( ) g and ρ( ) g .When the sample size is small (n = ), and degree of freedom ν is large (15, ∞) ρ( ) g performs the best.For the lognormal distribution, when ρ is small, we see ρ( )

. Robustness
We also conduct a simulation with contaminated data to demonstrate robustness and show how contamination a ects the performance of each correlations.We generate contaminated data of sizes (n = , ) from the following mixture normal model with contamination rates (ε = %, %).
where σ = , .The majority of the data is highly positively correlated with a contamination by a small portion of negatively correlated outliers.The same criterion √ nRMSE is used to evaluate the di erence between each correlation estimator and the true parameter value 0.9.M and the number of repetitions are the same as the previous subsection: 3000 and 30, respectively.The result is listed in Table 4.
In each case above, the Pearson correlation has the highest RMSE.This indicates the Pearson correlation's sensitivity to contamination and the high level of degradation those outliers have on its performance.The most robust correlation is the Kendall's tau.The performance of the Gini correlations are between those of the Pearson and Kendall's correlations.This result supports our ndings from the derived in uence functions in Section 3. The two symmetric Gini correlations ρ( ) g and ρ( ) g perform very similarly, but they are less robust than the joint rank based Gini correlation ρ(s) g .

Real data analysis
For the purpose of illustration, we apply the developed Gini correlations to the "GDP per captia and Suicide rates" data which is available on Kaggle.Many factors (mental health issues, weather, culture, etc.) a ect suicide.We would like to explore whether or not an economic factor, such as GDP, relates to suicide rate by measuring the correlation using several correlation coe cients.The data contains information from 160 countries around the world from the years 2000, 2005, 2010, 2015 and 2016.There are 2 missing values in 2000 data and 5 missing values in other years.We drop those countries with missing values and consider only the complete data for each year.We analyze how GDP and crude suicide rates are related and how the relationship changes through years.The crude suicide rate is     the number of suicide deaths in a year, divided by the population and multiplied by 100,000.The countries with the highest suicide rates are Russia and Lithuania.Their suicide rates range from 32 to 52 per 100000 people.Luxembourg is the country with the highest GDP per captia of $48736 in 2000 and $101305 in 2016.Ethiopia, Burundi, and Somalia are countries with the lowest GDP of $124 in 2000 and $282 in 2016.There is a high degree of positive skewness in the distribution of GDP, hence we also consider the log transformation of GDP data to handle the asymmetry.We draw the scatterplot between GDP per capita and SR as well as the scatterplot between log(GPD) and SR per year in Figure 3.We also add a cubic smoothing spline tting curve in each plot.We used default values of parameters of smooth.spline in R to t the curves.We can see that the tted curves demonstrate non-linear relationship between GDP per capita and SR, but almost linear relationships between log(GDP) and suicide rate except for the year 2010.We have calculated the symmetric Gini correlations for (GDP, SR) and (log(GDP), SR), as well as other correlations presented for comparison in Table 5.We utilize the jackknife method to provide an estimation of the variation of the sample correlations.Let r(−i) be the jackknife pseudo value of a correlation estimator r based on the sample with the i th observation deleted.Then the jackknife variance is where r(•) = /n n i= r(−i) .See [35] for more details.Table 5 lists the jackknife standard deviations in parentheses.
From Table 5, we observe that all the listed correlations between GDP per capita and SR are less than .5000,which indicates a weak or moderate association between GDP per capita and SR and is consistent with Figure 3.However, with each year, we notice an increasing trend in the correlations between GDP and SR.The data suggest that the correlations between the two become more signi cant as time passes.Values of r( ) g and r( ) g are close to each other, but there is a visible di erence between the regular Gini correlations, γ and γ .After the log transformation on GDP, the di erence becomes less signi cant.The monotonic transformation does not change the rank of the GDP.Kendall's τ and γ should maintain the same values before and after the transformation, which agrees with the values we have shown in  ) Table 6: Correlations between log(GDP) and SA for the complete data and the deleted data in 2015 and 2016.The standard deviations are in parenthesis.
To demonstrate robustness, we delete some outliers and compare the di erences of each correlation estimator in the complete data and in the edited data.We expect the Pearson correlation to show the largest di erence, the Kendall's τ correlation to demonstrate the smallest, and the Gini correlations to be somewhere in-between.We consider log(GDP) and SA data from 2015 and 2016.We delete all countries with SR > .The results listed in Table 6 con rm what we expect.In 2015, the Pearson correlation estimator changes from 0.295 to 0.353, while symmetric Gini correlations only have a slight change from 0.321 to 0.335.The Kendall's tau correlation is the most stable.A similar conclusion can be drawn for the 2016 data.This experiment illustrates that Pearson correlation is not robust and may not be a good measure of association even though the cubic smoothing spline tting lines in the scatter plots in

Conclusion
We have systematically studied two symmetric Gini correlations r ( ) g and r ( ) g , which are the arithmetic and geometric means of the traditional Gini correlations γ and γ .We studied basic properties of r ( ) g and r ( ) g , as well as their relationships to the correlation parameter in the elliptical distributions and log-normal distribution.Such relationships enable us to obtain Fisher consistent versions of each correlation.We derived their in uence functions in order to gauge robustness.They are more robust than the Pearson correlation but less robust than Kendall's tau and Spearman correlations.We established asymptotic distributions of the sample correlations.Usual asymptotic normality holds for r( ) g as well as for r( ) g as long as r ( ) g ≠ .Their asymptotic variances are obtained through the in uence function approach.For r ( ) g = , r( ) g has two di erent limiting distributions, depending on whether or not both γ and γ equal 0. We compared their computational e ciency and statistical e ciency with the rank-based symmetric Gini, Kendall's tau and the Pearson correlation.r( ) g and r( ) g can be e ciently calculated with a computational complexity of O(nlogn).Asymptotic eciency and nite sample e ciency of each correlation are obtained under various elliptical distributions and asymmetric lognormal distributions.In summary, the two symmetric Gini correlations balance well among statistical e ciency, robustness, and computational e ciency.
Continuations of this work could advance in several directions.The jackknife empirical likelihood (JEL) method proposed by Jing et al. [15] has been proven to be e ective and reliable in dealing with U-statistics.Sang et al. [29] have applied JEL to the classical Gini correlations.It could be bene cial to develop JEL for the two symmetric Gini correlations.In the current work, comparisons among correlations are made in elliptical distributions and lognormal distributions.It would be worthwhile to explore the comparisons in wide families of bivariate distributions such as copula family and Farlie-Gumbel-Morgenstern models.Fontanari et al. [8] proposed a new Archimedean copulas based on the Lorenz curve that is highly related to Gini index and Gini correlations.It is interesting to study correlations in this family.Dang et al. [4] extended the Gini mean di erence in one dimension to the Gini covariance matrix (GCM) in high dimensions.However, its computation cost is O(n ).It would be worthwhile to study the GCM based on r ( ) g or r ( ) g which should be more computationally e cient.
Let (X, Y) T follow a lognormal distribution with parameters (µ , µ , σ , σ , ρ).Then the marginal distributions are F(x) = Φ((log x − µ )/σ ) and G(y) = Φ((log y − µ )/σ ), respectively.EX = exp(µ + σ / ) and E(X|Y) = exp(µ + ρσ (log Y − µ )/σ + σ ( − ρ )/ ) by [23].We have The last equation is due to (22).Also Thus, A proof of the proposition follows directly from the fact of strong consistency of Ustatistics U , U , U , U by the U-statistics theorem [34] and the fact that r( ) g and r( ) g are continuous functions of U , U , U , U .By the continuous mapping theorem [34], the strong consistency of r( ) g and r( ) g holds.Proof of Proposition 4.2 and 4.3.The asymptotical normality of r( ) g and the asymptotical normality r( ) when rg ≠ are an immediate result from the application of the in uence function approach [14].statistics theorem and the continuous mapping theorem [34].We need to explore the limiting distribution of |U U |. Then by Slutsky's theorem [38], the limiting distribution of r( ) g follows.Now consider U U , the product of two U statistics.We have where Rn = op(n − ) and the symmetric kernel g(z , z , z , z ) = / !p h (z i , z i )h (z i , z i ) with p denoting summation over the 4! permutations (i , i , i , i ) of ( , , , ).De ne the new U statistic Un = n − ≤i<j<k<l≤n g(Z i , Z j , Z k , Z l ).It is easy to check that U U is asymptotically equivalent to Un.Now consider the rst order and second order projections of the kernel g.We de ne Y) for any constant c, d and nonzero a, b.
where • is the Euclidean norm and U is uniformly distributed on the unit sphere.When d = , the class of elliptical distributions coincides with the location-scale class.For d = , let Z = (X, Y) T and Σ = σ σ σ σ , then the corresponding linear correlation coe cient of X and Y is ρ = ρ(X, Y) := σ σ σ .

Remark 4 . 1 .
If γ = γ ≠ , we have vg = vg , meaning that two estimators r( ) g and r( ) g have the same statistical e ciency.

Proposition 4 . 4 .
Let Z , Z , ..., Zn be a random sample from 2-dimensional distribution H with nite second moment.When r( )  g = , we have 1.If γ ≠ , r( ) g converges to the square root of a folded normal random variable.That is, n / r( )g d −→ |Z|,where Z is a normal random variable with mean zero and variance given in the proof.2. If γ = , we have√ s − ) ,where ∆ = Cov(X, F(X)) and ∆ = Cov(Y , G(Y)) are Gini's mean di erences for F and G, respectively, χ s (s = , , ...) are independent χ variables and {λs} (s = , , ...) are coe cients given in the proof.

30 Figure 3 :
Figure 3: Scatter plots between GDP and Suicide Rate and log(GDP) and Suicide Rate in di erent years.A cubic smoothing spline tting curve is added in each plot.
Fig 3 are almost linear in 2015 and 2016, suggesting the usage of the Pearson correlation.Other correlations are more preferred in this example.
U and U are always positive.The denominator U U converges to √ ∆ ∆ almost surely by the U

Table 4 :
The mean and standard deviation (in parenthesis) of √ nRMSE of each correlation estimator in the contaminated Normal data.

Table 5 :
All types of correlations for (GDP, SR) and (log(GDP), SR), respectively.The standard deviations are in parenthesis.