Bivariate box plots based on quantile regression curves

Abstract: In this paper, we propose a procedure to build bivariate box plots (BBP). We rst obtain the theoretical BBP for a random vector (X, Y). They are based on the univariate box plot of X and the conditional quantile curves of Y|X. They can be computed from the copula of (X, Y) and the marginal distributions. The main advantage of these BBP is that the coverage probabilities of the regions are distribution-free. So they can be selected by the users with the desired probabilities and they can be used to perform t tests. Three reasonable options are proposed. They are illustrated with two examples from a normal model and an exponential model with a Clayton copula. Moreover, several methods to estimate the theoretical BBP are discussed. The main ones are based on linear and non-linear quantile regression. The others are based on empirical estimators and parametric and non-parametric (kernel) copula estimations. All of them can be used to get empirical BBP. Some extensions for the multivariate case are proposed as well.


Introduction
The (univariate) Tukey's box plots are very useful tools to analyse and compare data (random variables) both in size (mean, median, quantiles, etc.) and in dispersion (range, interquartile range, etc.). They can also be used to detect outliers.
When we analyse bivariate (paired) data we have di erent options in order to de ne bivariate box plots (BBP). From a theoretical point-of-view, when we study the (real-valued) random vector (X, Y), we can use the regions determined by the Mahalanobis distance and the associated lower bounds for their probabilities provided by the multivariate Chebyshev's inequality (see [5,17]). Exact probabilities can be computed for normal (Gaussian) random vectors by using the Chi-squared distribution (see p. 39 in [14]). Extensions to the multivariate case can be obtained through a standard principal component analysis (see [17]). In practice, we can use the regions determined by the estimated means, variances and covariances or that provided by the empirical distribution of the data set (see [16]).
Another popular empirical option is to use the bagplots proposed in [20]. They are based on the concept of depth and provide a region (called the bag) based on the depth median which contains the % of the data. This region is in ated three times to obtain the fence which separates inliers from outliers.
A third good option is provided by the quantile regression curves. This option is especially useful when we want to estimate Y (response variable) from X (explanatory variable). In the general case we could use the regions proposed in [6] instead. The theoretical quantile regression curves can be obtained from the copula of (X, Y) (see p. 217 in [19]). In practice we can use the empirical linear or non-linear estimators proposed in [10,11]. The package quantreg for the statistical program R can be used for this purpose. A short introduction is given in Section 3 below.
In this paper, we use the quantile regression curves to propose a procedure to build bivariate box plots (BBP). The theoretical BBP are studied in Section 2. The main advantage of this procedure is that, by using conditional distributions, we can choose the exact probability of each region. Three reasonable options are proposed. The empirical bivariate versions are discussed in Section 3. They are obtained by using di erent techniques to estimate the theoretical regions. Some extensions to the multivariate case are proposed in Section 4. The conclusions are placed in Section 5.
If f : R n → R is a real-valued function with n variables, then ∂ i f denotes the partial derivative of f with respect to its ith variable. Analogously, ∂ i,j f := ∂ i ∂ j f and so on. Whenever we use a partial derivative, we tacitly assume that it exists.

Theoretical BBP . De nitions
The Tukey's univariate box plot associated to a sample X , . . . , Xn from a random variable X is usually determined by the inverse of the function gn obtained connecting the points (X i:n , (i − )/(n − )), i = , . . . , n (see, e.g., [8]), where X :n ≤ · · · ≤ Xn:n represent the ordered data. There are di erent options to de ne the theoretical univariate box plots. In this case, the Tukey's box plot is replaced by the points method based on the mean (or the median), two moderate quantiles and two extreme quantiles. If F is the distribution function of X, the (upper) quantile (generalized inverse) function is usually de ned as q(u) = F − (u) := sup{x : F(x) ≤ u} for u ∈ ( , ). Other authors prefer to use the so called lower quantiles de ned as q − (u) := inf{x : F(x) ≥ u}. If F is continuous and strictly increasing, they coincide for < u < . The quantile function q is available in many statistical programs (as R) for the usual probability models. The theoretical median and quartiles can be de ned as me = q = q( / ), q = q( / ) and q = q( / ) (other de nitions can be considered as well).
Hence the theoretical box can be de ned as [q , q ]. One can also consider di erent options to de ne the theoretical whiskers. As in the empirical case, the limit points (or fences) can be de ned as l := q − . (q − q ) and l := q + . (q − q ). The main problem with this de nition is that the probabilities Pr(X ∈ [l , q ]) and Pr(X ∈ [q , l ]) depend on F. Instead we could consider the quantiles represented by q − . (q − q ) and q + . (q − q ) in a normal (Gaussian) model. In this model we have . Therefore, we can de ne the theoretical limit points for a general distribution F as := q( . ) and := q( − . ) = q( . ). With these de nitions, we have for any continuous distribution function F. In the normal model both de nitions coincide but in other models l i and i can be di erent for i = , . Note that Pr(X < ) + Pr(X > ) = · . = . .
Therefore, approximately, the . % of the data from X will be classi ed as (false) outliers, that is, the theoretical box-plot will contain the . % of the data. If we want di erent coverage probabilities, we can choose other quantiles for the points method. Now we are ready to propose a procedure to de ne theoretical bivariate box plots (BBP) for a bivariate random vector (X, Y) with an absolutely continuous distribution function F (this assumption can be relaxed later). To this end we are going to use the points method and conditional distributions. For instance, we can consider the conditional distribution (Y|X = x), that is, we consider Y as a response variable and X as an explanatory variable. Then we propose the regions de ned as and xed quantiles u and v such that < u < v < , then Proof. From eq. (2.1) and f is the probability density function (pdf) of X. By using now that α and β are quantile functions of the conditional random variable (Y|X = x), we get The preceding theorem is the key tool for de ning the theoretical bivariate box plots. Note that here we can use the copula approach to determine the conditional quantile functions F − | (see, e.g., [19, p. 217]). The result can be stated as follows.
for all x, y, where f = F and f = F represent the marginal pdf functions and c = ∂ , C is the pdf of the copula C. Therefore, the conditional pdf of (Y|X = x) is for x such that f (x) > . Then the conditional distribution function can be represented as where du(v) := ∂ C(u, v) for < v < , du( ) := and du( ) := , since we assume lim v→ ∂ C(u, v) = for all < u < (see [18] or [19, p. 217]). Therefore 2.2 holds.
The function du : [ , ] → [ , ] de ned in the preceding proof is a distortion function for all < u < , that is, it is increasing from du( ) = to du( ) = (see [18]). Note that du only depends on the copula C. In many copula models, it is continuous and strictly increasing in [ , ]. Therefore its inverse function d − u can be obtained easily (by using analytical and/or numerical methods). Properties for the conditional quantile functions (also known as Conditional Value at Risk or CoVaR) of di erent copulas can be seen in, e.g., [2,3,9]. Note that here we are xing one of the in nitely many versions of the conditional distribution Y|X. In some cases, this selection might a ect the shape of BBP de ned below.
The preceding proposition can be used to obtain di erent quantile curves. For example, the conditional median (or the median regression) curve can be obtained as If we want to predict Y for a speci c value of X = x, this curve is a good alternative to classical mean regression curve m | (x) := E(Y|X = x). Moreover, it can be used to de ne a conditional median vector as follows.
De nition 2.1. The conditional median vector for X and Y|X is de ned as where me X := F − ( . ) is the median of X and me | is the median regression curve of Y|X.
The conditional median vector me | for Y and X|Y can be de ned in a similar way. It is in general di erent from me | (some examples will be given later) and both are di erent from the mean vector (or the conditional mean vectors, de ned similarly). Analogously, the conditional (theoretical) quartile functions are de ned by for i = , , . Of course, q ( ) | = me | . As mentioned above, the function du only depends on the copula. For many copulas, we can obtain an explicit expression for d − u . For example, the explicit expression for Archimedean copulas was given in [19, p. 218]. If we cannot obtain an explicit expression, we can use numerical methods to approximate d − u . If we do not want to use copulas, we can choose models with known conditional distributions (e.g. a normal distribution) or models built by using speci c conditional distributions (see [1]).
The regions given in eq. (2.1) with appropriate conditional quantile functions α and β can be used to de ne the theoretical BBP. We propose here three reasonable options. In all of them, we use the points method applied to the marginal distribution of X and the conditional distribution of Y|X. Other options can be considered as well.
De nition 2.1: BBP, option 1. In this option, the central region Rcc is obtained with p = p = / . In this case, the interval [a, b] coincides with the theoretical (univariate) box for X (i.e. a = q (X) = F − ( / ) and b = q (X) = F − ( / )) and the curves α and β coincide with the rst and the third quartile regression curves (i.e., α = q ( ) | and β = q ( ) | ). In the middle of the region we can plot the median regression curve me | = q ( ) | (see the examples below). Hence, from Theorem 2.1, the region Rcc will contain (approximately) the % of the data since p p = / . Next we can de ne the central-top R ct and central-bottom R cb regions with p = / and p = / − . = .
. Both are obtained with [a, b] as above (i.e. the box of X). Then R ct is obtained with α = q ( ) | and β(x) = q | ( . |x) and R cb is obtained with α(x) = q | ( . |x) and β = q ( ) | (i.e. the limit points of the theoretical whiskers of Y|X). Then p p = .
. Analogously, the left-central R lc and right-central Rrc regions are obtained with p = / and p = / − .
. The rst one is obtained with a = and b = q (X) and the second with a = q (X) and b = (i.e. the limit points of the theoretical whiskers of X). Then, in both cases, we take α = q ( ) | and β = q ( ) | . Therefore p p = .
as in the preceding regions. The four regions R ct , R cb , R lc , Rrc together will contain (approximately) the . % of the data. Finally, we de ne in a similar way the four corner regions R lt , R lb , R rt , R rb (left-top, left-bottom, right-top and right-bottom) with p = p = / − . and p p = .
. Hence, the four corner regions together contain the .
% of the data. The data out of these nine regions can be considered as possible outliers and − .
is the probability of a false outlier (i.e. approximately the . % of the data from (X, Y) will be considered as possible outliers).
Some examples of these regions are given below (see Figures 1 and 5, left). There we add some simulated data (right plots). We also perform t tests based on the Chi-squared distribution and these regions. . However, in this case, the interval [a, b] does not coincide with the theoretical (univariate) box for X, since The same happen with the curves α and β which do not coincide with the quartile regression curves. They are de ned as α(x) = q | ( . |x) and β(x) = q | ( . |x). In the middle of the region, as in option 1, we plot the median regression curve me | . Hence, from Theorem 2.1, the region Rcc de ned in this way will contain (approximately) the % of the data (since p p = / ).
To complete this option we just need to de ne the other regions as in option 1 but changing the coverage probabilities to get simple regions. For example, a reasonable option could be to de ne the limit points as = F − ( . ) and = F − ( . ). In this case Pr( ≤ X ≤ ) = . . We can do the same with the conditional distributions by choosing the quantile functions q | ( . |x) and q | ( . |x). Hence the nine regions of the BBP will contain (approximately) the % of the data and the % of the data will be classi ed as possible (false) outliers. The four regions R ct , R cb , R lc , Rrc will contain the . % of the data and all together the . %. Analogously, the corner regions R lt , R lb , R rt , R rb will contain the . % and all together the .
%. Some examples are given below (see Figures 2 and 5, right).

De nition 2.3: BBP, option 3.
Another option to get a central region Rcc containing the % of the data, is to choose p = and p = . , that is, a = −∞, b = ∞, α(x) = q | ( . |x) and β(x) = q | ( . |x). This option is not new since it is a popular option in quantile regression plots (see [10,11]) and the resulting region can be considered as a % con dence band for the predictions for Y obtained by using the median regression curve. Note that they also coincide with the union of all the theoretical boxes for Y|X = x for x ∈ R.
Here we also have several options for the whiskers. The most habitual one (option 3.1) in quantile regression plots is to use q | ( . |x) and q | ( . |x) for the lower and upper limits (as in option 2 above). With this choice the three regions all together will contain the % of the data (as in option 2) and they can also be considered as a % con dence band for Y|X. The % of the data will be considered as false outliers. Instead (option 3.2) we can use the limits for the theoretical whiskers (used in option 1 above) obtained with the limit curves q | ( . |x) and q | ( . |x). This is equivalent to use the theoretical box plots for Y|X = x at every point x ∈ R. In all these choices we maintain a = −∞ and b = ∞. Thus, the main advantage of option is that we just have three regions (as in the univariate box plots). The main disadvantage is that we will not be able to detect outliers due to extremes values of X but close to the median regression curve (see Figure 3) since we are not using a univariate box plot for X to determine [a, b].

. Main properties
The BBP de ned above have the following properties. The proofs are immediate from Theorem 2.1 and the properties of the regression quantile functions.
for all increasing functions h , h and < v < . This property is based on Proposition 2.1 and the well known fact that the copula (and so du) does not change under increasing transformations. Therefore, R * is a p = p p con dence region for (h (X), h (Y)) based on quantile regression curves.
In particular, the BBP proposed above are equivariant under scale-location transformations with a positive scale parameter (i.e., when we change X and Y with aX + b and cY + d for a, c > ). However, the BBP are not in general equivariant for interchanging X and Y (since they are based on quantile regression curves). If we want equivariant plots in this sense we should use bagplots (see [20]) or the con dence regions based on the multivariate Chebyshev's inequality (see [16]).

Proposition 2.3. The coverage probabilities and the expected values for the regions of the BBP in the di erent options considered above are distribution-free.
The proof is immediate from Theorem 2.1. The de nitions given above of the di erent regions can be modi ed to get the desired (theoretical) coverage probabilities.
By prolonging the curves/lines used to de ne the regions proposed in options and above, the possible outliers can be also classi ed in di erent regions that indicate how extreme they are (the more extreme ones will be that included in the four corner regions).

. Two theoretical examples
Let us see how to compute the theoretical BBP in two di erent models. A normal (Gaussian) model (next example) and an exponential model with a Clayton copula (Example 2.2). Example 2.1. A typical (relevant) case is to consider a normal model. In this case the conditional distributions are known and the quantile regression curves are straight lines (this fact will simplify the estimation procedure considered in the next section). For example, we can consider two standard normal distributions with correlation ρ = . . Then the conditional distribution (Y|X = x) has a normal distribution with mean = . (see, e.g., [14, p. 63]). Moreover, the conditional median regression curve coincides with the regression curve, that is, In this case, the conditional medians coincide and are equal to the mean vector, that is, me | = me | = ( , ). Analogously, the conditional quantile function is where qnorm(v, , σ) is the quantile function of a normal distribution with zero mean and standard deviation σ = √ . = .
. The rst and third quartiles of X (and Y) satisfy −q = q = . and the limits for the whiskers − = = .
. As mentioned above, in this case we have = q( . %) and each grey region the . % (all together the . %). The conditional median m | = ( , ) point is represented by the + symbol. The data outside these nine regions can be considered as possible outliers. Note that these outliers can also be classi ed in di erent regions. For example, a data on the left (i.e. with X < ) is just an outlier in the box plot of X. If it is also in the top (i.e. Y > β), then it is also an outlier in the conditional distribution Y|X.
In the right plot of Figure 1, we add n = simulated data from this model. With these data we obtain the following estimations: Me(X) = − .
(of q = . ). In this plot we have two possible (false) outliers. The rst one (righttop) is also an outlier in the box plot of X since X = .
. The second one (bottom) with values (− . , − . ) is not detected as an outlier for X. However, it is classi ed as a possible (false in this case) outlier in the conditional distribution of Y|X since Y < α(X) (by a small margin). The counts of the data in each region are given in Table 1. There, we also provide the expected data values in each region obtained as np p . Thus, we can perform a Chi-squared t test with the statistic where O i represent the observed data in each region, E i the expected data and m the number of regions. We can consider m = regions by including a common region for the outliers. It is well known that, with the BBP obtained by using the correct (exact) distribution (null hypothesis), the asymptotic distribution of T (when n → ∞) is a χ m− . The P-value for this test is P − value = Pr(T > To), where To is the observed value for T. The approximation with the Chi-squared distribution gives a good approximation if all the expected values E i are greater than (see, e.g., [7]). So, if it is necessary, some regions can be joined. If the distribution contains k unknown parameters and they are estimated with a maximum likelihood method, then the asymptotic distribution of T is χ m−k− . With these simulated data and the ten regions of option 1, we obtain (see Table 1) and P−value = Pr(T > . ) = .
, where T has (approx.) a χ distribution. Hence, as expected, we can con rm that the data come from this model (we have an expected value less than but the P-value is large enough to con rm this decision). Note that with the BBP in option 1 (and the correct distribution), the expected values in the regions will only depend on n for any joint distribution F (i.e. they are distributionfree).
The BBP in options and . can be obtained in a similar way. They are plotted in Figures 2 and 3, respectively. They can also be used to perform t tests. As expected, in the BBP of option 2 we have several ( ) false outliers but we just have an extreme outlier (right corner), the point with values ( . , . ) which is out of both the regions determined by the right vertical red line ( ) and the top red line (β). Instead, the central (blue) region Rcc contains data (we expect ). In the BBP of option 3.1 we also have many false outliers, above the top red line and below the bottom red line (we expect ). In this case the more extreme data is the point with values (− . , − . ) because it is the farthest point from the median regression estimation If we use the limits in option . (red dashed lines in Figure 3), then this point is the unique outlier. In this case, the point in the right-top corner is not detected as an outlier (remember that it was classi ed as outlier in the univariate box plot of X). The same will happen with extremes values of X or Y close to the median regression curve (black line). In the right plot we add simulated data from this model.

Example 2.2.
There are more models (copulas) whose quantile regression curves are straight lines (see, e.g., Example 5.24 in [19, p. 218]). However, in this second example, we want to choose a model without this property (this fact will hinder the estimation procedures studied in the next section). Thus, we consider the following Clayton copula which induces a positive dependence between X and Y. As C is the distribution function of U = F (X) and V = F (Y) we can just consider the BBP for (U, V). The similar plots for (X, Y) will be obtained with the transformations X = F − (U) and Y = F − (V) for any continuous marginal distributions F and F and Proposition 2.2.
As the Clayton copulas belong to the wide family of Archimedean copulas, the quantile regression curves can be obtained from the expression given in Example 5.25 of [19, p. 218]. For the copula C given above we can also use a direct calculation as follows. The distortion function du for the conditional distribution Y|X is for ≤ v ≤ and < u < . Note that du is the distribution function of V|U = u for < u < . It is plotted in Figure 4, left, for di erent values of u. Its inverse function is √ v for ≤ v ≤ and < u < . Hence, from eq. (2.2), the quantile regression curves are for < y < and x such that f (x) > . In particular, for (U, V), which have uniform marginal distributions in ( , ), we get √ v for < u < . Thus, for example, the median regression curve for V|U = u is ).
By the symmetry of the model, we have m | = ( . , . ) and so, in this case, they do not coincide. Note that they are also di erent from the mean and median vectors which coincide in the point (E(U), E(V)) = (me(U), me(V)) = ( . , . ).
The quantile regression curve q | can also be used to get the BBP in options and . They are plotted in Figure 5 jointly with simulated data from this model. These simulated data were obtained by using the conditional standard method. Thus we rst generate a random data u , . . . , u from U (i.e. 100 random data in ( , )), and then we get one random data from V|U = u i by using q | for i = , . . . , (i.e. with the inverse transform method). As in the preceding example the BBP could be used to perform t tests for the copula (when we are not sure about the exact copula).
If, for example, the marginal distributions are standard exponentials, that is, F (x) = F (x) = − e −x for x ≥ , then by applying the inverse transform F − i (y) = − ln( − y) for i = , , we obtain the data and the BBP (options 1 and 2) plotted in Figure 6. ).

Practical BBP
As mentioned at the beginning of Section 2, the univariate box plots are an empirical tool. Therefore, it is very important to obtain the practical BBP associated to the theoretical ones proposed in the preceding section.
To this end we can maintain the de nitions (options) and just estimate the theoretical regions or to modify the de nitions by using the sample. In this section (X , Y ), . . . , (Xn , Yn) represent a sample of i.i.d. random vectors from (X, Y).
As the theoretical BBP in option of the preceding section is based on the univariate box plot for X, in practice we have two options for the limit points and .

Option 1.a ( ve points method):
Here we can maintain the de nitions and use the function gn obtained from the data X , . . . , Xn (see the rst paragraph of Section 2) to estimate and byˆ := g − n ( . ) andˆ := g − n ( . ).

Option 1.b (Tukey's method):
Alternatively, we can just use the empirical (Tukey's) box plot for X, that is, we can use the limits L and L for the whiskers as vertical limits for the BBP.
Note that this second option cannot be applied to the conditional distributions Y|X = x since, in continuous models, we just have a data (X i , Y i ) for each value x = X i for i = , . . . , n and we do not have data for the other values of x. Hence we need to estimate the conditional quantile function q | and use the quantile limits proposed in options -. Fortunately, we have several techniques available in the literature (and in some statistical programs) for this purpose. Let us see some of them.

. Linear quantile regression
The linear quantile regression was proposed by Koenker and Basset in [11] (see also [10]) as an alternative to the well known linear regression model. The key idea is to replace conditional means by conditional medians. It is well known that the empirical medianm X = g − n ( . ) of X , . . . , Xn minimizes the mean absolute error MAE(x) = n n i= |X i −x| while the empirical mean X minimizes the mean squared error MSE(x) = n n i= (X i − x) .  Thus, in a linear median regression model, we consider the linear function y = a + bx which minimizes the MAE(a, b) = n n i= |Y i − a − bX i | instead of the MSE used in the linear regression model. The empirical linear median regression curve is then de ned asm | (x) =â +bx, whereâ,b ∈ R are the (one) solutions of this minimization problem. This de nition can be extended when the random variable X is replaced with a random vector (see [11]).
Analogously, the qth sample quantilexq of X is de ned as the (one) solution of for < q < . Note that this de nition does not coincide with that used above (and in the default method of R) to get the quartiles Q and Q and to built the univariate box plots (see the rst paragraph of Section 2 or [8]). This minimization problem can be used to de ne the linear regression quantile function asq | (v|x) =âv+bv x, whereâv ,bv ∈ R are the (some) solutions of for a xed v ∈ ( , ). These linear regression quantiles can be obtained in R with the package: quantreg (see [13]). After installing this package, the command: rq(Y∼X,v) provides the estimated linear regression quantiles for < v < , where X and Y are two columns containing the paired data (X i , Y i ). Clearly,q | (v|x) can be considered as an estimator of the theoretical curve q | (v|x) de ned in the preceding section. Of course, it will be a better estimator if the theoretical curves are straight lines.
Thus, if we use the data in Example 2.1 (obtained from a normal model), we obtain the BBP (option .b) in Figure 7. The continuous lines represent the theoretical regression quantile functions (plotted also in Figure  1) and the dashed lines the associated estimations. The symbols + represent the conditional median vector (black) and its estimation (red). The theoretical median (and mean) regression curve me | (x) = .
x (continuous black line) is estimated with the median regression lineme | (x) = .
x (dashed red line). In this case it can also be estimated with the linear regression linem | (x) = .
x (dashed green line). In the right, we just plot the empirical BBP and we add the data. Note that the worst estimation is obtained in the lower quantile function (bottom red lines) which ts to the lower data. That is because the lower dashed red line is a regression quantile with probability . and so it is obviously attracted by (and even goes through) the lower outliers. As a result, the fences might not identify these outliers (but they remark them as possible outliers). Also note that the greater data of X is considered as an outlier and so it is replaced as the upper limit of X with L = Q + . (Q − Q ) = .
(right red vertical line). The dashed lines can be used to de ne the nine empirical regionsR as in Section 2. The counts of the data in these regions are given in Table 2. As they depend on the data, some regions may contain more data than the theoretical ones (for this sample). However, if they are used to study the data in an independent sample, the expected values should be the same.
Analogously, if we use the data in Example 2.2 (obtained from a Clayton copula), then we obtain the BBP (option .b) in Figure 8 for uniform (left) and exponential (right) marginals. The purpose of this gure is to show that the estimated quantile regression lines can be quite far from the theoretical ones when they are not straight lines. In this case they are not so bad in the left plot. Note that in the right plot we have ve outliers in X due to the fact that the exponential model is quite far from the normal model. So, in the classic univariate box (Tukey's) plot of X (vertical lines), the maximum value has been replaced with L = Q + . (Q − Q ) = .
(detecting ve false outliers). In these cases (far from the normal model), it is better to use option .a (i.e. the empirical points method). Moreover, with this option, we maintain the expected values in the regions. The counts for the empirical regions in the left plot are in Table 3. They do not coincide with the counts for the regions in the right plot due to the change in the right limit of X.

. Non-linear quantile regression
The basic theory for the non-linear quantile regression can be seen in [12]. They are also included in the Rpackage: quantreg. Of course, this method is especially useful for models with non-linear regression curves (as the model studied in Example 2.2). However, it can also be applied to models with linear quantile regression curves. Note that the linear model can also be applied to linear expressions based on X. For example, we can obtain quadratic approximations by using X and X . One can think that the ts will be better in this case because we have an additional parameter by using y = a + bx + cx . However, this is not always true. Thus, for example, the quadratic median regression curve obtained with this method will be closer to the data than the linear one but it can be a worse estimation of the theoretical median regression curve. In an extreme case we might add n parameters to get a curve which contains all the data, but this curve will not be a good approximation of the theoretical one. This fact can be observed in Figure 9 where we plot the quadratic estimations obtained from the data in Example 2.1. They can be compared with the linear estimations given in Figure 7. However, if we use the data in Example 2.2, we obtain the BBP in Figure 10 (uniform and exponential marginals). Clearly, in these cases, the quadratic approximations are better than the linear approximations given in Figure 8.

. Empirical estimators
If we prefer to have "empirical estimators" we just need to estimate the conditional median (quantiles) for Y|X = X i . For example, in a three points method, we can rst sort the data from X obtaining X :n , . . . , Xn:n. Then, to estimate the quartiles of Y|X = X i:n , we can consider the three values of the Y, associated with X i− :n , X i:n , X i+ :n . They are called concomitans and are usually represented as Y [i− :n] , Y [i:n] , Y [i+ :n] . These values are not necessarily ordered, so we need to sort them and then to estimate the quartiles of Y|X = X i:n with these three ordered values Q i ≤ Q i ≤ Q i (the median is estimated with the value in the middle). For the extreme points X :n and Xn:n, we can use respectively the points X :n , X :n , X :n and X n− :n , Xn:n , Xn:n (or just use two points). For these points we have the following property.  Analogously, it can be proved easily that E(U : ) = / and E(U : ) = / .
By applying this method to the data in Example 2.1, we obtain the estimations (continuous lines) plotted in Figure 11, left. As we can see the estimations are not very good. If we have more data we could use a ve-points or a seven-points methods. In the rst case, the ve closest values can be used to get the box-plots but note that they are biased (except in the case of the median) since E(U j: ) = j/ for j = , , , , . However, in the second, we have E(U j: ) = j/ for j = , . . . , and so the second, the fourth and the sixth values of the ordered Ys are unbiased estimators for the quartiles at this point. The rst and the seventh values can be used to plot the whiskers. They are plotted in Figure 11, right. There we have changed the theoretical limits in option with the quantile regression curves with q = / and q = / (dashed red lines). In discrete populations we can use all the data with X = X i for a xed i. In the following subsections we propose other techniques to get "smooth" estimations based on copula estimations.

. Parametric estimation
As we have seen in Section 2, the conditional regression curves depend on the partial derivative of the copula C and the marginal distributions F and F (see eq. (2.2)). The last ones can be estimated with the empirical (or kernel) distribution functions from X and Y. These estimationsF andF , can be used to transform the data into U i =F (X i ) and V i =F (Y i ) for i = , . . . , n, which can be considered as a sample from the copula C. Then we just need to use these data to estimate C.
In this subsection, we assume a parametric form for the copula C, that is, we assume that C = C θ for given family of copulas C θ with an unknown parameter θ. Typically, θ represents the grade of dependency between X and Y. Thus, in Example 2.2, we might assume that the data have a Clayton copula (see, e.g., [19, p. 116]) with an unknown dependence parameter θ ∈ [− , ) ∪ ( , ∞). If θ → , then C θ goes to the product copula Π(u, v) = uv (i.e. X and Y are independent). So, by de nition, we take C := Π.
A typical procedure in copula theory to estimate the dependence parameter θ is to consider a dependence (or concordance) coe cient invariant under monotone increasing transformations. The most usual coe cients are the Kendall's tau τ and the Spearman's rho ρ S (see [19, p. 158, 167]). Then we estimate this coe cient from the data and we choose the value of θ to get this coe cient in C θ . For example, the Kendall's tau τ coe cient for the above Clayton copula is τ θ = θ/( + θ) for θ ≥ − (see [19, p. 163]). Note that τ θ ∈ [− , ). The empirical Kendall's tauτ is de ned (see (5.1.1) in [19, p. 158 where c is the number of concordant and d is the number of discordant pairs of data and n is the number of pairs. We say that the pairs ( If we apply this formula to the data in Example . , we getτ = .
. Then by solving which a good estimation of the theoretical value θ = . Finally, under the assumption of a Clayton copula, we use the conditional quantile regression curves of Cθ to estimate the theoretical ones and get the practical BBP. A straightforward calculation from eq. (3.1) gives for < u < and < v < . Hence, by replacing the unknown θ withθ = .
in these curves, we obtain the estimated curves (dashed lines) plotted jointly with the theoretical ones (continuous lines) in Figure 12, left. Note that the estimations for the curves are very good. The estimations for the quartiles of U (vertical dashed lines) are not so good. However, if we know that the data come from a copula, they can be replaced with the exact values (continuous vertical lines). The estimations in Figure 12, right, for the exponential model are not so good due to the use of the empirical estimators of F and F − in eq. (2.2) (they are especially bad in the top red curve). If we want continuous lines, we can use kernel estimators for F and F − . Remember that the vertical dashed lines correspond to the (empirical) classical univariate box plot of X and that the right limit can be replaced with the maximum value of X which in this sample is X : = .
(here it is also considered as an outlier of X).
Moreover, note that the BBP regions can be used to perform t-tests to con rm the assumed model for the copula. There, if we estimate a dependence parameter by using maximum likelihood, then we should reduce the degrees of freedom in one unit in the (approximate) chi-squared distribution of T. .

Non-parametric estimation
If we do not have a parametric model for the copula (as in the preceding subsection), then we have to estimate the copula C and its partial derivative ∂ C. C can be estimated by using the empirical copula but, this estimator cannot be used to estimate the partial derivative. To this purpose it is better to use a kernel type estimator for C. A survey on the application of this kind of estimators to copula can be seen in [21]. The main problem is that the support of the copula is included in [ , ] while the kernel estimators do not have this property.
To skip this problem, Nagler [16] proposed to transform the data from (U, V) included in [ , ] to new data with support in R . This procedure was called the transform method. To do that, we can, for example, apply the quantile function of a standard normal univariate distribution. We shall use the following notation. The pdf of this distribution will be denoted by Hence its quantile (inverse) function is denoted by Φ − . By applying this transformation to the sample (U , V ), . . . , (Un , Vn), we obtain a new sample (X , Y ), . . . , (Xn , Yn) from (X, Y) where X = Φ − (U) and Y = Φ − (V). Of course, (X, Y) has support R , copula C and standard normal marginals. Then we use the new sample to get a kernel estimator for the joint distribution F of (X, Y). To this end we propose to use a bivariate normal kernel with independent components, i.e.F (x, y) with a common bandwidth hn > . Then, as F(x, y) = C(Φ(x), Φ(y)), the estimator for C iŝ for u, v ∈ ( , ) . Note thatĈ is an absolutely continuous bivariate distribution function with support included in [ , ] but it is not a copula. Then we can usê as an estimator for du(v). Here we can use a numerical method to estimate its inverse function d − u (v) and to get the estimations for the quantile curves from eq. (2.2).
The estimations obtained of the copula C(u, v) for u = . , . , . , . , . from the data in Example 2.2 are plotted in Figure 13, left. We have taken the bandwidth hn = n − / = .
but we have seen that the estimations are very similar for other typical choices of hn. The respective estimations for du(v) obtained from eq. (3.2) are potted in Figure 13, right. The worse estimations are obtained when u = . (red line) and u = . (orange line).
Note that some of the functionsdu in Figure 13, right, are not distribution functions. This is due to the fact thatĈ is not a copula. To avoid this problem the estimator in eq. (3.2) can be replaced witĥ are the marginal distributions ofĈ. Hence, from eq. (2.2),d * u (v) are the conditional distribution functions of C. Unfortunately, the estimations obtained from eq. (3.3), plotted in Figure 14, do not improve that obtained in Figure 13, right. They are not distribution functions due to the numerical errors in the computer calculations.
Finally, we use eq. (3.2) to approximate the quantile regression curves plotted in Figure 14, right, for the model (U, V) in Example 2.2 with q = . (black), q = . , . (blue) and q = . , . (red). These curves can be used to obtain (to estimate) the BBP (option 3) de ned in Section 2.

Multivariate box plots
The regions de ned in Section 2 can be extended to a random vector X = (X , . . . , Xn) with n > to get the multivariate box plots (MBP). For example, if n = , then we can consider the regions de ned as R = [ As in the bivariate case, if we use quantile maps, we have the following property. Proof. If the region R is de ned as above, then Thus, if α and β are u , v -quantile maps of (X |X = x , X = x ), then Finally, if α and β are also u , v -quantile curves of (X |X = x ), then for p = v − u and so we get the stated result.
The expression obtained in the preceding proposition can be used to de ne regions with speci c probabilities.
For example, if we use the rst and third quartile curves to de ne the central region Rccc, then we get p = p = p = / and Pr(X ∈ Rccc) = / . The other regions can be de ned similarly. As in Section 2, other options can be considered as well. The same technique can be used for n > . Unfortunately, the preceding approach does not provide useful plots. We could just have 3D plots when n = . In this case we obtain regions (obtaining plots similar to Rubik's cube). In the other cases we just might use the regions (but we cannot plot them). So we need to consider other options.
A typical one, is to provide in a common gure all the BBP obtained from pairs of random variables from X (see, e.g., Figure 7 in [20]). Moreover, note that here the BBP for (X , X ) and (X , X ) can be di erent (i.e. they are not just a symmetric transposition). To show this approach we plot in Figure 15 the BBP obtained by using linear regression quantile curves for the R-data called stackloss, which contains operational data of a plant for the oxidation of ammonia to get nitric acid (see [4] or the help included in R about this data set). The plots in the main diagonal represent the univariate box plots. The "linear regression quantile" procedure used in this gure can be replaced with any of the other procedures proposed in Section 3.
To conclude this section we propose a di erent approach also based in quantile regression. In all the paper we are assuming that we have a response variable Y that should be approximated when we know the value of an explanatory variable X. Actually, this is what happen in this data set where a response variable Y (called "stack loss") should be estimated from the variables X , X , X plotted in Figure 15 (note that they are not "independent"). In linear quantile regression this estimation will be provided by the estimated median regression line given in this case by the formula In order to get a bivariate plot of this relationship between four variables we propose to create the arti cial variable and use it to get the BBP of (Z, Y) (plotted in Figure 16). Note that the linear regression curves included in this plot can be used to obtain con dence bands for the estimation of Y from the function m in Eq. (4.1). However, note that these con dence bands can be di erent from that obtained from a linear quantile regression with X , X , X . For example, the limits for % con dence band obtained from the rst and third quartile lines in (Z, Y) (red lines) are . + . z and − . + . z, that is, q (x , x , x ) = − . + .
x , while the corresponding ones obtained from (X , X , X , Y) arẽ Note thatq andq cannot be written in terms of Z and so they cannot be included in Figure 16. In this sense the arti cial random variable Z can be seen as a kind of " rst principal component" when we want to approximate Y from a median regression line based on X , X , X since Z has all the information to get this estimation. However, Z does not contain the information needed to compute the limitsq andq . Instead we should use q and q . The same can be applied to the bands based on other quantile lines or to other approaches based on non-linear quantile regression (see Subsection 3.2).

Conclusions and open problems
We have proposed di erent ways to obtain bivariate box plots (BBP). They are a good alternative to other similar plots. They are based on some regions obtained from univariate box plots and conditional distributions. So they are especially useful when we have a response variable Y and an explanatory variable X.
The main advantage of this approach is that the regions in the theoretical BBP have xed probabilities. In Section 2 we propose three options to x these probabilities but the users can consider other options as well. As a consequence of this property, these regions can be used to perform t tests to study if some data can come from a given bivariate model.
Moreover, the theoretical BBP can be obtained easily from eq. (2.2), the partial derivatives of the copula and the marginal distribution functions. They can also be obtained in a direct way for models with known conditional distributions (as happen for the normal model and all the models proposed in [1]). Two examples illustrate these procedures.
In practice the theoretical BBP should be estimated. Fortunately, we already have several techniques available in the literature to this purpose. Here we have showed some of them. The two rst techniques are based on linear and non-linear quantile regression. So they can be computed easily by using the available packages in R. In the third one, we propose new empirical estimators based on concomitants. In the fourth, we assume a copula parametric model for the dependence and then we estimate this parameter from the data. As mentioned above, the assumed model can be con rmed by using a t test based on the regions in the BBP. We also propose a fth option based on new non-parametric kernel estimators for the partial derivative of the copula. All these approaches are illustrated with the two proposed examples.
Finally, we also consider the multivariate case, suggesting two ways to get BBP when we have more than two variables. A real data set (from R) is used to illustrate these options.
This paper is just a rst step. So there are many tasks for future research. Let us mention just some of them. From a theoretical point-of-view, the most di cult one is to decide which option is the more reasonable to de ne the bivariate (and the univariate) box plots. In practice, we should decide which estimation procedure is the best for our data. There are a lot of results (papers) for the rst approaches. However, the last one should be studied in detail. We have proposed two kernel estimators and we do not know which one is the best one. Also, we should study how to determine the optimal bandwidth. Moreover, if we assume a bivariate copula regression model, one would like to study some of the properties of the residuals resulting from tting such model. These and other tasks are left for future research projects.