# Asymmetric Laplace Regression: Maximum Likelihood, Maximum Entropy and Quantile Regression

Anil K. Bera, Antonio F. Galvao, Gabriel V. Montes-Rojas and Sung Y. Park

# Abstract

This paper studies the connections among the asymmetric Laplace probability density (ALPD), maximum likelihood, maximum entropy and quantile regression. We show that the maximum likelihood problem is equivalent to the solution of a maximum entropy problem where we impose moment constraints given by the joint consideration of the mean and median. The ALPD score functions lead to joint estimating equations that delivers estimates for the slope parameters together with a representative quantile. Asymptotic properties of the estimator are derived under the framework of the quasi maximum likelihood estimation. With a limited simulation experiment we evaluate the finite sample properties of our estimator. Finally, we illustrate the use of the estimator with an application to the US wage data to evaluate the effect of training on wages.

JEL Classification:: C14; C31

Corresponding author: Gabriel V. Montes-Rojas, Department of Economics, City University London, 10 Northampton Square, London EC1V 0HB, UK, E-mail:

# Acknowledgments

We are very grateful to the Editor, two anonymous referees, Arnold Zellner, Jushan Bai, Rong Chen, Daniel Gervini, Yongmiao Hong, Carlos Lamarche, Ehsan Soofi, Liang Wang, Zhijie Xiao, and the participants in seminars at University of Wisconsin-Milwaukee, City University London, Info-Metrics Institute Conference, September 2010, World Congress of the Econometric Society, Shanghai, August 2010, Latin American Meeting of the Econometric Society, Argentina, October 2009, Summer Workshop in Econometrics, Tsinghua University, Beijing, China, May 2009, for helpful comments and discussions. However, we retain the responsibility for any remaining errors.

## Appendix

### A. Interpretation of the Z-estimator

In order to interpret θ0, we take the expectation of the estimating equations with respect to the unknown true density. To simplify the exposition we consider a simple model without covariates: yi=α+ui. Our estimating equation vector is defined as:

E(Ψθ(y))=E(1σ(τ1(y<α))(12ττ(1τ)(yα)σ)1σ+1σ2ρτ(yα))=0,

and the estimator is such that

1ni=1nψθ(yi)=0

Let F(y) be the cdf of the random variable y. Now we need to find Eθ(y)].

For the first component we have

1σE[τI(y<α)]=1σ((τ1(y<α))dF(y))=1σ(ταdF(y))=1σ(τF(α)).

Thus if we set this equal to zero, we have

α=F1(τ),

which is the usual quantile. Thus, the interpretation of the parameter α is analogous to QR if covariates are included.

For the third term in the vector, 1σ+1σ2ρτ(yα), we have

E[1σ+1σ2ρτ(yα)]=0,

that is,

σ=E[ρτ(yα)].

Thus, as in the least squares case, the scale parameter σ can be interpreted as the expected value of the loss function.

Finally, we can interpret τ using the second equation,

E[12ττ(1τ)(yα)σ]=0,

which implies that

12ττ(1τ)=E[y]F1(τ)σ.

Note that s(τ)12ττ(1τ) is a measure of the skewness of the distribution. Thus, τ should be chosen to set s(τ) equal to a measure of asymmetry of the underline distribution F(·) given by the difference of τ-quantile with the mean (and standardized by σ). In the special case of a symmetric distribution, the mean coincides with the median and mode, such that E[y]=F–1(1/2) and τ=1/2, which is the most probable quantile and a solution to our Z-estimator.

### B. Lemma A1

In this appendix we state an auxiliary result that states Donskerness and stochastic equicontinuity. Let {ψθ(y,x),θΘ}, and define the following empirical process notation for w=(y, x):

fEn[f(w)]=1ni=1nf(wi)  fGn[f(w)]=1ni=1n(f(wi)Ef(wi)).

We follow the literature using empirical process exploiting the monotonicity and boundedness of the indicator function, the boundedness of the moments of x and y, and that the problem is a parametric one.

Lemma A1 Under Assumptions A1–A4 is Donsker. Furthermore,

θGnψθ(y,x)

is stochastically equicontinuous, that is

sup||θθ0||δn||Gnψθ(y,x)Gnψθ0(y,x)||=op(1),

for any δn↓0.

Proof: The proof of this result follows similar steps to those in Chernozhukov and Hansen (2006). To prove the lemma we check the conditions for independent but not identically distributed process stated in Theorem 2.11.1 of van der Vaart and Wellner (1996). It is important to note that a class of a vector-valued functions f:xk is Donsker if each of the classes of coordinates fj:xk with f=(f1, …, fk) ranging over (j=1,2,...,k) is Donsker (van der Vaart 1998, 270).

First, one can check the random-entropy condition by checking that satisfies a uniform entropy condition and the envelope is square integrable. The first element of the vector is ψ1θ(y,x)=(τ1(y<xβ))xσ. Note that the functional class 𝔄={τ1{y<xβ},τT,β} is a VC subgraph class, with envelope 2. Its product with x also forms a class with a square integrable envelope 2 maxj|xj|. Finally, the class 1 is defined as the product of the latter with 1/σ, which is bounded by assumption A3. Thus, by assumption A4 1 is satisfies a uniform entropy condition and the envelope function is square integrable. Therefore, the random entropy condition (2.11.2) in van der Vaart and Wellner (1996) is satisfied.

The second element of the vector is ψ2θ(y,x)=(12ττ(1τ)(yxβ)σ). Define 𝕳={(yxβ),β}. Note that

|(yxβ1)(yxβ2)|=|x(β2β1)|||x||||β2β1||,

where the inequality follows from Cauchy-Schwartz inequality. Thus by Assumptions A3–A4 the class 𝕳 has envelope function square integrable. In addition, note that, 𝕳 belongs to a VC class satisfying a uniform entropy condition, since this class is a subset of the vector space of functions spanned by (y, x1, …, xp), where p is the fixed dimension of x (see e.g., example 19.7 in van der Vaart (1998)). Thus, the class defined by 1/σ𝕳 has envelope of 𝕳 (|y|+const*|x|) which is square integrable by assumptions A3–A4. Therefore, 2 satisfies the random entropy condition.

The third element of the vector is ψ3θ(y,x)=(1σ+1σ2ρτ(yxβ)). Consider the following random process defined by 𝕵={ρτ(yxβ),τT,β} which satisfies a uniform entropy condition and square integrable envelope function. The latter is given by Assumptions A3–A4 and the quantile regression check function properties as ρτ(x+y)–ρτ(y)≤2|x| and ρτ1(yxt)ρτ2(yxt)=(τ2τ1)(yxt). The former follows from the fact that this is a parametric class collection of measurable functions indexed in a bounded subset. Hence, 3 the random entropy condition and the envelope function is square integrable.

Now we turn our attention to the second condition of Theorem 2.11.1 in van der Vaart and Wellner (1996). The process θGnψθ(y,x) is stochastically equicontinuous over Θ with respect to a L2(P) pseudo-metric. First, as in Angrist, Chernozhukov, and Fernández-Val (2006) and Chernozhukov and Hansen (2006), we define the distance d as the following L2(P) pseudo-metric

d(θ,θ)=E([ψθψθ]2).

Thus, as ||θθ0||→0 we need to show that

(28)d(θ,θ0)0, (28)

and the final follows from Theorem 2.11.1 of van der Vaart and Wellner (1996).

To show (28), first note that for each i=1, …, n,

d1i(θ,θ)=E([ψ1θψ1θ]2)=E([(τ1(yixiβ))xiσ(τ1(yixiβ))xiσ]2)[(E|1σ(τ1(yixiβ))1σ1(τ(yixiβ))|2(2+ϵ)ϵ)ϵ(2+ϵ)(E(|xi|2)2+ϵ2)2(2+ϵ)]12=(E|(τστσ)+(1σ1(yixiβ)1σ1(yixiβ))|2(2+ϵ)ϵ)ε2(2+ϵ)(E(|xi|2)2+ϵ2)1(2+ϵ)[((|τστσ|)2(2+ϵ)ϵ)ϵ2(2+ϵ)+(E(|1σ1(yixiβ)1σ1(yixiβ)|)2(2+ϵ)ϵ)ϵ2(2+ϵ)](E(|xi|2)2+ϵ2)1(2+ϵ)[|τστσ|+(E|g¯ixi(βσβσ)|)ϵ2(2+ϵ)](E||xi||2+ϵ)1(2+ϵ)[|τστσ|+(Eg¯i||xi||βσβσ||)ϵ2(2+ϵ)](E||xi||2+ϵ)1(2+ϵ),

where the first inequality is Holder’s inequality, the second is Minkowski’s inequality, the third is a Taylor expansion as in Angrist, Chernozhukov and Fernández-Val (2006), p.560) where g̅i is the upper bound of gi(yi|x) (using A2), and the last is Cauchy-Schwarz inequality. Therefore, by assumption A2–A4 sup||θθ||<δni=1nd1i0 when δn→0.

Now rewrite ψ2θ(y,x)=(σ12ττ(1τ)(yxβ)) and note that

d2i(θ,θ)=E([ψ2θψ2θ]2)=E([σ12ττ(1τ)(yixiβ)σ12ττ(1τ)+(yixiβ)]2)=E(|σ12ττ(1τ)σ12ττ(1τ)+(xi(ββ))|2)(|σ12ττ(1τ)σ12ττ(1τ)|2)1/2+(E|xi(ββ)|2)1/2(|σ12ττ(1τ)σ12ττ(1τ)|2)1/2+||ββ||(E||xi||2)1/2,

where the first inequality is given by Minkowski’s inequality (E|X+Y|p)1/p≤(E|X|p)1/p+(E|Y|p)1/p for p≥1, and the second inequality is Cauchy-Schwarz inequality. Hence, assumptions A3–A4 ensure that sup||θθ||<δni=1nd2i0 when δn→0.

Finally, rewrite ψ3θ(y, x)=(–σ+ρτ(γxβ)), and thus

d3i(θ,θ)=E([ψ3θψ3θ]2)=E([σ+ρτ(yixiβ)+σρτ(yixiβ)]2)=E([σ+σ+ρτ(yixiβ)1σ2ρτ(yixiβ)]2)(σ+σ)2+E([ρτ(yixiβ)ρτ(yixiβ)]2)=(σ+σ)2+E([ρτ(yixiβ)ρτ(yixiβ)+ρτ(yixiβ)ρτ(yixiβ)]2)|σσ|+E([||xi(ββ)||+|ττ|(yixiβ)]2)|σσ|+E([||xi||||ββ||+|ττ|(yixiβ)]2)|σσ|+(E[||xi||||ββ||]2)1/2+(E[|ττ|(yixiβ)]2)1/2=|σσ|+||ββ||(E||xi||2)1/2+|ττ|(E[(yixiβ)]2)1/2const(|σσ|+||ββ||+|ττ|),

where the first inequality is given by Minkowski’s inequality, the second inequality is given again by QR check function properties as ρτ(x+y)–ρτ(y)≤2|x| and ρτ1(yxt)ρτ2(yxt)=(τ2τ1)(yxt). Third inequality is Cauchy-Schwarz inequality. Fourth is Minkowski’s inequality. Last inequality uses assumption A4, and finally we have that sup||θθ||<δni=1nd3i0 when δn→0.

Thus, ||θ′–θ||→0 implies that d(θ′, θ)→0 in every case, and therefore, sup||θθ||<δni=1ndi0 when δn→0. The final condition in Theorem 2.11.1 in van der Vaart and Wellner (1996) is a Lindeberg condition, which is guaranteed by assumptions A1–A4. Therefore, we conclude that is Donsker and

sup||θθ0||δn||Gnψθ(y,x)Gnψθ0(y,x)||=op(1).

### References

Abadie, A., J. Angrist, and G. Imbens. 2002. “Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings.” Econometrica 70: 91–117.10.1111/1468-0262.00270Search in Google Scholar

Angrist, J., V. Chernozhukov, and I. Fernández-Val. 2006. “Quantile Regression Under Misspecification, with an Application to the U.S. Wage Structure.” Econometrica 74: 539–563.10.1111/j.1468-0262.2006.00671.xSearch in Google Scholar

Bloom, H. S. B., L. L. Orr, S. H. Bell, G. Cave, F. Doolittle, W. Lin, and J. M. Bos. 1997. “The Benefits and Costs of JTPA Title II-a Programs. Key Findings from the National Job Training Partnership Act Study.” Journal of Human Resources 32: 549–576.10.2307/146183Search in Google Scholar

Chernozhukov, V., and C. Hansen. 2006. “Instrumental Quantile Regression Inference for Structural and Treatment Effects Models.” Journal of Econometrics 132: 491–525.10.1016/j.jeconom.2005.02.009Search in Google Scholar

Chernozhukov, V., and C. Hansen. 2008. “Instrumental Variable Quantile Regression: A Robust Inference Approach.” Journal of Econometrics 142: 379–398.10.1016/j.jeconom.2007.06.005Search in Google Scholar

Chernozhukov, V., I. Fernández-Val, and B. Melly. 2009. “Inference on Counterfactual Distributions.” CEMMAP Working Paper CWP09/09.10.2139/ssrn.1235529Search in Google Scholar

Ebrahimi, N., E. S. Soofi, and R. Soyer. 2008. “Multivariate Maximum Entropy Identification, Transformation, and Dependence.” Journal of Multivariate Analysis 99: 1217–1231.10.1016/j.jmva.2007.08.004Search in Google Scholar

Firpo, S. 2007. “Efficient Semiparametric Estimation of Quantile Treatment Effects.” Econometrica 75: 259–276.10.1111/j.1468-0262.2007.00738.xSearch in Google Scholar

Geraci, M., and M. Botai. 2007. “Quantile Regression for Longitudinal Data Using the Asymmetric Laplace Distribution.” Biostatistics 8: 140–154.10.1093/biostatistics/kxj039Search in Google Scholar PubMed

He, X., and Q.-M. Shao. 1996. “A General Bahadur Representation of M-Estimators and its Applications to Linear Regressions with Nonstochastic Designs.” Annals of Statistics 24: 2608–2630.10.1214/aos/1032181172Search in Google Scholar

He, X., and Q.-M. Shao. 2000. “Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models.” Statistica Sinica 10: 129–140.Search in Google Scholar

Hinkley, D. V., and N. S. Revankar. 1997. “Estimation of the Pareto Law from Underreported Data: A Further Analysis.” Journal of Econometrics 5: 1–11.10.1016/0304-4076(77)90031-8Search in Google Scholar

Huber, P. J. 1967. “The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions.” In Fifth Symposium on Mathematical Statistics and Probability, 179–195. California: Unibersity of California, Berkeley.Search in Google Scholar

Kitamura, Y., and M. Stutzer. 1997. “An Information-Theoretic Alternative to Generalized Method of Moments Estimation.” Econometrica 65: 861–874.10.2307/2171942Search in Google Scholar

Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press.10.1017/CBO9780511754098Search in Google Scholar

Koenker, R., and G. W. Bassett. 1978. “Regression Quantiles.” Econometrica 46: 33–49.10.2307/1913643Search in Google Scholar

Koenker, R., and J. A. F. Machado. 1999. “Godness of Fit and Related Inference Processes for Quantile Regression.” Journal of the American Statistical Association 94: 1296–1310.10.1080/01621459.1999.10473882Search in Google Scholar

Koenker, R., and Z. Xiao. 2002. “Inference on the Quantile Regression Process.” Econometrica 70: 1583–1612.10.1111/1468-0262.00342Search in Google Scholar

Komunjer, I. 2005. “Quasi-Maximum Likelihood Estimation for Conditional Quantiles.” Journal of Econometrics 128: 137–164.10.1016/j.jeconom.2004.08.010Search in Google Scholar

Komunjer, I. 2007. “Asymmetric Power Distribution: Theory and Applications to Risk Measurement.” Journal of Applied Econometrics 22: 891–921.10.1002/jae.961Search in Google Scholar

Kosorok, M. R. 2008. Introduction to Empirical Processes and Semiparametric Inference. New York, New York: Springer-Verlag Press.10.1007/978-0-387-74978-5Search in Google Scholar

Kotz, S., T. J. Kozubowski, and K. Podgórsk. 2002a. “Maximum Entropy Characterization of Asymmetric Laplace Distribution.” International Mathematical Journal 1: 31–35.Search in Google Scholar

Kotz, S., T. J. Kozubowski, and K. Podgórsk. 2002b. “Maximum Likelihood Estimation of Asymmetric Laplace Distributions.” Annals of the Institute Statistical Mathematics 54: 816–826.10.1023/A:1022467519537Search in Google Scholar

LaLonde, R. J. 1995. “The Promise of Public-Sponsored Training Programs.” Journal of Economic Perspectives 9: 149–168.10.1257/jep.9.2.149Search in Google Scholar

Machado, J. A. F. 1993. “Robust Model Selection and M-Estimation.” Econometric Theory 9: 478–493.10.1017/S0266466600007775Search in Google Scholar

Manski, C. F. 1991. “Regression.” Journal of Economic Literature 29: 34–50.Search in Google Scholar

Park, S. Y., and A. K. Bera. 2009. “Maximum Entropy Autoregressive Conditional Heteroskedasticity Model.” Journal of Econometrics 150: 219–230.10.1016/j.jeconom.2008.12.014Search in Google Scholar

Schennach, S. M. 2008. “Quantile Regression with Mismeasured Covariates.” Econometric Theory 24: 1010–1043.10.1017/S0266466608080390Search in Google Scholar

Soofi, E. S., and J. J. Retzer. 2002. “Information Indices: Unification and Applications.” Journal of Econometrics 107: 17–40.10.1016/S0304-4076(01)00111-7Search in Google Scholar

van der Vaart, A. 1998. Asymptotic Statistics. Cambridge: Cambridge University Press.10.1017/CBO9780511802256Search in Google Scholar

van der Vaart, A., and J. A. Wellner. 1996. Weak Convergence and Empirical Processes. New York, New York: Springer-Verlag.10.1007/978-1-4757-2545-2Search in Google Scholar

Wei, Y., and R. J. Carroll. 2009. “Quantile Regression with Measurement Error.” Journal of the American Statistical Association 104: 1129–1143.10.1198/jasa.2009.tm08420Search in Google Scholar

Yu, K., and R. A. Moyeed. 2001. “Bayesian Quantile Regression.” Statistics & Probability Letters 54: 437–447.10.1016/S0167-7152(01)00124-9Search in Google Scholar

Yu, K., and J. Zhang. 2005. “A Three-Parameter Asymmetric Laplace Distribution and Its Extension.” Communications in StatisticsTheory and Methods 34: 1867–1879.10.1080/03610920500199018Search in Google Scholar

Zhao, Z., and Z. Xiao. 2011. “Efficient Regressions Via Optimally Combining Quantile Information.” Manuscript, University of Illinois at Urbana-Champaign.Search in Google Scholar

Zou, H., and M. Yuan. 2008. “Composite Quantile Regression and the Oracle Model Selection Theory.” Annals of Statistics 36: 1108–1126.10.1214/07-AOS507Search in Google Scholar

Published Online: 2015-3-3
Published in Print: 2016-1-1