Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 29, 2014

Testing Competing Models for Non-negative Data with Many Zeros

  • João M. C. Santos Silva EMAIL logo , Silvana Tenreyro and Frank Windmeijer


In economic applications it is often the case that the variate of interest is non-negative and its distribution has a mass-point at zero. Many regression strategies have been proposed to deal with data of this type but, although there has been a long debate in the literature on the appropriateness of different models, formal statistical tests to choose between the competing specifications are not often used in practice. We use the non-nested hypothesis testing framework of Davidson and MacKinnon (Davidson and MacKinnon 1981. “Several Tests for Model Specification in the Presence of Alternative Hypotheses.” Econometrica 49: 781–793.) to develop a novel and simple regression-based specification test that can be used to discriminate between these models.

JEL Codes: C12; C52

Corresponding author: João M. C. Santos Silva, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK and CEMAPRE, Rua do Quelhas 6, 1200-781 Lisboa, Portugal, E-mail:


We are grateful to the editor Jason Abrevaya and to an anonymous referee for many helpful and constructive suggestions. We also thank Holger Breinlich, Francesco Caselli, Daniel Dias, Esmeralda Ramalho, Joaquim Ramalho, and Rainer Winkelmann for helpful comments, and John Mullahy for providing one of the datasets used in Section 4. Santos Silva acknowledges partial financial support from Fundação para a Ciência e Tecnologia (Programme PEst-OE/EGE/UI0491/2013). Tenreyro acknowledges financial support from the European Research Council under the European Community’s ERC starting grant agreement 240852, “Research on Economic Fluctuations and Globalization.” Windmeijer acknowledges financial support from ERC grant 269874 – DEVHEALTH.


A1. Asymptotic Distribution and Adjusted Covariance Matrix

The proposed test is based on the OLS estimation of an artificial model of the form


The easiest way of obtaining the asymptotic distribution of the OLS estimates of θ=(δ, α), say θ^=(δ^,α^), is to consider the joint estimation of θ and ϕA=(βA, γA) by system GMM as in Newey (1984). The results in this Appendix are presented for the case in which βA and γA are jointly estimated by maximum likelihood, as in Heckman’s (1979) selection model. For cases such as the two-part model in which γA can be estimated independently of βA, the same results are valid if one considers only the moment conditions for the joint estimation of γA and θ.

Let S1 and S2 denote the vector of moment conditions for the model under the null and for the test equation, respectively, and define S(λ)=(S1,S2), with λ=(βA,γA,δ,α). The just-identified system-GMM estimator of λ is defined as the solution of


where λ^=(β^A,γ^A,δ^,α^), and we assume the following standard regularity conditions (see, e.g., theorems 2.6 and 3.4 in Newey and McFadden 1994).

A1 E(S(λ))=0 only if λ=λ0, where λ0 denotes the true value of λ.

A2λ0∈ interior of Λ, which is compact.

A3S(λ) is continuous at each λ∈Λ with probability one.

A4 With probability approaching one S(λ) is continuously differentiable in a neighborhood ς of λ0.

A5 E(supλΛS(λ)‖)<∞, E(‖S(λ0)‖2)<∞, and E(supλςλS(λ))<.

A6 The matrix M is non-singular, where M=E(λS(λ0)).

Then, the results in Newey and McFadden (1994) imply that




Noting that


where H denotes the expectation of the matrix of derivatives of S1 with respect to ϕA and H1 and H2 denote the expectation of the derivatives of S2 with respect to ϕA and θ, respectively, the variance of θ^ can then be written as




where V(ϕ^A) is the estimated variance of ϕ^A=(β^A,γ^A) and Vθ^ is the uncorrected estimated variance of θ^.

Whether V(θ^) is smaller, larger, or equal to Vθ^, in the positive semidefinite sense, depends on the particular case being considered. For example, if H1=0, the two matrices are equal and when E(S2S1)=0,V(θ^) is larger than Vθ^ in the positive semidefinite sense.

In the context of the HPC test, it is of special interest to consider the case where γA is estimated by maximum likelihood. In this case, V(ϕ^A)=H1 and E(S2S1)=H1, and therefore


implying that V(θ^) is smaller than Vθ^ (see Pierce 1982; Lee 2010, pp. 104–105). Therefore, when γA is estimated by maximum-likelihood, the test-statistic constructed using the uncorrected covariance will have variance smaller than 1 and, therefore, the test will be asymptotically undersized.

Finally, we reiterate that the correction of the covariance matrix is needed only when MA is a double-index model.

A2. Correlation in the Two-Part Model

In Duan et al. (1984) an example is given that argues that there can be correlation between the two error terms in the two-part model and that therefore this model is not nested by the sample selection model, in the sense that the two-part model cannot be obtained by imposing a restriction on the selection model. Since then, numerous papers have quoted this result, for example, Leung and Yu (1996) and Norton et al. (2008). Here we argue that the example is misleading.

In the notation of Duan et al. (1984), the two-part model is given by


where f is a continuous distribution with mean zero and variance σ2. Hence, (ln(yi)|Ii>0,xi)~f(xiδ2,σ2).

To show that correlation between η1 and η2 is possible Duan et al. (1984) constructed the following example (pp. 285–286): Let Z1i and Z2i follow a standard bivariate normal distribution with correlation coefficient ρ. Let Gi be the left- and Hi be the right-truncated standard normal cdf, with xiδ1 as truncation point:


where ϕ denotes the standard normal pdf.

Construct (η1i, η2i) as follows: with probability Φ(xiδ1), define


With probability (1Φ(xiδ1)) define


Then the two-part model assumptions are satisfied and there is correlation between η1i and η2i. Duan et al. (1984) show that when f is assumed to be normal then the conditional expectation is given by


and consequently η1i and η2i are stochastically dependent and positively associated.

The problem with this argument lies in the fact that with probability Φ(xiδ1) we draw an η1i such that η1i is larger than xiδ1. This essentially introduces a new uniformly distributed random variable, say ζi, and changes the model to


so ζi determines the outcome Ii>0 and is independent of η2i. Therefore, there is no selection problem, as


Clearly, the model of the example can be specified as

ζi|xi~U(0,1)Ii=1(ζi<Φ(xiδ1))ln(yi)=xiδ2+η2i(η2i|Ii=1,xi)~f(0,σ2)(η2i|Ii=0,xi)  (yi0),

with ζi independently distributed of η2i and the value of η1i is immaterial. Therefore, this example does not show that the errors η1i and η2i in the original model can be correlated.

In summary, under the maintained assumptions, there is no evidence to support the view that the two-part model cannot be obtained by imposing a restriction on the sample selection model. However, the assumptions of the sample selection model are unlikely to hold when it is used to describe corner solutions data, and in that case there is no guaranty that the conditional expectation implied by the sample selection model will fit the data better than the conditional expectation implied by the two-part model. For example, if η2i is homoskedastic but non-normal, the two-part model can be used to consistently estimate the conditional expectation of yi, while that is not possible with the sample selection model. In that sense, the two models are indeed not nested.


Anderson, J., and Y. Yotov. 2010. “The Changing Incidence of Geography.” American Economic Review 100: 2157–2186.10.1257/aer.100.5.2157Search in Google Scholar

Anderson, J., and E. van Wincoop. 2003. “Gravity with Gravitas: a Solution to the Border Puzzle.” American Economic Review 93: 170–192.10.1257/000282803321455214Search in Google Scholar

Arkolakis, C. 2008. “Market Penetration Costs and the New Consumers Margin in International Trade.” NBER Working Paper No. 14214.10.3386/w14214Search in Google Scholar

Arkolakis, C., A. Costinot, and A. Rodríguez-Clare. 2009. “New Trade Models, Same Old Gains?” NBER Working Paper No. 15628.Search in Google Scholar

Atkinson, A. C. 1970. “A Method for Discriminating Between Models.” Journal of the Royal Statistical Society, Series B 32: 323–353.10.1111/j.2517-6161.1970.tb00845.xSearch in Google Scholar

Bierens, H. J. 1982. “Consistent Model Specification Tests.” Journal of Econometrics 20: 105–134.10.1016/0304-4076(82)90105-1Search in Google Scholar

Bierens, H. J. 1990. “A Consistent Conditional Moment Test of Functional Form.” Econometrica 58: 1443–1458.10.2307/2938323Search in Google Scholar

Chaney, T. 2008. “Distorted Gravity: The Intensive and Extensive Margins of International Trade.” American Economic Review 98: 1707–1721.10.1257/aer.98.4.1707Search in Google Scholar

Cox, D. R. 1961. “Tests of Separate Families of Hypotheses.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I, 105–123. Berkeley: University of California Press.Search in Google Scholar

Davidson, R., and J. G. MacKinnon. 1981. “Several Tests for Model Specification in the Presence of Alternative Hypotheses.” Econometrica 49: 781–793.10.2307/1911522Search in Google Scholar

Davidson, R., and J. G. MacKinnon. 2006. “Bootstrap Methods in Econometrics.” In Palgrave Handbook of Econometrics, edited by T. C. Mills and K. Patterson, Vol. 1, Ch. 23, 821–838. London: Palgrave Macmillan.Search in Google Scholar

Deaton, A., and M. Irish. 1984. “Statistical Models for Zero Expenditures in Household Budgets.” Journal of Public Economics 23: 59–80.10.1016/0047-2727(84)90067-7Search in Google Scholar

Dow, W. H., and E. C. Norton. 2003. “Choosing Between and Interpreting the Heckit and Two-Part Models for Corner Solutions.” Health Services & Outcomes Research Methodology 4: 5–18.10.1023/A:1025827426320Search in Google Scholar

Duan, N. 1983. “Smearing Estimate: a Nonparametric Retransformation Method.” Journal of the American Statistical Association 78: 605–610.10.1080/01621459.1983.10478017Search in Google Scholar

Duan, N., W. G. Manning, C. N. Morris, and J. P. Newhouse. 1983. “A Comparison of Alternative Models for the Demand for Medical Care.” Journal of Business and Economic Statistics 1: 115–126.Search in Google Scholar

Duan, N., W. G. Manning, C. N. Morris, and J. P. Newhouse. 1984. “Choosing Between the Sample-Selection Model and the Multi-Part Model.” Journal of Business and Economic Statistics 2: 283–289.Search in Google Scholar

Eaton, J., and A. Tamura. 1994. “Bilateralism and Regionalism in Japanese and US trade and Direct Foreign Investment Patterns.” Journal of the Japanese and International Economics 8: 478–510.Search in Google Scholar

Feinstein, J. S. 1989. “The Safety Regulation of U.S. Nuclear Power Plants: Violations Inspections, and Abnormal Occurrences.” Journal of Political Economy 97: 115–154.10.1086/261595Search in Google Scholar

Fisher, G., and M. McAleer. 1979. “On the Interpretation of the Cox Test in Econometrics.” Economics Letters 4: 145–150.10.1016/0165-1765(79)90225-8Search in Google Scholar

Freedman, D. A. 1981. “Bootstrapping Regression Models.” Annals of Statistics 9: 1218–1228.10.1214/aos/1176345638Search in Google Scholar

Gaundry, M. J. I., and M. G. Dagenais. 1979. “The Dogit Model.” Transportation Research Part B: Methodological 13: 105–111.10.1016/0191-2615(79)90028-6Search in Google Scholar

Gourieroux, C., A. Monfort, and A. Trognon. 1984. “Pseudo Maximum Likelihood Methods: Applications to Poisson Models.” Econometrica 52: 701–720.10.2307/1913472Search in Google Scholar

Gourieroux, C., and A. Monfort. 1994. “Testing Non-Nested Hypotheses.” In Handbook of Econometrics, edited by R. F. Engle and D. McFadden, Vol. IV, Ch. 44, 2583–2637. Amsterdam: Elsevier.10.1016/S1573-4412(05)80013-3Search in Google Scholar

Hallak, J. C. 2006. “Product Quality and the Direction of Trade.” Journal of International Economics 68: 238–265.10.1016/j.jinteco.2005.04.001Search in Google Scholar

Heckman, J. J. 1979. “Sample Selection Bias as a Specification Error.” Econometrica 47: 153–161.10.2307/1912352Search in Google Scholar

Helpman, E., M. J. Melitz, and Y. Rubinstein. 2008. “Estimating Trade Flows: Trading Partners and Trading Volumes.” Quarterly Journal of Economics 123: 441–487.10.1162/qjec.2008.123.2.441Search in Google Scholar

Helpman, E., M. J. Melitz, and S. R. Yeaple. 2004. “Export Versus FDI with Heterogeneous Firms.” American Economic Review 94: 300–316.10.1257/000282804322970814Search in Google Scholar

Jones, A. M. 2000. “Health Econometrics.” In Handbook of Health Economics, edited by J. P. Newhouse and A. J. Culyer, Vol. 1A, Ch. 6, 265–344. Amsterdam: Elsevier.10.1016/S1574-0064(00)80165-1Search in Google Scholar

La Porta, R., F. López-de-Silanes, and G. Zamarripa. 2003. “Related Lending.” The Quarterly Journal of Economics 118: 231–268.10.1162/00335530360535199Search in Google Scholar

Lee, M.-J. 2010. Micro-Econometrics: Methods of Moments and Limited Dependent Variables. 2nd ed. New York, NY: Springer.Search in Google Scholar

Leung, S. F., and S. Yu. 1996. “On the Choice Between Sample-Selection and Two-Part Models.” Journal of Econometrics 72: 197–229.10.1016/0304-4076(94)01720-4Search in Google Scholar

Mukhopadhyay, K., and P. K. Trivedi. 1995. “Regression Models for Under-Recorded Count Data.” Paper presented at the Econometric Society 7th World Congress, Tokyo.Search in Google Scholar

Melitz, M. J., 2003. “The Impact of Trade on Intra-Industry Reallocations and Aggregate Industry Productivity.” Econometrica 71: 1695–1725.10.1111/1468-0262.00467Search in Google Scholar

Mullahy, J. 1986. “Specification and Testing in Some Modified Count Data Models.” Journal of Econometrics 33: 341–365.10.1016/0304-4076(86)90002-3Search in Google Scholar

Mullahy, J. 1998. “Much ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Econometrics.” Journal of Health Economics 17: 247–282.10.1016/S0167-6296(98)00030-7Search in Google Scholar

Newey, W. K. 1984. “A Method of Moments Interpretation of Sequential Estimators.” Economics Letters 14: 201–206.10.1016/0165-1765(84)90083-1Search in Google Scholar

Newey, W. K. 1985. “Maximum Likelihood Specification Testing and Conditional Moment Tests.” Econometrica 53: 1047–1070.10.2307/1911011Search in Google Scholar

Newey, W. K., and D. McFadden. 1994. “Large Sample Estimation and Hypothesis Testing.” In Handbook of Econometrics, edited by R. F. Engle and D. McFadden, Vol. 4, Ch. 36, 2111–2245. Amsterdam: Elsevier.10.1016/S1573-4412(05)80005-4Search in Google Scholar

Norton, E. C., W. H. Dow, and Y. K. Do. 2008. “Specification Tests for the Sample Selection and Two-part Models.” Health Services and Outcomes Research Methodology 8: 201–208.10.1007/s10742-008-0037-8Search in Google Scholar

Pierce, D. A. 1982. “The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics.” Annals of Statististics 10: 475–478.10.1214/aos/1176345788Search in Google Scholar

Pesaran, M. H., and A. S. Deaton. 1978. “Testing Non-Nested Nonlinear Regression Models.” Econometrica 46: 677–694.10.2307/1914240Search in Google Scholar

Quandt, R. E. 1974. “A Comparison of Methods for Testing Non-Nested Hypotheses.” Review of Economics and Statistics 56: 251–255.10.2307/1927531Search in Google Scholar

Ramalho, E. A., J. J. S. Ramalho, and J. M. R. Murteira. 2011. “Alternative Estimating and Testing Empirical Strategies for Fractional Regression Models.” Journal of Economic Surveys 25: 19–68.10.1111/j.1467-6419.2009.00602.xSearch in Google Scholar

Santos Silva, J. M. C. 2001. “A Score Test for Non-Nested Hypotheses with Applications to Discrete Data Models.” Journal of Applied Econometrics 16: 577–597.10.1002/jae.601Search in Google Scholar

Santos Silva, J. M. C., and S. Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88: 641–658.10.1162/rest.88.4.641Search in Google Scholar

Santos Silva, J. M. C., and S. Tenreyro. 2011. “Further Simulation Evidence on the Performance of the Poisson Pseudo-Maximum Likelihood Estimator.” Economics Letters 112: 220–222.10.1016/j.econlet.2011.05.008Search in Google Scholar

StataCorp. 2013. Stata Release 13. Statistical Software. College Station (TX): StataCorp LP.Search in Google Scholar

Tauchen, G. 1985. “Diagnostic Testing and Evaluation of Maximum Likelihood Models.” Journal of Econometrics 30: 415–443.10.1016/0304-4076(85)90149-6Search in Google Scholar

Tobin, J. 1958. “Estimation of Relationships for Limited Dependent Variables.” Econometrica 26: 24–36.10.2307/1907382Search in Google Scholar

van de Ven, W. P., and B. M. van Praag. 1981. “Risk Aversion of Deductibles in Private Health Insurance: Application of an Adjusted Tobit Model to Family Health Care Expenditures.” In Health, Economics and Health Economics, edited by J. van der Gaag and M. Perlman, 125–148. Amsterdam: North Holland.Search in Google Scholar

Vuong, Q. H. 1989. “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.” Econometrica 57: 307–333.10.2307/1912557Search in Google Scholar

Winkelmann, R., and K. F. Zimmermann. 1993. “Poisson Logistic Regression.” Department of Economics, University of Munich, Working Paper No 93-18.Search in Google Scholar

Wooldridge, J. M. 1992. “A Test for Functional Form Against Nonparametric Alternatives.” Econometric Theory 8: 452–475.10.1017/S0266466600013165Search in Google Scholar

Wooldridge, J. M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.Search in Google Scholar

Supplemental Material

The online version of this article (DOI: 10.1515/jem-2013-0005) offers supplementary material, available to authorized users.

Published Online: 2014-3-29
Published in Print: 2015-1-1

©2015 by De Gruyter

Downloaded on 26.2.2024 from
Scroll to top button