Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter May 28, 2013

A Simple GMM Estimator for the Semiparametric Mixed Proportional Hazard Model

Govert E. Bijwaard, Geert Ridder and Tiemen Woutersen

Abstract

Ridder and Woutersen (Ridder, G., and T. Woutersen. 2003. “The Singularity of the Efficiency Bound of the Mixed Proportional Hazard Model.” Econometrica 71: 1579–1589) have shown that under a weak condition on the baseline hazard, there exist root-N consistent estimators of the parameters in a semiparametric Mixed Proportional Hazard model with a parametric baseline hazard and unspecified distribution of the unobserved heterogeneity. We extend the linear rank estimator (LRE) of Tsiatis (Tsiatis, A. A. 1990. “Estimating Regression Parameters using Linear Rank Tests for Censored Data.” Annals of Statistics 18: 354–372) and Robins and Tsiatis (Robins, J. M., and A. A. Tsiatis. 1992. “Semiparametric Estimation of an Accelerated Failure Time Model with Time-Dependent Covariates.” Biometrika 79: 311–319) to this class of models. The optimal LRE is a two-step estimator. We propose a simple one-step estimator that is close to optimal if there is no unobserved heterogeneity. The efficiency gain associated with the optimal LRE increases with the degree of unobserved heterogeneity.


Corresponding author: Govert E. Bijwaard, Netherlands Interdisciplinary Demographic Institute (NIDI), PO Box 11650, NL-2502, AR, The Hague, The Netherlands

We thank Kei Hirano and Nicole Lott for very helpful comments. We also thank seminar participants at the University of Western Ontario, and the Netherlands Interdisciplinary Demographic Institute. This paper replaces the paper Method of Moments Estimation of Duration Models with Exogenous Regressors (2003). Financial support from NORFACE research programme on Migration in Europe – Social, Economic, Cultural and Policy Dynamics is gratefully acknowledged.

  1. 1

    Horowitz (2001, theorem 2.2) averages gn (Xi); the STATA program on our website is sufficiently fast to apply the bootstrap to most survey datasets.

  2. 2

    The Brent’s method combines the bisection method, the secant method and inverse quadratic interpolation. The idea is to use the secant method or inverse quadratic interpolation if possible, because they converge faster, but to fall back to the more robust bisection method if necessary. The secant method can be thought of as a finite difference approximation of the Newton-Raphson method. The Powell method extends the Brent method by searching in a specific direction, rather than changing one parameter at the time.

  3. 3
  4. 4

    In the MLE for models with duration dependence, we do not need the standard identification restriction that the unobserved heterogeneity term has mean one because the baseline hazard is normalized to be equal to 1 in the first interval.

  5. 5

    The Gâteaux derivative is a directional derivative; let

    and
    and η>0 then df(x, a)=limη0[{f(x+)–f(x)}/η].

  6. 6

    Our calculations were done in Gauss 6.0 on 3 parallel computers: a Pentium 2.1 PC, a Pentium 2.8 PC and a Pentium 2.0 laptop. The calculations took about 9 weeks of CPU time.

  7. 7

    The LRE with a duration dependence on 10 intervals for a sample size of 500 did not converge in seven of the experiments. The average is therefore base on 93 experiments instead of 100.

  8. 8

    The results for the parameters of the piecewise constant duration dependence, α2 and α3, are given in Tables A3 and A4 in Appendix A.

  9. 9

    The Doob-Meyer decomposition theorem is a theorem in stochastic calculus stating the conditions under which a submartingale may be decomposed in a unique way as the sum of a martingale and a continuous increasing process, see Meyer (1963) and Protter (2005).

Appendix A: Additional tables

Table A1

Average Bias of Estimates of the Log α’s Across the Experiments with a Piecewise Constant Duration Dependence on 4 Intervals.

Estimation methodSample size
50010005000
MLE no heteroα2–0.0480*–0.0319*–0.0095*
(0.150)(0.0103)(0.0042)
α3–0.0082–0.0127–0.0094*
(0.0132)(0.0088)(0.0041)
α4–0.0149–0.0102–0.0079
(0.0127)(0.0089)(0.0046)
MLE 2 pointsα20.02820.02570.0140*
(0.0194)(0.0158)(0.0053)
α30.1131*0.0713*0.0257*
(0.0237)(0.0175)(0.0064)
α40.1480*0.1013*0.0438*
(0.0273)(0.0213)(0.0076)
NPMLEα20.0785*0.0495*0.0211*
(0.0210)(0.0152)(0.0050)
α30.2011*0.1027*0.389*
(0.0275)(0.0183)(0.0059)
α40.2835*0.1782*0.0612*
(0.0339)(0.0228)(0.0079)
LREα2–0.0333–0.0234–0.0074
(0.0230)(0.0184)(0.0066)
α30.03910.0158–0.0087
(0.0306)(0.0224)(0.0093)
α40.05360.0264–0.0109
(0.0383)(0.0287)(0.0128)

*p<0.05

Table A2

Average Bias of Estimates of the Log α’s Across the Experiments with a Piecewise Constant Duration Dependence on 10 Intervals.

Sample sizeSample size
5001000500050010005000
MLE no heteroMLE 2 points
α2–0.0240–0.00980.00680.0704*0.0498*0.0464*
(0.0216)(0.0153)(0.0063)(0.0230)(0.0176)(0.0080)
α3–0.0162–0.0089–0.00900.1096*0.0740*0.0420*
(0.0241)(0.0157)(0.0061)(0.0283)(0.0195)(0.0086)
α4–0.0609*–0.0378*–0.00690.0958*0.0627*0.0590*
(0.0207)(0.0135)(0.0054)(0.0273)(0.0204)(0.0098)
α50.0073–0.0035–0.01150.1991*0.1229*0.0690*
(0.0206)(0.0144)(0.0069)(0.0305)(0.0231)(0.0117)
α6–0.0097–0.0024–0.00590.1986*0.1348*0.0766*
(0.0207)(0.0127)(0.0067)(0.0340)(0.0226)(0.0123)
α7–0.0593*–0.0464*–0.00740.1617*0.0971*0.0823*
(0.0226)(0.0154)(0.0072)(0.0364)(0.0269)(0.0135)
α8–0.0144–0.0130–0.00230.2161*0.1491*0.0963*
(0.0204)(0.0151)(0.0070)(0.0360)(0.0277)(0.0141)
α9–0.0209–0.0076–0.01200.2309*0.1616*0.0964*
(0.0243)(0.0149)(0.0075)(0.0388)(0.0284)(0.0137)
α10–0.0383–0.0217–0.00780.2324*0.1658*0.1068*
(0.0206)(0.0153)(0.0071)(0.0379)(0.0287)(0.0154)
NPMLELRE
α20.1790*0.1157*0.0703*–0.0648*–0.0460*0.0088
(0.0267)(0.0184)(0.0088)(0.0298)(0.0221)(0.0106)
α30.3039*0.1880*0.0871*–0.0784–0.0664*–0.0070
(0.0397)(0.0239)(0.0099)(0.0446)(0.0315)(0.0136)
α40.3730*0.2298*0.1181*–0.1236*–0.0942*–0.0041
(0.0466)(0.0298)(0.0120)(0.0514)(0.0387)(0.0166)
α50.5390*0.3248*0.1372*–0.0554–0.0605–0.0093
(0.0554)(0.0343)(0.0146)(0.0599)(0.0443)(0.0203)
α60.5848*0.3649*0.1573*–0.0716–0.0617–0.0050
(0.0583)(0.0383)(0.0151)(0.0646)(0.0496)(0.0220)
α70.5910*0.3554*0.1692*–0.1230–0.1079*–0.0078
(0.0646)(0.0413)(0.0170)(0.0698)(0.0530)(0.0245)
α80.6916*0.4232*0.1884*–0.0844–0.0792–0.0042
(0.0678)(0.0429)(0.0179)(0.0782)(0.0570)(0.0258)
α90.7346*0.4594*0.1918*–0.0921–0.0819–0.0157
(0.0734)(0.0441)(0.0191)(0.0782)(0.0578)(0.0278)
α100.7758*0.48169*0.2123*–0.1230–0.1038–0.0117
(0.0736)(0.0486)(0.0209)(0.0803)(0.0637)(0.0309)

For sample size of 500 based on 93 experiments, because in seven experiments the estimation procedure did not convergence . *p<0.05.

Table A3

Average Bias, Standard error and RMSE of Estimates of Parameters of Piecewise Constant Baseline Hazard Across the Experiments, Second set of Monte Carlo experiments.

Duration dependenceEstimation methodBiasStd errorRMSE
Positive duration dependenceMLE gammaα20.00690.00960.0118
α3–0.01490.02060.0255
NPMLEα20.02050.01570.0258
α30.00910.02830.0298
LREα2–0.01300.02000.0238
α3–0.06450.03290.0724
LRE-optα2–0.01340.01950.0236
α3–0.05330.3270.0625
Negative duration dependenceMLE gammaα20.02110.01110.0239
α30.0553*0.02290.0598
NPMLEα20.0345*0.01740.0386
α30.1079*0.03100.1123
LREα20.0369*0.01790.0410
α30.0643*0.03150.0716
LRE-optα20.0358*0.01780.0400
α30.0627*0.03140.0701
U-shaped duration dependenceMLE gammaα2–0.00090.00970.0097
α3–0.0338*0.01730.0379
NPMLEα20.0385*0.01550.0416
α30.01490.02510.0292
LREα20.03340.01860.0383
α3–0.02150.02710.0346
LRE-optα20.02610.01830.0319
α3–0.02470.02630.0361
Inverse U duration dependenceMLE gammaα20.01020.01040.0146
α3–0.00470.02320.0237
NPMLEα20.02320.01400.0271
α30.03270.02950.0440
LREα20.03350.01830.0381
α30.04000.03360.0522
LRE-optα20.03210.01820.0369
α30.03440.03360.0481

For each DGP (gamma mixture) 100 simulations with 1000 observations each. *p<0.05

Table A4

Average Bias, Standard error and RMSE of Estimates of Parameters of Piecewise Constant Baseline Hazard Across the Experiments, Second set of Monte Carlo Experiments, Censored Sample.

Duration dependenceEstimation methodBiasStd errorRMSE
Positive duration dependenceMLE gammaα20.00100.01350.0135
α3–0.02670.02690.0379
NPMLEα20.01200.01770.0213
α3–0.02040.03100.0371
LREα2–0.01480.01990.0248
α3–0.0656*0.03290.0734
LRE-optα2–0.01380.01990.0242
α3–0.05990.03280.0683
Negative duration dependenceMLE gammaα20.0347*0.01310.0371
α30.0633*0.02770.0691
NPMLEα20.0417*0.01840.0456
α30.0898*0.03250.0956
LREα20.0378*0.01820.0420
α30.05390.03290.0631
LRE-optα20.0375*0.01810.0416
α30.05010.03270.0598
U-shaped duration dependenceMLE gammaα20.00520.01330.0143
α3–0.02690.02250.0350
NPMLEα20.03080.01730.0353
α3–0.01590.02920.0333
LREα20.02660.01840.0323
α3–0.03210.02540.0410
LRE-optα20.02630.01820.0320
α3–0.03150.02530.0404
Inverse U duration dependenceMLE gammaα20.01370.01230.0184
α3–0.00300.02630.0264
NPMLEα20.01830.01490.0236
α30.02830.03050.0416
LREα20.03400.01850.0387
α30.03600.03350.0491
LRE-optα20.03130.01830.0363
α30.02900.03330.0441

For each DGP (gamma mixture) 100 simulations with 1000 observations each. *p<0.05

Appendix B: Proofs and Technical Details

Technical Details Section 2: A Counting Process Approach

The counting process approach is a very useful framework for analyzing duration data since an indicator can be used to denote whether a transition happened or not. Andersen et al. (1993) have provided an excellent survey of counting processes. Less technical surveys have been given by Klein and Moeschberger (1997), Therneau and Grambsch (2000), and Aalen et al. (2009). The main advantage of this framework is that it allows us to express the duration distribution as a regression model with an error term that is a martingale difference. Regression models with martingale difference errors are the basis for inference in time series models with dependent observations. Hence, it is not surprising that inference is much simplified by using a similar representation in duration models.

To start the discussion, we first introduce some notation. A counting process {N(t)|t≥0} is a stochastic process describing the number of events in the interval [0, t] as time proceeds. The process contains only jumps of size +1. For single duration data, the event can only occur once because the units are observed until the event occurs. Therefore we introduce the observation indicator Y(t)=I(Tt) that is equal to one if the unit is under observation at time t and zero after the event has occurred. The counting process is governed by its random intensity process, Y(t)κ(t), where κ(t) is the hazard in (2). If we consider a small interval (tdt] of length dt, then Y(t)κ(t) is the conditional probability that the increment dN(t)=N(t)–N(t–) jumps in that interval given all that has happened until just before t. By specifying the intensity as the product of this observation indicator and the hazard rate, we effectively limit the number of occurrences of the event to one. It is essential that the observation indicator only depends on events up to time t.

Usually we do not observe T directly. Instead we observe

with g a known function and C a random vector. The most common example is right censoring, where g(T, C)=min (T, C). By defining the observation indicator as the product of the indicator I(tT) and, if necessary, an indicator of the observation plan, we capture when a unit is at risk for the event. In the case of right censoring Y(t)=I(tT)I(tC), and in all cases of interest we have Y(t)=I(tT)IA(t) with A a random set that may depend on random variables. We assume that C and T are conditionally independent given X. The history up to and including t, Yh(t) is assumed to be a left continuous function of t. The history of the whole process also includes the history of the covariate process, Xh(t), and V. Thus, we have

The sample paths of the conditioning variables should be up to t–, but because these paths are left continuous we can take them up to t. A fundamental result in the theory of counting processes, the Doob-Meyer decomposition,9 allows us to write

where M(t), t≥0 is a martingale with conditional mean and variance given by

The (conditional) mean and variance of the counting process are equal, so the disturbances in (B.2) are heteroscedastic. The probability in (B.1) is zero, if the unit is no longer under observation. A counting process can be considered as a sequence of Bernoulli experiments because if dt is small, (B.3) and (6) give the mean and variance of a Bernoulli random variable. The relation between the counting process and the sequence of Bernoulli experiments given in (B.2) can be considered as a regression model with an additive error that is a martingale difference. This equation resembles a time-series regression model. The Doob-Meyer decomposition is very helpful to the derivation of the distribution of the estimators because the asymptotic behavior of partial sums of martingales is well-known.

Technical Details Section 3: Assumptions 1–4

To simplify the expressions, we use the notation hi(t, θ)= hi(t, Xh,I (t), θ).

  1. The conditional distribution of T given X(‧) and V has hazard rate

    with X(‧) a K covariate bounded stochastic process that is independent of V and such that if the probability of the event

    some set S with positive measure and for some constants c1, c2, then c1=c2=0. For the baseline hazard, 0<limt0λ(t, α0)<∞.

  2. For the covariate process X(t), t≥0, we assume that the sample paths are piecewise constant, i.e., its derivative with respect to t is 0 almost everywhere, and left continuous. The hazard that is not conditional on V is

    The observation process is Y(t), t≥0 with Y(t)=I9(tT)I(tC) and we assume

    The support of C is bounded.

  3. The parameter vector θ=(β′, α′)′ is an M vector with β a K vector and α an L vector. The parameter space Θ is convex. The baseline hazard λ(t, α)>0 and is twice differentiable and the second derivative is bounded in α (in the parameter space) and t.

  4. The weight function

    is an M vector of bounded and left continuous functions. If

    then there are functions μ(u, θ) (an M vector), Vβ (u, s, θ) (an M×K matrix), and Vα (u, s, θ) (an M×L matrix) such that

    and

    and

    Define

We assume that the M×M matrix [B(θ0) A(θ0)] is nonsingular.

The restriction on the baseline hazard in Assumption A1 ensures identification (see Section 3) and guarantees that the semiparametric information bound is nonsingular (see below). Assumption A2 states that the covariates and the observation indicator are predetermined. Assumption A4 is about smoothness: Suppose that one censors all the data at u=τ+ψ then the expressions in equation (30) and (31) do not change if the value of ψ varies. The derivation of the asymptotic distribution of the LR estimator follows the proof in Tsiatis (1990). Tsiatis requires that the density of U0 is bounded. For the MPH model, this density is

If E(V)=∞, this density is not bounded at u0=0. Inspection of Tsiatis’ proof shows that this does not change the result, and we do not need to impose the restriction that E(V) is finite. The transformed durations are observed up to τ with τ<∞ such that for some ψ,η>0

Pr[min (U0, C) > τ+ψ]≥η.

In the MPH model, this is just an assumption on the distribution of C because for U0 it is satisfied for all τ<∞.

Technical Details Section 4: Lemma 2–3

Lemma 2: If the derivative of κ is bounded on [0, τ] then for ε>0 with

and

we have

for u1, u2 with 0<u1<u2<τ.

If Yh,N(t) is bounded away from zero on [0, τ] for large N, then (B.14) and (B.15) imply that if bN=Nc for

then
Note that the uniform convergence holds on a compact subset of [0, τ]. Although this can be generalized to uniform convergence on [0, τ], the variable kernels that are needed for this generalization complicate the asymptotic analysis. In practice, estimation of the hazard is inaccurate near the endpoints, and it may be preferable to exclude observations that are close to the endpoints. Note that the observations near the endpoints are used in the estimation of the hazard. Also, using a bandwidth proportional to N–1/5 and
satisfies all the assumptions of this paper.

We do not observe the transformed duration

but rather an estimate
of this transformed duration, and hence we consider the kernel estimator

Lemma 3: The kernel K is positive and bounded on [–1, 1] (and zero elsewhere) and satisfies a Lipschitz condition on this interval. The covariate process X(t) is bounded on [0, τ] and so is

for all α in an open neighborhood of α0. Moreover

uniformly for 0≤uτ, θ∈N(θ0) and H has derivatives that are bounded for 0≤uτ, θ∈N(θ0). Then for ε>0 such that

we have

Proof: See below.

Note that the conditions on bN are determined in Lemma 2 and that a bandwidth proportional to N–1/5 and

satisfies all the assumptions of this paper. The fact that we use estimated transformed durations does not change the restrictions on the bandwidth choice.

At this point we consider the condition in (B.18) more closely. With

if the duration T is (right) censored at C, Y(t)=I(Tt)I(Ct), so

YU (u, θ)=I(h(T, θ)≥u)‧I(h(C, θ)≥u).

If the censoring time and the duration are conditionally independent given the history up to t, i.e.,

then

If N(θ0) is an open neighborhood of θ0, Xi and Ci are i.i.d., and

then

and by the uniform law of large numbers

uniformly for θN(θ0) and 0≤uτ. Because by (B.23) the limit is bounded away from zero, we have

uniformly for θN(θ0) and 0≤uτ with

Because h(T,θ0)=U0, (B.19) holds for θ=θ0 if κ0(u) is bounded for 0≤uτ. From the expression for κU (u, θ) in (9), a sufficient condition for κU(u, θ) to be bounded for all θ in a neighborhood of θ0 and 0≤tτ is that λ(t, α)>0 for all t and on a neighborhood of α0. In the same way, (B.20) holds if the hazard of C is bounded and λ(t, α) is bounded away from zero in a neighborhood around α0.

Proof of Lemma 1

is a linearization of
Because SN(θ) is not continuous in θ, it is not possible to linearize this function by a first order Taylor series expansion. Instead we linearize the hazard rate of the transformed durations U(θ). From (4) and (5) we obtain

This relates the hazard of the distribution of U(θ) to that of U0

Because h(h–1(u, θ), θ)=u, we have

The derivatives of κU(u, θ) with respect to θ are

where the last equality follows from a change of variables in the integral. In the same way, we obtain with a change of variable in the integral

The proof consists of checking the conditions for asymptotic linearity of SN(θ) in Tsiatis (1990) and a computation of the coefficients in the linear approximation. In Tsiatis’ proof the covariate in the estimating equation is Xi. We have

and hence the requirement that this is a vector of bounded functions. The equations (9), (10) and (11) are stability conditions [see also Andersen et al. (1993)]. Instead of a mean and variance condition as in Tsiatis (1990), we have a mean and two covariance conditions. Note that by setting s=u, we obtain conditions for uniform convergence to Vα (u, u) and Vβ (u, u). The final condition for linearization is that for u≤τ

The assumptions that λ(t,α) is bounded away from zero for all t≥0 and α in the parameter space, that

for all t≥0 and α in the parameter space, and that X(t) is bounded, imply that the second derivative of κU(u, θ) with respect to θ is bounded for all uτ and θ∈Θ. This is sufficient for (B.31) if the parameter space is convex.

Next we linearize SN(θ). Because

we have if |θθ0| is small

The second term is after substitution of (B.29), and (B.30)

The normalized vectors of coefficients converge to (B.12) and (B.13) if (B.10) and (11) hold. This proves the lemma.

Proof of Theorem 1

By van der Vaart (1998) Theorem 5.45, we have from Lemma 1

with M0 the martingale associated with the counting process N0 for U0. By the central limit theorem for integrals of predetermined functions with respect to a martingale, [see e.g., Anderson et al. (1993)], the sum on the right-hand side converges to a normal distribution with the variance matrix in (24).

Proof of Lemma 2 and 3

We have

We first consider the second term. Because K is Lipschitz this is bounded by

Moreover by the mean value theorem, we have that for some intermediate

Because Xi(t) is bounded on [0, τ] and so is

for all α in an open neighborhood of α0, (B.36) is bounded by
and substitution in (B.35) gives the upper bound

Because the estimator

is
consistent, the upper bound converges to 0 in probability if

Next we consider the first term in (B.34). By subtraction and addition of expected values, this term is bounded by

The first and second terms converge to 0 in probability if

Because of (B.18) the final term converges in probability to

This expression is bounded (both H and K are bounded) by

The first term goes to 0 in probability if

and the second if
This completes the proof.

References

Aalen, O. O., O. Borgan, and H. K. Gjessing. 2009. Survival and Event History Analysis. New York: Springer Verlag.10.1007/978-0-387-68560-1Search in Google Scholar

Amemiya, T. 1974. “The Nonlinear Two-Stage Least-Squares Estimator.” Journal of Econometrics 2: 105–110.10.1016/0304-4076(74)90033-5Search in Google Scholar

Amemiya, T. 1985. “Instrumental Variable Estimation for the Nonlinear Errors-in-Variables Model.” Journal of Econometrics 28: 273–289.10.1016/0304-4076(85)90001-6Search in Google Scholar

Andersen, P. K., O. Borgan, R. D. Gill, and N. Keiding. 1993. Statistical Models Based on Counting Processes. New York: Springer Verlag.10.1007/978-1-4612-4348-9Search in Google Scholar

Baker, M., and A. Melino. 2000. “Duration Dependence and Nonparametric Heterogeneity: A Monte Carlo Study.” Journal of Econometrics 96: 357–393.10.1016/S0304-4076(99)00064-0Search in Google Scholar

Bearse, P., J. Canals-Cerda, and P. Rilstone. 2007. “Efficient Semiparametric Estimation of Duration Models with Unobserved Heterogeneity.” Econometric Theory 23: 281–308.10.1017/S0266466607070120Search in Google Scholar

Bijwaard, G. E. 2009. “Instrumental Variable Estimation for Duration Data.” In Causal Analysis in Population Studies: Concepts, Methods, Applications, edited by H. Engelhardt, H.-P. Kohler, and A. Fürnkranz-Prskawetz, 111–148. New York: Springer Verlag.10.1007/978-1-4020-9967-0_6Search in Google Scholar

Bijwaard, G. E. 2010. “Immigrant Migration Dynamics Model for The Netherlands.” Journal of Population Economics 23: 1213–1247.10.1007/s00148-008-0228-1Search in Google Scholar

Bijwaard, G. E., and G. Ridder. 2005. “Correcting for Selective Compliance in a Re–employment Bonus Experiment.” Journal of Econometrics 125: 77–111.10.1016/j.jeconom.2004.04.004Search in Google Scholar

Bijwaard, G. E., C. Schluter, and J. Wahba. 2013. “The Impact of Labour Market Dynamics on the Return–Migration of Immigrants.” Review of Economics & Statistics, forthcoming.10.1162/REST_a_00389Search in Google Scholar

Chen, S. 2002. “Rank Estimation of Transformation Models.” Econometrica 70: 1683–1697.10.1111/1468-0262.00347Search in Google Scholar

Chiaporri, P. A., and B. Salanie. 2000. “Testing for Asymmetric Information in Insurance Markets.” Journal of Political Economy 108: 56–78.10.1086/262111Search in Google Scholar

Cox, D. R., and D. Oakes. 1984. Analysis of Survival Data. London: Chapman and Hall.Search in Google Scholar

Elbers, C., and G. Ridder. 1982. “True and Spurious Duration Dependence: The Identifiability of the Proportional Hazard Model.” Review of Economic Studies 49: 403–410.10.2307/2297364Search in Google Scholar

Feller, W. 1971. An Introduction to Probability Theory and its Applications. 3rd ed. John Wiley and Sons.Search in Google Scholar

Hahn, J. 1994. “The Efficiency Bound of the Mixed Proportional Hazard Model.” Review of Economic Studies 61: 607–629.10.2307/2297911Search in Google Scholar

Han, A. K. 1987. “Non–parametric Analysis of a Generalized Regression Model: The Maximum Rank Correlation Estimator.” Journal of Econometrics 35: 303–316.10.1016/0304-4076(87)90030-3Search in Google Scholar

Hausman, J. A., and T. Woutersen. 2005. “Estimating a Semi–Parametric Duration Model without Specifying Heterogeneity.” CeMMAP, working paper, CWP11/05.Search in Google Scholar

Heckman, J. J. 1991. “Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity.” American Economic Review 81: 75–79.Search in Google Scholar

Heckman, J. J., and B. Singer. 1984a. “Econometric Duration Analysis.” Journal of Econometrics 24: 63–132.10.1016/0304-4076(84)90075-7Search in Google Scholar

Heckman, J. J., and B. Singer. 1984b. “A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data.” Econometrica 52: 271–320.10.2307/1911491Search in Google Scholar

Honoré, B. E. 1990. “Simple Estimation of a Duration Model with Unobserved Heterogeneity.” Econometrica 58: 453–473.10.2307/2938211Search in Google Scholar

Horowitz, J. L. 1996. “Semiparametric Estimation of a Regression Model with an Unknown Transformation of the Dependent Variable.” Econometrica 64: 103–137.10.2307/2171926Search in Google Scholar

Horowitz, J. L. 1999. “Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity.” Econometrica 67: 1001–1018.10.1111/1468-0262.00068Search in Google Scholar

Horowitz, J. L. 2001. The Bootstrap in Handbook of Econometrics, Vol. 5, edited by J. J. Heckman and E. Leamer. North-Holland: Amsterdam.Search in Google Scholar

Khan, S. 2001. “Two Stage Rank Estimation of Quantile Index Models.” Journal of Econometrics 100: 319–355.10.1016/S0304-4076(00)00040-3Search in Google Scholar

Khan, S., and E. Tamer. 2007. “Partial Rank Estimation of Duration Models with General forms of Censoring.” Journal of Econometrics 136: 251–280.10.1016/j.jeconom.2006.03.003Search in Google Scholar

Klein, J. P., and M. L. Moeschberger. 1997. Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer Verlag.Search in Google Scholar

Lai, T. L., and Z. Ying. 1991. “Rank Regression Methods for Left–Truncated and Right-Censored Data.” Annals of Statistics 19: 531–556.10.1214/aos/1176348110Search in Google Scholar

Lancaster, T. 1976. “Redundancy, Unemployment and Manpower Policy: A Comment.” Economic Journal 86: 335–338.10.2307/2230754Search in Google Scholar

Lancaster, T. 1979. “Econometric Methods for the Duration of Unemployment.” Econometrica 47: 939–956.10.2307/1914140Search in Google Scholar

Lin, D. Y., and Z. Ying. 1995. “Semiparametric Inference for the Accelerated Life Model with Time-Dependent Covariates.” Journao of Statistical Planning and Inference 44: 47–63.10.1016/0378-3758(94)00039-XSearch in Google Scholar

Lindsay, B. G. 1983. “The Geometry of Mixture Likelihoods: A General Theory.” Annals of Statistics 11: 86–94.10.1214/aos/1176346059Search in Google Scholar

Manton, K. G., E. Stallard, and J. W. Vaupel. 1981. “Methods for the Mortality Experience of Heterogeneous Populations.” Demography 18: 389–410.10.2307/2061005Search in Google Scholar

Meyer, P. 1963. “Decomposition of Supermartingales: The Uniqueness Theorem.” Illinois Journal of Mathematics 7: 1–17.10.1215/ijm/1255637477Search in Google Scholar

Newey, W. K., and D. McFadden. 1994. “Large Sample Estimation and Hypothesis Testing.” In Handbook of Econometrics,Vol. 4, edited by R. F. Engle and D. MacFadden. North-Holland: Amsterdam.10.1016/S1573-4412(05)80005-4Search in Google Scholar

Powell, M. J. D. 1964. “An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives.” The Computer Journal 7: 155–162.10.1093/comjnl/7.2.155Search in Google Scholar

Prentice, R. L. 1978. “Linear Rank Tests with Right Censored Data.” Biometrika 65: 167–179.10.1093/biomet/65.1.167Search in Google Scholar

Press, W. H., B. P. Flannert, S. A. Teukolsky, and W. T. Vetterling. 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge: Cambridge University Press.10.1016/S0003-2670(00)82860-3Search in Google Scholar

Protter, P. 2005. Stochastic Integration and Differential Equations. New York: Springer Verlag, 107–113.Search in Google Scholar

Ramlau-Hansen, H. 1983. “Smoothing Counting Process Intensities by Means of Kernel Functions.” Annals of Statistics 11: 453–466.10.1214/aos/1176346152Search in Google Scholar

Ridder, G., and T. Woutersen. 2003. “The Singularity of the Efficiency Bound of the Mixed Proportional Hazard Model.” Econometrica 71: 1579–1589.10.1111/1468-0262.00460Search in Google Scholar

Robins, J. M., and A. A. Tsiatis. 1992. “Semiparametric Estimation of an Accelerated Failure Time Model with Time-Dependent Covariates.” Biometrika 79: 311–319.Search in Google Scholar

Sherman, R. P. 1993. “The Limiting Distribution of the Maximum Rank Correlation Estimator.” Econometrica 61: 123–137.10.2307/2951780Search in Google Scholar

Therneau, T., and P. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer Verlag.10.1007/978-1-4757-3294-8Search in Google Scholar

Tsiatis, A. A. 1990. “Estimating Regression Parameters using Linear Rank Tests for Censored Data.” Annals of Statistics 18: 354–372.10.1214/aos/1176347504Search in Google Scholar

van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge: Cambridge University Press.10.1017/CBO9780511802256Search in Google Scholar

Wooldridge, J. M. 2005. “Unobserved Heterogeneity and Estimation of Average Partial Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, edited by D. W. K. Andrews and J. H. Stock, 27–55. Cambridge University Press.10.1017/CBO9780511614491.004Search in Google Scholar

Woutersen, T. 2000. Consistent Estimators for Panel Duration Data with Endogenous Censoring and Endogenous Regressors. Dissertation Brown University.Search in Google Scholar

Published Online: 2013-05-28
Published in Print: 2013-07-01

©2013 by Walter de Gruyter Berlin Boston