Dependence modeling in stochastic frontier analysis

: This review covers several of the core methodological and empirical developments surrounding stochastic frontier models that incorporate various new forms of dependence. Such models apply naturally to panels where cross - sectional observations on ﬁ rm productivity correlate over time, but also in situations where various components of the error structure correlate between each other and with input variables. Ignoring such dependence patterns is known to lead to severe biases in the estimates of production functions and to incorrect inference.


Introduction
The roots of stochastic frontier analysis (SFA) can be traced back to the origins of classical growth accounting [82] and, perhaps, production planning [43]. These fields deal with production relationships, which are usually modeled through production functions or, more generally, tranfsformation functions (e.g., Shephard's distance functions, directional distance functions, cost functions, etc.). In the classical growth accounting approach, all variation in growth, apart from the variation of inputs, is attributed to the so-called Solow's residual, which under certain restrictions measures what is referred to as the change in total factor productivity (TFP).
Early models assumed that all the decision making units (DMUs) represented in the data as observations (e.g., firms, countries, etc.) were independent of one another and fully efficient. This ignored the numerous inefficiencies that arise in practice, which have arbitrary dependence and can arise due to such factors as simultaneity in DMU decisions, unobserved heterogeneity, common sources of information asymmetry and other market imperfections [84], managerial practices [18], cultural beliefs [34], traditions, expectations, and other unobserved factors inducing unaccounted dependence in models of production [16].
A key feature of several recent developments in SFA is the construction of a statistical model with as few restrictions on its dependence properties as possible. This implicitly recognizes that various forms of dependence are empirical questions that can and should be statistically tested against the data. Modern implements of SFA provide a framework where shortfalls from the production potential are decomposed into two termsstatistical noise and inefficiency, both of which are unobserved by a researcher but can be estimated for the sample as a whole (e.g., representing an industry) or for each individual DMU under a variety of possible dependence scenarios.
The extensions of SFA allowing for dependence that has until recently been ignored or overly restrictive are the focus of this review. To a large extent, they are triggered by the fact that restricting the nature of dependence of the composed error can lead to severe biases in estimators and incorrect inference. For example, within the confines of the traditional SFA approach one can test for the presence of either inefficiency or noise [25,73]. Thus, the model encompasses the classical approach with a naive assumption of full efficiency (conditional mean) and the deterministic production frontier as special cases. However, such estimators and tests themselves have been derived under the assumption of independence between inefficiency and noise; empirical results suggest that allowing for dependence changes the estimates and tests significantly, potentially distorting the conclusions of the classical models.
Thus, SFA under dependence is a natural relaxation of the extreme assumptions of full efficiency and independence, yet it also encompasses them as special cases, which can still be followed if the data and the statistical tests do not recommend otherwise. If there is evidence in favor of the full efficiency hypothesis after allowing for dependence, one can proceed with regression techniques or growth accounting, but this inference would now be robust to these assumptions.
Accounting for dependence within a production model could be critical for both quantitative and qualitative conclusions and, perhaps more importantly, for the resulting policy implications. For example, El Mehdi and Hafner [28] found that estimated technical efficiency scores across the financing of Moroccan rural districts allowing for dependence tend to be lower than under the assumption of independence but the rankings remained basically the same. Thus, a key difference emerges if one is looking to identify the best versus measure how much improvement can be made.
While some of the methods and models we present here can also be found in previous reviews, e.g., [67] and [9], and it is impossible to give a good review without following them to some degree, here we also summarize many of (what we believe to be) key recent developments as well as (with their help) shed some novel perspectives onto the workhorse methods. We do not claim, however, that this survey comprehensively covers all of the relevant recent developments in modelling dependence in SFA. Many other important references can be found elsewhere.
The rest of the article is structured as follows. Section 2 introduces the classical cross-sectional stochastic frontier models (SFMs) and focuses on dependence between error components in such models. Section 3 considers dependence via sample selection. Section 4 surveys dependence models used in panels. Section 5 discusses dependence that underlies endogeneity in SFM, which is a situation when there is dependence between production inputs and error terms. Section 6 discusses how dependence can help obtain more precise estimates of inefficiency. Section 7 concludes.

The benchmark SFM and dependence within the composed error
In cross-sectional settings, one of the main approaches to study productivity and efficiency of firms is the SFM, independently proposed by Aigner et al. [3] and Meeusen and van den Broeck [61].¹ Using conventional notation, let Y i be the single output for observation (e.g., firm) i and let ( ) = y Y ln i i . The cross-sectional SFM can be written for a production frontier as ; .
1 [15] and [62], while appearing in the same year, are applications of the methods.
Here ( ) x β m ; i represents the production frontier of a firm (or more generally a DMU), with given input vector x i . Observations indexed by = … i N 1, , are assumed to be independent and identically distributed. Our use of β is to clearly signify that we are parametrically specifying our production function, most commonly as a Cobb-Douglas or translog (see, e.g., [67] or [68] for a detailed treatment on nonparametric estimation of the SFM).

Canonical independence framework
The main difference between a standard production function setup and the SFM is the presence of two distinct error terms in the model. The u i term captures inefficiency, shortfall from maximal output dictated by the production technology, while the v i term captures stochastic shocks. The standard neoclassical production function model assumes full efficiencyso SFA embraces it as a special case, when = u 0 i , ∀i, and allows the researcher to test this statistically.² It is commonly assumed that inputs are exogenous, in the sense that x is independent of u and v, u v , , and the two components of the error term are independent, ⊥ u v. Many estimation methods require distributional assumptions for both u and v (beyond the assumption of independence). For an assumed distributional pair, one can obtain the implied distribution for ε i and then estimate all of the parameters of the SFM with the maximum likelihood estimator (MLE). The most common assumption is that Exponential distribution with parameter σ u . The most popular case for the density of the composed error ε is obtained for the Normal Half Normal specification under independence ⊥ u v. According to Aigner et al. [3], the distribution function of a sum of a normal and Truncated Normal was first derived by Weinstein [94]. Let f v and f u denote the density of v and u, respectively. For the Normal Half Normal case, ( ) , where ( ) ⋅ ϕ is the standard Normal probability density function (pdf). The closed form expression for the pdf can be obtained by convolution as follows: is the standard normal cumulative distribution function (cdf), with the parameterization λ is commonly interpreted as the proportion of variation in ε due to inefficiency. The density of ε in (2.2) can be characterized as that of a Skew Normal random variable with location parameter 0, scale parameter σ, and skew parameter −λ.³ This connection has only recently appeared in the efficiency and productivity literature [24].
It is worth noting that the closed form expression in (2.2) is equivalent to where expectations are taken with respect to the relevant distribution. This suggests an alternative, simulation-based, way to construct the density by sampling from the distribution of u or v and evaluating the corresponding sample averages. Among the two sampling options (from the distribution of u or from the  2 Prior to the development of the SFM, approaches which intended to model inefficiency typically ignored v i leading to estimators of the SFM with less desirable statistical properties: see [1,2,27,71,72,86]. 3 The pdf of a Skew Normal random variable x is ( ) The distribution is right skewed if > α 0 and is left skewed if < α 0. We can also place the Normal, Truncated Normal pair of distributional assumptions in this class. The pdf of x with location ξ , scale ω, and skew parameter α is ( ) . See [10] and [65] for more details. distribution of v), sampling the u's is more practical as it avoids to need to ensure that − > v ε 0. Sampling u can be easily done by sampling from the standard normal distribution and taking the absolute values | ( )| σ N 0, 1 u (in the case of the Half Normal distribution).⁴ Our mathematical formulation will focus on a production frontier as it is the most popular object of study. The framework for dual characterizations (e.g., cost, revenue, profit) or other frontiers is similar and follows with only minor changes in notation. For example, the cost function formulation is obtained by changing the sign in front of u to a "+," which will represent excess, rather than shortfall, of cost above the minimum level.

Modeling dependence
Smith [81] relaxed the assumption of independence between u and v by introducing a copula function to model their joint distribution. This is one of the first relaxations of the independence assumptions available for SFA and it allowed testing the adequacy of this assumption. If the marginal distributions of u and v are linked by a copula density ( ) ⋅ ⋅ c , , then their joint density can be expressed as follows: where F u and F v denote the respective cdfs. It then follows by a similar construction to (2.2) that the density of ε can be written as For commonly used copula families, this density does not have a close form expression similar to (2.2), even in the Normal Half Normal case, so a simulation-based approach would often need to be used, where we simulate many draws of u and evaluate the sample analogue of the following expectation with respect to the distribution of u: Smith [81] found that ignoring the dependence can lead to biased estimates and discussed how one can test whether the independence assumption nested within this model is adequate. It is easy to see that the model in (2.2) is a special case of (2.3) when ( ) ⋅ ⋅ c , is the independence (or product) copula. From ( ) f ε , along with the assumption of independence over i, the log-likelihood function can be written as follows: The SFM can be estimated using the traditional MLE, if an analytic expression for the integrals is available, or the maximum simulated likelihood estimator (MSLE), if we need to use a simulation-based approach to evaluate the integrals of the density [60].⁵ The benefit of MLE/MSLE is that under the assumption of correct distributional specification of ε, the MLE is asymptotically efficient (i.e., consistent, asymptotically Normal, and its asymptotic variance reaches the Cramer-Rao lower bound).  4 With modern statistical software it is straightforward to sample from a wide swath of one-sided distributions that have been suggested in the SFA literature: Exponential, Gamma, Truncated Normal, Weibull, Beta, Uniform, Binomial, Generalized Exponential, etc. 5 Burns [21], who was Smith's student, proposed using MSLE, and while predating Smith [81] actually attributes a working paper version.
A further benefit is that a range of testing options are available. For instance, tests related to β can easily be undertaken using any of the classic trilogy of tests: Wald, Lagrange multiplier, or likelihood ratio. The ability to readily and directly conduct asymptotic inference is one of the major benefits of SFA over data envelopment analysis (DEA).⁶ Two main issues that practitioners face when confronting dependence are the choice of copula model and the assumed error distributions that best fit the data. As Wiboonpongse et al. ( [95], p. 34) note "The impact of the independence assumption on technical efficiency estimation has long remained an open issue." Analytical criteria such as AIC or BIC can be used for these purposes, see both [9] and [95] for detailed reviews.
More specifically, Wiboonpongse et al. [95] use MSLE and systematically consider several copula families including the Student-t, Clayton, Gumbel and Joe families as well as their relevant rotated versions. Their data is a cross section from coffee production in Thailand and they use AIC and BIC to determine which copula model is most appropriate. Wiboonpongse et al. [95] also assume the marginals of the two error components are Normal and Half Normal, then apply a range of copulas to inject dependence. In their empirical application they have a total of 111 observations. The Clayton copula is found to be the best and plots of technical efficiencies across 111 farmers for independence and the best fitting copula model found near uniformly lower TE scores (though not much different). Finally, it also appears that the ranks are preserved (see their Figure 3).
An unintended benefit of modelling dependence is that it may alleviate the "wrong skewness" issue that is common in the canonical Normal Half Normal SFM [77,89]. The wrong skewness issue arises when the OLS residuals display skewness that is of the wrong sign compared to that stemming from the frontier model (so positive when estimating a production frontier). For specific distributional pairs the model cannot separately identify the variance parameters for both v and u. It was noted in [19] that the third central moment of the composed error is It is clear that the skewness of ε only depends on the skewness of u when v is assumed to be symmetric and independent of u. Once u and v are allowed to be dependent and/or v is allowed to be asymmetric, then the skewness of the composed error does not have to align with the skewness of inefficiency. Thus, modelling dependence is one way in which some of the empirical vagaries of the SFM can be overcome [93].

Asymmetric dependence
A common feature of all of the papers that have allowed dependence in the SFM is the use of copulas that introduce symmetric dependence. Symmetric dependence assumes that the noise v and inefficiency components are treated equally in the SFM. However, a recent suggestion by Wei et al. [91] offers a set of copulas that allow for asymmetric dependence. As Wei et al. ( [91], p. 57) note "[...]in practical situations, the inefficiency component u and the error component often play different roles in global inefficiency, and in such cases, the symmetric copulas are not suitable." They define asymmetric copulas as those that have non-exchangeable and/or radial asymmetric properties [92].
Wei et al. [91] introduced the Skew Normal copula and used it to construct their SFM with dependence. An interesting feature of their general setup is that they allow both v and u to be asymmetric along with an asymmetric copula (see their Proposition 3.1). As in [95], Wei et al. [91] recommended selecting the copula model based on AIC/BIC. In their empirical application 31 out of 108 farms have the same efficiency rank  (the bottom 5 are in complete agreement as are 4 of the top 5) across the standard SFM and the asymmetric copula SFM. The point estimates of technical efficiency however show large differences among the two competing models, which again provides evidence that ignoring dependence can have an undue influence on the point estimates of technical inefficiency.

Dependence via sample selection
Another way in which dependence can arise is through sample selection. By itself sample selection has only recently been a serious area of focus in the stochastic frontier literature. Several early approaches to deal with potential sample selection follow the two-step correction [37]. In the first stage, the probability of selection is estimated and the inverse Mill's ratio is calculated for each observation. This estimated inverse Mill's ratio is then included as a regressor in the final SFM. An example of this is the Finnish farming study [80]. This limited information two-step approach works in a standard linear regression framework because of linearity, which Greene [33] makes clear. However, as shown in [51], when inefficiency is present no two-step approach will work and full information maximum likelihood estimation is required.
Recognizing the limitations of direct application of the two-stage approach, both Kumbhakar et al. [51] and Greene [33] proposed alternative stochastic frontier selection models. The two approaches differ in how selection arises in the model. The Greene [33] model allows the choice of technology to be influenced by correlation between random error in the selection and frontier models, whereas Kumbhakar et al. [51] constructed a model where the choice of technology is based on some aspect of inefficiency, inducing a different form of sample selection. Beyond the difference in how selection arises, the sample selection stochastic production frontier models [51] and [33] are identical.
Sriboonshitta et al. [83] were the first to recognize that dependence could enter the sample selection model. They work with the Greene [33] stochastic frontier sample selection model and admit dependence into the composite error term. This is termed a double-copula because they have a copula in the sample selection equation and a copula in the SFM. See ( [83], equation (20)) for the likelihood function.
Beyond a small set of simulations, Sriboonshitta et al. [83] applied the double copula sample selection SFM to 200 rice farmers from Kamphaeng Phet province, Thailand, in 2012 using a Cobb-Douglas production frontier and considered eight different copula functions (see their Table 4). Their preferred model based on the AIC is a Gaussian copula with 270-degree rotated Clayton model. They find a substantial difference in estimated TE scores between the Greene [33] model which assumes independence and their double-copula model (see their Figure 5). As Sriboonshitta et al. ( [83], p. 183) note "[...]improperly assuming independence between the two components of the error term in the SFM may result in biased estimates of technical efficiency scores, hence potentially leading to wrong conclusions and recommendations." As a further extension of [83], Liu et al. [59] noted that "this double-copula model neglects the correlation between the unobservables in the selection model and the random error in the SFM, in contrast to Greene's model." Liu et al. [59] generalized the Greene [33] model by modeling the dependence between the unobservables in the selection equation and the two error terms in the production equation using a trivariate Gaussian copula. The key feature is that the trivariate and double copula models rely on different assumptions concerning the joint distribution of v, u, and ξ (ξ here is the error in the selection equation). Liu et al. [59] made note of the decomposition

1)
and note that the double copula model assumes that ( | ) , i.e., the distribution of ξ only depends on the composite error, not the individual pieces. This also implies that the double copula model and the trivariate copula model are nonnested.
Liu et al. [59] provided an application that focuses on Jasmine/non-Jasmine rice farming in Thailand. The data suggest u is Gamma distributed for the most preferred model. As with some of the earlier papers, Liu

Dependence in panel SFM
When repeated observations of the firms are available, then we can allow for richer models that incorporate unobserved components and various other dependence structures. Most importantly, we can extract information about likely time trends in inefficiency and time constant firm-specific characteristics. Pitt and Lee [69] seem to be the first to extend the cross-sectional SFM to a panel structure, and Schmidt and Sickles [76] were the first to propose a panel-specific methodology for SFA.

A benchmark specification
The benchmark panel SFM can be written as follows: ; .
This model differs from (2.1) in many ways. All observed variables and error terms inherited from (2.1) now have a double-index for both firms, i, and time, = … t T 1, , . In addition, the model contains the so-called firm-specific heterogeneity, c i , and the time-invariant component of inefficiency η i . Compared with (2.1), c i encapsulates any unobserved factors that affect output (other than inputs) without changing over time such as unmeasured management or operational specifics of the firm. If such factors are present, the dependence between c i and x i causes omitted variable biases and invalidates inference based on cross-sectional SFM. In panel models, when ignored, such factors can serve as common sources of dependence in the error term ε it which also leads to invalid inference.
Another distinguishing feature of (4.1) is the presence of η i , a component of inefficiency which is timeinvariant. This means that inefficiency is composed of both time-invariant and time variant components, which are sometimes interpreted as long-run and short-run inefficiency. Since both c i and η i are unobserved, it will generally be difficult to decompose α i into its subsequent firm-specific and time-invariant inefficiency components.
Classical panel methods (i.e., methods that assume that η i and u it do not exist) allow for various forms of dependence between c i and x it . For example, estimation under the fixed effects (FE) framework allows x it to be correlated with c i and uses a within transformation to obtain a consistent estimator of β. Alternatively, estimation in the random effects (RE) framework assumes that x it and c i are independent and uses OLS or GLS. The difference between OLS and GLS arises due to the fact that the variancecovariance matrix of the composed error term + c v i i t is no longer diagonal, and so, feasible GLS is asymptotically efficient.
The early work on panel SFM assumed inefficiency to be time-invariant. This allowed handling dependence within panels using classical panel methods such as FE and RE estimation [76]. The standard timeinvariant SFM is , is allowed to have arbitrary dependence with x it . In cases in which there are time-invariant variables of interest in the production model, one can use the RE framework, which also requires no distributional assumptions on v and η and can be estimated with OLS or GLS. Alternatively, in such cases, one can rely on distributional assumptions as in [69], where v it is assumed to follow a Normal distribution and η i Half Normal. Table 1 contains a summary of the classical SFMs allowing for specific forms of serial dependence in u it . It also lists any additional dependence structures permitted in these different models such as dependence between c i and x it . See [67] and [50] for a detailed discussion of these methods.

Quasi MLE
i is assumed to be part of u it or v it , which are independent of x, then we can view the panel model as a special case of the cross-sectional model (2.1), only with the double-index it. The MLE method described in Section 2 applies in this case but it uses the sample likelihood obtained leveraging the assumption of independence over both i and t, not just i. Because independence over t is questionable in panels, this version of MLE is often referred to as quasi-MLE (QMLE). Let and let f it denote the density of the composed error term evaluated at = − ε v u it it it . Then, and the QMLE of θ can be written as The QMLE is known to be consistent even if there is no independence over t but to obtain the correct standard errors one needs to use the so-called "sandwich," or misspecification-robust, estimator of the QMLE asymptotic variance matrix. QMLE is known to be dominated in terms of precision by several other estimators that use the dependence information explicitly (and correctly). However, an appeal of the QMLE in this setting is that assuming independence is more innocuous in the sense that it does not lead to estimation bias, only to a lack of precision, when compared to a misspecification of the type of dependence that can lead to distinct biases.

Badunenko and Kumbhakar
Amsler et al. [5] proposed several estimators that model time dependence in panels. One such estimator can be obtained in the Generalized Method of Moments (GMM) framework. Let ( ) θ s it denote the score of the density function ( ) θ f it , i.e., where ∇ θ denotes the gradient with respect to θ. Then, the QMLE of θ solves and is identical to the GMM estimator based on the moment condition where expectation is with respect to the distribution of ε. However, under time dependence, summation (over t) of the scores in (4.6) is not the optimal weighting. The theory of optimal GMM suggests using correlation of s it over t by applying the GMM machinery to the T score functions written as follows: The optimal GMM estimator based on these moment conditions has the smallest asymptotic variance than that of any other estimator using these moment conditions. In a classical (non-SFA) panel data setting, Prokhorov and Schmidt [70] call this estimator Improved QMLE (IQMLE).

Using a Copula
Alternative estimators that allow explicit modelling of dependence between cross-sectional errors over t have to construct a joint distribution of those errors. Amsler et al. [5] offered two ways of doing so. One is to apply a copula to form f ε , the joint (over t) density of the composed errors . The other is to use a copula to form f u , the joint distribution of ( ) Given the Normal/Half Normal marginals of ε's in (2.2) and a copula density ( ) ⋅ … ⋅ c , , , the joint density f ε can be written as follows: where, as before, ( ) ( ) is the pdf of the composed error term evaluated at ε it and is the corresponding cdf. Once the joint density is obtained we can construct a loglikelihood and run MLE. If we let the copula density have a scalar parameter ρ, then the sample loglikelihood can be written as follows: The first term in the summation is what distinguishes this likelihood from QMLEan explicit modelling of dependence between the composed errors at different t. . Again, efficiency improvement is, in some circumstances, possible if we instead use the optimal GMM machinery on the moment conditions However, this improvement may now come at the price of a bias as the copula-based moment conditions may be misspecified causing inconsistency of GMM and offsetting any benefit of higher precision. So assuming a wrong kind of time dependence may be worse than assuming independence (over time). Prokhorov and Schmidt [70] explored these circumstances. The alternative copula-based specification is to form the joint distribution of ( ) . A challenge of this specification is that a T -dimensional integration will be needed to form the likelihood in this case. Let is the cdf of the Half Normal error term.
To form the sample likelihood we need the joint density of the composed error vector ε. Given the density of u and assuming, as before, that ⊥ v u, this density can be obtained as follows: Similar to the previous section, this integral has no analytical form. Additionally, this is a T -dimensional integral, which is computationally strenuous to evaluate using numerical methods. However, it has the form of an expectation over a distribution we can sample from and this, as before, permits application of MSLE, where we simulate the u's and estimate ( ) ε f ε by averaging ( ) + ε ϕ u over the draws. To be precise, let S denote the number of simulations. The direct simulator of ( ) ε f ε can be written as follows: constructed in (4.9). Then, a simulated log-likelihood can be obtained as follows: it . This method is a multivariate extension to the simulation-based estimation of univariate densities discussed earlier. An important additional requirement is the ability to sample from the copula; see ( [64], Ch. 2) for a discussion of how to sample from the copula to allow dependence. Other than that, similar asymptotic arguments suggest that MSLE is asymptotically equivalent to MLE [29].

Dependence due to endogeneity
A common assumption in the SFM is that x is either exogenous or independent of both u and v. If either of these conditions are violated, then all of the estimators discussed so far will be biased and most likely inconsistent. Yet, it is not difficult to think of settings where endogeneity is likely to exist. For example, if shocks are observed before inputs are chosen, then producers may respond to good or bad shocks by adjusting inputs, leading to correlation between x and v. Alternatively, if managers know they are inefficient, they may use this information to guide their level of inputs, again, producing endogeneity. In a regression model, dealing with endogeneity is well understood. However, in the composed error setting, these methods cannot be simply transferred over, but require care in how they are implemented [6].
To incorporate endogeneity into the SFM in (2.1), we set ( ) = + ′ + ′ x β x β x β m β ; , where x 1 are our exogenous inputs, and x 2 are the endogenous inputs, where endogeneity may arise through correlation of x 2 with u, v, or both. To deal with endogeneity we require instruments, w, and identification necessitates that the dimension of w is at least as large as the dimension of x 2 . The natural assumption for valid instrumentation is that w is independent of both u and v.
Why worry about endogeneity? Economic endogeneity means that the inputs in question are choice variables and chosen to optimize some objective function such as cost minimization or profit maximization. Statistical endogeneity arises from simultaneity, omitted variables, and measurement errors. For example, if the omitted variable is managerial ability, which is part of inefficiency, inefficiency is likely to be correlated with inputs because managerial ability affects inputs. This is the Mundlak argument for why omitting a management quality variable (for us inefficiency) will cause biased parameter estimates. Endogeneity can also be caused by simultaneity meaning that more than one variable in the model are jointly determined. In many applied settings, it is not clear what researchers mean when they attempt to handle endogeneity inside the SFM. An excellent introduction into the myriad of influences that endogeneity can have on the estimates stemming from the SFM can be found in [63]. Mutter et al. [63] used simulations designed around data based on the California nursing home industry to understand the impact of endogeneity of nursing home quality on inefficiency measurement.
The simplest approach to accounting for endogeneity is to use a corrected two-stage least squares (C2SLS) approach, similar to the common correct ordinary least squares (COLS) approach that has been used to estimate the SFM. This method estimates the SFM using standard two-stage least squares (2SLS) with instruments w. This produces consistent estimators for β 1  This represents a simple avenue to account for endogeneity, and it does not require specifying how endogeneity enters the model, i.e., through correlation with v, with u or both. However, as with other corrected procedures based on calculations of the second and third (or even higher) moments of the residuals, from [66] and [89], if the initial 2SLS residuals have positive skew (instead of negative), then σ u 2 cannot be identified and its estimator is 0. Furthermore, the standard errors from this approach need to be modified for the estimator of the intercept to account for the step-wise nature of the estimation.

A likelihood framework
Likelihood-based alternatives allow for explicit modelling and estimation of the dependence structure that underlies endogeneity. This has recently been studied by Kutlu [54], Karakaplan and Kutlu [44], Tran and Tsionas [87,88], and Amsler et al. [6]. Our discussion here follows [6] as their derivation of the likelihood relies on a simple conditioning argument as opposed to the earlier work relying on the Cholesky decomposition or alternative approaches. While all approaches lead to a likelihood function, the conditioning idea of Amsler et al. [6] is simpler and more intuitive. Consider the stochastic frontier system: Dependence modeling in stochastic frontier analysis  133 is the vector of instruments, η i (different from η i in Sections 3.1-3.2) is uncorrelated with w i and endogeneity of x i 2 arises through Here simultaneity bias (and the resulting inconsistency) exists because η i is correlated with either v i , u i , or both.
We start with the case of dependence between η i and v i while u i is independent of ( ) To derive the likelihood function, [6] condition on the instruments, w. Doing this yields ( x w f y f , 2 2 . With the density in this form, the log-likelihood follows suite: = + ln ln ln . The subtraction of μ ci in ln 1 is an endogeneity correction while it should be noted that ln 2 is nothing more than the standard likelihood function of a multivariate normal regression model (as in (5.2)). Estimates of the model and Σ ηη can be obtained by maximizing ln . While direct estimation of the likelihood function is possible, a two-step approach is also available [54]. However, as pointed out by both Kutlu [54] and Amsler et al. [6], this two-step approach will have incorrect standard errors. Even though the two-step approach might be computationally simpler, it is, in general, different from full optimization of the likelihood function of Amsler et al. [6]. This is due to the fact that the two-step approach ignores the information provided by Γ and Σ ηη in ln 1 . In general, full optimization of the likelihood function is recommended as the standard errors (obtained in a usual manner from the inverse of the Fisher information matrix) are valid.⁷

A GMM framework
An insightful avenue to model dependence due to endogeneity in the SFM that differs from the traditional corrected methods or maximum likelihood stems from the GMM framework as proposed by Amsler et al. [6], who used the insights of Hansen et al. [35]. Similar to our discussion on the use of GMM in panel estimation, the idea is to use the first-order conditions for maximization of the likelihood function under exogeneity as a GMM problem: Typically, the standard errors can be obtained either through use of the outer product of gradients (OPG) or direct estimation of the Hessian matrix of the log-likelihood function. Given the nascency of these methods it has yet to be determined which of these two methods is more reliable in practice, though in other settings both tend to work well. One caveat for promoting the use of the OPG is that since this only requires calculation of the first derivatives, it can be more stable (and more likely to be invertible) than calculation of the Hessian. Also note that in finite samples, the different estimators of covariance of MLE estimator can give different numerical estimates, even suggesting different implications on the inference (reject or do not reject the null hypothesis). So, for small samples, it is often advised to check all feasible estimates whenever there is suspicion of ambiguity in the conclusions (e.g., when a hypothesis is rejected only at say around the 10% of significance level).
Note that these expectations are taken over x i and y i (and by default, ε i ) and solved for the parameters of the SFM.
The key here is that these first-order conditions (one for σ 2 , one for λ, and the vector for β) are valid under exogeneity and this implies that the MLE is equivalent to the GMM estimator. Under endogeneity however, this relationship does not hold directly. But the seminal idea of Amsler et al. [6] is that the firstorder conditions (5.3) and (5.4) are based on the distributional assumptions on v and u, not on the relationship of x with v and/or u. Thus, these moment conditions are valid whether x contains endogenous components or not. The only moment condition that needs to be adjusted is (5.5). In this case, the firstorder condition needs to be taken with respect to w, the exogenous variable, not x. Doing so results in the following amended first-order condition: where ϕ i and Φ i are identical to those in (5.5). It is important to acknowledge that this moment condition is valid when ε i and w i are independent. This is a more stringent requirement than the typical regression setup with As with the C2SLS approach, the source of endogeneity for x 2 does not need to be specified (through v and/or u).

An economic model of dependence
with respect to x j . These first-order conditions are exact, which usually does not arise in practice, rather, a stochastic term is added which is designed to capture allocative inefficiency. That is, our empirical first-order conditions are for = … j J 2, , , where η j captures allocative inefficiency for the jth input relative to input one (the choice of input to compare is without loss of generality).
The idea behind allocative inefficiency is that firms could be fully technically efficient, and still have room for improvement due to over or under use of inputs, relative to another input, given the price ratio. On the other hand, firms can be technically inefficiency because of allocative inefficiency and vice versa so independence between u and η is hard to justify. Additionally, if firms are cost minimizers and one  8 It is possible to treat a subset of x as endogenous, i.e., 1 2 , where x 1 is endogenous and x 2 is exogenous. estimates a production function, the inputs will be endogenous as these are choice variables to the firm. In this case, input prices can serve as instruments.
Combining the SFM, under the Cobb-Douglas production function, with the information in the − J 1 conditions in (5.8) with allocative inefficiency built in, results in the following system: where x ij is the log of input j of firm i, p j is the log of input j price, β j is the coefficient on input j in (5.9), and are the allocative inefficiencies for − J 1 inputs with respect to input one. See Schmidt and Lovell [74,75] for details.

A copula-based approach
Amsler et al. [6] used copulas to obtain a joint distribution for u v , , and η, whereas Amsler et al. [8] developed a new copula family for u and η with properties that reflect the nature of allocative (symmetric) and technical (one-sided) inefficiencies. Here we provide the derivation of a copula-based likelihood for the most general case that allows us to model dependence between all the components of ( ) η u v , , . We keep the Half Normal marginal for u, Normal marginals for the elements of ( ) = ′ ′ η ψ v, as before, and assume a copula density ( ) ⋅ … ⋅ c , , . Amsler et al. [6] used the Gaussian copula, which implies that the joint distribution of ψ is Normal but this is largely done for convenience. This gives the joint density of ( ) η u v , , : However, we need the joint density of ε and η in order to form a sample log-likelihood. This density can be obtained by integrating u out of ( ) as follows: and whatever copula parameters appear in c. This permits modelling and testing the validity of independence assumptions between all error terms in the system including the assumption of exogeneity.

Dependence on determinants of inefficiency
To conclude this section, we consider the extension to a setting when inefficiency depends on covariates and some of these determinants of inefficiency may be endogenous [7,55]. These models can be estimated using traditional instrumental variable methods. However, given that the determinants of inefficiency enter the model nonlinearly, nonlinear methods are required. Amsler et al. [7] considered the model where * u is the baseline inefficiency and u has the property that the scale of its distribution (relative to the distribution of * u ) changes depending on the determinants z (the so-called scaling property). The covariates x i and z i are partitioned as where q i are the traditional outside instruments. Identification of all the parameters requires that the dimension of q be at least as large as the dimension of x 2 plus the dimension of z 2 (the rank condition).
In the model of Amsler et al. [7], endogeneity arises through dependence between a variable in the model (x 2 and/or z 2 ) and noise, v. That is, both x and z are assumed to be independent of baseline inefficiency * u . Given that [ ] E u i is not constant, the COLS approach to deal with endogeneity proposed by Amsler et al. [6] cannot be used here. To develop an appropriate estimator, add and subtract the mean of inefficiency to produce a composed error term that has mean 0, Proper estimation through instrumental variables requires that the following moment condition holds The nonlinearity of these moment conditions would necessitate use of nonlinear two-stage least squares (NL2SLS) [4]. Latruffe et al. [55] have a similar setup to Amsler et al. [7], using the model in (5.13), but develop a fourstep estimator for the parameters; additionally, only x 2 is treated as endogenous. Latruffe et al.'s [55] approach is based on [23] using the construction of efficient moment conditions. The vector of instruments proposed in [55] is defined as where ′ q γ i captures the linear projection of x 2 on the external instruments q. The four-stage estimator is defined as Step 1 Regress x 2 on q to estimate γ. Denote the OLS estimator of γ as  γ .
Step 2 Use NLS to estimate the SFM in (5.13). Denote the NLS estimates of ( ) β δ , as ( ) β δ,¨. Use the NLS estimate of δ and the OLS estimate of γ in Step 1 to construct the instruments  ( ) w γ δ ,ï .
Step 3 Using the estimated instrument vector  ( ) w γ δ ,ï , calculate the NL2SLS estimator of ( ) β δ , as ( ) ͠ ͠ β δ , . Use the NL2SLS estimate of δ and the OLS estimate of γ in Step 1 to construct the instru- Step 4 Using the estimated instrument vector This multi-step estimator is necessary in the context of efficient moments because the actual set of instruments is not used directly, rather ( ) w γ δ , i is used, and this instrument vector requires estimates of γ and δ. The first two steps of the algorithm are designed to construct estimates of these two unknown parameter vectors. The third step then is designed to construct a consistent estimator of ( ) w γ δ , i , which is not done in Step 2 given that the endogeneity of x 2 is ignored (note that NLS is used as opposed to NL2SLS). The iteration from Step 2 to Step 3 does produce a consistent estimator of ( ) w γ δ , i , and as such, Step 4 produces consistent estimators for β and δ. While Latruffe et al. [55] proposed a set of efficient moment conditions to handle endogeneity, the model of Amsler et al. [7] is more general because it can handle endogeneity in the determinants of inefficiency as well. Finally, the presence of z is attractive since this allows the researcher to dispense with distributional assumptions on v and u.

Estimation of individual inefficiency using dependence information
Once the parameters of the SFM have been estimated, estimates of firm level productivity and efficiency can be recovered. Observation-specific estimates of inefficiency are one of the main benefits of the SFM relative to neoclassical models of production. Firms can be ranked according to estimated efficiency; the identity of under-performing firms as well as those who are deemed best practice can also be gleaned from the estimated SFM. All of this information is useful in helping to design more efficient public policy or subsidy programs aimed at improving the market, for example, insulating consumers from the poor performance of heavily inefficient firms. As a concrete illustration, consider firms operating electricity distribution networks that typically possess a natural local monopoly given that the construction of competing networks over the same terrain is prohibitively expensive.⁹ It is not uncommon for national governments to establish regulatory agencies which monitor the provision of electricity to ensure that abuse of the inherent monopoly power is not occurring. Regulators face the task of determining an acceptable price for the provision of electricity while having to balance the heterogeneity that exists across the firms (in terms of size of the firm and length of the network). Firms which are inefficient may charge too high a price to recoup a profit, but at the expense of operating below capacity. However, given production and distribution shocks, not all departures from the frontier represent inefficiency. Thus, precise measures designed to account for noise are required to parse information from ε i regarding u i .
Alternatively, further investigation could reveal what it is that makes these establishments attain such high levels of performance. This could then be used to identify appropriate government policy implications and responses or identify processes and/or management practices that should be spread (or encouraged) across the less efficient, but otherwise similar, units. This is the essence of the determinants of inefficiency approach discussed in previous section. More directly, efficiency rankings are used in regulated industries such that regulators can set tougher future cost reduction targets for the more inefficient companies, in order to ensure that customers do not pay for the inefficiency of firms.
The only direct estimate coming from the Normal Half Normal SFM is  σ u 2 . This provides context regarding the shape of the Half Normal distribution on u i and the industry average efficiency [ ] u , but not on the absolute level of inefficiency for a given firm. If we are only concerned with the average level of technical efficiency for the population, then this is all the information that is needed. Yet, if we want to know about a specific firm, then something else is required. The main approach to estimating firm-level  9 The SFA literature contains a fairly rich set of examples for the estimation and use of efficiency estimates in different fields of research. For example, in the context of electricity providers, see [36,45,53]; for banking efficiency, see [22] and references cited therein; for the analysis of the efficiency of national health care systems, see [30] and the review [40]; for analyzing efficiency in agriculture, see [13,14,20,58], to mention just a few. inefficiency is the conditional mean estimator [42], commonly known as the JLMS estimator. Their idea was to calculate the expected value of u i conditional on the realization of composed error of the model, ≡ − ε v u i i i , i.e., [ | ] u ε i i .¹⁰ This conditional mean of u i given ε i gives a point prediction of u i . The composed error contains individual-specific information, and the conditional expectation is one measure of firmspecific inefficiency.
JLMS [42] shows that for the Normal Half Normal specification of the SFM, the conditional density function of u i given ε i , ( | ) f u ε i i , is ( ) . This is useful in cases where output is measured in logarithmic form. Furthermore, technical efficiency is bounded between 0 and 1, making it somewhat easier to interpret relative to a raw inefficiency score. Since − e u i is not directly observable, the idea of JLMS [42] can be deployed here, and where * μ i and * σ were defined in (6.1) and (6.2), respectively. Technical efficiency estimates are obtained by replacing the true parameters in (6.4) with MLE estimates from the SFM. When ranking efficiency scores, one should use estimates of [ | ] − u ε 1 i i , which is the first-order approximation of (6.4). Similar expressions for the JMLS [42] and Battese and Coelli [12] efficiency scores can be derived under the assumption that u is Exponential ( [49], p. 82), Truncated Normal ( [49], p. 86), and Gamma ( [49], p. 89); see also [52].
An interesting and important finding from [5] and [6] is that when we allow for dependence of the kinds described in Sections 4 and 5, we can potentially improve estimation of inefficiency through the JLMS estimator. We focus on the case of endogeneity (Section 5) but the case of dependence over t in panels (Section 4) is similar. The traditional predictor [42] is ( | ) u ε i i . However, more information is available when dependence is allowed, namely via η i . This calls for a modified JLMS estimator, ( | ) η u ε , i i i . Note that even though it is assumed that u i is independent from η i , similar to [6], because η i is correlated with v i , there is information that can be used to help predict u i even after conditioning on ε i .

( )
applications. At present there is little cognizance into the direction of impact of unmodelled, or misspecified, dependence on efficiency scores.