In this subsection, we introduce a key complication to demonstrate the risks associated with estimators that do not include sampling weights in the propensity score estimation step. Namely, we introduce a variable $z$ that is used in the construction of the sampling weights, but is unavailable when estimating the propensity score. Such $z$s are not so unusual, particularly for datasets that are part of the federal statistical system. For example, sampling weights often include features related to fielding method, geography, and household, any of which could be related to outcomes of interest. These features might not be available (or might not be available at the same resolution) to the analyst. There is also the risk that the analyst might not be entirely aware of what features were used in constructing the design weights even if all of those features are technically available.

Combining eqs (1) and (3), the estimator of PATT has the form
$\stackrel{\u02c6}{\mathrm{P}\mathrm{A}\mathrm{T}\mathrm{T}}=\frac{{\sum}_{i=1}^{n}{y}_{i}{t}_{i}(1/{p}_{i})}{{\sum}_{i=1}^{n}{t}_{i}(1/{p}_{i})}-\frac{{\sum}_{i=1}^{n}{y}_{i}(1-{t}_{i})({e}_{i}/1-{e}_{i})(1/{p}_{i})}{{\sum}_{i=1}^{n}(1-{t}_{i})({e}_{i}/1-{e}_{i})(1/{p}_{i})}$(4)where ${e}_{i}=f(t=1|{\mathbf{x}}_{i})$, the propensity score. The first term estimates $\mathrm{E}({y}_{1}|t=1)$ regardless of the relationship of $z$ to the treatment assignment or to the potential outcomes. We write out the details of this well-known property to more easily guide the analysis into the more complicated second term.
$\frac{{\sum}_{i=1}^{n}{y}_{i}{t}_{i}(1/{p}_{i})}{{\sum}_{i=1}^{n}{t}_{i}(1/{p}_{i})}\to \frac{\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{1}\frac{1}{f(s=1|{y}_{1},\mathbf{x},z,t=1)}f({y}_{1},\mathbf{x},z|s=1,t=1)\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}{\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \frac{1}{f(s=1|{y}_{1},\mathbf{x},z,t=1)}f({y}_{1},\mathbf{x},z|s=1,t=1)\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}$(5)
$=\frac{\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{1}\frac{f(s=1|{y}_{1},\mathbf{x},z,t=1)f({y}_{1},\mathbf{x},z|t=1)}{f(s=1|{y}_{1},\mathbf{x},z,t=1)f(s=1|t=1)}\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}{\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \frac{f(s=1|{y}_{1},\mathbf{x},z,t=1)f({y}_{1},\mathbf{x},z|t=1)}{f(s=1|{y}_{1},\mathbf{x},z,t=1)f(s=1|t=1)}\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}$(6)
$=\frac{\frac{1}{f(s=1|t=1)}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{1}f({y}_{1},\mathbf{x},z|t=1)\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}{\frac{1}{f(s=1|t=1)}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int f({y}_{1},\mathbf{x},z|t=1)\phantom{\rule{1pt}{0ex}}d{y}_{1}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz}$(7)
$=\mathrm{E}({y}_{1}|t=1)$(8)The second term in eq. (4) should estimate $\mathrm{E}({y}_{0}|t=1)$, but we will see that this depends on how we estimate the propensity scores, ${p}_{i}$, and on a few key assumptions. Here we will just analyze the numerator since, as seen in eq. (7) the denominator is simply a normalization term.

If we use the approach described in Zanutto [3] and DuGoff et al. [1] then we insert the propensity score conditional on $s=1$ in place of ${e}_{i}$.
$\sum _{i=1}^{n}{y}_{i}(1-{t}_{i})}\frac{{e}_{i}}{1-{e}_{i}}\frac{1}{{p}_{i}}\to n{\displaystyle \int \text{}}\text{}{\displaystyle \int \text{}}\text{}{\displaystyle \int {y}_{0}}\frac{f(t=1|x,s=1)}{f(t=0|x,s=1)}\frac{f({y}_{0},x,z|t=0,s=1)}{f(s=1|{y}_{0},x,z,t=0)}{\text{dy}}_{\text{0}}\text{dxdz$(9)
$=n{\displaystyle \int \text{}}\text{}{\displaystyle \int \text{}}\text{}{\displaystyle \int {y}_{0}}\frac{f(t=1|x,s=1,{y}_{0})}{f(t=0|x,s=1,{y}_{0})}\frac{f({y}_{0},x,z|t=0)}{f({y}_{0},x,z|t=0,s=1)f(s=1|t=0)}f({y}_{0},x,z|t=0,s=1){\text{dy}}_{\text{0}}\text{dxdz}$(10)In eq. (10) we need $t$ to be independent of ${y}_{0}$ conditional on $\mathbf{x}$, the standard independence assumption used in PSA, in order to insert ${y}_{0}$ in the treatment conditional probabilities. Integrating out $z$ we obtain
$=\frac{n}{f(s=1|t=0)}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{0}\frac{f(t=1|{y}_{0},\mathbf{x},s=1)}{f(t=0|{y}_{0},\mathbf{x},s=1)}f({y}_{0},\mathbf{x}|t=0)\phantom{\rule{1pt}{0ex}}d{y}_{0}\phantom{\rule{1pt}{0ex}}d\mathbf{x}$(11)
$=\frac{n}{f(s=1|t=0)}\frac{f(t=0)}{f(t=1)}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{0}\frac{f({y}_{0},\mathbf{x}|t=0)}{f({y}_{0},\mathbf{x},s=1|t=0)}f({y}_{0},\mathbf{x},s=1|t=1)\phantom{\rule{1pt}{0ex}}d{y}_{0}\phantom{\rule{1pt}{0ex}}d\mathbf{x}$(12)Bringing in the denominator normalizing term we arrive at
$\frac{{\sum}_{i=1}^{n}{y}_{i}(1-{t}_{i})({e}_{i}/(1-{e}_{i}))(1/{p}_{i})}{{\sum}_{i=1}^{n}(1-{t}_{i})({e}_{i}/(1-{e}_{i}))(1/{p}_{i})}\to \int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{0}\frac{f({y}_{0},\mathbf{x}|t=0)}{f({y}_{0},\mathbf{x},s=1|t=0)}f({y}_{0},\mathbf{x},s=1|t=1)\phantom{\rule{1pt}{0ex}}d{y}_{0}\phantom{\rule{1pt}{0ex}}d\mathbf{x}$(13)However, eq. (13) shows that the estimator is not necessarily consistent for $\mathrm{E}({y}_{0}|t=1)$. We need $f({y}_{0},\mathbf{x},s|t)=f({y}_{0},\mathbf{x}|t)f(s|t)$ but the independence of $s$ and $({y}_{0},\mathbf{x})$ cannot be guaranteed when $z$, even though no longer directly visible in the expression, might have induced a correlation.

To repair the treatment effect estimator, we need to modify eq. (9) by using the sampling weights in the estimation of the propensity score model so that we obtain a consistent estimate of $f(t=1|\mathbf{x})$. Doing so results in
$\sum _{i=1}^{n}{y}_{i}(1-{t}_{i})\frac{{e}_{i}}{1-{e}_{i}}\frac{1}{{p}_{i}}\to n\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{0}\frac{f(t=1|\mathbf{x})}{f(t=0|\mathbf{x})}\frac{f({y}_{0},\mathbf{x},z|t=0,s=1)}{f(s=1|{y}_{0},\mathbf{x},z,t=0)}\phantom{\rule{1pt}{0ex}}d{y}_{0}\phantom{\rule{1pt}{0ex}}d\mathbf{x}\phantom{\rule{1pt}{0ex}}dz$(14)
$=\int \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\int {y}_{0}f({y}_{0},\mathbf{x}|t=1)\phantom{\rule{1pt}{0ex}}d{y}_{0}\phantom{\rule{1pt}{0ex}}d\mathbf{x}$(15)
$=\mathrm{E}({y}_{0}|t=1)$(16)The following simple example shown in makes the situation more concrete. In this example, the sampling and treatment assignment probabilities depend on $z$, but the potential outcomes do not depend on $z$.

Table 1: Example scenario with sampling and treatment probabilities dependent on $z$.

The inclusion or exclusion of the sampling weights in the propensity score model and the necessary assumptions do in fact matter. compares the asymptotic results for this example. The first column in shows the asymptotic mean of $x$ and $y$ for the treated group. The second column shows using sampling weights in both stages of the PSA results in balance on $x$ and an $\mathrm{E}({y}_{0}|t=1)$ identical to the treatment group, consistent with the actual null treatment effect simulated here. The third column shows that without sampling weights in the propensity score model we do not get balance on $x$ and do not correctly estimate the null treatment effect.

Table 2: Asymptotic values for methods with and without sampling weights in the propensity score model.

This analysis demonstrates that when there are case features used in the development of the sampling weights that are unavailable when estimating the propensity score, consistent estimates depend on additional independence assumptions, assumptions that are not needed if the propensity score estimator uses the sampling weights.

## Comments (0)