To demonstrate how the `drgee` package can be used for O-estimation, we use the dataset `SLID` from the `car` package (Fox and Weisberg, 2011). This publically available dataset contains data from the 1994 wave of Canadian Survey of Labour and Income dynamics, with variables `wages`, `sex`, `education`, `age` and `language` for composite hourly wage (Canadian dollar), sex, years of education, age and native language respectively. The variables `sex` and `language` are factors with levels ("`Female`", "`Male`") and ("`English`", "`French`", "`Other`"), respectively. We refer to the `car` package for a more thorough documentation of the dataset.

Suppose that we wish to use these data study if there is a direct effect of sex on wages, not mediated through education level. Such a direct effect is clearly of substantive interest, since it would be an indication of sex discrimination. To eliminate the mediated effect we wish to control for education level. However, to avoid bias we must then additionally control for covariates that can be confounders for the mediator and the outcome (see Valeri and VanderWeele, 2013 and references therein). It is obvious that age can be such a mediator-outcome confounder, since age is likely to be associated with both education level and wages. Native language may also be a mediator-outcome confounder, by being associated with education level and wages through ethnicity and socio-economic status. We thus use a model for the mean of `wages` conditional on `sex`, `education`, `age` and `language`:
$\begin{array}{rl}& E(wages|sexMale,education,age,languageFrench,languageOther;\mathrm{\beta},{\mathrm{\gamma}}_{0},{\mathrm{\gamma}}_{1},{\mathrm{\gamma}}_{2},{\mathrm{\gamma}}_{3},{\mathrm{\gamma}}_{4})\\ & =\mathrm{\beta}\cdot sexMale+{\mathrm{\gamma}}_{0}+{\mathrm{\gamma}}_{1}\cdot education+{\mathrm{\gamma}}_{2}\cdot age+{\mathrm{\gamma}}_{3}\cdot languageFrench+{\mathrm{\gamma}}_{4}\cdot languageOther\end{array}$where the factors `sex` and `language` are recoded as dummy variables, with reference levels "`Female`" and "`English`", respectively. In this model, $Y=wage$, $A=sexMale$, $L=(education,age,languageFrench,languageOther)$, $X(L)=1$ and $V(L)=(1,L)$. The target parameter is $\mathrm{\beta}$ and the main model is
$\begin{array}{rl}& E(wages|sexMale,education,age,languageFrench,languageOther)\\ & -E(wages|sexMale=0,education,age,languageFrench,languageOther)\\ & =\mathrm{\beta}\cdot sexMale\end{array}$[7]The nuisance parameter is $\mathrm{\gamma}=({\mathrm{\gamma}}_{0},{\mathrm{\gamma}}_{1},{\mathrm{\gamma}}_{2},{\mathrm{\gamma}}_{3},{\mathrm{\gamma}}_{4})$ and the outcome nuisance model is
$\begin{array}{rl}& E(wages|sexMale=0,education,age,languageFrench,languageOther)\\ & ={\mathrm{\gamma}}_{0}+{\mathrm{\gamma}}_{1}\cdot education+{\mathrm{\gamma}}_{2}\cdot age\\ & \phantom{\rule{1em}{0ex}}+{\mathrm{\gamma}}_{3}\cdot languageFrench+{\mathrm{\gamma}}_{4}\cdot languageOther\end{array}$[8]To use O-estimation based on these models we type:

> library(car)
> fit <- drgee(oformula=wages~education + age + language,
+ exposure="sex", estimation.method="o", data=SLID)

By setting the `estimation.method` argument to "`o`" we tell the `drgee` function to use O-estimation. The `oformula` argument specifies the outcome nuisance model and the `exposure` argument specifies the exposure. There is no need to explicitly specify the outcome, since this is identified by the `drgee` function as the response variabel in `oformula`. To summarize the results we type:

> summary(fit)

which gives us the output

Call: drgee(exposure = "sex", oformula = wages ~ education + age + language, data
= SLID, estimation.method = "o")
Outcome: wages
Exposure: sexMale
Covariates: education,age,languageFrench,languageOther
Main model: wages ~ sexMale
Outcome nuisance model: wages ~ education + age + languageFrench + languageOther
Outcome link function: identity
Estimate Std. Error z value Pr(>|z|)
sexMale 3.4554 0.2091 16.53 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
(Note: The estimated parameters quantify the conditional exposure-outcome association, given the covariates included in the nuisance models)
3987 complete observations used

Only the result for the main parameters are shown in the output. Since `drgee` only uses complete observations, 3428 incomplete observations in the original dataset `SLID` were not used in the calculations. The estimate of $\mathrm{\beta}$ indicates that men earned 3.46 dollar more per hour than women in the target population, even when controlling for education level, age and native language. This observed sex difference is highly significant, with a p-value much smaller than the nominal 0.05-level.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.