Show Summary Details
More options …

# Epidemiologic Methods

CiteScore 2018: 0.58

SCImago Journal Rank (SJR) 2018: 0.506
Source Normalized Impact per Paper (SNIP) 2018: 0.602

Online
ISSN
2161-962X
See all formats and pricing
More options …
Volume 3, Issue 1

# On the Impact of Misclassification in an Ordinal Exposure Variable

Dongxu Wang
• Department of Statistics, University of BC, Earth Sciences Building, 3182-2207 Main Mall, Vancouver, BC V6T1Z4, Canada
• Email
• Other articles by this author:
/ Paul Gustafson
• Corresponding author
• Department of Statistics, University of BC, Earth Sciences Building, 3182-2207 Main Mall, Vancouver, BC V6T1Z4, Canada
• Email
• Other articles by this author:
Published Online: 2014-06-25 | DOI: https://doi.org/10.1515/em-2013-0017

## Abstract

Say that interest focuses on the relationship between an exposure variable and an outcome variable; however, the exposure variable is subject to measurement error. While exceptions have been identified, in almost all circumstances nondifferential measurement error leads to attenuated regression coefficients and lost power to detect associations. In the case of an ordinal exposure variable subject to nondifferential misclassification, we confirm that power is always lost in a general test for association between the exposure and outcome variables. Surprisingly, however, we find that for a linear test of trend, a gain in power is possible.

## 1 Introduction

In many observational data settings focused on the regression relationship between an exposure variable X and a disease outcome variable Y, the exposure variable is prone to measurement error. That is, the available data are realizations of $\left({X}^{\ast },Y\right)$ rather than $\left(X,Y\right)$, where ${X}^{\ast }$ is a noisy surrogate for X. There is a substantial literature on measurement error problems, particularly involving nondifferential measurement error, whereby ${X}^{\ast }$ and Y are conditionally independent given X. Two typical hallmarks of nondifferential measurement error are (i) attenuation and (ii) power loss. Attenuation, or bias toward the null, refers to a regression coefficient describing the $\left({X}^{\ast },Y\right)$ association being smaller in magnitude than the corresponding (X,Y) coefficient. Consequently, reporting inference based on $\left({X}^{\ast },Y\right)$ data as targeting the $\left(X,Y\right)$ relationship of interest is a biased procedure. Power loss refers to $\left({X}^{\ast },Y\right)$ data yielding less power than $\left(X,Y\right)$ data to detect association.

The maxim of nondifferential exposure measurement error leading to attenuation is widely appealed to in applied work; Jurek et al. (2008) refer to it as “a well-known heuristic in epidemiology.” As documented in Jurek et al. (2006), it is not uncommon to see inference on $\left({X}^{\ast },Y\right)$ data reported without any quantitative adjustment for measurement error, but with a claim that this is a conservative procedure. The rationale for such a claim is that if an adjustment were undertaken, it would push the point estimate away from the null, in order to mitigate the attenuation. Against this, however, there are a number of cautionary papers in the literature. These point to exceptional circumstances where the maxim of attenuation can fail. For instance, see Dosemeci et al. (1990); Wacholder (1995); Carroll (1997); Jurek et al. (2005).

In specific contexts where power loss applies, there is recent work on quantifying the magnitude of the loss (Buonaccorsi et al. 2011; Vanderweele 2012). However, circumstances under which power loss is not guaranteed do not seem to have been investigated. This paper focusses on the situation where X is ordinal with more than two categories. We show that for a general test of $\left(X,Y\right)$ association, nondifferential misclassification of X is indeed guaranteed to reduce power. However, another commonly used test when X is ordinal is a test of linear trend. This test is seen to constitute an exceptional circumstance, as nondifferential misclassification of X can in fact increase the power to detect a trend.

## 2 Misclassification

The exposure X is assumed ordinal with $k>2$ levels, represented as $X\in \left\{1,\dots ,k\right\}$. The discussion applies to disease variables Y of various numerical types, for example, binary, count, continuous. We take $\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$ to describe aspects of the joint distribution of $\left(X,Y\right)$, according to ${\mathrm{\alpha }}_{i}=\phantom{\rule{1pt}{0ex}}\mathrm{p}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(X=i\right)$, ${\mathrm{\beta }}_{i}=E\left(Y|X=i\right)$, and ${\mathrm{\kappa }}_{i}^{-1}=\phantom{\rule{1pt}{0ex}}\mathrm{v}\mathrm{a}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(Y|X=i\right)$, for $i=1,\dots ,k$. Exposure misclassification is manifested in terms of ${X}^{\ast }\in \left\{1,\dots ,k\right\}$ being observed, rather than X itself. Nondifferential misclassification is assumed throughout this paper, whereby ${X}^{\ast }$ and Y are conditionally independent given X. Thus, the misclassification is described by a single $k×k$ classification matrix P, with entries ${p}_{ij}=\phantom{\rule{1pt}{0ex}}\mathrm{p}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left({X}^{\ast }=j|X=i\right)$.

A given P induces a joint distribution of $\left({X}^{\ast },Y\right)$ from the joint distribution of $\left(X,Y\right)$. The induced distribution can be described in similar terms as the original. For instance, ${\mathrm{\alpha }}_{i}^{\ast }=\phantom{\rule{1pt}{0ex}}\mathrm{p}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left({X}^{\ast }=i\right)$ is given by ${\mathrm{\alpha }}^{\ast }={P}^{T}\mathrm{\alpha }.$

Similarly, ${\mathrm{\beta }}_{i}^{\ast }=E\left(Y|{X}^{\ast }=i\right)$ is determined by ${\mathrm{\beta }}^{\ast }=Q\left(\mathrm{\alpha }\right)\mathrm{\beta },$

where ${Q}_{ij}\left(\mathrm{\alpha }\right)=\phantom{\rule{1pt}{0ex}}\mathrm{p}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(X=j|{X}^{\ast }=i\right)$, hence, $Q\left(\mathrm{\alpha }\right)=\left\{\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left({P}^{T}\mathrm{\alpha }\right){\right\}}^{-1}{P}^{T}\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left(\mathrm{\alpha }\right).$

The variance of $\left(Y|{X}^{\ast }\right)$ is $\begin{array}{rl}\left({\mathrm{\kappa }}_{i}^{\ast }{\right)}^{-1}=\phantom{\rule{thinmathspace}{0ex}}& \phantom{\rule{1pt}{0ex}}\mathrm{v}\mathrm{a}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(Y|{X}^{\ast }=i\right)\\ =\phantom{\rule{thinmathspace}{0ex}}& E\left\{\mathrm{v}\mathrm{a}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(Y|X\right)|{X}^{\ast }=i\right\}+\phantom{\rule{1pt}{0ex}}\mathrm{v}\mathrm{a}\mathrm{r}\left\{E\left(Y|X\right)|{X}^{\ast }=i\right\}\\ =\phantom{\rule{thinmathspace}{0ex}}& \sum _{j=1}^{k}{\mathrm{\kappa }}_{j}^{-1}{Q}_{ij}+\sum _{j=1}^{k}{\mathrm{\beta }}_{j}^{2}{Q}_{ij}\left(1-{Q}_{ij}\right)-2\sum _{a

or, more succinctly, $\left(\begin{array}{c}{\left({\mathrm{\kappa }}_{1}^{\ast }\right)}^{-1}\\ \dots \\ {\left({\mathrm{\kappa }}_{k}^{\ast }\right)}^{-1}\end{array}\right)=Q\left(\mathrm{\alpha }\right)\left\{\left(\begin{array}{c}{\mathrm{\kappa }}_{1}^{-1}\\ \dots \\ {\mathrm{\kappa }}_{k}^{-1}\end{array}\right)+\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left(\mathrm{\beta }\right)\mathrm{\beta }\right\}-\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left\{Q\left(\mathrm{\alpha }\right)\mathrm{\beta }\right\}Q\left(\mathrm{\alpha }\right)\mathrm{\beta }.$

Thus, $\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$, which describes the joint distribution of $\left(X,Y\right)$, maps to $\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)$, which describes the joint distribution of $\left({X}^{\ast },Y\right)$. As a minor comment, if we assume normality of $\left(Y|X\right)$, then $\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$ completely characterizes the joint distribution of $\left(X,Y\right)$. However, it does not follow that $\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)$ completely characterizes the joint distribution of $\left({X}^{\ast },Y\right)$, as normality of $\left(Y|X\right)$ induces a mixture of normal distributions for $\left(Y|{X}^{\ast }\right)$.

Generally, we expect less extreme exposure misclassification to result in less damage, when inferences based on the $\left({X}^{\ast },Y\right)$ data are interpreted directly as applying to the $\left(X,Y\right)$ relationship. To quantify the extent to which the exposure classification is nicely behaved, P is termed monotone if, for each i, ${p}_{i,i+\mathrm{\delta }}$ and ${p}_{i,i-\mathrm{\delta }}$ are both decreasing in $\mathrm{\delta }$ for $\mathrm{\delta }\ge 0$. That is, for each i, the distribution of $\left({X}^{\ast }|X=i\right)$ is unimodal with mode at ${X}^{\ast }=i$. Thus, monotonicity corresponds to a less good classification being less likely. A non-monotone P clearly corresponds to rather pathological misclassification.

Following Ogburn and Vanderweele (2013), if both P and ${P}^{T}$ are monotone, then P is said to be tapered, i.e., the classification probabilities taper off as one moves away vertically or horizontally from the diagonal of P. Hence, tapered misclassification corresponds to better behavior than merely monotone misclassification. We can also consider whether P is a banded matrix. For instance, a tridiagonal P rules out classifications that are off by more than one category. In qualitative terms, a P which is both tapered and tridiagonal might be regarded as the least threatening form of exposure misclassification. Another nice property for P to possess is ${min}_{i}\phantom{\rule{thinmathspace}{0ex}}{p}_{ii}>0.5$, i.e., for all exposure levels correct classification is more likely than incorrect classification.

## 3 Hypothesis testing

Say that interest is focused on testing a null hypothesis of the form ${H}_{0}:C\mathrm{\beta }=0$, for an appropriate $q×k$ matrix C, against a general alternative. If correctly classified data are available, in the form of n independent and identically distributed realizations of $\left(X,Y\right)$, then a test can proceed as follows. Let $Z=\left({Z}_{1},\dots ,{Z}_{k}{\right)}^{T}$ be the vector of response means for the k exposure groups, such that if ${n}_{i}$ of the n subjects have $X=i$, then, at least approximately, $Z\sim {N}_{k}\left(\mathrm{\beta },\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left(\mathrm{\nu }\right)\right),$[1]

where ${\mathrm{\nu }}_{i}={n}_{i}^{-1}{\mathrm{\kappa }}_{i}^{-1}$. For the sake of clear exposition, we proceed on the basis that $\mathrm{\kappa }$, and hence $\mathrm{\nu }$, is known. In practice $\mathrm{\kappa }$ would be estimated, either directly on the basis of within-group variation in responses (e.g., if Y is continuous) or on the basis of a postulated mean–variance relationship.

Starting from eq. [1], standard linear model theory gives the test statistic $T={Z}^{T}{C}^{T}{\left\{C\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left(\mathrm{\nu }\right){C}^{T}\right\}}^{-1}CZ$[2]

as central ${\mathrm{\chi }}_{q}^{2}$ distributed under the null, but noncentral ${\mathrm{\chi }}_{q}^{2}$ distributed under the alternative, with noncentrality parameter ${\mathrm{\beta }}^{T}{C}^{T}\left(C\phantom{\rule{1pt}{0ex}}\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{g}\phantom{\rule{1pt}{0ex}}\left(\mathrm{\nu }\right){C}^{T}{\right)}^{-1}C\mathrm{\beta }$. We gain insight by replacing the realized group sizes ${n}_{i}$ with the expected group sizes $n{\mathrm{\alpha }}_{i}$, for $i=1,\dots ,k$. Then the noncentrality parameter becomes $ng\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$, where $g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)={\mathrm{\beta }}^{T}{C}^{T}{\left\{CD\left(\mathrm{\alpha },\mathrm{\kappa }\right){C}^{T}\right\}}^{-1}C\mathrm{\beta },$[3]

in which $D\left(\mathrm{\alpha },\mathrm{\kappa }\right)$ is the diagonal matrix with entries ${D}_{ii}\left(\mathrm{\alpha },\mathrm{\kappa }\right)={\mathrm{\alpha }}_{i}^{-1}{\mathrm{\kappa }}_{i}^{-1}$. With the form [3] in hand, the question of whether or not power is lost as a result of misclassification is simply abstracted as the question of whether or not $g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$.

## 4 General test for association

A general test for exposure–disease association would focus on the null hypothesis ${\mathrm{\beta }}_{1}={\mathrm{\beta }}_{2}=\cdots ={\mathrm{\beta }}_{k}$. This null, concerning the distribution of $\left(X,Y\right)$, implies the corresponding null for $\left({X}^{\ast },Y\right)$, i.e., ${\mathrm{\beta }}_{1}^{\ast }=\cdots ={\mathrm{\beta }}_{k}^{\ast }$. In the $k=2$ case of binary exposure, it has been known since at least Bross (1954) that nondifferential misclassification induces power loss. However, the literature on misclassification with $k>2$ exposure categories has focused more on estimation than on testing [see, for instance, Dosemeci et al. (1990); Weinberg et al. (1994)]. In the present context, at least for the homoscedastic case that $\phantom{\rule{1pt}{0ex}}\mathrm{v}\mathrm{a}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(Y|X\right)$ is constant, we can indeed confirm that power is lost as a result of having misclassified data rather than correctly classified data. The following result applies for any misclassification matrix P. In fact, the proof makes no use of the fact that X is ordered, so the result applies to categorical X more generally.

Theorem 1. Consider $C=\left({1}_{k-1}\phantom{\rule{thickmathspace}{0ex}}-{I}_{k-1}\right)$, corresponding to the null hypothesis of equal mean outcomes for all exposure levels. In the homoscedastic situation of equal variances (${\mathrm{\kappa }}_{1}={\mathrm{\kappa }}_{2}=\cdots ={\mathrm{\kappa }}_{k}$), $g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)\le g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right).$

That is, nondifferential misclassification cannot increase the power for detecting a relationship between mean outcome and exposure level.

Proof. Direct calculation gives $g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)=\sum _{i=1}^{k}{\mathrm{\alpha }}_{i}{\mathrm{\kappa }}_{i}{\left({\mathrm{\beta }}_{i}-\stackrel{˜}{\mathrm{\beta }}\right)}^{2},$

with $\stackrel{˜}{\mathrm{\beta }}=\left({\sum }_{i=1}^{k}{\mathrm{\alpha }}_{i}{\mathrm{\kappa }}_{i}{\mathrm{\beta }}_{i}\right)/\left({\sum }_{i=1}^{k}{\mathrm{\alpha }}_{i}{\mathrm{\kappa }}_{i}\right)$. Consequently, $\begin{array}{rl}g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)=\phantom{\rule{thinmathspace}{0ex}}& \sum _{i=1}^{k}{\mathrm{\alpha }}_{i}^{\ast }{\mathrm{\kappa }}_{i}^{\ast }{\left({\mathrm{\beta }}_{i}^{\ast }-{\stackrel{˜}{\mathrm{\beta }}}^{\ast }\right)}^{2}\\ =\phantom{\rule{thinmathspace}{0ex}}& \underset{s}{min}\sum _{i=1}^{k}{\mathrm{\alpha }}_{i}^{\ast }{\mathrm{\kappa }}_{i}^{\ast }\left({\mathrm{\beta }}_{i}^{\ast }-s{\right)}^{2}\\ \le \phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\sum _{i=1}^{k}{\mathrm{\alpha }}_{i}^{\ast }\left({\mathrm{\beta }}_{i}^{\ast }-s{\right)}^{2}\\ =\phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\sum _{i=1}^{k}\left(\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}\right){\left(\frac{\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}{p}_{ji}}{\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}}-s\right)}^{2}\\ =\phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\left[\sum _{i=1}^{k}\frac{{\left(\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}{p}_{ji}\right)}^{2}}{\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}}-2s\sum _{i=1}^{k}\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}{p}_{ji}+{s}^{2}\sum _{i=1}^{k}\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}\right]\\ =\phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\left[\left\{\sum _{i=1}^{k}\frac{{\left(\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}{p}_{ji}\right)}^{2}}{\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}}\right\}-2s\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}+{s}^{2}\right].\end{array}$

Now, by the Cauchy–Schwarz inequality, ${\left(\frac{{\sum }_{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}{p}_{ji}}{{\sum }_{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}}\right)}^{2}\le \frac{{\sum }_{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}^{2}{p}_{ji}}{{\sum }_{j=1}^{k}{\mathrm{\alpha }}_{j}{p}_{ji}}.$

Hence, $\begin{array}{rl}g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)\le \phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\left(\sum _{i=1}^{k}\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}^{2}{p}_{ji}-2s\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}+{s}^{2}\right)\\ =\phantom{\rule{thinmathspace}{0ex}}& {\mathrm{\kappa }}_{1}\underset{s}{min}\left(\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}^{2}-2s\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\beta }}_{j}+{s}^{2}\right)\\ =\phantom{\rule{thinmathspace}{0ex}}& \underset{s}{min}\sum _{j=1}^{k}{\mathrm{\alpha }}_{j}{\mathrm{\kappa }}_{j}\left({\mathrm{\beta }}_{j}-s{\right)}^{2}\\ =\phantom{\rule{thinmathspace}{0ex}}& g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right),\end{array}$

as claimed.

## 5 Test for trend across categories

As will be demonstrated momentarily, a version of Theorem 1 does not hold for the linear trend test. This test, versions of which were first proposed by Cochran (1954) and Armitage (1955), uses the linear contrast $C=\left(1-k,3-k,\dots ,k-3,k-1\right)$ to define the linear component of $E\left(Y|X\right)$. A direct interpretation of the test is that the null $C\mathrm{\beta }=0$ holds if and only if the least-squares slope when $\left({\mathrm{\beta }}_{1},\dots ,{\mathrm{\beta }}_{k}\right)$ is regressed upon $\left(1,\dots ,k\right)$ is zero.

To gain a general sense of how this test is affected by misclassification, we compare $g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)$ to $g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)$ for a large ensemble of $\mathrm{\alpha }$, $\mathrm{\beta }$, and P values. Particularly, we fix $k=6$ exposure categories and also fix $\mathrm{\kappa }=\left(1,\dots ,1\right)$. Then values of $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$ are drawn from 12 different probability distributions. The first three distributions arise from fixing $\mathrm{\alpha }=\left(1/6,\dots 1/6\right)$, thinking of a uniform distribution for X as being a simple but important special case. A distribution on $\mathrm{\beta }$ is assigned by fixing ${\mathrm{\beta }}_{1}=0$ without loss of generality, and then taking the increments $\left({\mathrm{\beta }}_{2}-{\mathrm{\beta }}_{1},\dots ,{\mathrm{\beta }}_{k}-{\mathrm{\beta }}_{k-1}\right)$ to be distributed as $\mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}\phantom{\rule{1pt}{0ex}}\left(c,\dots ,c\right)$, so that $\mathrm{\beta }$ is increasing and ${\mathrm{\beta }}_{k}=1$ is fixed. In our experiments we take $c=2$, which allows the possibility of far from linear patterns (as c increases the distribution puts more weight on relationships which are closer to linear). Then the three distributions are based on sampling P from a specified distribution over tridiagonal classification matrices, and conditioning on either (i) P being non-monotone, (ii) P being monotone but not tapered, or (iii) P being tapered. The distribution on P generates tridiagonal matrices by independently drawing $\left({p}_{11},{p}_{12}\right)\sim \phantom{\rule{1pt}{0ex}}\mathrm{b}\mathrm{e}\mathrm{t}\mathrm{a}\phantom{\rule{1pt}{0ex}}\left(a,b\right)$, $\left({p}_{k,k-1},{p}_{kk}\right)\sim \phantom{\rule{1pt}{0ex}}\mathrm{b}\mathrm{e}\mathrm{t}\mathrm{a}\phantom{\rule{1pt}{0ex}}\left(b,a\right)$, and $\left({p}_{i,i-1},{p}_{ii},{p}_{i,i+1}\right)\sim \phantom{\rule{1pt}{0ex}}\mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}\phantom{\rule{1pt}{0ex}}\left(b/2,a,b/2\right)$, for $i=2,\dots ,k-1$. We use $\left(a,b\right)=\left(20,10\right)$, giving a mean of 0.667 and a standard deviation of 0.085 for each diagonal element of P, before conditioning. This ensures that a wide range of classification matrices is being generated, with the typical extent of misclassification being quite considerable.

The next three distributions arise exactly as per the first three, except $\mathrm{\alpha }$ is now also taken as random, with $\mathrm{\alpha }\sim \phantom{\rule{1pt}{0ex}}\mathrm{D}\mathrm{i}\mathrm{r}\mathrm{i}\mathrm{c}\mathrm{h}\mathrm{l}\mathrm{e}\mathrm{t}\phantom{\rule{1pt}{0ex}}\left(d,\dots ,d\right)$, so that each ${\mathrm{\alpha }}_{i}$ has mean $1/6$. This allows departures from X being uniformly distributed (which would correspond to $d\to \mathrm{\infty }$). We wish to take d small enough to engender substantial departures from uniformity, but not so small as to yield distributions that place almost no mass on one or more values. Thus we set $d=8$, which induces a standard deviation of 0.053 for each ${\mathrm{\alpha }}_{i}$.

Results for $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$ arising from these six distributions appear in Figure 1. For both the general test and the trend test, results are given in terms of relative power, defined as the power achieved from misclassified data $\left({X}^{\ast },Y\right)$ at the sample size for which ideal data $\left(X,Y\right)$ yield 80% power. That is, if ${F}_{q}\left(;t\right)$ denotes the distribution function for the noncentral ${\mathrm{\chi }}_{q}^{2}$ distribution with noncentrality t, then the relative power is $RP=1-{F}_{q}\left\{{F}_{q}^{-1}\left(0.95;0\right);{t}_{q}\frac{g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)}{g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)}\right\},$where ${t}_{q}$ solves ${F}_{q}\left\{{F}_{q}^{-1}\left(0.95;0\right);t\right\}=0.2$. For the general test, Theorem 1 guarantees that the relative power cannot exceed 80%. Figure 1 shows contrary behavior for the trend test. At least when the distribution of X can depart from uniformity, there are values of P (of all three types) and $\mathrm{\alpha }$ and $\mathrm{\beta }$ for which misclassification induces power gain. Of course these results are based on first-order asymptotic theory; in Appendix A we present a small simulation indicating that the theory does indeed translate to the finite-sample case. Also, while our simulated scenarios never produced a power gain when X is uniformly distributed, in Appendix B we exhibit a $\mathrm{\beta }$ and a tridiagonal, tapered P such that there is power gain in this case.

Figure 1

Relative power for the trend test and for the general test, with tridiagonal misclassification matrices. The panels give 1,000 realizations from each of the six distributions over $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$ described in the text, with $\mathrm{\alpha }$ fixed (random) in the upper (lower) panels. Quartiles of relative power are indicated (diamonds), and the reference lines correspond to 80% relative power

The remaining six distributions for $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$ are constructed as per the first six described above, but with the underlying distribution of P modified so as to generate non-banded matrices. This is achieved by taking the rows of P to be independently distributed as Dirichlet over all k categories. To mimic the earlier construction, for each row the parameters of the Dirichlet distribution are taken to be a for the correct category and to sum to b over the $k-1$ incorrect categories, implying the same $\phantom{\rule{1pt}{0ex}}\mathrm{b}\mathrm{e}\mathrm{t}\mathrm{a}\phantom{\rule{1pt}{0ex}}\left(a,b\right)$ distribution for diagonal elements as previously. For all but the first and last rows, sub-sums of $b/2$ are forced, both below and above the correct category. The specification is completed by fixing a value of r such that the Dirichlet parameters for a given row decay geometrically by a factor of r as we move further away from the correct category. We take $\left(a,b,r\right)=\left(20,10,0.25\right)$ in generating the results of Figure 2. Qualitatively, we see very similar behavior as before with tridiagonal misclassification.

Figure 2

Relative power for the trend test and for the general test, with non-banded misclassification matrices. The panels give 1,000 realizations from each of the six distributions over $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$ described in the text, with $\mathrm{\alpha }$ fixed (random) in the upper (lower) panels. Quartiles of relative power are indicated (diamonds), and the reference lines correspond to 80% relative power

To seek further insight into when misclassification increases power, we focus on the situation of non-uniform X distributions and tapered misclassification matrices, i.e., the settings underlying the lower right panels of both Figures 1 and 2. For these simulated values, in Figure 3 we first plot the relative power of the trend test against the absolute ratio of $C{\mathrm{\beta }}^{\ast }$ to $C\mathrm{\beta }$. Values of this ratio larger than one correspond to the misclassification inflating the slope which summarizes the linear component of the exposure–disease relationship. In situations involving linear regression of a continuous outcome on a continuous exposure, nondifferential exposure misclassification is quite widely guaranteed to attenuate the regression slope being estimated [see, for instance, Gustafson (2004); Carroll et al. (2006)]. Figure 3 shows that such a guarantee does not apply to the present situation, but it is perhaps not surprising that attenuation is much more common than inflation across the simulated scenarios. What is more surprising is that slope inflation is neither a necessary condition nor a sufficient condition for power gain. That is, in some scenarios the misclassification inflates the slope yet power is lost for detecting that the slope is not zero. And in some situations, the misclassification attenuates the slope, yet power is gained for detecting that the slope is not zero. So the behavior of the slopes alone is not determining whether power is lost or gained.

Next in Figure 3 we plot the relative power against ${min}_{i}\phantom{\rule{thinmathspace}{0ex}}{p}_{ii}$, to see whether the worst probability of correct classification across exposure categories is a driving force behind the power of the test based on misclassified data. The plots provide a negative answer here, indicating very little association, if any, between this characteristic of the classification matrix and the relative power.

Finally, we consider the extent to which misclassification pushes the exposure distribution toward or away from uniformity. We define the log sum-of-squares ratio (LSSR) as $\mathrm{L}\mathrm{S}\mathrm{S}\mathrm{R}=log\frac{{\sum }_{i=1}^{k}{\left({\mathrm{\alpha }}_{i}^{\ast }-{k}^{-1}\right)}^{2}}{{\sum }_{i=1}^{k}{\left({\mathrm{\alpha }}_{i}-{k}^{-1}\right)}^{2}}.$So, for instance, a large positive value of LSSR describes a case where the misclassification induces a distribution of ${X}^{\ast }$ that is much further from uniform than the distribution of X. Figure 3 exhibits quite strong negative associations between relative power and LSSR, in both the tridiagonal and non-banded cases. Thus scenarios with power gain tend to have misclassification which makes ${X}^{\ast }$ much more uniformly distributed than X. But again this is only a stochastic tendency with respect to the chosen distribution of $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$. A low LSSR is neither a necessary nor sufficient condition for power gain. In general, there seems to be considerable complexity in how the distribution of X, the distribution of $\left(Y|X\right)$, and the misclassification matrix P combine to determine the performance of the trend test applied to misclassified data.

Figure 3

Relative power of the trend test versus (i) absolute ratio of slopes (top panels), (ii) smallest diagonal element of P (middle panels), and (iii) LSSR (bottom panels). The misclassification matrices are tapered throughout. The left (right) panels correspond to tridiagonal (non-banded) misclassification matrices. For both distributions of $\left(\mathrm{\alpha },\mathrm{\beta },P\right)$, 1,000 realizations are simulated

## 6 Discussion

The idea that nondifferential exposure misclassification causes attenuation in estimating exposure–disease relationships is well established in the literature, to the point that concerns are raised about this maxim being quoted in situations beyond its domain of applicability. The related idea that nondifferential exposure misclassification causes power loss for detecting exposure–disease association is also present in the literature, albeit with less emphasis. Of course, the situation is more nuanced with hypothesis testing than with estimation, since typically nondifferential exposure misclassification does not invalidate testing the way it invalidates estimation. That is, treating $\left({X}^{\ast },Y\right)$ “as-is” results in valid hypothesis testing for $\left(X,Y\right)$ association, but biased estimation of the magnitude of this association (when non-null). Consequently, comparing the power of a test using $\left({X}^{\ast },Y\right)$ data to that of a test using $\left(X,Y\right)$ data is a natural quantification of the impact of misclassification.

Just as exceptions to the attenuation maxim have been noted, here we have exhibited a situation where power loss is not guaranteed. In the ordinal exposure setting, nondifferential misclassification is guaranteed to reduce power for the general test of association, but not for the test of linear trend. We have not been able to give a deterministic characterization of the distributions for X and $\left(Y|X\right)$ and the classification matrices P for which power is actually gained via misclassification. Indeed, our numerical results point to these three entities combining in a complicated manner to determine the extent to which power is lost, or perhaps gained. Looking for the parameter settings under which power gain occurs seems somewhat akin to looking for needles in a haystack. Thus, there is no “free lunch” – it would be foolish to deliberately add artificial misclassification to exposure data, in the hope of increasing the power of a given study. Rather, we simply think it important to add the potential for power gain with this test to the extant list of caveats concerning heuristics for exposure measurement error.

Settings where the noisy exposure is much more uniformly distributed than the true exposure are seen to be more prone to exhibiting power gain. Upon reflection, there may be some intuitive sense in this. Generally, a wider distribution of a regressor imbues more precise estimation of a regression slope, and this precision contributes to the power of the association test. However, in the linear regression setting, the benefit of increased variance of ${X}^{\ast }$ compared to X is more than offset by the attenuation in the slope itself and by the increase in residual variance of $\left(Y|{X}^{\ast }\right)$ compared to $\left(Y|X\right)$. In the present setting of the trend test for an ordinal exposure, we still lack a complete understanding of why these offsetting mechanisms are not always in full force.

## Appendix A

In the setting of a tapered and tridiagonal classification matrix, the highest relative power obtained in the simulation study was 92.7%, arising from $P=\left(\begin{array}{cccccc}0.599& 0.401& 0& 0& 0& 0\\ 0.080& 0.745& 0.175& 0& 0& 0\\ 0& 0.160& 0.718& 0.122& 0& 0\\ 0& 0& 0.302& 0.522& 0.176& 0\\ 0& 0& 0& 0.065& 0.606& 0.329\\ 0& 0& 0& 0& 0.311& 0.689\end{array}\right),$$\mathrm{\alpha }=\left(0.242,0.036,0.232,0.099,0.299,0.092\right)$ and $\mathrm{\beta }=\left(0,0.247,0.586,0.907,1\right)$. Using these settings, we repeatedly generate 5,000 independent datasets, each comprised of $n=500$ observations on $\left(X,{X}^{\ast },Y\right)$, under both the alternative (with the value of $\mathrm{\beta }$ given above) and under the null (with $\mathrm{\beta }=\left(0,\dots ,0\right)$). Note that while n itself is large, for this choice of $\mathrm{\alpha }$ the expected number of subjects in the $X=2$ group is quite small. The simulation is carried out using ${\mathrm{\kappa }}_{i}^{-1}=\phantom{\rule{1pt}{0ex}}\mathrm{v}\mathrm{a}\mathrm{r}\phantom{\rule{1pt}{0ex}}\left(Y|X\right)={2.2}^{2}$, which is found empirically to yield about 80% power for the trend test using $\left(X,Y\right)$ data. For each simulated dataset, a likelihood-ratio test is implemented using the test statistic [2] compared to the ${\mathrm{\chi }}_{1}^{2}$ distribution, with the standard error for each group mean “plugged in” as if it were known. For simulation under the null, empirical type I error rates are 5.5% for $\left(X,Y\right)$ data and also 5.5% for $\left({X}^{\star },Y\right)$ data, with Monte Carlo standard error of 0.3% in each case, i.e., the rates are within simulation error of nominal. For simulation under the alternative, empirical power is 78.5% for $\left(X,Y\right)$ data and 92.3% for $\left({X}^{\ast },Y\right)$ data, with a Monte Carlo standard error of 0.6% for the difference between the two powers. Thus, we have a finite-sample demonstration that the test is valid under the null for both X data and ${X}^{\ast }$ data, but the latter yields higher power under the alternative.

## Appendix B

With $k=6$, $\mathrm{\alpha }=\left(1/6,\dots 1/6\right)$, $\mathrm{\beta }=\left(0,0.091,0.455,0.545,0.901,1\right)$, $\mathrm{\kappa }=\left(1,\dots ,1\right)$ and $P=\left(\begin{array}{cccccc}0.9& 0.1& 0& 0& 0& 0\\ 0.25& 0.7& 0.05& 0& 0& 0\\ 0& 0& 1& 0& 0& 0\\ 0& 0& 0& 1& 0& 0\\ 0& 0& 0& 0.05& 0.7& 0.25\\ 0& 0& 0& 0& 0.1& 0.9\end{array}\right),$we obtain $g\left({\mathrm{\alpha }}^{\ast },{\mathrm{\beta }}^{\ast },{\mathrm{\kappa }}^{\ast }\right)/g\left(\mathrm{\alpha },\mathrm{\beta },\mathrm{\kappa }\right)=1.0022$. Thus with a uniform distribution over X and a tapered, tridiagonal classification matrix it is possible for power to increase upon misclassification.

## References

• Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics, 11:375–386.

• Bross, I. (1954). Misclassification in 2 × 2 tables. Biometrics, 10:478–486.

• Buonaccorsi, J. P., Laake, P., and Veierød, M. B. (2011). On the power of the Cochran – Armitage test for trend in the presence of misclassification. Statistical Methods in Medical Research.

• Carroll, R. (1997). Surprising effects of measurement error on an aggregate data estimator. Biometrika, 84:231–234.

• Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Edition. Boca Raton, FL: Chapman & Hall, CRC Press. Google Scholar

• Cochran, W. G. (1954). Some methods for strengthening the common χ2 tests. Biometrics, 10:417–451.

• Dosemeci, M., Wacholder, S., and Lubin, J. H. (1990). Does nondifferential misclassification of exposure always bias a true effect toward the null value? American Journal of Epidemiology, 19:746–748. Google Scholar

• Gustafson, P. (2004). Measurement Error and Misclassification in Statistics and Epidemiology: Impact and Bayesian Adjustments. Boca Raton, FL: Chapman & Hall, CRC Press. Google Scholar

• Jurek, A. M., Greenland, S., and Maldonado, G. (2008). Brief report how far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null? International Journal of Epidemiology, 37:382–385.

• Jurek, A. M., Greenland, S., Maldonado, G., and Church, T. R. (2005). Proper interpretation of non-differential misclassification effects: expectations vs observations. International Journal of Epidemiology, 34:680–687.

• Jurek, A. M., Maldonado, G., Greenland, S., and Church, T. R. (2006). Exposure-measurement error is frequently ignored when interpreting epidemiologic study results. European Journal of Epidemiology, 21:871–876.

• Ogburn, E. L., and Vanderweele, T. J. (2013). Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. Biometrika, 100:241–248.

• Vanderweele, T. J. (2012). Inference for additive interaction under exposure misclassification. Biometrika, 99:502–508.

• Wacholder, S. (1995). When measurement errors correlate with truth: surprising effects of nondifferential misclassification. Epidemiology, 6:157–161.

• Weinberg, C. R., Umbach, D. M., and Greenland, S. (1994). When will nondifferential misclassification of an exposure preserve the direction of a trend? (with discussion). American Journal of Epidemiology, 140:565–571. Google Scholar

Published Online: 2014-06-25

Published in Print: 2014-12-01

Citation Information: Epidemiologic Methods, Volume 3, Issue 1, Pages 97–106, ISSN (Online) 2161-962X, ISSN (Print) 2194-9263,

Export Citation

## Citing Articles

[1]
Ashley Ling, El Hamidi Hay, Samuel E. Aggrey, Romdhane Rekaya, and Yong Deng
PLOS ONE, 2018, Volume 13, Number 12, Page e0208433
[2]
Liangrui Sun, Michelle Xia, Yuanyuan Tang, and Philip G. Jones
Journal of Statistical Computation and Simulation, 2017, Page 1
[3]
Roseanna Presutti, Shelley A. Harris, Linda Kachuri, John J. Spinelli, Manisha Pahwa, Aaron Blair, Shelia Hoar Zahm, Kenneth P. Cantor, Dennis D. Weisenburger, Punam Pahwa, John R. McLaughlin, James A. Dosman, and Laura Beane Freeman
International Journal of Cancer, 2016, Volume 139, Number 8, Page 1703