If there exists a window ${W}_{0}=[\underset{\_}{r},\stackrel{\u02c9}{r}]$ where our randomization-type condition Assumption 1 holds, and this window is known, applying randomization inference procedures to the RD design is straightforward. In practice, however, this window will be unknown and must be chosen by the researcher. This is the main methodological challenge of applying a randomization inference approach to RD designs and is analogous to the problem of bandwidth selection in conventional nonparametric RD approaches [7, 8].

Imposing Assumption 1 throughout, we propose a method to select ${W}_{0}$ based on covariates. These could be either *predetermined* covariates (determined before treatment is assigned and thus, by construction, unaffected by it) or *placebo* covariates (determined after treatment is assigned but nonetheless expected to be unaffected by treatment given prior theoretical knowledge about how the treatment operates). In most RD empirical applications, researchers have access to predetermined covariates and use them to assess the plausibility of the RD assumptions and/or to reduce sampling variability. A typical strategy to validate the design is to test whether there is a treatment effect at the discontinuity for these covariates, and absence of such effect is interpreted as supporting evidence for the RD design.

Our window selection procedure is inspired by this common empirical practice. In particular, we assume that there exists a covariate for each unit, denoted ${x}_{i}(\mathbf{r})$, which is unrelated to the score inside ${W}_{0}$ but related to it outside of ${W}_{0}$. This implies that for a window $W\supset {W}_{0}$, the score and covariate will be associated for units with ${R}_{i}\in W-{W}_{0}$ but not for units with ${R}_{i}\in {W}_{0}$. This means that if the sharp null hypothesis is rejected in a given window, that window is strictly larger than ${W}_{0}$, which leads naturally to a procedure for selecting ${W}_{0}$: perform a sequence of “balance” tests for the covariates, one for each window candidate, beginning with the largest window and sequentially shrinking it until the test fails to reject “balance”.

The first step to formalize this approach is to assume that the treatment effect on the covariate *x* is zero inside the window where Assumption 1 holds. We collect the covariates in $\mathbf{X}=({X}_{1},{X}_{2},\cdots ,{X}_{n}{)}^{\prime}$ where, as before, ${X}_{i}={x}_{i}(\mathbf{R})$.

**Assumption 4: Zero treatment effect for covariate**. For all *i* with ${R}_{i}\in {W}_{0}$: the covariate ${x}_{i}(\mathbf{r})$ satisfies ${x}_{i}(\mathbf{r})={x}_{i}({\mathbf{z}}_{{W}_{0}})={x}_{i}$ for all $\mathbf{r}$.

Assumption 4 states that the sharp null hypothesis holds for ${X}_{i}$ in ${W}_{0}$. This assumption simply states what is known to be true when the available covariate is determined before treatment: treatment could not have possibly affected the covariates and therefore its effect is zero by construction. Note that if ${X}_{i}$ is a predetermined covariate, the sharp null holds everywhere, not only in ${W}_{0}$. However, we require the weaker condition that it holds only in ${W}_{0}$ to include placebo covariates.

The second necessary step to justify our procedure for selecting ${W}_{0}$ based on covariate balance is to require that the covariate and the score be correlated outside of ${W}_{0}$. We formalize this requirement in the following assumption, which is stronger than needed, but justifies our proposed window selection procedure in an intuitive way, as further discussed below. Define $\tilde{W}=[\underset{\_}{\mathrm{\rho}},\underset{\_}{r})\cup (\stackrel{\u02c9}{r},\stackrel{\u02c9}{\mathrm{\rho}}]$ for a pair $(\underset{\_}{\mathrm{\rho}},\stackrel{\u02c9}{\mathrm{\rho}})$ satisfying $\underset{\_}{\mathrm{\rho}}<\underset{\_}{r}<\stackrel{\u02c9}{r}<\stackrel{\u02c9}{\mathrm{\rho}}$, and recall that ${r}_{0}\in {W}_{0}=[\underset{\_}{r},\stackrel{\u02c9}{r}]$.

**Assumption 5: Association outside ***W*_{0} between covariate and score. For all *i* with ${R}_{i}\in \tilde{W}$ and for all $r\in \tilde{W}$:

**(a)**${F}_{{R}_{i}|{R}_{i}\in \tilde{W}}(r)=F\left(r;{x}_{i}(r)\right)$, and

**(b)**For all $j\ne k$, either (i) ${x}_{j}>{x}_{k}\Rightarrow F(r;{x}_{j})<F(r;{x}_{k})$ or (ii) ${x}_{j}>{x}_{k}\Rightarrow F(r;{x}_{j})>F(r;{x}_{k})$.

Assumption 5 is key to obtain a valid window selector, since it requires a form of non-random selection among units outside ${W}_{0}$ that leads to an observable association between the covariate and the score for those units with ${R}_{i}\notin {W}_{0}$, i.e., between the vectors ${\mathbf{X}}_{\tilde{W}}$ and ${\mathbf{R}}_{\tilde{W}}$. In other words, under Assumption 5 the vectors ${\mathbf{X}}_{W}$ and ${\mathbf{R}}_{W}$ will be associated for any window *W* such that $W\supset {W}_{0}$. Since *x* is predetermined or placebo, this association cannot arise because of a direct effect of *r* on *x*. Instead, it may be that *x* affects *r* (e.g., higher campaign contributions at $t-1$ lead to higher margin of victory at *t*) or that some observed or unobserved factor affects both *x* and *r* (e.g., more able politicians are both more likely to raise high contributions and win by high margins). In other words, Assumption 5 leads to units with high ${R}_{i}$ having high (or low) ${X}_{i}$, even when ${X}_{i}$ is constant for all values of *r*.

Assumptions 1, 4 and 5 justify a simple procedure to find ${W}_{0}$. This procedure finds the widest window for which the covariates and scores are not associated inside this window, but are associated outside of it. We base our procedure on randomization-based tests of the sharp null hypothesis of no effect for each available covariate *x*. Given Assumption 4 above, for units with ${R}_{i}\in {W}_{0}$, the treatment assignment vector ${\mathbf{Z}}_{{W}_{0}}$ has no effect on the covariate vector ${\mathbf{X}}_{{W}_{0}}$. Under this assumption, the size of the test of no effect is known, and therefore we can control the probability with which we accept a window where the assumptions hold. In addition, under Assumption 5 (or a similar assumption), this procedure will be able to detect the true window ${W}_{0}$. Such a procedure can be implemented in different ways. A simple approach is to begin by considering all observations (i.e., choosing the largest possible window ${W}_{0}$), test the sharp null of no effect of ${Z}_{i}$ on ${X}_{i}$ for these observations and, if the null hypothesis is rejected, continue by decreasing the size of the window until the resulting test fails to reject the null hypothesis.

The procedure depends crucially on sequential testing in *nested* windows: if the sharp null hypothesis is rejected for a given window, then this hypothesis will also be rejected in any window that contains it (with a test of sufficiently high power). Thus, the procedure searches windows of different sizes until it finds the largest possible window such that the sharp null hypothesis cannot be rejected for any window contained in it. This procedure can be implemented as follows.

**Window selection procedure based on predetermined covariates**. Select a test statistic of interest, denoted $T(\mathbf{X},\mathbf{R})$. Let ${R}_{(j)}$ be the *j*th order statistic of $\mathbf{R}$ in the sample of all observations indexed by $i=1,\dots ,n$.

Step 1: Define $W({j}_{0},{j}_{1})=[{R}_{({j}_{0})},{R}_{({j}_{1})}]$, and set ${j}_{0}=1$, ${j}_{1}=n$. Choose minimum values ${j}_{\text{0,min}}$ and ${j}_{\text{1,min}}$ satisfying ${j}_{\text{0,min}}<{r}_{0}<{j}_{\text{1,min}}$, which set the minimum number of observations required in $W({j}_{\text{0,min}},{j}_{\text{1,min}}\text{)}$.

Step 2: Conduct a test of no effect using $T({\mathbf{X}}_{W({j}_{0},{j}_{1})},{\mathbf{R}}_{W({j}_{0},{j}_{1})})$.

Step 3: If the null hypothesis is rejected, increase ${j}_{0}$ and decrease ${j}_{1}$. If ${j}_{0}<{j}_{\text{0,min}}$ and ${j}_{\text{1,min}}\text{gt}{j}_{1}$ go back to Step 2, else stop and conclude that lower and upper ends for ${W}_{0}$ cannot be selected. If the null hypothesis is not rejected, keep ${R}_{[{j}_{0}]}$ and ${R}_{[{j}_{1}]}$ as the ends of the selected window.

An important feature of this approach is that, unlike conventional hypothesis testing, we are particularly concerned about the possibility of failing to reject the null hypothesis when it is false (Type II error). Usually, researchers are concerned about controlling Type I error to avoid rejecting the null hypothesis too often when it is true, and thus prefer testing procedures that are not too “liberal”. In our context, however, rejecting the null hypothesis is used as evidence that the local randomization Assumption 1 does not hold, and our ultimate goal is to learn whether the data support the existence of a neighborhood around the cutoff where our null hypothesis fails to be rejected. In this sense, the roles of Type I and Type II error are interchanged in our context.^{6} This has important implications for the practical implementation of our approach, which we discuss next.

## 3.1 Implementation

Implementing the procedure proposed above requires three choices: (i) a test statistic, (ii) the minimum sample sizes (${j}_{\text{0,min}}$, ${j}_{\text{1,min}}$), and (iii) a testing procedure and associated significance level $\mathrm{\alpha}$. We discuss here how these choices affect our window selector, and give guidelines for researchers who wish to use this procedure in empirical applications.

## 3.1.1 Choice of test statistic

This choice is important because different test statistics will have power against different alternative hypotheses and, as discussed above, we prefer tests with low type II error. In our procedure, the sharp null hypothesis of no treatment effect could employ different test statistics such as difference-in-means, Wilcoxon rank sum or Kolmogorov–Smirnov, because the null randomization distribution of any of them is known. Lehmann [19] and Rosenbaum [14, 15] provide a discussion and comparison of alternative test statistics. In our application, we employ the difference-in-means test statistic.

## 3.1.2 Choice of minimum sample size

The main goal of setting a minimum sample size is to prevent the procedure from having too few observations when conducting the hypothesis test in the smallest possible window. These constants should be large enough so that the test statistic employed has “good” power properties to detect departures from the null hypothesis. We recommend setting ${j}_{\text{0,min}}$ and ${j}_{\text{1,min}}$ so that roughly at least 10 observations are included at either side of the threshold. One way of justifying this choice is by considering a two-sample standard normal shift model with a true treatment effect of one standard deviation and 10 observations in each group, in which case a randomization-based test of the sharp null hypothesis of no treatment effect using the difference-in-means statistic has power of roughly 80% with significance level of 0.15 (and 60 percent with significance level of 0.05). Setting ${j}_{\text{0,min}}$ and ${j}_{\text{1,min}}$ at higher values will increase the power to detect departures from Assumption 1 and will lead to a more conservative choice of ${W}_{0}$ (assuming the chosen window based on those higher values is feasible, that is, has positive length).

## 3.1.3 Choice of testing procedure and $\mathit{\alpha}$

First, our procedure performs hypothesis tests in a sequence of nested windows and thus involves multiple hypothesis testing (see Efron [20] for a recent review). This implies that, even when the null hypothesis is true, it will be rejected several times (e.g., if the hypotheses are independent, they will be rejected roughly as many times as the significance level times the number of windows considered). For the family-wise error rate, multiple testing implies that our window selector will reject more windows than it should, because the associated *p*-values will be too small. But since we are more concerned about failing to reject a false null hypothesis (type II error) than we are about rejecting a true one (type I error), this implies that our procedure will be more conservative, selecting a smaller window than the true window (if any) where the local randomization assumption is likely to hold. For this reason, we recommend that researchers do not adjust *p*-values for multiple testing.^{7} Second, we must choose a significance level $\mathrm{\alpha}$ to test whether the local randomization assumption is rejected in each window. As our focus is on type II error, this value should be chosen to be higher than conventional levels for a conservative choice for ${W}_{0}$. Based on the power calculations discussed above, a reasonable choice is to adopt $\mathrm{\alpha}=0.15$; higher values will lead to a more conservative choice of ${W}_{0}$ if a feasible window satisfies the stricter requirement. Nonetheless, researchers should report all *p*-values graphically so that others can judge how varying $\mathrm{\alpha}$ would alter the size of the chosen window. Finally, when the sharp null is tested for multiple covariates in every candidate window, the results of multiple tests must be aggregated in a single *p*-value. To be as conservative as possible, we choose the minimum *p*-value across all tests in every window.

In the upcoming sections, we illustrate how our methodological framework works in practice with a study of party advantages in U.S. Senate elections.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.