*Proof of Theorem 3*. The pathwise derivative of $\mathrm{\Psi}(Q)$ is defined as ${|}_{\frac{d}{d\mathrm{\epsilon}}\mathrm{\Psi}(Q(\mathrm{\epsilon}))}$ along paths $\{{P}_{\mathrm{\epsilon}}:\mathrm{\epsilon}\}\subset {M}$. In particular, these paths are chosen so that
$d{Q}_{W,\mathrm{\epsilon}}=(1+\mathrm{\epsilon}{H}_{W}(W))d{Q}_{W},$
$\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}E{H}_{W}(W)=0\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}{C}_{W}\stackrel{\mathrm{\Delta}}{=}\underset{w}{sup}|{H}_{W}(w)|<\mathrm{\infty};$
$d{Q}_{Y,\mathrm{\epsilon}}(Y|A,W)=(1+\mathrm{\epsilon}{H}_{Y}(Y|A,W))d{Q}_{Y}(Y|A,W),$
$\text{\hspace{0.05em}}\text{\hspace{0.17em}}where\text{\hspace{0.17em}}\text{\hspace{0.05em}}E({H}_{Y}|A\mathrm{,}W)=0and\underset{w\mathrm{,}a\mathrm{,}y}{{\displaystyle sup}}|{H}_{Y}(y|a\mathrm{,}w)|<\infty \mathrm{.}$

The parameter $\mathrm{\Psi}$ is not sensitive to fluctuations of ${g}_{0}(a|w)=P{r}_{0}(a|w)$, and thus we do not need to fluctuate this portion of the likelihood. Let ${\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{\stackrel{\u02c9}{Q}}_{b,{P}_{\mathrm{\epsilon}}}$, ${\stackrel{\u02c9}{Q}}_{\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{\stackrel{\u02c9}{Q}}_{{P}_{\mathrm{\epsilon}}}$, ${d}_{\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{d}_{{P}_{\mathrm{\epsilon}}}$, ${\mathrm{\eta}}_{\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{\mathrm{\eta}}_{{P}_{\mathrm{\epsilon}}}$, ${\mathrm{\tau}}_{\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{\mathrm{\tau}}_{{P}_{\mathrm{\epsilon}}}$, and ${S}_{\mathrm{\epsilon}}\stackrel{\mathrm{\Delta}}{=}{S}_{{P}_{\mathrm{\epsilon}}}$. First note that
${\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}(v)={\stackrel{\u02c9}{Q}}_{b,0}(v)+\mathrm{\epsilon}{h}_{\mathrm{\epsilon}}(v)$(7)for an ${h}_{\mathrm{\epsilon}}$ with
$\underset{|\mathrm{\epsilon}|<1}{sup}\underset{v}{sup}|{h}_{\mathrm{\epsilon}}(v)|\stackrel{\mathrm{\Delta}}{=}{C}_{1}<\mathrm{\infty}.$(8)Note that C4) implies that ${d}_{0}$ is (almost surely) deterministic, i.e. ${d}_{0}(U,\cdot )$ is almost surely a fixed function. Let $\tilde{d}$ represent the deterministic rule $v\mapsto I({\stackrel{\u02c9}{Q}}_{b,0}(v)>0)$ to which $d(u,\cdot )$ is (almost surely) equal for all *u*. By Lemma 1,
$\begin{array}{l}\Psi ({P}_{\epsilon})-\Psi ({P}_{0})={\displaystyle {\int}_{w}}\left({E}_{{P}_{U}}[{d}_{\epsilon}(U\mathrm{,}V)]-{\tilde{d}}_{0}(V)\right){\overline{Q}}_{b\mathrm{,}\epsilon}d{Q}_{W\mathrm{,}\epsilon}\\ +{\displaystyle {\int}_{w}}{\tilde{d}}_{0}(V)\left({\overline{Q}}_{b\mathrm{,}\epsilon}d{Q}_{W\mathrm{,}\epsilon}-{\overline{Q}}_{b\mathrm{,0}}d{Q}_{W\mathrm{,0}}\right)\\ +{E}_{{P}_{\epsilon}}{\overline{Q}}_{\epsilon}(\mathrm{0,}W)-{E}_{{P}_{0}}{\overline{Q}}_{0}(\mathrm{0,}W)\\ ={\displaystyle {\int}_{w}}\left({E}_{{P}_{U}}[{d}_{\epsilon}(U\mathrm{,}V)]-{\tilde{d}}_{0}(V)\right)\left({\overline{Q}}_{b\mathrm{,}\epsilon}-{\tau}_{0}\right)d{Q}_{W\mathrm{,}\epsilon}\\ +{\tau}_{0}{\displaystyle {\int}_{w}}\left({E}_{{P}_{U}}[{d}_{\epsilon}(U\mathrm{,}V)]d{Q}_{W\mathrm{,}\epsilon}-{\tilde{d}}_{0}(V)d{Q}_{W\mathrm{,0}}\right)\\ -{\tau}_{0}{\displaystyle {\int}_{w}}{\tilde{d}}_{0}(V)\left(d{Q}_{W\mathrm{,}\epsilon}-d{Q}_{W\mathrm{,0}}\right)\\ +{\Psi}_{{d}_{0}}({P}_{\epsilon})-{\Psi}_{{d}_{0}}({P}_{0}).\end{array}$(9)Dividing the fourth term by $\mathrm{\epsilon}$ and taking the limit as $\mathrm{\epsilon}\to 0$ gives the pathwise derivative of the mean outcome under the rule that treats ${d}_{0}$ as known. The third term can be written as $-\mathrm{\epsilon}{\mathrm{\tau}}_{0}{\int}_{w}{\tilde{d}}_{0}(V){H}_{W}d{Q}_{W,0}$, and thus the pathwise derivative of this term is $-{\int}_{w}{\mathrm{\tau}}_{0}{\tilde{d}}_{0}(V){H}_{W}d{Q}_{W,0}$. If ${\mathrm{\tau}}_{0}>0$, then ${E}_{{P}_{U}\times {P}_{0}}[{\tilde{d}}_{0}(V)]=\mathrm{\kappa}$. The pathwise derivative of this term is zero if ${\mathrm{\tau}}_{0}=0$. Thus, for all ${\mathrm{\tau}}_{0}$,
$\underset{\mathrm{\epsilon}\to 0}{lim}-\frac{1}{\mathrm{\epsilon}}{\mathrm{\tau}}_{0}{\int}_{w}{\tilde{d}}_{0}(V)\left(d{Q}_{W,\mathrm{\epsilon}}-d{Q}_{W,0}\right)={\int}_{w}\left(-{\mathrm{\tau}}_{0}({\tilde{d}}_{0}(v)-\mathrm{\kappa})\right){H}_{W}(w)d{Q}_{W,0}(w).$Thus the third term in eq. (9) generates the $v\phantom{\rule{thinmathspace}{0ex}}\mapsto -{\mathrm{\tau}}_{0}({\tilde{d}}_{0}(v)-\mathrm{\kappa})$ portion of the canonical gradient, or equivalently $v\phantom{\rule{thinmathspace}{0ex}}\mapsto -{\mathrm{\tau}}_{0}({E}_{{P}_{U}}[{d}_{0}(U,v)]-\mathrm{\kappa})$. The remainder of this proof is used to show that the first two terms in eq. (9) are $o(\mathrm{\epsilon})$.

**Step 1**: ${\mathrm{\eta}}_{\mathrm{\epsilon}}\to {\mathrm{\eta}}_{0}$.

We refer the reader to eq. (3) for a definition of the quantile $P\mapsto {\mathrm{\eta}}_{P}$. This is a consequence of the continuity of ${S}_{0}$ in a neighborhood of ${\mathrm{\eta}}_{0}$. For $\mathrm{\gamma}>0$,
$|{\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}|>\mathrm{\gamma}\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{e}\mathrm{s}\phantom{\rule{thickmathspace}{0ex}}\mathrm{t}\mathrm{h}\mathrm{a}\mathrm{t}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}{S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}-\mathrm{\gamma})\le \mathrm{\kappa}\text{\hspace{0.17em}}\mathrm{o}\mathrm{r}\text{\hspace{0.17em}}{S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}+\mathrm{\gamma})>\mathrm{\kappa}.$(10)

For positive constants ${C}_{1}$ and ${C}_{W}$,
${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}-\mathrm{\gamma})\ge (1-{C}_{W}|\mathrm{\epsilon}|)P{r}_{0}\left({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}>{\mathrm{\eta}}_{0}-\mathrm{\gamma}\right)\ge (1-{C}_{W}|\mathrm{\epsilon}|){S}_{0}({\mathrm{\eta}}_{0}-\mathrm{\gamma}+{C}_{1}|\mathrm{\epsilon}|).$

Fix $\mathrm{\gamma}>0$ small enough so that ${S}_{0}$ is continuous at ${\mathrm{\eta}}_{0}-\mathrm{\gamma}$. In this case we have that ${S}_{0}({\mathrm{\eta}}_{0}-\mathrm{\gamma}+{C}_{1}|\mathrm{\epsilon}|)\to {S}_{0}({\mathrm{\eta}}_{0}-\mathrm{\gamma})$ as $\mathrm{\epsilon}\to 0$. By the infimum in the definition of ${\mathrm{\eta}}_{0}$, we know that ${S}_{0}({\mathrm{\eta}}_{0}-\mathrm{\gamma})>\mathrm{\kappa}$. Thus ${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}-\mathrm{\gamma})>\mathrm{\kappa}$ for all $|\mathrm{\epsilon}|$ small enough.

Similarly, ${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}+\mathrm{\gamma})\le (1+{C}_{W}|\mathrm{\epsilon}|)\text{\hspace{0.17em}}{S}_{0}({\mathrm{\eta}}_{0}+\mathrm{\gamma}-{C}_{1}|\mathrm{\epsilon}|)$. Fix $\mathrm{\gamma}>0$ small enough so that ${S}_{0}$ is continuous at ${\mathrm{\eta}}_{0}+\mathrm{\gamma}$. Then ${S}_{0}({\mathrm{\eta}}_{0}+\mathrm{\gamma}-{C}_{1}|\mathrm{\epsilon}|)\to {S}_{0}({\mathrm{\eta}}_{0}+\mathrm{\gamma})$ as $\mathrm{\epsilon}\to 0$. Condition C2) implies the uniqueness of the $\mathrm{\kappa}$-quantile of ${\stackrel{\u02c9}{Q}}_{b,0}$, and thus that ${S}_{0}({\mathrm{\eta}}_{0}+\mathrm{\gamma})<\mathrm{\kappa}$. It follows that ${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}+\mathrm{\gamma})<\mathrm{\kappa}$ for all $|\mathrm{\epsilon}|$ small enough.

Combining ${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}-\mathrm{\gamma})>\mathrm{\kappa}$ and ${S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{0}+\mathrm{\gamma})<\mathrm{\kappa}$ for all $\mathrm{\epsilon}$ close to zero with eq. (10) shows that ${\mathrm{\eta}}_{\mathrm{\epsilon}}\to {\mathrm{\eta}}_{0}$ as $\mathrm{\epsilon}\to 0$.

**Step 2: Second term of eq. (9) is 0 eventually**.

If ${\mathrm{\tau}}_{0}=0$ then the result is immediate, so suppose ${\mathrm{\tau}}_{0}>0$. By the previous step, ${\mathrm{\eta}}_{\mathrm{\epsilon}}\to {\mathrm{\eta}}_{0}$, which implies that ${\mathrm{\tau}}_{\mathrm{\epsilon}}\to {\mathrm{\tau}}_{0}>0$ by the continuity of the max function. It follows that ${\mathrm{\tau}}_{\mathrm{\epsilon}}>0$ for $\mathrm{\epsilon}$ large enough. By eq. (4), $P{r}_{{P}_{U}\times {P}_{\mathrm{\epsilon}}}({d}_{\mathrm{\epsilon}}(U,V)=1)=\mathrm{\kappa}$ for all sufficiently small $|\mathrm{\epsilon}|$ and $P{r}_{0}({\tilde{d}}_{0}(V)=1)=\mathrm{\kappa}$. Thus the second term of eq. (9) is 0 for all $|\mathrm{\epsilon}|$ small enough.

**Step 3**: ${\mathrm{\tau}}_{\mathrm{\epsilon}}-{\mathrm{\tau}}_{0}=O(\mathrm{\epsilon})$.

Note that $\mathrm{\kappa}<{S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{\mathrm{\epsilon}}-|\mathrm{\epsilon}|)\le (1+{C}_{W}|\mathrm{\epsilon}|){S}_{0}({\mathrm{\eta}}_{\mathrm{\epsilon}}-(1+{C}_{1})|\mathrm{\epsilon}|)$. A Taylor expansion of ${S}_{0}$ about ${\mathrm{\eta}}_{0}$ shows that
$\mathrm{\kappa}<\left(1+{C}_{W}|\mathrm{\epsilon}|\right)\left({S}_{0}({\mathrm{\eta}}_{0})+({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}-(1+{C}_{1})|\mathrm{\epsilon}|)(-{f}_{0}({\mathrm{\eta}}_{0})+o(1))\right)$
$=\mathrm{\kappa}+({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}-(1+{C}_{1})|\mathrm{\epsilon}|)(-{f}_{0}({\mathrm{\eta}}_{0})+o(1))+O(\mathrm{\epsilon})$
$=\mathrm{\kappa}-({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}){f}_{0}({\mathrm{\eta}}_{0})+o({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0})+O(\mathrm{\epsilon}).$(11)The fact that ${f}_{0}({\mathrm{\eta}}_{0})\in (0,\mathrm{\infty})$ shows that ${\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}$ is bounded above by some $O(\mathrm{\epsilon})$ sequence. Similarly, $\mathrm{\kappa}\ge {S}_{\mathrm{\epsilon}}({\mathrm{\eta}}_{\mathrm{\epsilon}}+|\mathrm{\epsilon}|)\ge (1-{C}_{W}|\mathrm{\epsilon}|){S}_{0}({\mathrm{\eta}}_{\mathrm{\epsilon}}+(1+{C}_{1})|\mathrm{\epsilon}|)$. Hence,
$\mathrm{\kappa}\ge \left(1-{C}_{W}|\mathrm{\epsilon}|\right)\left({S}_{0}({\mathrm{\eta}}_{0})+({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}+(1+{C}_{1})|\mathrm{\epsilon}|)(-{f}_{0}({\mathrm{\eta}}_{0})+o(1))\right)$
$=\mathrm{\kappa}-({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}){f}_{0}({\mathrm{\eta}}_{0})+o({\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0})+O(\mathrm{\epsilon}).$

It follows that ${\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}$ is bounded below by some $O(\mathrm{\epsilon})$ sequence. Combining these two bounds shows that ${\mathrm{\eta}}_{\mathrm{\epsilon}}-{\mathrm{\eta}}_{0}=O(\mathrm{\epsilon})$, which immediately implies that ${\mathrm{\tau}}_{\mathrm{\epsilon}}-{\mathrm{\tau}}_{0}=max\{O(\mathrm{\epsilon}),0\}=O(\mathrm{\epsilon})$.

**Step 4: First term of eq. (9) is** $o(\mathrm{\epsilon})$.

We know that
${\stackrel{\u02c9}{Q}}_{b,0}(V)-{\mathrm{\tau}}_{0}+O(\mathrm{\epsilon})\le {\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}(V)-{\mathrm{\tau}}_{\mathrm{\epsilon}}\le {\stackrel{\u02c9}{Q}}_{b,0}(V)-{\mathrm{\tau}}_{0}+O(\mathrm{\epsilon}).$

By C4), it follows that there exists some $\mathrm{\delta}>0$ such that ${sup}_{|\mathrm{\epsilon}|<\mathrm{\delta}}P{r}_{0}({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}(V)={\mathrm{\tau}}_{\mathrm{\epsilon}})=0$. By the absolute continuity of ${Q}_{W,\mathrm{\epsilon}}$ with respect to ${Q}_{W,0}$, ${sup}_{|\mathrm{\epsilon}|<\mathrm{\delta}}P{r}_{{P}_{\mathrm{\epsilon}}}({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}(V)={\mathrm{\tau}}_{\mathrm{\epsilon}})=0$. It follows that, for all small enough $|\mathrm{\epsilon}|$ and almost all *u*, ${d}_{\mathrm{\epsilon}}(u,v)=I({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}(v)>{\mathrm{\tau}}_{\mathrm{\epsilon}})$. Hence,
$\begin{array}{rl}{\int}_{w}& \left({E}_{{P}_{U}}[{d}_{\mathrm{\epsilon}}(U,V)]-{d}_{0}(V)\right)\left({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}-{\mathrm{\tau}}_{0}\right)d{Q}_{W,\mathrm{\epsilon}}\\ & =\left|{\int}_{w}\left(I({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}>{\mathrm{\tau}}_{\mathrm{\epsilon}})-I({\stackrel{\u02c9}{Q}}_{b,0}>{\mathrm{\tau}}_{0})\right)\left({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}-{\mathrm{\tau}}_{0}\right)d{Q}_{W,\mathrm{\epsilon}}\right|\\ & \le {\int}_{w}\left|I({\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}>{\mathrm{\tau}}_{\mathrm{\epsilon}})-I({\stackrel{\u02c9}{Q}}_{b,0}>{\mathrm{\tau}}_{0})\right|\left(\left|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}\right|+{C}_{1}|\mathrm{\epsilon}|\right)d{Q}_{W,\mathrm{\epsilon}}\\ & \le {\int}_{w}I(|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}|\le |{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}-{\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}+{\mathrm{\tau}}_{\mathrm{\epsilon}}|)\left(\left|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}\right|+{C}_{1}|\mathrm{\epsilon}|\right)d{Q}_{W,\mathrm{\epsilon}}\\ & ={\int}_{w}I(0<|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}|\le |{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}-{\stackrel{\u02c9}{Q}}_{b,\mathrm{\epsilon}}+{\mathrm{\tau}}_{\mathrm{\epsilon}}|)\left(\left|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}\right|+{C}_{1}|\mathrm{\epsilon}|\right)d{Q}_{W,\mathrm{\epsilon}}\\ & \le O(\mathrm{\epsilon}){\int}_{w}I(0<|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}|\le O(\mathrm{\epsilon}))d{Q}_{W,\mathrm{\epsilon}}\\ & \le O(\mathrm{\epsilon})(1+{C}_{W}|\mathrm{\epsilon}|)P{r}_{0}\left(0<|{\stackrel{\u02c9}{Q}}_{b,0}-{\mathrm{\tau}}_{0}|\le O(\mathrm{\epsilon})\right),\end{array}$
where the penultimate inequality holds by Step 3 and eq. (7). The last line above is $o(\mathrm{\epsilon})$ because $Pr(0<X\le \mathrm{\epsilon})\to 0$ as $\mathrm{\epsilon}\to 0$ for any random variable *X*. Thus dividing the left-hand side above by $\mathrm{\epsilon}$ and taking the limit as $\mathrm{\epsilon}\to 0$ yields zero.□

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.