We start with a toy example: in dimension $d=1$, when $h(y)=y$ and
$\mathcal{\mathcal{S}}=\{s\in {\mathbb{R}}_{+}:s\le {s}_{\star}\}$ so that

$\mathcal{\mathcal{I}}=\mathbb{E}[{(\mathbb{E}[{(K-{S}_{{T}^{\prime}})}_{+}\mid {S}_{T}]-{p}_{\star})}_{+}\mid {S}_{T}\le {s}_{\star}];$

${(K-{S}_{{T}^{\prime}})}_{+}$ is the Put payoff written on one stock with price ${({S}_{t})}_{t\ge 0}$,
with strike *K* and maturity ${T}^{\prime}$: this is a standard financial
product used by asset managers to insure their portfolio against the
decrease of stock price. We take the point of view of the seller of
the contract, who is mostly concerned by large values of the Put
price, i.e. he aims at valuing the excess of the Put price at time
$T\in (0,{T}^{\prime})$ beyond the threshold ${p}_{\star}>0$, for stock value
${S}_{T}$ smaller than ${s}_{\star}>0$. We assume that $\{{S}_{t}:t\ge 0\}$
evolves like a geometric Brownian motion, with volatility $\sigma >0$
and zero drift. For the sake of simplicity, we assume that the
interest rate is 0; extension to non-zero interest rate is obvious.

Upon noting that ${S}_{T}=\xi (Y)$ and ${S}_{{T}^{\prime}}=\xi (Y)\mathrm{exp}(-\frac{1}{2}{\sigma}^{2}\tau +\sigma \sqrt{\tau}Z)$, where $Y,Z$ are independent standard Gaussian
variables and

$\xi (y):={S}_{0}\mathrm{exp}\left(-\frac{1}{2}{\sigma}^{2}T+\sigma \sqrt{T}y\right),\tau :={T}^{\prime}-T$

we have

$\mathcal{\mathcal{I}}=\mathbb{E}\left[{(\mathbb{E}\left[{(K-\xi (Y)\mathrm{exp}(-\frac{1}{2}{\sigma}^{2}\tau +\sigma \sqrt{\tau}Z))}_{+}\right|Y]-{p}_{\star})}_{+}\right|Y\le {y}_{\star}],$

where

${y}_{\star}:=\frac{1}{\sigma \sqrt{T}}\mathrm{ln}\left(\frac{{s}_{\star}}{{S}_{0}}\right)+\frac{1}{2}\sigma \sqrt{T}.$

Therefore, problem (3.1) is of the form
(1.1) with

$R={\left(K-\xi (Y)\mathrm{exp}\left(-\frac{1}{2}{\sigma}^{2}\tau +\sigma \sqrt{\tau}Z\right)\right)}_{+},f(y,r)={(r-{p}_{\star})}_{+},\mathcal{\mathcal{A}}=\{y\in \mathbb{R}:y\le {y}_{\star}\},$

and ${[Y,Z]}^{\prime}\sim {\mathcal{\mathcal{N}}}_{2}(0,{I}_{2})$.
In this example, $\mathbb{P}(Y\in \mathcal{\mathcal{A}})$ and $\mathbb{E}[R\mid Y]$ are
explicit. We have indeed $\mathbb{P}(Y\in \mathcal{\mathcal{A}})=\mathrm{\Phi}({y}_{\star})$, where Φ denotes the cumulative distribution function (cdf) of
a standard Gaussian distribution. Furthermore, $\mathbb{E}[R\mid Y]={\mathrm{\Phi}}_{\star}(\xi (Y))$, where

${\mathrm{\Phi}}_{\star}(s):=K\mathrm{\Phi}({d}_{+}(s))-s\mathrm{\Phi}({d}_{-}(s)),\text{with}{d}_{\pm}(s):=\frac{1}{\sigma \sqrt{\tau}}\mathrm{ln}\left(\frac{K}{s}\right)\pm \frac{1}{2}\sigma \sqrt{\tau};$

note that ${\varphi}_{\star}={\mathrm{\Phi}}_{\star}\circ \xi $. The parameter values for the numerical tests
are given in Table .

Table 1 Parameter values for the one-dimensional example.

Figure 1 Normalized histograms of the *M* points from the Markov chains $\mathrm{GL}$ (top left), $\mathrm{NR}$ (top right) and from the i.i.d. sampler with rejection (bottom left).
Bottom right: Restricted to $[-6,{y}_{\star}]$, the cdf of *Y* given
$\{Y\in \mathcal{\mathcal{A}}\}$, two MCMC approximations (with ${\U0001d5af}_{\mathrm{GL}}$ and ${\U0001d5af}_{\mathrm{NR}}$) and an i.i.d. approximation.

We first illustrate the behavior of the kernel ${\U0001d5af}_{\mathrm{GL}}$ described
by Algorithm 2. Since *Y* is a standard
Gaussian random variable, we design ${\U0001d5af}_{\mathrm{GL}}$ as a
Hastings–Metropolis sampler, with invariant distribution $\mu \mathrm{d}\lambda $ equal to a standard $\mathcal{\mathcal{N}}(0,1)$ restricted to $\mathcal{\mathcal{A}}$
and with proposal distribution $q(x,\cdot )\mathrm{d}\lambda \equiv \mathcal{\mathcal{N}}(\rho x,1-{\rho}^{2})$. Observe that this proposal kernel is reversible with respect to μ, see (2.6). Note that condition
(ii) in Corollary 3
gets into

$\underset{y\le {y}_{\star}}{sup}\mathrm{\Phi}\left(\frac{\rho y-{y}_{\star}}{\sqrt{1-{\rho}^{2}}}\right)<1,$

which holds true since $\rho >0$. In the following, the performance
of the kernel ${\U0001d5af}_{\mathrm{GL}}$ is compared to that of the kernel
${\U0001d5af}_{\mathrm{NR}}$ defined as a Hastings–Metropolis kernel with proposal
$q(x,\cdot )\mathrm{d}\lambda \equiv \mathcal{\mathcal{N}}((1-\rho ){y}_{\star}+\rho x,1-{\rho}^{2})$ and with invariant distribution a standard Gaussian random
variable restricted to $\mathcal{\mathcal{A}}$. As a main difference with ${\U0001d5af}_{\mathrm{GL}}$, this proposal transition density *q* is not reversible
with respect to μ (whence the notation ${\U0001d5af}_{\mathrm{NR}}$ for the kernel);
therefore, the acceptance-rejection ratio of the new point *z* is
given by (see equality (2.8))

$(1\wedge \mathrm{exp}({y}_{\star}(x-z))){\mathrm{\U0001d7cf}}_{z\le {y}_{\star}}.$

In Figure 1 (bottom right), the true cdf of *Y* given $\{Y\in \mathcal{\mathcal{A}}\}$ (which is a density on $(-\mathrm{\infty},{y}_{\star}]$) is
displayed on $[-6,{y}_{\star}]$ together with three empirical cdfs $x\mapsto {M}^{-1}{\sum}_{m=1}^{M}{\mathrm{\U0001d7cf}}_{\{{X}^{(m)}\le x\}}$: the first one is
computed from i.i.d. samples with distribution $\mathcal{\mathcal{N}}(0,1)$ and the second one (resp.
the third one) is computed from a Markov chain path ${X}^{(1:M)}$ of length
*M* with kernel ${\U0001d5af}_{\mathrm{GL}}$ (resp. ${\U0001d5af}_{\mathrm{NR}}$) and started at
${X}^{(0)}={y}_{\star}$. The two kernels provide a similar approximation
of the true cdf. Here $M=1\mathrm{e}6$, and $\rho =0.85$ for both kernels. We also display the normalized histograms of the points ${X}^{(m)}$ sampled respectively from ${\U0001d5af}_{\mathrm{GL}}$ (top left), ${\U0001d5af}_{\mathrm{NR}}$ (top right) and the crude rejection algorithm with Gaussian proposal (bottom left). In the latter plot, the histogram is built with only around 50–60 points which correspond to the accepted points among $M=1\mathrm{e}6$ proposal points.

Figure 2 For different values of ρ, estimation of the autocorrelation function (over 100 independent runs) of the chain ${\U0001d5af}_{\mathrm{GL}}$ (left) and ${\U0001d5af}_{\mathrm{NR}}$ (right). Each curve is computed using $1\mathrm{e}6$ sampled points.

To assess the speed of convergence of the samplers ${\U0001d5af}_{\mathrm{GL}}$ and ${\U0001d5af}_{\mathrm{NR}}$ to their stationary distributions, we additionally plot in Figure 2 the autocorrelation function for both chains. For ${\U0001d5af}_{\mathrm{GL}}$ the choice of ρ is quite significant, as observed in [11]; values of ρ around 0.9 give usually good results. For ${\U0001d5af}_{\mathrm{NR}}$, in this example the choice of ρ is less significant because we are able to define a proposal which takes advantage of the knowledge on the rare set. A comparison of acceptance rates is provided below (see
Figure 3 (left)).

Figure 3 Comparison of the MCMC sampler ${\U0001d5af}_{\mathrm{GL}}$ (top) and
${\U0001d5af}_{\mathrm{NR}}$ (bottom), for different values of $\rho \in \{0.1,\mathrm{\dots},0.9,0.99\}$.Left: Mean acceptance rate when
computing $\mathbb{P}(Y\le {y}_{\star}\mid Y\le {w}_{J-1})$ after *M*
iterations of the chain. Right: Estimation of $\mathbb{P}(Y\in \mathcal{\mathcal{A}})$ by
combining splitting and MCMC.

We also illustrate the behavior of these two MCMC samplers for the
estimation of the rare-event probability $\mathbb{P}(Y\in \mathcal{\mathcal{A}})$.
Following the approach of [11], we use the
decomposition

$\mathbb{P}(Y\in \mathcal{\mathcal{A}})=\prod _{j=1}^{J}\mathbb{P}(Y\le {w}_{j}\mid Y\le {w}_{j-1})\approx \widehat{\pi}:=\prod _{j=1}^{J}\left(\frac{1}{M}\sum _{m=1}^{M}{\mathrm{\U0001d7cf}}_{{X}^{(m,j)}\le {w}_{j}}\right),$

where ${w}_{0}=+\mathrm{\infty}>{w}_{1}>\mathrm{\cdots}>{w}_{J}={y}_{\star}$, and
$\{{X}^{(m,j)}:m\ge 0\}$ is a Markov chain with kernel
${\U0001d5af}_{\mathrm{GL}}^{(j)}$ or ${\U0001d5af}_{\mathrm{NR}}^{(j)}$ having a standard Gaussian
restricted to $(-\mathrm{\infty},{w}_{j-1}]$ as invariant distribution. The *J*
intermediate levels are chosen such that $\mathbb{P}(Y\le {w}_{j}\mid Y\le {w}_{j-1})\approx 0.1$.
Figure 3 (right) displays the boxplot of
100 independent realizations of the estimator $\widehat{\pi}$ for
different values of $\rho \in \{0.1,\mathrm{\dots},0.9\}$; the horizontal
dotted line indicates the true value $\mathbb{P}(Y\in \mathcal{\mathcal{A}})=5.6\mathrm{e}-5$.
Here $J=5$, $({w}_{1},\mathrm{\dots},{w}_{4})=(0,-1.6,-2.5,-3.2)$ and $M=1\mathrm{e}4$. Figure 3 (left) displays the
boxplot of 100 mean acceptance rates ${M}^{-1}{\sum}_{m=1}^{M}{\mathrm{\U0001d7cf}}_{\{{X}^{(m,J)}={\stackrel{~}{X}}^{(m,J)}\}}$ computed along 100
independent chains $\{{X}^{(m,J)}:m\le M\}$, for different values of
ρ; the horizontal dotted line is set to 0.234 which is usually
chosen as the target rate when fixing some design parameters in a
Hastings–Metropolis algorithm (see e.g. [22]). We
observe that the use of non-reversible proposal kernel ${\U0001d5af}_{\mathrm{NR}}$
yields more accurate results than ${\U0001d5af}_{\mathrm{GL}}$; this is intuitively
easy to understand since ${\U0001d5af}_{\mathrm{GL}}$ better accounts for the point ${y}_{\star}$
around which one should sample.

Figure 4 Left: 1000 sampled points $({X}^{(m)},{R}^{(m)})$ (using the sampler ${\U0001d5af}_{\mathrm{GL}}$), together with ${\varphi}_{\star}$. Right: A realization of the error function
$x\mapsto {\widehat{\varphi}}_{M}(x)-{\varphi}_{\star}(x)$ on
$[-5,{y}_{\star}]$, for different values of $L\in \{2,3,4\}$ and two different kernels when sampling ${X}^{(1:M)}$.

We now run Algorithm 1 for the estimation of the conditional
expectation $x\mapsto {\varphi}_{\star}(x)$ on $(-\mathrm{\infty},{y}_{\star}]$. The algorithm
is run with $M=1\mathrm{e}6$, successively with $\U0001d5af={\U0001d5af}_{\mathrm{GL}}$ and
$\U0001d5af={\U0001d5af}_{\mathrm{NR}}$ both with $\rho =0.85$; the *L* basis functions
are $\{x\mapsto {\varphi}_{\mathrm{\ell}}(x)={(\xi (x))}^{\mathrm{\ell}-1}:l=1,\mathrm{\dots},L\}$ and
we consider successively $L\in \{2,3,4\}$. In
Figure 4 (right), the error function $x\mapsto {\widehat{\varphi}}_{M}(x)-{\varphi}_{\star}(x)$ is displayed for different values of *L* when
computing ${\widehat{\varphi}}_{M}$. It is displayed on the interval $[-5,{y}_{\star}]$,
which is an interval with probability larger than $1-5\mathrm{e}-3$
under the distribution of *Y* given $\{Y\in \mathcal{\mathcal{A}}\}$ (see
Figure 1). Note that the errors may be quite large for *x* close to -5; however these values are very unlikely (see Figure 1), and therefore these large errors are not representative of the global quadratic error. In Figure 4 (left), we display 1000 sampled points of $({X}^{(m)},{R}^{(m)})$. These points are taken from the sampler ${\U0001d5af}_{\mathrm{GL}}$, every twenty iterations, in order to obtain quite uncorrelated design points. Observe that the regression function ${\varphi}_{\star}$ looks like affine, which explains why the results with $L=2$ only are quite accurate.

Figure 5 Left: Monte Carlo approximations of $M\mapsto {\mathrm{\Delta}}_{M}$, and fitted curves of the form $M\mapsto \alpha +\beta /M$. Right: For different values of ρ, and for three different values of *M*, boxplot of 100 independent estimates ${\widehat{\mathcal{\mathcal{I}}}}_{M}$ when ${X}^{(1:M)}$ is sampled from a chain with kernel ${\U0001d5af}_{\mathrm{GL}}$ (top) and ${\U0001d5af}_{\mathrm{NR}}$ (bottom).

We finally illustrate Algorithm 1 for the estimation of
$\mathcal{\mathcal{I}}$ (see (3.1)). In
Figure 5 (right), the boxplot of 100
independent outputs ${\widehat{\mathcal{\mathcal{I}}}}_{M}$ of Algorithm 1
is displayed when run with $\U0001d5af={\U0001d5af}_{\mathrm{GL}}$ (top) and $\U0001d5af={\U0001d5af}_{\mathrm{NR}}$ (bottom); different values of ρ and *M* are
considered, namely $\rho \in \{0,0.1,0.5,0.85\}$ and $M\in \{5\mathrm{e}2,5\mathrm{e}3,1\mathrm{e}4\}$; the regression step is performed with $L=2$ basis
functions. Figure 5 (right) illustrates well the
benefit of using MCMC sampler for the current regression problems:
when $\U0001d5af={\U0001d5af}_{\mathrm{GL}}$, compare the distribution for $\rho =0$
(i.i.d. samples) and $\rho =0.85$: observe the bias when $\rho =0$
which does not disappear even when $M=1\mathrm{e}4$ and note that the variance
is very significantly reduced (when $M=5\mathrm{e}2,5\mathrm{e}3,1\mathrm{e}4$ respectively,
the standard deviation is reduced by a factor 1.11, 6.58 and 11.96).

Figure 5 (left) is an empirical verification of
the statement of Theorem 1. One hundred independent
runs of Algorithm 1 are performed, and for different values
of *M*, the quantities ${M}^{-1}{\sum}_{m=1}^{M}{({\widehat{\varphi}}_{M}({X}^{(m)})-{\varphi}_{\star}({X}^{(m)}))}^{2}$ are collected; here ${\widehat{\varphi}}_{M}$ is computed with
$L=2$ basis functions. The mean value over these 100 points is
displayed as a function of *M*; it is a Monte Carlo approximation of
${\mathrm{\Delta}}_{M}$ (see (2.12)). We compare two
implementations of Algorithm 1: first, $\U0001d5af={\U0001d5af}_{\mathrm{GL}}$
with $\rho =0.85$ and then $\U0001d5af={\U0001d5af}_{\mathrm{NR}}$ with $\rho =0.85$.
Theorem 1 establishes that ${\mathrm{\Delta}}_{M}$ is upper
bounded by a quantity of the form $\alpha +\beta /M$; such a curve is
fitted by a mean square technique (we obtain $\alpha =0.001$ for both
kernels, which is in adequation with the theorem since this term does
not depend on the Monte Carlo stages). The fitted curves are shown in
Figure 5 (left) and they demonstrate a good match between
the theory and the numerical studies.

## Comments (0)