As noted in Section 2.5, estimating the counterfactual probability of any outcome of a plate appearance (e.g. $P[{O}_{i}^{({a}_{D},{a}_{S})}=\text{hit}]$) requires integrating over the distribution of intermediate outcomes as in Equation 1. In many applications of the G-computation algorithm, this would require evaluating an integral of very large dimension and usually necessitates some approximation (e.g. Monte Carlo integration). The assumption that the outcome of the *j*th pitch depends on the past pitches only through the count simplifies the dimension of the integral considerably. For example, we show in the Supplementary Material that Equation 1 simplifies to

$$\begin{array}{ccccc}& P[{O}_{i}^{({a}_{D},{a}_{S})}=\text{hit}|{A}_{i}={a}_{H}]\hfill & & & \\ & =\sum _{j}P[{O}_{ij}^{({a}_{D},{a}_{S})}=\text{hit}|{A}_{i}={a}_{H}]\hfill & & & \\ & =\sum _{j}\int \int \int \int P[{R}_{i}=r|{A}_{i}=a]\hfill & & & \\ & \times P[{C}_{ij}^{({a}_{D},{a}_{S})}={c}_{j}|{R}_{i}=r,{A}_{i}=a]\hfill & & & \\ & \times {P}_{D}^{\left({g}_{j}\right)}[{D}_{ij}={d}_{j}|{C}_{ij}^{({a}_{D},{a}_{S})}={c}_{j},{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & \times {P}_{S}^{\left({g}_{j}\right)}[{S}_{ij}={s}_{j}|{D}_{ij}={d}_{j},{C}_{ij}^{({a}_{D},{a}_{S})}={c}_{j},{A}_{i}={a}_{H}]\hfill & & & \\ & \times P[{O}_{ij}^{({d}_{j},{s}_{j})}=\text{hit}|{C}_{ij}^{({a}_{D},{a}_{S})}={c}_{j},{A}_{i}={a}_{H}]\hfill & & & \\ & drd{c}_{j}d{d}_{j}d{s}_{j}.\hfill & & & \end{array}$$(8)

The probability distribution for the counterfactual count of the *j*th pitch, $P\left[{C}_{ij}^{({a}_{D},{a}_{S})}\right|{R}_{i}=r,{A}_{i}={a}_{H}]$, can be defined recursively based on the probability of the count of the previous pitch and the probability of the outcome of the previous pitch. For example, $P[{C}_{i1}^{({a}_{D},{a}_{S})}=0\text{\u2212}0|{R}_{i}=r,{A}_{i}={a}_{H}]=1$ and then the probability of having count *b*-*s* on pitch *j* is given by:

$$\begin{array}{ccccc}& P[{C}_{ij}^{({a}_{D},{a}_{S})}=b\text{\u2212}s|{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & =P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,{R}_{i}=r,\hfill & & & \\ & {A}_{i}={a}_{H}\left]P\right[{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s|{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & +\{P[{O}_{ij-1}^{({a}_{D},{a}_{S})}\in \left\{\text{whiff, called strike, foul}\right\}|{C}_{ij-1}^{({a}_{D},{a}_{S})}\hfill & & & \\ & =b\text{\u2212}(s-1),{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & \times P[{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}(s-1)|{R}_{i}=r,{A}_{i}={a}_{H}]\}\hfill & & & \\ & +I[s=2]P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{foul}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}s,{R}_{i}=r,\hfill & & & \\ & {A}_{i}={a}_{H}\left]P\right[{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}s|{R}_{i}=r,{A}_{i}={a}_{H}].\hfill & & & \end{array}$$(9)

That is, provided that we can estimate $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}|$ ${C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,{R}_{i}=r,{A}_{i}={a}_{H}]$, $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=$ $\text{whiff, called strike, foul}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}(s-1),{R}_{i}=r,$ ${A}_{i}={a}_{H}]$, and $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{foul}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}s,{R}_{i}=r,$ ${A}_{i}={a}_{H}]$, then we can estimate $P[{C}_{ij}^{({a}_{D},{a}_{S})}=b\text{\u2212}s|{R}_{i}=r,$ ${A}_{i}={a}_{H}]$ recursively. We now discuss how each of those terms may be estimated. We note that if the assumptions in Section 2.6 are valid, the first term in the right hand side of Equation 9 is given by

$$\begin{array}{ccccc}& P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & =\int {P}_{D}^{\left({g}_{j-1}\right)}[{D}_{ij-1}={d}_{j-1}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,\hfill & & & \\ & {R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & \times {P}_{S}^{\left({g}_{j-1}\right)}[{S}_{ij-1}=0|{D}_{ij-1}={d}_{j-1},{C}_{ij-1}^{({a}_{D},{a}_{S})}\hfill & & & \\ & =(b-1)\text{\u2212}s,{A}_{i}={a}_{H}]\hfill & & & \\ & \times P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}|{D}_{ij-1}={d}_{j-1},{S}_{ij-1}=0,\hfill & & & \\ & {C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,{A}_{i}={a}_{H}]dd{}_{j-1}\hfill & & & \\ & =\int P[{D}_{ij-1}={d}_{j-1}|{C}_{ij-1}=(b-1)\text{\u2212}s,\hfill & & & \\ & {R}_{i}=r,{A}_{i}={a}_{D}]\hfill & & & \\ & \times P[{S}_{ij-1}=0|{D}_{ij-1},{C}_{ij-1}=(b-1)\text{\u2212}s,{A}_{i}={a}_{S}]\hfill & & & \\ & \times P[{O}_{ij-1}=\text{ball}|{D}_{ij-1}={d}_{j-1},{S}_{ij-1}=0,\hfill & & & \\ & {C}_{ij-1}=(b-1)\text{\u2212}s,{A}_{i}=a]dd{}_{j-1}\hfill & & & \end{array}$$(10)

demonstrating how the conditional distribution of potential outcomes (e.g. ${O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}$) under regimes *a*_{D} and *a*_{S} with batter *a*_{H} hitting is identified from observed data. We discussed in Sections 3.2–3.4 how to estimate each of the conditional probabilities of the observed data. In particular, the estimated distributions are all discrete so the integrals simplify to summations. Therefore, we can estimate $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=$ $\text{ball}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,{R}_{i}=r,{A}_{i}={a}_{H}]$ using

$$\begin{array}{ccccc}& \widehat{P}[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{ball}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=(b-1)\text{\u2212}s,\hfill & & & \\ & {R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & =\sum _{{d}_{j-1}\in {\mathcal{D}}_{\left(b-1\right)\text{\u2212}s,r}}\widehat{P}[{D}_{ij-1}={d}_{j-1}|{C}_{ij-1}=(b-1)\text{\u2212}s,\hfill & & & \\ & {R}_{i}=r,{A}_{i}={a}_{D}]\hfill & & & \\ & \times \widehat{P}[{S}_{ij-1}=0|{D}_{ij-1}={d}_{j-1},{C}_{ij-1}=(b-1)\text{\u2212}s,\hfill & & & \\ & {A}_{i}={a}_{S}]\hfill & & & \\ & \times \widehat{P}[{O}_{ij-1}=\text{ball}|{D}_{ij-1}={d}_{j-1},{S}_{ij-1}=0,\hfill & & & \\ & {C}_{ij-1}=(b-1)\text{\u2212}s,{A}_{i}={a}_{H}],\hfill & & & \end{array}$$(11)

where *𝒟*_{c, r} indicates the support of pitch characteristics for ball-strike count *c* from pitchers with arsenal *r*, and $\widehat{P}$[⋅] are the estimated probabilities from the models described in Section 3 to the data. A similar approach can be used to obtain estimates of $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=$ $\text{whiff, called strike, or foul}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}(s-1),{R}_{i}=r,$ ${A}_{i}={a}_{H}]$ and $P[{O}_{ij-1}^{({a}_{D},{a}_{S})}=\text{foul}|{C}_{ij-1}^{({a}_{D},{a}_{S})}=b\text{\u2212}s,{R}_{i}=r,$ ${A}_{i}={a}_{H}]$ so that we can estimate $P[{C}_{ij}^{({a}_{D},{a}_{S})}=b\text{\u2212}s|$ ${R}_{i}=r,{A}_{i}={a}_{H}]$ by plugging in estimated quantities using the observed data for each term in the recursive formula given in Equation 9.

Returning to Equation 8, using the G-computation algorithm and substituting the estimated conditional probabilities of the observed data for the conditional distribution of the potential outcomes gives

$$\begin{array}{ccccc}& \widehat{P}[{O}_{i}^{({a}_{D},{a}_{S})}=\text{hit}|{A}_{i}={a}_{H}]\hfill & & & \\ & =\sum _{j}\sum _{r\in \mathcal{R}}\sum _{{c}_{j}\in \mathcal{C}}\sum _{{d}_{j}\in {\mathcal{D}}_{{c}_{j},r}}\sum _{{s}_{j}\in (0,1)}\widehat{P}[{R}_{i}=r|{A}_{i}={a}_{D}]\hfill & & & \\ & \times \widehat{P}[{C}_{ij}^{({a}_{D},{a}_{S})}={c}_{j}|{R}_{i}=r,{A}_{i}={a}_{H}]\hfill & & & \\ & \times \widehat{P}[{D}_{ij}={d}_{j}|{C}_{ij}={c}_{j},{R}_{i}=r,{A}_{i}={a}_{D}]\hfill & & & \\ & \times \widehat{P}[{S}_{ij}={s}_{j}|{D}_{ij}={d}_{j},{C}_{ij}={c}_{j},{A}_{i}={a}_{S}]\hfill & & & \\ & \times \widehat{P}[{O}_{ij}=\text{hit}|{D}_{ij}={d}_{j},{S}_{ij}={s}_{j},{C}_{ij}={c}_{j},{A}_{i}={a}_{H}],\hfill & & & \end{array}$$(12)

where $\mathcal{C}=\{0\text{\u2212}0,1\text{\u2212}0,0\text{\u2212}1,\mathrm{\dots},3\text{\u2212}2,$ plate appearance complete} is the set of all possible pitch counts and *ℛ* is the set of all arsenals. A very similar approach can be used to estimate $P[{O}_{i}^{({a}_{D},{a}_{S})}=\text{out}|{A}_{i}={a}_{H}]$ and $P[{O}_{i}^{({a}_{D},{a}_{S})}=\text{walk}|{A}_{i}={a}_{H}]$ to estimate the counterfactual batting average, on-base percentage, and slugging percentage which we describe in detail in the Supplementary Material.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.