We now turn our attention now to measuring the impact framing has on the game. Formally, let *S* be a random variable counting the number of runs the pitching team gives up after the current pitch to the end of the half-inning. Using slightly different notation than that in Section 2.3, let **h** encode the handedness of the batter and pitcher and let **lo** be the estimated log-odds of a called strike from the appropriate historical GAM. Let **b**, **ca**, **co**, **p** and **u** denote the batter, catcher, count, pitcher, and umpire involved in the pitch. Finally, denote the baseline catcher Brayan Pena by *ca*_{0}. For compactness, let *ξ* = (**u**, **co**, **lo**, **b**, **p**, **h**) and observe that every pitch in our dataset can be identified by the combination (**ca**, *ξ*). For each catcher *ca*, let ${\mathcal{P}}_{ca}$ be the set of all called pitches caught by catcher *ca*:

$${\mathcal{P}}_{ca}=\{(\mathbf{ca},\xi ):\mathbf{ca}=ca\}.$$

Finally, let *TAKEN* be an indicator for the event that the current pitch was taken and let *CALL* ∈ {*Ball*, *Strike*} be the umpire’s ultimate call. We will be interested in the expected value of *S*, conditioned on (**ca**, *ξ*), the fact that pitch was taken, and the umpire’s call. Assuming, conditioned on the count, the fact that the pitch was taken, and the call, *S* is independent of pitch location and participants, we have

$$\begin{array}{ccccc}E\left[S\right|\mathbf{ca},\hfill & \xi ,TAKEN]=\sum {}_{CALL}E[S|COUNT,TAKEN,CALL]\hfill & & & \\ & P\left(CALL\right|\mathbf{ca},\xi ,TAKEN)\hfill & & & \end{array}$$

To determine the expected number of runs given up that can be attributed to a catcher *ca*, we may consider the counter-factual scenario in which the catcher is replaced by the baseline catcher, Brayan Pena, with all other factors remaining the same. In this scenario, the expected number of runs the fielding team gives up in the remaining of the half-inning is *E*[*S∣***ca** = *ca*_{0}, *ξ*, *TAKEN*, *CALL*]. We may interpret the difference

$$\begin{array}{ccccc}E\left[S\right|\mathbf{ca}\hfill & =ca,\xi ,TAKEN,CALL]-E[S|\mathbf{ca}\hfill & & & \\ & =c{a}_{0},\xi ,TAKEN,CALL]\hfill & & & \end{array}$$

as the average number of runs saved (i.e. negative of runs given up) by catcher *ca*’s framing, relative to the baseline. A straightforward calculation shows that this difference is exactly equal to

$$\begin{array}{ccccc}f(ca,\xi )\hfill & =(P\left(Strike\right|\mathbf{ca}=ca,\xi ,TAKEN)-P\left(Strike\right|\mathbf{ca}\hfill & & & \\ & =c{a}_{0},\xi ,TAKEN))\times \rho (COUNT),\hfill & & & \end{array}$$

where

$$\begin{array}{ccccc}\rho \left(COUNT\right)\hfill & =E\left[S\right|COUNT,TAKEN,Ball]-E\left[S\right|COUNT,\hfill & & & \\ & TAKEN,Strike].\hfill & & & \end{array}$$

We can interpret the difference in called strike probabilities above as catcher *ca*’s *framing effect*: it is precisely how much more the catcher adds to the umpires’ called strike probability than the baseline catcher, over and above the other pitch participants, pitch location, and count. We can easily simulate approximate draws from the posterior distribution of this difference using the posterior samples from Model 3. We interpret *ρ* as the value of a called strike in a given count: it measures how many more runs a team is expected to give up if a taken pitch is called a ball as opposed to a strike.

Table 3: Empirical estimates of run expectancy and run value, with standard errors in parentheses.

To compute *ρ*, we begin by computing the difference in the average numbers of runs scored after a called ball and after a called strike in each count. For instance, 182,405 0–1 pitches were taken (140,667 balls, 41,738 called strikes) between 2011 and 2014. The fielding team gave up an average of 0.322 runs following a ball on a taken 0–1 pitch, while they only gave up an average of 0.265 runs following a called strike on a taken 0–1 pitch. So conditional on an 0–1 pitch being taken, a called strike saves the fielding team 0.057 runs, on average. Table shows the number of runs scored after a called ball or a called strike for each count, as well as an estimate of *ρ*. Also shown is the relative proportion of each count among our dataset of taken pitches from 2011 to 2014. We see, for instance, that a called strike is most valuable on a 3–2 pitch but only 2.1% of the taken pitches in our dataset occurred in a 3–2 count. This calculation is very similar to the seminal run expectancy calculation of Lindsey (1963), though ours is based solely on count rather than on the number of outs and base-runner configuration. Albert (2010) also computes a count-based run expectancy, through his valuations are derived using the linear weights formula of Thorn and Palmer (1985) rather than the simple average. See Albert (2015) for a more in-depth discussion of run expectancy.

The weighted average run value of a called strike based on Table is 0.11 runs, slightly smaller than the value of 0.14 used by Judge et al. (2015) and much smaller than the 0.161 figure used by Turkenkopf (2008). The discrepancy stems from the fact that we estimated the run values based only on taken pitches while most other valuations of strikes include swinging strikes and strikes called off of foul balls. It is worth stressing at this point that in our subsequent calculations of framing impact we use the count-based run valuation as opposed to the weighted average value.

Table 4: Top and Bottom 10 catchers according to the posterior mean of RS.

With our posterior samples and estimates of *ρ* in hand, we can simulate draws from the posterior distribution of *f*(*ca*, *ξ*) for each pitch in our dataset. An intuitive measure of the impact of catcher *ca*’s framing, which we denote RS for “runs saved” is

$$\text{RS}\left(ca\right)=\sum _{(\mathbf{ca},\xi )\in {\mathcal{P}}_{ca}}f(ca,\xi ).$$

The calculation of RS is very similar to the one used by Judge et al. 2015 to estimate the impact framing has on the game. Rather than using fixed baseline catcher, Judge et al. (2015) reports the difference in expected runs saved relative to a hypothetical average catcher. According to their model, Brayan Pena, our baseline catcher, was no different than this average catcher, so our estimates of RS may be compared to the results of Judge et al. (2015). Table shows the top and bottom 10 catchers, along with the number of pitches in our dataset received by the catchers, and the posterior mean, standard deviation, and 95% credible interval of their RS values. Also shown are Judge et al. (2015)’s estimates of runs saved for the catchers, as well as the number of pitches used in their analysis.

According to our model, there is little posterior uncertainty that the framing effects of the top 10 catchers shown in Table had a positive impact for their teams, relative to the baseline catcher. Similarly, with the exception of Welington Castillo and Chris Iannetta, we are rather certain that the bottom 10 catchers’ framing had an overall negative impact, relative to the baseline. We estimate that Miguel Montero’s framing saved his team 25.71 runs on average, relative to the baseline. That is, had he been replaced by the baseline catcher on each of the 8086 called pitches he received, his team would have given up an additional 25.71 runs, on average. Unsurprisingly, our estimates of framing impact differ from those of Judge et al. (2015)’s model. This is largely due to differences in the model construction, valuation of a called strike, and collection of pitches analyzed. Indeed, in some cases, (e.g. Montero and Rene Rivera), they used more pitches to arrive at their estimates of runs saved while in others, we used more pitches (e.g. Mike Zunino and Jonathan Lucroy). Nevertheless, our estimate are not wholly incompatible with theirs; the correlation between our estimates and theirs is 0.94. Moreover, if we re-scale their estimates to the same number of pitches we consider, we find overwhelmingly that these re-scaled estimates fall within our 95% posterior credible intervals.

## 4.1 Catcher aggregate framing effect

Looking at Table , it is tempting to say that Miguel Montero is the best framer. After all, he is estimated to have saved the most expected runs relative to the baseline catcher. We observe, however, that Montero received 8086 called pitches while Conger received only 4743. How much of the difference in the estimated number of runs saved is due to their framing ability and how much to the disparity in the called pitches they received?

A naive solution is to re-scale the RS estimates and compare the average number of runs saved on a per-pitch basis. While this accounts for the differences in number of pitches received, it does not address the fact that Montero appeared with different players than Conger and that the spatial distribution of pitches he received is not identical to that of Conger. In other words, even if we convert the results of Table to a per-pitch basis, the results would still be confounded by pitch location, count, and pitch participants.

To overcome this dependence, we propose to *integrate f*(*ca*, *ξ*) over all *ξ* rather than summing *f*(*ca*, *ξ*) over ${\mathcal{P}}_{ca}$. Such a calculation is similar to the spatially aggregate fielding evaluation (SAFE) of Jensen, Shirley, and Wyner (2009). They integrated the average number of runs saved by a player successfully fielding a ball put in play against the estimated density of location and velocity of these balls to derive an overall fielding metric un-confounded by dispraise in players’ fielding opportunities. We propose to integrate *f*(*ca*, *ξ*) against the empirical distribution of *ξ* and define catcher *ca*’s “Catcher Aggregate Framing Effect” or CAFE to be

$$\text{CAFE}\left(ca\right)=4000\times \frac{1}{N}\sum _{\xi}f(ca,\xi ),$$(2)

The sum in Equation 2 may be viewed as the number of expected runs catcher *ca* saves relative to the baseline if he participated in every pitch in our dataset. We then re-scale this quantity to reflect the impact of his framing on 4000 “average” pitches. We opted to re-scale by CAFE by 4000 as the average number of called pitches received by catchers who appeared in more than 25 games was just over 3992. Of course, we could have easily re-scaled by a different amount.

Table 5: Top and Bottom 10 catchers according to the posterior mean of CAFE.

Once again, we can use our simulated posterior samples of the Θ^{u}’s to simulate draws from the posterior distribution of CAFE. Table shows the top and bottom 10 catchers ranked according to the posterior mean of their CAFE value, along with the posterior standard deviation, and 95% credible interval for their CAFE value. Also shown is the a 95% interval of each catchers marginal rank according to CAFE.

We see that several of the catchers from Table also appear in Table . The new additions to the top ten, Christian Vazquez, Martin Maldonado, Chris Stewart, and Francisco Cervelli were ranked 13th, 17th, 18th and 19th according to the RS metric. The fact that they rose so much in the rankings when we integrated over all *ξ* indicates that their original rankings were driven primarily by the fact that they all received considerably fewer pitches in the 2014 season than the top 10 catchers in Table . In particular, Vazquez received 3198 called pitches, Cervelli received 2424, Stewart received 2370, and Maldonado received only 1861.

Interestingly, we see that now Hank Conger ranks ahead of Miguel Montero according to the posterior mean CAFE, indicating that the relative rankings in Table was driven at least partially by disparities in the pitches the two received than by differences in their framing effects. Though Conger emerges as a slightly better framer than Montero in terms of CAFE, the difference between the two is small, as evidenced by the considerable overlap in their 95% posterior credible intervals.

We find that in 95% of the posterior samples, Conger had anywhere between the largest and 11th largest CAFE. In contrast, we see that in 95% of our posterior samples, Tomas Tellis’s CAFE was among the bottom 3 CAFE values. Interestingly, we find much wider credible intervals for the marginal ranks among the bottom 10 catchers. Some catchers like Koyie Hill and Austin Romine appeared very infrequently in our dataset. To wit, Hill received only 409 called pitches and Romine received only 61. As we might expect, there is considerable uncertainty in our estimate about their framing impact, as indicated by the rather wide credible intervals of their marginal rank.

## 4.2 Year-to-year reliability of CAFE

We now consider how consistent CAFE is over multiple seasons. We re-fit our model using data from the 2012 to 2015 seasons. For each season, we restrict attention to those pitches within one foot of the approximate rule book strike zone from that season. We also use the log-odds from the GAM models trained on all previous seasons so that the model fit to the 2012 data uses GAM forecasts trained only on data from 2011 while the model fit to the 2015 data uses GAM forecasts trained only on data from 2011 to 2014. When computing the values of CAFE, we use the run values given in Table for each season. There were a total of 56 catchers who appeared in all four of these seasons. Table shows the correlation between their CAFE values over time.

Table 6: Correlation of CAFE across multiple seasons.

In light of the non-stationarity in strike zone enforcement across seasons, it is encouraging to find moderate to high correlation between a player’s CAFE in one season and the next. In terms of year-to-year reliability, the autocorrelations of 0.5–0.7 place CAFE on par with slugging percentage for batters. Interestingly, the correlations between 2012 CAFE and 2013 CAFE and the correlation between 2013 CAFE and 2014 CAFE are >0.7, but the correlation between 2014 CAFE and 2015 CAFE is somewhat lower, 0.58. While this could just be an artifact of noise, we do note that there was a marked uptick in awareness of framing between the 2014 and 2015 seasons, especially among fans and in the popular press. One possible reason for the drop in correlation might be umpires responding to certain catcher’s reputations as elite pitch framers by calling stricter strike zones, a possibility suggested by Sullivan (2016).

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.