Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter December 20, 2018

A Bayesian method for computing intrinsic pitch values using kernel density and nonparametric regression estimates

Glenn Healey

Abstract

The deployment of sensors that characterize the trajectory of pitches and batted balls in three dimensions provides the opportunity to assign an intrinsic value to a pitch that depends on its physical properties and not on its observed outcome. We exploit this opportunity by using a Bayesian framework to learn a set of mappings from five-dimensional velocity, movement, and location vectors to intrinsic pitch values. A kernel method generates nonparametric estimates for the component probability density functions in Bayes theorem while nonparametric regression is used to derive a batted ball weight function that is invariant to the defense, ballpark, and atmospheric conditions. Cross-validation is used to determine the parameters of the model. We use Cronbach’s alpha to show that intrinsic pitch values have a significantly higher reliability than outcome-based pitch values. We also develop a method to combine intrinsic values at the individual pitch level into a statistic that captures the value of a pitcher’s collection of pitches over a period of time. We use this statistic to show that pitchers who outperform their intrinsic values during a season tend to perform worse the following year. We also show that this statistic provides better predictive value for future Earned Run Average (ERA) than either current ERA or Fielding Independent Pitching (FIP).

Appendix I: strike zone transformation

For each pitch recorded by the PITCHf/x system the height of the top and bottom of the batter’s strike zone is specified manually. Since these specifications are error-prone and can vary over time, we use the average of the specified top and bottom for each individual batter over the full season to represent his strike zone in the vertical dimension. For a given season, let Bt and Bb denote the average height of the top and bottom of the strike zone specified for batter B and let Lt and Lb denote the average height of the top and bottom of the strike zone for the league. For the 2014 data set, we used Lt = 3.403 feet and Lb = 1.571 feet. For a pitch with a measured vertical height of z=lz to batter B, the normalized z-coordinate lz is computed as

lz={Lb(Bbz)zBbLb+(zBb)(LtLb)(BtBb)BbzBtLt+(zBt)zBt

Thus, a pitch at the bottom of any batter’s strike zone (z = Bb) maps to lz = Lb and a pitch at the top of any batter’s strike zone (z = Bt) maps to lz = Lt.

Appendix II: distribution of pitches across counts and outcomes

Table 4 presents the distribution of pitches thrown in the RHP vs. RHB configuration in 2014 for each count and with each outcome. As described in Section 2.2 the outcomes are R0 = ball in play, R1 = called ball, R2 = called strike, R3 = swinging strike, R4 = foul ball, and R5 = batter hit-by-pitch where foul tips that are caught for strikeouts are classified as R3 and not R4.

Table 4:

Number of pitches for each count and with each outcome, RHP vs. RHB, 2014.

CountR0R1R2R3R4R5
0–0709025,23322,68246836994151
1–04315792060232484414541
2–0146822962064639131514
3–01186561328401171
0–1625013,552405442156079125
1–15581848130853317539078
2–13102341714911494282539
3–112081235808446112511
0–2314178177392438339470
1–25265905711063705546498
2–2514155179642763515382
3–23485226058221324338

Appendix III: optimal bandwidths for kernel density estimates

In Tables 510, we present the optimal bandwidths derived for different configurations using the process described in Section 2.5.

Table 5:

Optimal bandwidths for each outcome, RHP vs. RHB, 0–0 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.451.301.300.2150.255
called ball1.551.301.350.2600.290
called strike1.451.201.200.1800.185
swinging strike1.551.301.400.2800.345
foul ball1.451.301.200.2550.285
hit-by-pitch2.501.602.450.3650.360

Table 6:

Optimal bandwidths for each outcome, RHP vs. RHB, 0–1 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.551.251.300.2550.280
called ball1.601.401.400.2950.375
called strike1.751.551.550.1900.225
swinging strike1.751.401.400.2800.345
foul ball1.551.251.400.2600.300
hit-by-pitch2.452.652.000.2850.350

Table 7:

Optimal bandwidths for each outcome, RHP vs. RHB, 1–0 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.551.251.250.2400.270
called ball1.801.451.500.2800.335
called strike1.651.501.450.2200.235
swinging strike1.651.401.450.3300.360
foul ball1.401.451.350.2750.285
hit-by-pitch2.152.702.000.3150.255

Table 8:

Optimal bandwidths for each outcome, RHP vs. LHB, 0–0 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.601.251.300.2200.260
called ball1.551.301.350.2500.300
called strike1.501.251.250.1700.190
swinging strike1.851.501.400.2900.315
foul ball1.501.301.350.2400.285
hit-by-pitch2.852.952.750.5200.465

Table 9:

Optimal bandwidths for each outcome, RHP vs. LHB, 0–1 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.351.301.550.2450.290
called ball1.751.451.500.2700.355
called strike1.951.551.700.1750.240
swinging strike1.751.451.500.3050.360
foul ball1.451.351.350.2750.300
hit-by-pitch2.802.652.900.2601.035

Table 10:

Optimal bandwidths for each outcome, RHP vs. LHB, 1–0 count.

Outcome Rjσs(j)σbx(j)σbz(j)σlx(j)σlz(j)
ball in play1.601.351.250.2600.280
called ball1.751.501.500.2750.325
called strike1.601.401.550.1950.235
swinging strike1.851.451.550.3000.370
foul ball1.551.351.350.2600.290
hit-by-pitch1.551.801.550.4550.795

Appendix IV: data used to evaluate intrinsic pitch statistics

In Tables 11 and 12, we present the data for the 34 pitchers who were used to evaluate the intrinsic pitch statistics as described in Section 4.

Table 11:

RHP with at least 162 innings pitched in 2014 and 2015.

Pitcher201420142015ERA
OMI ⋅ 1000ERAERADifference
Chris Archer−11.13.333.23−0.10
A.J. Burnett8.84.593.18−1.41
Bartolo Colon8.04.094.160.07
Johnny Cueto−6.12.253.441.19
R.A. Dickey4.73.713.910.20
Yovani Gallardo16.13.513.42−0.09
Kyle Gibson7.04.473.84−0.63
Sonny Gray−10.83.082.73−0.35
Zack Greinke10.72.711.66−1.05
Jason Hammel8.03.473.740.27
Aaron Harang6.33.574.861.29
Dan Haren13.24.023.60−0.42
Felix Hernandez−13.32.143.531.39
Ian Kennedy3.93.634.280.65
Corey Kluber3.52.443.491.05
Tom Koehler−2.83.814.080.27
John Lackey17.33.822.77−1.05
Mike Leake16.33.703.700.00
Colby Lewis25.15.184.66−0.52
Lance Lynn−4.92.743.030.29
Shelby Miller7.93.743.02−0.72
Jake Odorizzi−1.34.133.35−0.78
Rick Porcello0.53.434.921.49
Garrett Richards−7.32.613.651.04
Tyson Ross−5.82.813.260.45
Jeff Samardzija6.12.994.961.97
Max Scherzer2.13.152.79−0.36
James Shields−0.23.213.910.70
Alfredo Simon−3.03.445.051.61
Julio Teheran−5.42.894.041.15
Chris Tillman3.23.344.991.65
Yordano Ventura−8.43.204.080.88
Edinson Volquez−13.33.043.550.51
Jordan Zimmermann−4.92.663.661.00

Table 12:

RHP with at least 162 innings pitched in 2014 and 2015.

Pitcher201420142014ERA
ERAFIPERA – FIPDifference
Chris Archer3.333.39−0.06−0.10
A.J. Burnett4.594.140.45−1.41
Bartolo Colon4.093.570.520.07
Johnny Cueto2.253.30−1.051.19
R.A. Dickey3.714.32−0.610.20
Yovani Gallardo3.513.94−0.43−0.09
Kyle Gibson4.473.800.67−0.63
Sonny Gray3.083.46−0.38−0.35
Zack Greinke2.712.97−0.26−1.05
Jason Hammel3.473.92−0.450.27
Aaron Harang3.573.570.001.29
Dan Haren4.024.09−0.07−0.42
Felix Hernandez2.142.56−0.421.39
Ian Kennedy3.633.210.420.65
Corey Kluber2.442.350.091.05
Tom Koehler3.813.84−0.030.27
John Lackey3.823.780.04−1.05
Mike Leake3.703.88−0.180.00
Colby Lewis5.184.460.71−0.52
Lance Lynn2.743.35−0.610.29
Shelby Miller3.744.54−0.80−0.72
Jake Odorizzi4.133.750.38−0.78
Rick Porcello3.433.67−0.241.49
Garrett Richards2.612.600.011.04
Tyson Ross2.813.24−0.430.45
Jeff Samardzija2.993.20−0.211.97
Max Scherzer3.152.850.30−0.36
James Shields3.213.59−0.380.70
Alfredo Simon3.444.33−0.891.61
Julio Teheran2.893.49−0.601.15
Chris Tillman3.344.01−0.671.65
Yordano Ventura3.203.60−0.400.88
Edinson Volquez3.044.15−1.110.51
Jordan Zimmermann2.662.68−0.021.00

Acknowledgement

I am grateful to Sportvision and MLB Advanced Media for providing the HITf/x data which made this work possible. I am also happy to acknowledge the assistance of Qi Shi and Jason Wang in the preparation of this document.

References

Allen, D. 2009. Run Value by Pitch Location [Online]. Available: baseballanalysts.com/archives/2009/03/run_value_by_pi.php.Search in Google Scholar

Appelman, D. 2009. Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/blogs/pitch-type-linear-weights.Search in Google Scholar

Arthur, R. 2014. Entropy and the Eephus [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22758.Search in Google Scholar

Bonney, P. 2015. Defining the Pitch Sequencing Question [Online]. Available: www.hardballtimes.com/defining-the-pitch-sequencing-question.Search in Google Scholar

Bowman, A. and A. Azzalini. 1997. Applied Smoothing Techniques for Data Analysis. Oxford: Clarendon Press.Search in Google Scholar

Brooks, D. 2012. Yes, We Actually Classified Every Pitch [Online]. Available: www.fangraphs.com/tht/yes-we-actually-classified-every-pitch.Search in Google Scholar

Burley, C. 2004. The Importance of Strike One (and Two, and Three …), Part 2 [Online]. Available: www.hardballtimes.com/the-importance-of-strike-one-part-two.Search in Google Scholar

Chipman, H., E. George, and R. McCulloch. 2010. “BART: Bayesian Additive Regression Trees.” The Annals of Applied Statistics 4(1):266–98.10.1214/09-AOAS285Search in Google Scholar

Cronbach, L. 1951. “Coefficient Alpha and the Internal Structure of Tests.” Psychometrika 16(3):297–334.10.1007/BF02310555Search in Google Scholar

Deshpande, S. and A. Wyner. 2017. “A Hierarchical Bayesian Model of Pitch Framing.” Journal of Quantitative Analysis in Sports 13(3):95–112.10.1515/jqas-2017-0027Search in Google Scholar

Domingos, P. and M. Pazzani. 1996. “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier.” Thirteenth International Conference on Machine Learning, 105–12.Search in Google Scholar

Duda, R., P. Hart, and D. Stork. 2001. Pattern Classification. New York: Wiley-Interscience.Search in Google Scholar

Efron, B. and C. Morris. 1977. “Stein’s Paradox in Statistics.” Scientific American 236(5):119–27.10.1038/scientificamerican0577-119Search in Google Scholar

Duin, R. 1976. “On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions.” IEEE Transactions on Computers C- 25(11):1175–9.10.1109/TC.1976.1674577Search in Google Scholar

Fast, M. 2010. “What the heck is PITCHf/x?” In The Hardball Times Baseball Annual, 2010, edited by J. Distelheim, B. Tsao, J. Oshan, C. Bolado, and B. Jacobs, The Hardball Times, pp. 153–8.Search in Google Scholar

Glaser, C. 2010. The Influence of Batters’ Expectations on Pitch Perception [Online]. Available: www.hardballtimes.com/tht-live/the-influence-of-batters-expectations-on-pitch-perception.Search in Google Scholar

Gray, R. 2002. “Behavior of College Baseball Players in a Virtual Batting Task.” Journal of Experimental Psychology: Human Perception and Performance 28(5):1131–48.10.1037/0096-1523.28.5.1131Search in Google Scholar

Greenhouse, J. 2010. Lidge’s Pitches [Online]. Available: baseballanalysts.com/archives/2010/05/brad_lidges_out.php.Search in Google Scholar

Guidoum, A. C. 2015. “Kernel Estimator and Bandwidth Selection for Density and its Derivatives.” The kedd package, version 1.03.Search in Google Scholar

Healey, G. 2017a. “Learning, Visualizing, and Assessing a Model for the Intrinsic Value of a Batted Ball.” IEEE Access 5:13811–22.10.1109/ACCESS.2017.2728663Search in Google Scholar

Healey, G. 2017b. “The New Moneyball: How Ballpark Sensors are Changing Baseball.” Proceedings of the IEEE 105:1999–2002.10.1109/JPROC.2017.2756740Search in Google Scholar

Healey, G. 2015. “Modeling the Probability of a Strikeout for a Batter/Pitcher Matchup.” IEEE Transactions on Knowledge and Data Engineering 27(9):2415–23.10.1109/TKDE.2015.2416735Search in Google Scholar

Healey, G. and S. Zhao. 2017. “Using PITCHf/x to Model the Dependence of Strikeout Rate on the Predictability of Pitch Sequences.” Journal of Sports Analytics 3:93–101.10.3233/JSA-170103Search in Google Scholar

Healey, G., S. Zhao, and D. Brooks. 2017. Measuring Pitcher Similarity [Online]. Available: www.baseball.prospectus.com/news/article/32199/prospectus-feature-measuring-pitcher-similarity.Search in Google Scholar

Jensen, P. 2009. Using HITf/x to Measure Skill [Online]. Available: www.hardball-times.com/using-hitf-x-to-measure-skill.Search in Google Scholar

Judge, J., H. Pavlidis, and D. Turkenkopf. 2015. Introducing Deserved Run Average DRA and all its Friends [Online]. Available: www.baseballprospectus.com/article.php?articleid=26195.Search in Google Scholar

Kindo, B., H. Wang, and E. Pena. 2016. MPBART - Multinomial Probit Bayesian Additive Regression Trees [online]. Available: https://arxiv.org/pdf/1309.7821.pdf.Search in Google Scholar

Lichtman, M. 2013. Pitch Types and the Times Through the Order Penalty [Online]. Available: www.baseball.prospectus.com/article.php?articleid=22235.Search in Google Scholar

Marchi, M. 2009. Pitch Run Value and Count [Online]. Available: www.hardballtimes.com/pitch-run-value-and-count.Search in Google Scholar

Meyer, D. 2015. Dynamic Run Value of Throwing a Strike (Instead of a Ball) [Online]. Available: www.hardballtimes.com/dynamic-run-value-of-throwing-a-strike-instead-of-a-ball.Search in Google Scholar

Mills, B. 2017a. “Policy Changes in Major League Baseball: Improved Agent Behavior and Ancillary Productivity Outcomes.” Economic Inquiry 55:1104–18.10.1111/ecin.12396Search in Google Scholar

Mills, B. 2017b. “Technological Innovations in Monitoring and Evaluation: Evidence of Performance Impacts Among Major League Baseball Umpires.” Labour Economics, 46:189–99.10.1016/j.labeco.2016.10.004Search in Google Scholar

Murphy, A. and R. Winkler. 1977. “Reliability of Subjective Probability Forecasts of Precipitation and Temperature.” Applied Statistics 26(1):41–7.10.2307/2346866Search in Google Scholar

Nathan, A. 2012. Determining Pitch Movement from PITCHf/x Data [Online]. Available: baseball.physics.illinois.edu/Movement.pdf.Search in Google Scholar

Panas, L. 2010. Beyond Batting Average. Morrisville, North Carolina: Lulu Press.Search in Google Scholar

Parzen, E. 1962. “On Estimation of a Probability Density Function and Mode.” Annals of Mathematical Statistics 33(3):1065–76.10.1214/aoms/1177704472Search in Google Scholar

Pavlidis, H. and D. Brooks. 2014. Framing and Blocking Pitches: A Regressed Probabilistic Model [Online]. Available: www.baseballprospectus.com/article.php?articleid=22934.Search in Google Scholar

Pitch Type Linear Weights [Online]. Available: www.fangraphs.com/library/pitching/ linear-weights.Search in Google Scholar

Roegele, J. 2014. The Effects of Pitch Sequencing [Online]. Available: www.hardballtimes.com/the-effects-of-pitch-sequencing.Search in Google Scholar

Roegele, J. 2016. The 2016 Strike Zone [Online]. Available: www.hardballtimes.com/the-2016-strike-zone.Search in Google Scholar

Rosenblatt, M. 1956. “Remarks on Some Nonparametric Estimates of a Density Function.” Annals of Mathematical Statistics 27(3):832–7.10.1214/aoms/1177728190Search in Google Scholar

Sheather, S. 2004. “Density Estimation.” Statistical Science 19(4):588–97.10.1214/088342304000000297Search in Google Scholar

Silver, N. 2006. “Why was Kevin Maas a Bust?” In Baseball between the numbers, edited by J. Keri, Basic Books, New York, pp. 253–71.Search in Google Scholar

Tango, T., M. Lichtman, and A. Dolphin. 2007. The Book: Playing the Percentages in Baseball. Dulles, Virgina: Potomac Books.Search in Google Scholar

Thorn, J. and P. Palmer. 1984. The Hidden Game of Baseball. New York: Doubleday and Company.Search in Google Scholar

Walsh, J. 2008. Searching for the Game’s Best Pitch [Online]. Available: www.hardballtimes.com/searching-for-the-games-best-pitch.Search in Google Scholar

Weighted on Base Average (wOBA) [Online]. Available: www.fangraphs.com/library/offense/woba/.Search in Google Scholar

Weinberg, N. 2015. The Beginner’s Guide to Understanding Park Factors [Online]. Available: www.fangraphs.com/library/the-beginners-guide-to-understanding-park-factors.Search in Google Scholar

wOBA and FIP Constants [Online]. Available: www.fangraphs.com/guts.aspx? type=cn.Search in Google Scholar

Zadrozny, B. and C. Elkan. 2002. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” International Conference on Knowledge Discovery and Data Mining, 694–9.Search in Google Scholar

Zeller, R. and E. Carmines. 1980. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge, England: Cambridge University Press.Search in Google Scholar

Published Online: 2018-12-20
Published in Print: 2019-02-25

©2019 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow