Clutch, WPA/LI, and the Home Run Bias

The stat Clutch, as published on FanGraphs and Baseball-Reference, is designed to quantify how much better or worse a hitter has produced in situations based on how critical those situations are in the immediate context of a game.  Players who perform better in more critical situations (for example, late in a close game) than they normally do will have a positive Clutch rating, and players who perform worse in such situations will have a negative Clutch.  It does this by comparing two values for a hitter:  his WPA and his WPA.LI.  I will assume you are familiar enough these two stats (not necessarily their inner workings, but at least what they are) as a prerequisite for this piece; if not, you can catch up on B-R's or FanGraphs' explanation pages.

WPA.LI follows two key constraints.  The first is that, for a given game state (i.e. the inning, the score, the number of outs, and the placement of any runners on base), the relative value of a play is determined by how much that play affects the team's chances of winning.  If the bases are empty, a walk is credited the same as a single.  If the bases are loaded with the winning run on third, a walk is credited the same as a home run.  This constraint works exactly like WPA (as one might expect from a WPA-based metric).

The second constraint differentiates WPA.LI from WPA.  One of the properties of WPA is that some situations are inherently weighted more strongly than others.  A key at bat late in a close game can swing a team's chances of winning by several times as much as the same result in a blowout, and it is credited accordingly.  WPA.LI, on the other hand, ensures that the average play in every situation gets the same weight.

So, on the one hand, you have WPA, which weights PAs according to their immediate impact on the game.  One clutch PA might be worth as much as 4 or 5 normal PAs, and one mop-up PA might be worth practically nothing.  On the other hand, you have WPA.LI, which weights every PA equally, just like most other stats do.  Basically, it is linear weights, but with the ability to tailor the value of each event to the specific situation rather than sticking to a blanket value for each event across all situations.  While WPA tells the story of clutch hitting (who got the big hit when the team most needed production), WPA.LI tells the story of situational hitting (who got on base when the team needed baserunners, put the ball in play when the strikeout was most costly, or hit for power when advancing runners quickly was more important than getting another guy on first).

There is a third important constraint which WPA.LI does not adhere to, however.  Ideally, the average value of each event would match its linear weights value.  If a home run is worth 1.4 runs above average across all situations, then you would like the average WPA.LI value of a HR to be 1.4 runs (or rather, the equivalent value on the wins scale).  That is not the case, however.

The following linear weights values represent the average change in run and win expectancy for that event across all situations, along with the average WPA.LI value of each event.  All three versions have been placed on the runs scale by setting the value of the out at -.27 in order to make them easier to compare directly:



RE WPA WPA.LI
1B 0.47 0.47 0.44
2B 0.77 0.75 0.75
3B 1.05 1.06 1.04
HR 1.41 1.42 1.58
BB 0.31 0.30 0.31
K -0.29 -0.30 -0.29
out -0.27 -0.27 -0.27

As you can see, WPA.LI does fine at assigning the correct value to most events, but the value of the HR is way off.  This may seem counterintuitive; if WPA.LI just creates custom linear weights for each situation based on the WPA values, why would the average WPA.LI value be different from the average WPA value?  We can look at the mathematical relationship between WPA and WPA.LI to see why this is.
For a single play, we have WPA = WPA.LI * LI.  Now, let X be a variable that represents the set of WPA.LI values for all home runs, and Y be a variable that represents the set of LI values for all home runs:

X = WPA.LI
Y = LI
WPA = XY

The linear weights value of the home run will be the expected value (i.e. the mean) of the set of all home runs:

linear.weights(WPA) = E[XY]
linear.weights(WPA.LI) = E[X]

In order for the linear weights values implied by WPA and WPA.LI to be equal, then E[XY] has to equal E[X].  The relationship between these two values can be explored using covariance, which is defined as:

COV(X,Y) = E[XY] - E[X]*E[Y]

Rearranging, we get:
E[XY] = COV(X,Y) + E[X]*E[Y]

Now, let E[XY] = E[X] + d, where d is an error term representing the difference between E[XY] and E[X].  If E[XY] = E[X], then d=0.

E[X] + d = COV(X,Y) + E[X]*E[Y]
d = COV(X,Y) + E[X]*(E[Y] - 1)

Note that when two variables are independent, their covariance is zero (in which case the first term will be zero), and that when an event occurs randomly across all situations, E[Y] = 1 (because the average Leverage Index is 1), and the second term will equal zero.

From this, we can see that there are two things that can cause the WPA.LI value of an event to deviate from its proper value.  One, the WPA.LI value of an event is not independent of the LI of the situation.  Two, the event does not occur randomly across all situations, so that the average LI value for that event is not 1.  If either or both of these is the case, then the average WPA.LI value of an event will deviate from its WPA value (unless the two error terms cancel each other out).

Both of these are in fact the case for the HR.  Home runs are worth slightly more, relative to other events, in lower-leverage situations (i.e. WPA.LI value is negatively correlated with LI for HR), and home runs, like other extra base hits, happen slightly more often in low-leverage situations than in high-leverage situations.  Both of these sources of error are in the same direction, and their cumulative effect is that the WPA.LI value of a HR is about .016 wins higher than the WPA value (.0108 from the covariance and .0056 from the average LI).

Because WPA.LI assigns a higher value to HR than does WPA, WPA.LI will be skewed high for home run hitters relative both to WPA and to static linear weights.  This complicates comparisons between WPA.LI and other stats.  For example, the stat Clutch, as defined as WPA-WPA.LI*, runs into problems with high-HR hitters. 


*Clutch is actually WPA/LI - WPA.LI, where WPA/LI is literally WPA divided by the average Leverage Index for the player, but this is really hard to write without confusing WPA divided by LI for WPA.LI, which is usually written as WPA/LI.  That, by the way, is why I have been using WPA.LI instead of WPA/LI to refer to that stat.

The pattern of sluggers rating worse than contact hitters in Clutch rankings has been noted by various observers.  As shown in the Book Blog thread (see especially Cyril Morong's posts), a player's HR rate correlates strongly with his Clutch rating.  Similarly, a player's HR rate in one season predicts his Clutch in the following year even better than his Clutch from that season does (year-to-year r for Clutch for hitters with at least 300 PAs in both seasons is about .06; for year-1 HR rate to year-2 Clutch, it is about -.12).

Take a look at the top 10 hitters in HR/PA from 2000-2011 (min 1000 PAs):



           WPA                WPA.LI             Clutch
Barry Bonds 55.69 58.47 -2.79
Sammy Sosa 15.97 22.66 -6.97
Ryan Howard 24.75 20.67 2.60
Jim Thome 32.39 41.53 -10.04
Alex Rodriguez 45.30 52.63 -7.16
Albert Pujols 55.77 59.30 -2.80
Ryan Braun -0.58 4.77 -5.33
Manny Ramirez 41.27 44.04 -4.12
Marcus Thames -1.49 1.69 -3.17
Jason Giambi 39.87 38.53 0.37
AVERAGE

-3.94





Collectively, these 10 hitters average a Clutch rating of -3.94 wins over about 5000 PAs.  This effect is entirely due to the bias in WPA.LI with regard to home runs, though, and not to any deficiency in clutch hitting by the group.  If we compare WPA not to WPA.LI, but to linear weights (taken as the average WPA value of each event) for these players, we see that their WPA contributions are almost exactly what we would expect from their context-ignorant production:



         WPA      linear.weights      Clutch.LW
Barry Bonds 55.69 53.33 2.35
Sammy Sosa 15.97 19.11 -3.43
Ryan Howard 24.75 18.15 5.12
Jim Thome 32.39 33.82 -2.32
Alex Rodriguez 45.30 43.69 1.78
Albert Pujols 55.77 57.97 -1.47
Ryan Braun -0.58 3.17 -3.72
Manny Ramirez 41.27 42.51 -2.58
Marcus Thames -1.49 0.51 -1.99
Jason Giambi 39.87 31.70 7.20
AVERAGE

.09





This version of Clutch (WPA - linear weights) removes the HR-bias of the WPA.LI version.  Clutch.LW shows almost no correlation with HR rate (either for the same year or adjacent years), and the leader board becomes HR-neutral.

While it appears that many of the top sluggers in the game have been particularly un-clutch based on the FG and BR leader boards, this is probably not actually the case.  The mathematical properties of WPA.LI (specifically the possibility of correlation with LI and of non-randomness of events) just happen to skew the results in that direction.  This can be addressed by using linear weights values (especially linear weights derived from WPA) as the context neutral baseline to compare against WPA rather than using WPA.LI.


Note:  All win probability and leverage index numbers used in this post come from the tables created here. These figures are based on 1993-2010 data and are not calibrated to 2000-2011, nor are they adjusted for the different run-environments of each park (as the FG and B-R figures are). As a a result, the WPA figures here won't match those sites, but should serve fine for illustrative purposes.

3 comments:

Cyril Morong said...

Very interesting. Thanks for mentioning my research. What I posted at tango's blog I went into more detail on mine. See

http://cybermetric.blogspot.com/2010/07/dont-let-your-little-leaguers-grow-up.html

I also did something several years ago called "Do Power Hitters Choke in the Clutch?" it is at

http://cyrilmorong.com/Choke.htm

A study called “Clutch Hitting: Fact or Fiction?” By Andrew Dolphin suggests that they might. It is at

http://www.dolphinsim.com/ratings/notes/clutch.html

I also did something similar to what you did here by just comparing WPA to linear weights in my presentation on clutch in 2002 at the Boston SABR convention. I had Bonds doing better in the clutch than his linear weights stats would predict. Maybe I will post this at my blog

Kincaid said...

Thanks for the additional research, Cyril. Here is the blog post for anyone who is interested:

http://cybermetric.blogspot.com/2012/06/do-power-hitters-choke-in-clutch.html

Cyril Morong said...

You're welcome and thanks for the link to my blog

Post a Comment