Evaluating Pitchers with FIP, Part II

This is the second half of a two part article. Read the first half here: Evaluating Pitchers with FIP, Part I

Rather than throw out stats simply because they regress, you should only throw them out if you think what it does and does not regress is wrong (possibly RBI or W-L record for pitchers, for example) or that how it regresses events is wrong (possibly WHIP, for example). FIP regresses sequencing/timing, leverage/situation, and distribution of batted balls (including things like how many ground balls happen to find holes or how many find fielders), all of which generally fall under the term "Luck", as well as factors like park effects (unless you park adjust it, which you can if you want) and quality of opponents. It does not regress defensive support. Your decision on whether or not to consider FIP in evaluating what happened should hinge on whether you feel these decisions offer a useful perspective or not. Maybe you have a problem with all of these decisions. That would make evaluating pitchers very difficult, however.

If you think sequencing or timing should not in any case be regressed, that also means opponent batting lines are out. If you think leverage or situation should not in any case be regressed, then ERA is out. Perhaps you have an issue with regressing anything termed "Luck", which, in the case of FIP, means if a pitcher gives up a line drive in the gap or one right at a fielder, you don't care if it was because of the pitcher's ability or if it was just random chance, you want to evaluate what happened. If you feel that way, however, you should ask whether you also have a problem with the other "Luck" aspects listed in the above paragraph besides batted ball distribution. They also show up in other stats like ERA and opponent batting lines. Is it luck whether a pitcher allows a single with the bases loaded in the bottom of the 9th of a tie game or in a blowout? The former is obviously much more costly measuring by outcome, but opponent batting lines count them the same, and ERA actually counts the latter as worse (because it will often be counted as 2 ER, while the former can only ever count as 1). ERA might not even count the former case at all if there was an error in the inning that would have been a third out. If a shortstop boots a ball, do you say the pitcher had nothing to do with it, or do you not want to consider luck and look only at what happened? Opponent batting lines ignore the actual outcome and pretend the pitcher got the out as if the error never happened. ERA either ignores the event altogether or pretends the pitcher got the out even though he didn't (he won't get credit for the out in the IP part of ERA, but if it becomes the third out of the inning, he is given credit for getting out of the inning so no further runs are earned; in fact, in such a case, ERA can completely throw out even events like walks and home runs as if they never happened).

Maybe you still want to charge batted ball distribution to the hitter but not other aspects of "Luck", which is ok as long as you understand what you are doing and that you are still regressing "Luck" in other cases rather than charging the pitcher regardless. That doesn't mean you throw out FIP. You still have the other benefits of regressing "Luck" that you have chosen to regress in using other stats, and you have the advantage that FIP regresses some of those factors differently and sometimes better. For example, in ignoring distribution and regressing all events to the average value of the event, opponent batting lines assign an arbitrary value to each event. FIP weights each event to match the value of the event to its value in reality. You could also do this with opponent batting lines, but you would have to do it yourself.

You also have the other advantages of FIP, particularly that it does not regress defensive support. If a fielder makes a great play, should we credit that value entirely to the pitcher? The pitcher may have had some effect in creating the out, but did the pitcher really create as much value on a diving play in the hole by the shortstop as he does with a strikeout, or does the shortstop create some of that value? Both outs are worth the same overall (depending on the situation; the strikeout is actually worth slightly more overall, but given that both occur in a situation that gives them the same value), but in one, the pitcher created almost all of the value to the defensive team while on the other, he shares most of the value with the shortstop. Over the course of the season, some of the deviation from the average value of a ball in play will be due to the pitcher pitching better or worse than average and some will be due to the defense playing better or worse than average. Crediting all of the deviation to either the pitcher or the defense (or rather, all to the pitcher or none to pitcher, which is what both FIP and other stats like ERA, opponent batting lines, etc. do; none of them actually measure defensive value) is wrong, which means you probably shouldn't throw out either type of stat, because both tell you something about how well the pitcher pitched. Which one is less, wrong, though? It depends in part on how big your sample is, but probably crediting none of the variation to the pitcher, at least over one season. You are going to lose less by regressing the effect of the pitcher completely to the mean than by regressing the effect of the fielder completely to the mean. Even if you disagree with that, the regression that each does is wrong to some extent, so you shouldn't take one and say it does not measure value because of its regression and take the other and pretend it doesn't also regress.

So in fielding-independent stats, we have a distinct perspective on defensive support that is not provided by traditional stats. We also have a distinct perspective on other issues. Opponent batting lines and FIP both give a sequence-independent perspective, but each perspective takes a different approach in choosing how to group events to regress to the mean value and in how to decide what value to use for those events, as well as how to present that value (opponent batting lines as a pseudo-binomial rate and FIP as a run value rate). ERA and FIP both present a distribution-independent run-value rate that is based on the actual values of events, but each makes different assumptions about what values or factors should or should not be regressed. Some of these assumptions are clearly better in FIP's case (choosing not to regress defensive or bullpen support), some are more grey but still favour FIP (not discarding events that happen after a botched third out), and some simply offer differing perspectives that each have value (choosing to regress the value of events or simply take the outcome, regressing sequencing or not). It is important to consider all of these perspectives in analyzing what happened.

Most people will not consider only one stat in evaluating pitchers because they intuitively understand (even if they aren't aware that this is what's happening) the concept that each stat they look at is regressing factors, usually arbitrarily, and is not measuring all value. Many people also have the idea that some factors should be regressed, or else they wouldn't look at anything related to opponent batting lines (WHIP, AVG/OBP/SLG against, etc) that regress the value of each event to some average. They use a combination of stats to get the full picture: opponent production gives a sequence/timing-independent perspective, ERA gives a leverage/situation- and distribution-independent perspective that does not regress sequence, strikeouts and walks give a fielding-independent perspective that doesn't regress defensive support, wins/losses give a park-independent perspective that does not regress leverage or sequence (though the last one is ignored by a lot of people because it also does a lot of things that it shouldn't, and there are better measures that do the same thing). Each piece gives some part of the picture that is not the full picture. FIP and other stats of its ilk add a fielding-, sequence-, batted-ball-distribution- and leverage-independent perspective, and they give an added dimension to the perspectives considered in other stats. FIP becomes one of many stats to consider to give you a fuller perspective. It is not designed to measure everything or be comprehensive. Sometimes people will want to throw it out because they think it purports to do such and thus label it a failure for not being so. This is a mistake; you would not throw out any stat you do use for such a failure. FIP is just another of those stats to add to the equation. If you don't like how FIP handles some factors, you shouldn't throw out the entire perspective. You just balance its flaws with other perspectives that handle those factors differently, just as you would include FIP to balance the flaws of those perspectives.

At this point, the regression of batted-ball distribution is probably the number one issue taken with FIP. There are a lot of fans who can accept most of the above, and even that distribution of batted ball locations should be regressed, but not batted ball types. At this point, you've probably already accepted fielding-independent measures in general (which is really what all of the above is about; FIP is merely the most familiar of these). This is where you would want to consider a number of different fielding independent measures, such as DIPS, various forms of tRA, and xFIP. Each handles these factors differently, so if you don't like how FIP regresses batted ball types, perhaps you would prefer something like tRA. Just for bonus coverage, though, we can look a bit at whether FIP's handling of BIP is actually wrong.

A common perception is that FIP regresses batted ball type distribution to the average distribution. It doesn't really, though. It just regresses the aggregate value of all balls in play. There is a clear skill in whether a pitcher tends to allow more fly balls or ground balls that manifests even over a single season, so we probably shouldn't regress ground ball or fly ball tendencies. What is the primary difference between ground balls and fly balls, though? Home runs. So this tendency is accounted for in FIP. Once you remove home runs and only look at balls in play, the difference in the value of a ground ball and a fly ball in play is not that great. The distribution of singles and extra base hits will be different for each, but FIP doesn't care about the breakdown of hits, only their aggregate value, which is fairly similar. Ground balls are worth a bit more than fly balls in play, so extreme fly ball pitchers may be slightly undervalued by FIP (though I haven't looked at this to see whether it is true). Line drives are another story. Unlike GB/FB tendencies, line drive tendencies for pitchers tend to regress pretty heavily. Their value is also significantly different from that of ground balls and fly balls. So the question is, should we hold the pitcher responsible for high or low LD rates, or should we regress them? Because of the huge value of a line drive, this decision can make a big difference. Again, I suggest considering both perspectives here to some extent, but if we are already conceding the regression of some factors outside a pitchers control, the tendency of LD rate to regress should be enough that we should at least consider regressing it and leaving only GB/FB tendencies rather than just dismiss FIP for regressing the influence of line drives.

Maybe after all this, you still don't think FIP, et al provide any value in evaluating pitchers. Maybe you understand that FIP isn't just throwing out events and how the regression in FIP is a concept that appears in traditional stats as well, but you just don't feel the perspective FIP adds anything to the picture. If you have considered all of the above and asked yourself what you think is important and what is not important to evaluating pitchers, you are free to choose whatever stats best represent your perspectives. If you are rejecting defense-independent statistics for the more common reasons based on an incomplete understanding of the numbers, however, you should probably reconsider.

0 comments:

Post a Comment