Recently, Baseball-Reference briefly unveiled a new version of ERA+ which rescaled the stat to show how a pitcher's ERA related to the league rather than how the league related to the pitcher's ERA. With the old numbers, 200 meant that the league ERA was 100% higher than the pitcher's ERA; with the new figures, that 200 is instead presented as 150, meaning that the pitcher's ERA was 50% lower than the league ERA. The advantages of the new version, as well as some of the limitations, were discussed recently by Patriot on his blog.
This new version was soon rolled back, as B-R announced that the change was premature and that the official change will require a more organized approach (apparently simply switching important numbers around unannounced with no explanation and waiting to see if anyone notices is confusing, or something like that). However, in the ensuing discussion of the new version in the afore-linked B-R blog post, poster PhilM brought up an interesting point.
This post is not about anything specific to the new version of ERA+ or to the switch at B-R (should I have mentioned this somewhere in the 2-paragraph lead-in? Well, I'm telling you now, anyway), but before I begin (begin? Should I have also done that before the third paragraph?), let me explain one key benefit of the new version of ERA+. Under the old version of the stat, 2 pitchers with ERA+'s of 200 and 100 over the same number of innings would have a combined ERA+ of...133. Strange, huh? Two 150 ERA+ pitchers were better than a 200 and a 100 ERA+ pitcher, because to combine ERA+, you had to take the harmonic mean, not the simple average. Convert those to the new version, and the 200/100 pair of pitchers have ERA+'s of 150 and 100, for a combined 125 (hey, that's what we'd expect!). The 150/150 pair under the old ERA+ would each have an ERA+ of 133 using the new version. With the new ERA+, it's obvious how to combine ERA+, and it's easy to see that two 133 pitchers are better than a 150 pitcher and a 100 pitcher combined. For the rest of this article, I'll refer only to the new version, so when you see ERA+ after this, think of the version where 150/100 average to 125.
Now, back to PhilM (remember PhilM?): PhilM noticed that when he calculated Walter Johnson's career ERA+ by taking the average of each season ERA+ weighted by IP, it was not exactly the same as his actual career ERA+, although it was close (133 vs. 132). This is an important observation. ERA+ represents the percentage of the average number of earned runs a pitcher allows over a given number of innings. A 150 pitcher allows 50% fewer ER than average. A 133 pitcher allows 33% fewer ER than average. A 90 pitcher allows 10% more ER. You arrive at this percentage by dividing the pitcher's ER by the amount of ER a league average pitcher would have allowed (which you figure by prorating the league ERA to the pitcher's IP). A pitcher who allows 50 ER when the league average pitcher would have allowed 100 allows 50% fewer ER than average, so he has a 150 ERA+.
What this means is that if you want to combine ERA+, what you are really doing is figuring the combined ER allowed, and the combined league average ER allowed; in other words, when each individual ERA+ represents the fraction ER/lgER, you are adding the numerators of each fraction and adding the denominators of each fraction (not adding the fractions themselves; you don't care about common denominators or anything of the like-you may let out a collective sigh of relief here if you feel so moved). If your individual ERA+'s represent the following fractions:
50/100, 20/30, 70/60, 100/100
Then the combined ERA+ will be:
Simple enough, right? Now, what PhilM noticed was that this calculation for combined ERA+ is not the same as weighting each individual ERA+ by IP and then averaging them, even though that is supposed to be how we combine ERA+. What gives?
The issue is that averaging the individual ERA+'s, weighted by IP, does in fact simplify to the sum of ER allowed divided by the sum of league ER allowed, but only if the league ERA is the same in each ERA+ you are combining. If that is not the case (and it usually isn't), then what you are doing when you average multiple ERA+'s is not calculating the combined ERA+ but estimating it (albeit pretty well).
This brings us at last to the actual point of this post. PhilM then asked the all-important question: if averaging each season ERA+, weighted by IP, is not the same as calculating career ERA+ directly, which is the better method? PhilM, being the smart guy that he is (I don't really know if PhilM is a smart guy; I assume he is smart because he asked such a good question, and I assume he is a guy because PhilM is a fairly masculine handle), noted that if you have two seasons, one with an ERA+ of 150 and one with an ERA+ of 100, and each had the same number of innings pitched, the intuitive thing would be to say that the pitcher was 125 overall, but in fact, whichever season has the higher league ERA will get more weight. PhilM showed that if the league ERA was twice as high in 150 ERA+ season, that the combined ERA+ would actually be 133, much higher than the 125 we would expect. What's more, if you switched the two seasons so that the league ERA is twice as high in the 100 ERA+ season, the combined ERA+ will be only 117. Of course, the league ERA isn't going to double or halve from year to year, but it will change, and that means that it will make a difference, even if that difference is usually small. Sometimes, such as when a pitcher goes from, say, San Diego to Texas and puts up very different ERA+'s for each team, it might make a pretty noticeable difference. PhilM even noted that the difference, even when small, could actually change the career leaderboard for ERA+, such as flipping Walter Johnson and Lefty Grove.
Let's examine this a little more closely to see what is going on. In PhilM's example, we have a 150 ERA+ season and a 100 ERA+ season, each with the same number of innings. If the league ERA in the former season is double that in the latter, then the combined ERA+ will be 133. That means if we want to combine the two ERA+'s, we can't simply weight by IP and average them. How, then, can we get combined ERA+ from each individual season?
To actually calculate the combined ERA+ by averaging each individual ERA+, you would have to weight each individual ERA+ not just by IP, but by the quantity IP * lgERA. As you can see, seasons where the league ERA is higher than average will get more weight in the calculation, and seasons where the league ERA is lower than average will get less weight.
Think, for a moment, about what ERA+ actually is. At it's core, it is a measure of ERA that removes the context of the run-environment. A 2.00 ERA in an environment where 3.00 is average is not the same as a 2.00 ERA in an environment where 4.00 is average- it's more like a 2.67 ERA in the 4.00 environment. The whole point is that a 133 ERA+ is equal to a 133 ERA+ no matter what the run environment is-whether it is a high- or a low-scoring environment doesn't matter. But, as we just saw, it does matter when we combine ERA+. Higher scoring environments get more weight in the combined figure when it is calculated in the traditional way. That goes against the general purpose of ERA+ of removing the effects of the run-environment on how we interpret the numbers. PhilM's proposal of using the weighted average of each season for career ERA+ fixes this issue and gives each season equal weight (by IP, of course) regardless of what the league ERA was in each season.
Given that the current method of calculating career ERA+ weights each season by a combination of IP and of the league ERA that season, I can definitely get behind this change. It's not entirely a clear-cut choice, though. The change would give different definitions of ERA+ for season and career levels, and the two would no longer be calculated in the same way. There is also the problem that each season is actually a combination of games much like a career is a combination of seasons. That means that over the course of a season, games in a higher run-environment get more weight in calculating season ERA+. In fact, this is probably actually a larger issue because, in general, run-environments will very much more from game to game than from season to season. Does that mean we want to calculate season ERA+ as a weighted average of each game ERA+? That would be a mess to calculate, and even then, you can still break up a game further into innings or PAs. At some point, you have to actually calculate ERA+ in the traditional way and accept the shortcomings, or else you have no pieces to average together to get ERA+ at the higher levels. Choosing where to make this distinction between calculating ERA+ directly and where to start calculating it as a weighted average of individual parts is problematic. Ideally, you make the distinction at as low a level as possible (doing it by game would be better than doing it by season, and doing it by season would be better than not doing it at all if you want to get as close to weighting every IP equally as possible), but each level you go lower makes the stat more of a mess to calculate. Making the distinction at the season level as per PhilM's suggestion is probably as far as you can reasonably go.
Furthermore, the method you use will depend on what precisely you want to convey. If you want ERA+ to just represent the percentage of runs allowed relative to the league average, then higher run environments actually do have more impact. If you want to present ERA as a run-environment neutral stat, or you want it to model value or wins rather than just runs relative to average, taking the weighted average is probably better. I think the way ERA+ is generally defined and explained (as percentage of league average in terms only of runs) follows the traditional method more closely, but the way it is generally viewed or interpreted or used (as a run-environment neutral perspective of pitcher value as measured by ERA) is more in line with the weighted average method. Like PhilM, I would at least ask Baseball-Reference to consider this change, but I would also understand if, after considering both methods, they elected to keep doing as they are.
*note-when I call this version "new", I mean new to Baseball-Reference, which publishes ERA+. The method was proposed some time ago by regular Book Blog poster Guy and has been used for tRA+ at StatCorner for a while.