ZiPS ROS Projections as Estimates of True Talent

Player projections are a great tool. They give us good, objective estimates for a player's talent going forward, which makes them useful for addressing a number of questions. For example, should your team go after Player A or Player B to play shortstop next year, and how much improvement does each expect to provide over Player C, who is already signed? How much should the team offer each if they decide to pursue them? Who is a better option to start between two players competing for a job? Did that trade your team just make make sense, and did you get an improvement in expected performance over what you had? How does the talent on my team compare to that of other teams in the division?

You can even use projections for important questions, like, who should I draft first overall in my fantasy league (I guarantee you will avoid such pitfalls as the infamous Beltran-over-Pujols,-et-al debacle, circa 2005--sorry, Uncle Jeff)?

That's all well and good for looking at the coming season, when you don't know anything about anyone's season yet, and your best guess is probably going to be heavily informed by each player's projections. However, the major problem with projections at this point in the season is that most of them are an off-season affair. They are most widely used for projecting the coming season, after all, and they can take a lot of time and computer resources to update to work mid-season and keep running over and over throughout the year.

Each player's current season performance definitely tells us a lot about how we should estimate his talent going forward, so this presents a problem with relying primarily on pre-season projections in some cases. As a result, you are more limited if you want to find projections that incorporate the current season's data to answer questions that require current estimates of talent to answer; say, for example, how does that trade my team just recently made look, or how does my team shape up for the playoffs, and how do they compare to their likely opponents, or should we give a serious look to this September call-up who's been on fire?

Fortunately, there are at least a couple freely available projections that provide in-season updates. The ones I know of are CHONE (published at Baseball Projection) and ZiPS (published at FanGraphs as ROS--rest of season--projections). Both can be good options for estimating a player's current talent level without ignoring information from his performance this season.

Because ZiPS is updated daily (as opposed to the updates every month or so that CHONE provides) and because it is now published at and frequently used by writers for the prominent stat website FanGraphs, it has become a favourite for a lot of fans for estimating current offensive talent for players. While it is great that such a tool is available and that it is used in an attempt to form objective, informed opinions, there is a serious caveat with using the current ZiPS projections on FanGraphs as true talent estimates this late in the season.

To illustrate, consider Ryan Ludwick's ZiPS ROS wOBA projection. Right now, it is .375. Before the season started, ZiPS had Ludwick pegged for a .372 wOBA. He has since aged a bit, posted a .334 figure for the year, and moved to a worse park for hitters. How did his projection go up? What is even more confusing, if you track the projections from day to day, is that yesterday, his wOBA projection was at .390 or so. The day before, it was at .385. And, if you really want to wake up in Wonderland, check the ROS projections during the last week or so of the season, when you have 8 dozen guys projected for the same .250 (or whatever it ends up being) wOBA. What is going on?

The issue is that the ZiPS ROS projections on FanGraphs are not, in fact, an estimate of the player's true talent going forward. Rather, the projection gives its best estimate, in whole numbers, for the player's number of singles, doubles, triples, homers, walks, and HBP for the rest of the season, and then FanGraphs figures out what the player's wOBA would be over the rest of the season if he hit each of those figures on the nose. For Ludwick, that means his .375 wOBA projection is not his projected talent level, but the wOBA for the following projected line:


1B 2B 3B HR BB HBP PA
appr. wOBA
Ludwick 8 3 0 3 4 1 55
0.375

But remember that each of those components is rounded to the nearest whole number. His projected singles total could be anything from 7.5-8.5. Rounding to the nearest whole number eliminates precision, and when you have a wOBA figure that needs to go to 3 decimal places, that loss of precision can affect the projected wOBA. To see just how much difference this can make, let's pretend all of Ludwick's actual projected components are really .5 lower than the rounded off whole number (the lowest his actual projected wOBA could be), and then pretend they are all really .5 higher (the highest his actual projected wOBA could be), and see how much that affects his projected wOBA:


1B 2B 3B HR BB HBP PA appr. wOBA
min
7.5 2.5 0 2.5 3.5 0.5 55 0.324
max
8.5 3.5 0.5 3.5 4.5 1.5 55 0.440


As you can see, given Ludwick's projections over 55 PA, his actual projected wOBA could theoretically be anywhere from .324 to .440. That is a huge range. Of course, to be close to the extremes of the range, every component would have to be rounded in the same direction by a large amount, so it is more likely to be close to .375 than to .324 or .440.

How much more likely? To answer that, we have to know something about the distribution of possible true projected wOBAs for Ludwick, given that FanGraphs is displaying a .375 projection over 55 PA. We can do that by finding the standard deviation of the difference between actual projected wOBA and the rounded wOBA projections displayed on FanGraphs for hitters with 55 projected PA.

The actual projected total for each component, before rounding, can be anywhere from .5 less to .5 more than the rounded total. We have no idea where in that range it falls. If Ludwick is projected for 8 singles over 55 PA, it is probably close to equally likely that his true projected rate of singles per PA is 7.5 as it is 8.5, with everything in between being pretty much equally likely. This is a uniform distribution. The standard deviation for this distribution is .5/sqrt(3)=.289. That means the standard deviation for the difference between Ludwick's projected 1B total without rounding and his projected 1B total rounded to the nearest whole number is .289 singles. This describes the error in the rounded total FanGraphs displays.

Since the possible error for every component has the same uniform distribution from -.5 to .5 (except triples, since the rounded estimate of 0 can't have been rounded up, but we'll ignore that for now), the standard deviation for the error of each component is the same .289. Next, we need to know what that means in terms of affecting wOBA. The formula for wOBA is:

(0.72xNIBB + 0.75xHBP + 0.90x1B + 0.92xRBOE + 1.24x2B + 1.56x3B + 1.95xHR) / PA

That means each walk (non-intentional walk, but ZiPS doesn't differentiate, so we'll just use BB) is worth .72 in the numerator of wOBA, each HBP is worth .75, etc. The standard deviation for the error in walk total is .289, so the standard deviation of the effect of that error on the numerator of wOBA is .72*.289 (in other words, the value of each walk times the number of walks). The same process goes for each component. The following table shows the standard deviation and variance for the value of each component to the numerator:


error Val StD Val Var Val
1B 0.289 0.9 0.260 0.068
2B 0.289 1.24 0.358 0.128
3B 0.289 1.56 0.450 0.203
HR 0.289 1.95 0.563 0.317
BB 0.289 0.72 0.208 0.043
HBP 0.289 0.75 0.217 0.047
combined
0.897 0.805


The combined row shows the total variance and standard deviation for the combined rounding errors. This is simply the sum of the individual variances, with the standard deviation being the square root of that. This is what we are interested in.

.897 is not the standard deviation of wOBA itself, just the numerator. To get the standard deviation for the rounding error of wOBA, we have to divide by the numerator, which, as the above formula shows, is just PA. Ludwick is projected for 55 PA, so divide .897 by 55:

.897/55 = .016

For players projected for 55 remaining PA on FanGraphs, the standard deviation of the difference between their actual projected wOBAs and the rounded off projections which is displayed is .016. If Ludwick's actual projected wOBA is .359, that would put the rounding error in his displayed projection at one standard deviation, which would be a pretty typical observation. Of course, we don't know what anyone's actual projected wOBA is or whether the displayed figure is rounded high or low, just how imprecise the displayed figure is. In some cases, like Ludwick's, we can make a reasonable guess that the projection is rounded in a certain direction based on what we know about how projections work (i.e., a 31-32 year old with a down year probably isn't raising his projection), but all we can do is make reasonable estimates and acknowledge the limitations the imprecision imposes.

What does this mean about the value of ZiPS ROS projections? It depends on how precise you need to be. The precision drops quickly near the end of the year, but earlier in the year, they can work as good estimates of current talent. To determine how much rounding error you can expect in a projection, just divide .897 by the projected PA total for the rest of the season, and that will give you the standard deviation of the error. For example, with 200 PA projected for the ROS, the SD of the error is .897/200=.004, which is a lot more reasonable. At 20 PA, you get .045, at which point you basically can't estimate the difference between anyone with much certainty.

As a result, extrapolating the projections over longer periods of time becomes problematic. For example, if you want to compare players for next season, or to measure the magnitude of the difference between them on a full season scale (i.e., Player A is projected to be worth 30 more runs a year on offense than Player B), you are going to be multiplying the large error in wOBA over a large number of PA. Basically, you can't use them to get an idea of large-scale value.

What they are good for, however, is getting a good guess at expected production over the handful of remaining PAs this season. For example, if you want to scour your fantasy league waiver wire and see what everyone is likely to give you over the rest of the season, or if you want to evaluate a fantasy trade proposal, or whatever, then ZiPS ROS projections are great. The key to using them is, do you need a precise measure of value, and do you need to extrapolate over a large number of PAs? Anything where you are looking for true talent going forward beyond just what to expect over a handful of remaining PAs, or to discuss value in terms of a full season, you'd want to shy away from ZiPS ROS projections more and more the later in the year you get. For applications where you don't care about precision or how the projection extrapolates beyond the remaining 50 or 20 or however many PAs, and you don't need to be able to necessarily pick up differences between players with much certainty, then ZiPS ROS projections are fine.
Continue Reading...

Ted Williams, Saberist

One of my first encounters with sabermetrics came when I was a kid visiting Cooperstown for the first time. There, in that house of tradition and history of all places, was an exhibit of baseballs bolted to the wall in the shape of a strike zone, each one painted some colour with some number written on it. The number was a batting average, and the colour corresponded to how hot or cold the average was (from grey or blue for the mid-.200s all the way up to deep red for .400). Together, they replicated a famous chart Ted Williams had put together by keeping track of how well he hit pitches in any location in the zone. Using this information, Williams estimated how well he would be expected to hit on a pitch thrown to any location in the strike zone. You could plainly see his weakness down and away, exacerbated after he shattered his elbow in 1950, as well as how quickly that target zone for pitches turned into a .380+ wheelhouse for Williams if the pitcher missed by the slightest of margins.

I later learned that the chart the exhibit was based on was from a book Williams had written called The Science of Hitting (I probably learned it reading the info from the exhibit, actually, but I relearned and remembered it later). The book embraced the objective, analytical thought processes that form the basis of sabermetrics. Nowadays, analysts like Jeremy Greenhouse are still building on the work Ted Williams was doing already decades ago. So when I was browsing through interviews of the Splendid Splinter recently, it's really no surprise to find him espousing sabermetric wisdom right and left.

Tangotiger of The Book Blog recently praised contemporary sabermetric spokesman Brian Bannister for speaking intelligently on the role of luck in the game. While a player's skill is clearly a huge part of his success or failure in the game, it's also impossible to ignore that random chance also factors into that success, and sometimes the effects of that random chance can play a significant role. As one of the most sabermetrically-minded players in the game today, Bannister understands this.

So too, it seems, did Ted Williams. From a Sports Illustrated interview:

TW: I've been a very lucky guy. Even I know how lucky I've been, especially in my baseball career. Anybody who thinks he's had great success or outstanding success, he's a lucky guy. You're damn right.


One of the key ideas to understanding future performance in baseball is that if someone has performed at a spectacular level, and you want to estimate how he will perform in the future, chances are he will not perform as highly as he did before. That concept is usually called regression to the mean. Basically, if you have a hitter who just hit for a .400 wOBA, it is possible that his actual expected level of performance is .400, and he hit just like he was expected too, and it's possible that he is really an expected .420 hitter who got unlucky and only hit .400, but it's more likely that he's really an expected .380 hitter who got a bit lucky to hit .400. Another way to look at it is if you take all the hitters who hit for a .400 wOBA over a given period of time, and you look at what they do after that. A few of them will keep hitting at .400 or better, but most of them will regress, and as a group, they are very likely to hit at a lower level going forward. If you take any one hitter from that group and try to predict whether he will be one of the few who improves or one of the many who declines, the odds are greater that he will decline.

That's why whenever we have a player who has performed at a high level, we know, with some degree of certainty, that he was a really good player, but we also know that there was a better than average chance that he was a bit lucky too, and that he really hit a bit higher than his expected level of performance. Ted Williams sums this up rather succinctly in the above quotation.

He expands on this in another interview, this time with Esquire:

Somebody will hit .400, maybe .410 or .415. Oh, you bet. It’s a hard thing to do. Ya gotta be lucky. Baseball might be a little tougher today. They bring in a new pitcher any old time. Ya gotta go through that whole ritual again of trying to find out as much as ya can on six pitches. Ya hit at him four times, ya got a chance of gettin’ him locked in a little better.

In addition to the special bonus material discussing the benefit of facing a hitter multiple times through the order and the added difficulty of hitting relievers even though they are worse pitchers talent-wise than starters (these are among the subjects explored in the modern sabermetric masterpiece The Book), we see Williams again discussing the role of luck, this time in perhaps his most historic feat as the last hitter to bat .400. Williams again says it simply, "Ya gotta be lucky." When he says someone will hit for a .400 AVG again, he acknowledges the role of chance in the game, that there is random variation in a hitter's performance, and that while no one is truly an expected .400 AVG hitter, sometimes, by chance, hitters hit several points above their actual talent level, and that if you have enough true .330 hitters play enough seasons, eventually one of them will get lucky and hit .400. You've got to be good to even have a chance, but you've also got to be lucky to be the one it happens to.

Going back to the Hall of Fame exhibit, Williams marked his hottest zone right at the center of the strike zone, 1 ball wide and 3 balls high. In that area, just 3 balls across, Williams estimated that he was a true .400 hitter. That was as good as he got, and nowhere else in the zone was even Ted Williams that good. When he hit .400, it was basically like he hit the whole season as if pitchers were grooving every pitch right down the middle to him. Obviously, they weren't putting every pitch right down the middle, so it's easy to see from the chart how it is impossible that Williams could have hit .400 simply because that was his true talent level, and not because he was a ridiculously good hitter who also benefited from some good luck that season and hit better than his expected level. He had to have gotten lucky, and Williams openly acknowledges this.

Earlier in the Esquire interview, Williams discusses another way in which he was lucky:

I was lucky. I’m talkin’ about the fifty thousand balls that was thrown at me, the times I slid, the times I fell. You gotta be lucky to have longevity in any sport. It’s a tough routine. Some people are just a little inherently more tough than the next guy. I think that’s God-given genetics.

Williams had a truly great career as one of the best hitters the game has ever seen. He hit like no one else in the game, and he kept doing it from the time he broke into the Majors at age 20 right up into his 40s. It takes incredible talent and devotion to do that, but, as Williams plainly points out, it takes a lot of luck too. Even with all the talent and desire and work ethic in the world, you can still end up like Herb Score or Dick Allen or Andre Dawson or Jim Edmonds or Ralph Kiner, players whose luck broke the wrong way, to varying degrees, at one time or another and left them never quite the same. Bad luck can push you right out of the game; it can leave you a semi-productive shell hanging around for years; it can even leave you a still-immense talent, robbing you only of the chance to be one of the small handful to reach the Ruthian peak of the game's history. However it gets you, it can get anyone at any time, and for all the players who may have had the talent and everything else to reach that peak, very few have the luck Williams did to be allowed to actually reach it.

None of this is intended to take anything away from how great Ted Williams really was. Williams was an honest and candid man who had no trouble placing himself among the handful of greatest hitters of all-time, and he was absolutely right to place himself there (and this isn't to say he was arrogant about it either; he also refused to claim to be the greatest hitter ever or that he had distinguished himself from the other handful of greatest hitters even as many felt he was and had). The same honesty let him publish his chart saying that he was only truly a .400 hitter on the very fattest of pitches, and saying that if the pitcher could paint the lower-outside corner perfectly against him, he could be reduced to a .230 hitter. Part of that honesty is that Williams had the sense to understand that no matter how great he was, his greatness was enabled by good fortune along the way, and that no amount of greatness can erase the role of chance and luck in the game. It's that honest pursuit of objective knowledge of the game that makes Ted Williams a perfect pioneer in the field of sabermetrics. He looked for the truth of the game around him and learned to understand its workings, and then he very matter-of-factly presented the truths he learned with no bias toward his own career or his teammates or anything other than what he saw to be true. And that, in essence, is sabermetrics.
Continue Reading...

Hammering Away at the Derby Effect

The HR Derby is having trouble finding participants these days. Players and teams alike are removing themselves or their employees from consideration for fear of hurting their swings and/or their bodies. It's not too hard to cite a handful of players who did well in the Derby only to launch into a second-half slump (exhibit A: Josh Hamilton hits 56* first half home runs in 2008, only to drag across the finish line with a paltry 11 second-half dingers), so it's unsurprising that players and teams spare few precautions for something widely deemed meaningless.


Players who have been confirmed or rumoured to have turned down invitations to participate this year include:

Albert Pujols
Justin Morneau
Ryan Howard
Torii Hunter
Robinson Cano
Ichiro
Micah Owings
Mark McGwire
Barry Bonds
Bobby Thomson (initially agreed, but declined upon learning Ralph Branca would not be available to pitch his rounds)

Meanwhile, eventual participants Chris Young, Corey Hart, Nick Swisher, Vernon Wells, and Hanley Ramirez entered 2010 with collectively fewer home runs than Alex Rodriguez despite their 5500 PA head start.

This article isn't about whether or not the HR Derby sucks, however. After all, it could be worse. For example, it could be the Texas League HR Derby, which was thrown into controversy when participant Koby Clemens failed to homer once after reportedly being threatened with a beaning if he dared take his dad yard. Rather, this article is about whether there has really been any detectable Derby hangover effect holding hitters back.

Before I begin, I should mention that Derek Carty looked at this very issue last year at THT. He has also recently published a follow-up on the same site using a different method. The approach I take here is more similar to the second Carty approach (compare Derby participants to a control group) than to the first (compare Derby participants to their own pre-season projections), but it is worth noting that Carty found similar results using both methods.

Whereas Carty focused only on AB/HR in each half of the season, I have chosen to instead look at all around hitting using wOBA and total PAs. My reasons for doing so are twofold:

-Derek Carty has already demonstrated, with mostly the same data I am using, that the Derby hangover has not manifested itself in worse HR frequencies than expected
-It is possible that if players or teams are concerned about the Derby affecting a player's swing, that could result in the player hitting just as many HR but suffering in other areas of hitting, which would reflect in wOBA but not AB/HR

For my study, I looked at the 80 participants in the HR Derby from 2000-2009 (some of those 80 participants are really just different seasons for the same player). For each player, I split his season into pre- and post-All-Star-Break and recorded his wOBA (by the way, the wOBA I am using here does not include SB/CS, only batter events) for each half. Here are the results:

1st Half

2nd Half

Diff
wOBA PA | wOBA PA | wOBA PA
0.411 29621 | 0.401 23182 | 0.010 6439

Like Carty, I found a drop in performance from 1st half to second half for Derby participants, but not a very large one, and, as Carty points out in his work, we would expect to see a drop in performance from any group of players who performed that well in the first half. As for how much of a drop we should expect, or whether a drop of .010 points in wOBA is indicative of a hangover effect, well, that's what we need to look at our control group for.

Carty manually selected comps for each Derby participant for his control group in his second study. Rather than repeat his process, I used a simple rule to select my control group. I just ranked all non-Derby participants in each season by first-half wRAA and took the top 8 from each season. I sorted by wRAA rather than wOBA to make sure I was not taking players with a great wOBA in small number of PAs (since they would not make good comps and would be expected to have significantly more regression in the second half than the Derby-participant group). I could have also set a minimum PA threshold and sorted by wOBA; either way accomplishes more-or-less the same thing.

Ideally, we want our control group to be as close as possible to the Derby-participant group in the first half so that we can make a good comparison of their second half performances. Let's see how the two groups compare:


1st Half

2nd Half

Diff

wOBA PA | wOBA PA | wOBA PA
Derby
0.411 29621 | 0.401 23182 | 0.010 6439
Control
0.432 28774 | 0.400 21442 | 0.032 7332


Here, we see that our control group lost .032 points in wOBA, way more than the Derby participants lost. What's more, the control group lost more PAs in the second half, so not only are the Derby participants holding up their rate production better; they're also staying in the lineup more, which is important because of the commonly cited health concerns over Derby participation.

Before we get too excited over these results, we should consider that they could simply reflect an issue with the control group. After all, it doesn't really make sense that over 20-30 thousand PA samples, the control group should lose an extra .022 points in wOBA and about 12 extra PAs per hitter in the second half over the Derby participants. If our control group were properly selected and actually representative, this would suggest that the Derby could actually be helping hitters significantly in the second half, and there's no reason to believe that to be the case. So before we accept these results, let's consider what issues might exist with the control group.

The first thing to notice is that we wanted both groups to come out as close as possible in the first half. However, the control group had a significantly higher first-half wOBA, as well as fewer first half PAs. This is by itself problematic. Remember that we expect any group that performs exceptionally in the first half to regress in the second half. The more exceptional the performance of the group in the first half, the more we'd expect them to regress in the second half. Additionally, the fewer PAs each player in the group has taken, the more we'd expect them to regress. An extra .021 points in first half wOBA and fewer PAs per player in the control group mean we would expect more regression in the second half than for the Derby group (which, in short, means this is not a good control group).

One possible way to address this is to select more players for the control group. If the top 8 non-Derby participants each year are collectively much better in the first half than the 8 Derby participants, we can select more hitters until the control group hits at about the same level as the Derby group. For example, while the top 8 hitters in the non-Derby group each year have hit at a .432 level, the top 20 might hit at close to a .411 level, which would make for a better control group. Unfortunately, that would still leave a likely problem.

As noted, even though the control group hit significantly better in the first half than the Derby group, they had fewer PAs, which is a bit unusual since we selected the control group based on the top performing hitters (who are generally given a lot of PAs). A possible reason for this presents an even bigger problem for our control group. With our Derby hitters, we know that they performed well in the first half, and that they were healthy at the All Star Break (at least healthy enough to participate). With our control group, we know that they performed well in the first half, but not that they were healthy at the All Star Break. We also know that, despite out-hitting the Derby participants as a group in the first half, they didn't participate in the Derby. There are many reasons hitters sit out the Derby, including pulling themselves out or being passed over for more well-known if less stellar-performing hitters, but one potential reason for a top-performing hitter to not be in the Derby is that he can't because he is already hurt. This was likely the case for a small number of hitters in the control group. It explains why the control group had fewer PAs despite performing better, as well as why they lost more PAs in the second half and why they regressed so much more.

Since we know none of the hitters in the Derby group were hurt as of the All Star Break, having any injured players (as of the All Star Break) in the control group will screw up the control group. Players who got hurt in the first half and still showed up in the top 8 in wRAA would have to have had a really good wOBA in the first half. That means when we look at the second half for their group, the group not only loses PAs because a player is already hurt going into the second half, they also lose a high-wOBA player, so even if everyone else regresses normally, the group as a whole will regress more than expected from losing one of its better hitters.

What this means for our control group is that we need to ensure that we have the same restrictions on our control group as we have on our Derby group. Namely, we need to ensure they were healthy going into the All Star Break. This is simple enough to do; we can do it the exact same way we got that restriction on the Derby group in the first place. We'll simply select only from the pool of players who participated in the All Star Game but not in the Derby.

More specifically, I narrowed the pool of players for the control group to the ASG starters (just because that was the simplest way to ensure an All Star actually participated and was not just selected, and because, as we'll see, doing so gave me a pretty good match for the control group) who did not participate in the HR Derby. I took the top 8 such players from each group each year (again, sorted by first half wRAA). Now, here is what the new control group looks like compared to the Derby group:



1st Half

2nd Half

Diff

wOBA PA | wOBA PA | wOBA PA
Derby
0.411 29621 | 0.401 23182 | 0.010 6439
Control
0.411 28768 l 0.395 22332 | 0.016 6436


Still fewer PAs, but an exact match on the wOBA, and we eliminated the issue of having already-injured players in the control group that was throwing off the comparison before.

With this group, we see that the loss in both wOBA and in PAs is pretty close for both groups. It's possible there are still some selection issues with the control group; for example, Derby participants over the last 10 years might tend to be more highly regarded hitters, so, even though they performed at the same level, they were given more PAs and had a slightly higher true talent level, which would cause them to regress a bit less. Still, I think the restrictions placed should take care of the most important problems, and this control group should be good enough to pick up an effect if there were one. The extra .006 points in wOBA the Derby group held over the control group seems like a pretty reasonable cushion to absorb any further problems with the control group. That gap could be explained by remaining minor selection issues, but with the injury problem controlled for, I think the control group is good enough to demonstrate a lack of any easily detectable effect.

Based on this control group, we see that not only have Derby participants not lost any home run frequency in the second half over what was expected, as Carty has shown; they also have not lost any overall hitting performance or total PAs. What that tells us is that, compared to other All Star participants who had equally good first halves at the plate, hitters who participate in the Derby haven't lost any more production or playing time over the past ten years. If there is any meaningful hangover effect, it is certainly far more subtle and less nefarious than it is often made out to be, and it's not reflected in a simple glance at the second half numbers.


*56 first half HR includes 35 unofficial HR from Derby itself
Continue Reading...

On Ernie Harwell

As I'm sure you've heard, the great Ernie Harwell passed away yesterday. As the voice of the Tigers for four decades, Harwell was understandably a very meaningful motif in a lot of people's lives. His voice and his eloquence personified baseball for generations of fans in Detroit. It's the type of thing that happens when you take a talent like that and tie to a presence as strong and as constant as baseball for that long. The two become intertwined and entangled until one is synonymous with the other, and all those moments where baseball and Ernie were present unravel in one long, unbroken ribbon throughout people's lives. It's only natural that Ernie meant so much to the fans of Detroit.

I have to admit, I've never really had that type of presence in a broadcaster. I heard Jack Buck call games when I was a kid, but it wasn't long before age started to get to him. Mike Shannon eventually began taking over, and while he aired something of the buffoonish charm of the likes of Harry Caray (or at least of Harry Caray at that age), he lacked the sharply tuned observations or the crisply parsed conveyances that lit up a game you couldn't see unfolding. Occasionally, the observations deteriorated along with the annunciation in the later innings.

That's not to say Shannon can't touch people. I know fans who feel very deeply for Shannon's work. For many whose Ernie Harwell was Jack Buck, it was really Buck and Shannon, and the last decade or two have begun a new thread gently unraveling though the years against that jovial, sometimes slurred voice. For some of the younger generation, Shannon is it. But for me, he isn't.

Then there was TV. Radio has long since loosened the grip it had on the pipeline pumping baseball into people's homes before most people began taking in daily dosages of baseball in technicolor. When the voices in fans lives grew to cover twice the media sources, the consistent drone of baseball on the radio became a little less consistent. The need for a narrative line dwindled as people began to watch for themselves what was happening. The voices meant less on TV, and people paid less attention. Now, the internet readily provides real-time updates with no audio at all, an iconoclast tearing down the role of the voice of the game further still. The Ernie Harwell's of the game seem like dinosaurs in today's media.

We now live in a world where I can watch a game in Houston and the voice of Mark Grace can tell me that Michael Bourn is the Astros' most productive hitter, and it's not that hard to tune out. The voice doesn't mean all that much. I know Michael Bourn is not and has probably never been anybody's most productive hitter, because I don't have to rely on the voice telling me that for any of the information I get anymore. What's more, nobody even cares that he said something so ridiculous. Grace doesn't need to be particularly informed or particularly eloquent or particularly anything other than perhaps affable to Diamondback fans. They can see for themselves how Michael Bourn hits even as Grace says otherwise.

In the early part of my childhood, we didn't get cable, so I wasn't regularly watching games on TV. I don't even know if games were on every day yet back then. Watching the games on basic network stations was a treat, but I never got to know the voices. When we did finally get cable, I hadn't grown up long enough on the radio voices for them to really stick as an integral part of my life.

So I really have no idea what it actually feels like to have an Ernie Harwell in my life. That voice was just never there for me. Now satellite radio picks up games from all over the country (via outer space, which I can't imagine is very efficient), and I can hear some of the new voices in the game. None of them will ever stick like a Harwell, but there are definitely those that I like (Jon Miller of the Giants, for example). And now, with that new media having developed an affinity for the archives of days past when sentimentalism is ripe for exploitation, I've heard my share of Harwell's calls. Nothing like a Tigers fan, of course, but I've heard him, and the man had talent. I can say that as an outside observer who never felt the kind of connection to a voice that lifts someone like Harwell to legendary status. On the merits of his voice and his wit alone, I can feel in five minutes the tug that lets you latch onto a voice and spin it into the narrative of your life. Those generations of Tiger fans who speak of Harwell's impact on their lives makes perfect sense to me. That's the kind of talent Harwell was.

There's a song I think is particularly fitting here. It was written by Chuck Brodsky, a Philly fan, after Richie Ashburn died, and I think it conveys the meaning and feeling of the connection between a fan and a voice as well as anything I've ever heard. Without having felt that connection myself, I feel like this really gets it across well, so I think it serves as as good a tribute as any to that connection, be it to Richie Ashburn and Harry Kalas or to Ernie Harwell or to whomever. Hopefully Chuck wouldn't mind me posting his song:

Continue Reading...

"Gamers" and Biases

In baseball, "gamers" are quite the peculiar phenomenon. If you aren't familiar with the term in its baseball sense, perhaps it will help to think of it in terms of a more accessible definition of the word. Gamers are people who play video games, technologically-inclined contestants who plug into a virtual world of gameplay rather than dust off the old chessboard or backgammon set or whatever board game of choice; in other words, they exemplify the anti-Milton Bradley in gameplay. Same thing in baseball.

But perhaps you're still confused. It's tough to know, say, a gamer from a dirtball from a jerk. Don't worry. Bert Blyleven's got you covered. As it turns out, Blyleven enlightens, a gamer and a dirtball are, in fact, the same thing. They are players who are tough and gritty, who work hard and help their teams win. That seems clear enough. If a guy works hard all the time and helps his team win, he's a gamer, right?

Not so fast, Blyleven says. A guy can work hard and be a great player, but if he's a jerk, why, that's no gamer. Kirk Gibson? Not a gamer. Tough as nails? Check. Great player? Check. Did everything to help his team win? Check. Has enough of a positive influence on players that teams have employed him as a Major League coach for the better part of the last decade? Check. So what's wrong with Kirk Gibson? I'll tell you what: the jerk store called, and they're running out of Kirk Gibson. And he probably slept with your wife.

How about Tim Foli? We are given the perfect anecdote to determine whether the man was a gamer:

(Foli) was a hardnosed shortstop who had an attitude and a fight about him, and was not afraid to say what was on his mind. In fact, I remember an incident where he questioned the tactics of manager Chuck Tanner, and they had an altercation that escalated to the point where Tanner’s hands were around Foley’s (sic) throat.


If you could only give one anecdote to explain why someone is or isn't a gamer, what better one than this? Why, it's such a good example that if you played with hundreds of teammates over a 22 year career and had to select the three biggest gamers you ever played with, this is the kind of thing that vaults you to that level.

So a gamer can question authority, undermine the manager, speak up disruptively, and get into physical altercations with the skipper (and presumably teammates as well).

Compare this description of Foli to another player: Dick Allen. Allen was known for having an attitude and a fight about him, and he was allegedly not afraid to say what was on his mind. He even once got into a fight with teammate Frank Thomas because of this spirit. Pretty much everything that qualifies Foli as such a gamer. Yet Allen was widely viewed as a disruptive force for those same traits and for his altercation with a teammate (which amounted to Thomas hitting Allen with a bat, not Allen being the aggressor). Craig Wright wrote an excellent piece for SABR (PDF file) defending Allen which covers many of the misperceptions regarding Allen, including those stemming from the fight, which lists the following details picked up from interviews with Allen's coaches and teammates:

-the fight happened because Allen was defending a young black player from demeaning remarks from Thomas
-Thomas was unpopular on the team and Allen's teammates generally supported Allen in the altercation
-The manager was looking for a reason to get rid of Thomas anyway and the team released Thomas following the altercation (which also meant when the team was forbidden to speak to the press about the fight, Thomas gave his own account uncontested after being released that painted Allen as the bad guy)
-Coaches and teammates said the fight had no effect on the team's morale

So if getting in a fight with the manager with no specified reason can actually make someone a gamer, sticking up for a teammate against the abuse of an unpopular and disruptive veteran should certainly qualify Dick Allen. This is an interesting dichotomy: in most players, the traits and incident Blyleven describes in Foli are viewed as undesirable and disruptive, but when you get a different idea of a player in mind, those things somehow come across as endearing.

This pattern continues in Blyleven's gamer descriptions. For Ed Ott and Pete Rose, the examples given that exemplify their gamer-ness are examples of them needlessly injuring players. For Ott, the anecdote is, "During one incident after he broke up a double-play at second base, he body-slammed Mets infielder Felix Millan so hard he broke his collar bone." I cannot think of a way to cleanly break up a double play by body-slamming someone so hard it breaks his collarbone (though maybe that is just a communication issue as far as what constitutes "body-slamming" to me and to Blyleven), or how that is any more playing hard and helping the team win than just breaking up the double play without doing it in a dirty and dangerous way.

Michael Young is cited as a gamer for the fact that he has changed positions twice: once to shortstop when Texas acquired a worse defensive middle infield starter (Soriano), which is a move I'd imagine most middle infielders would be thrilled with, and once to third when Texas got a real shortstop (Andrus), which prompted anything-for-the-team gamer Young to
request a trade rather than willingly switch positions for a clearly superior fielder. Compare that to the second baseman who first moved him, Soriano. His displeasure with his move to left field was widely publicized and derided. Young's is ignored. Both players initially refused to accept the switch their teams asked of them, and both ended up switching anyway because they realized the manager fills out a lineup card and wherever you are, you play there with no recourse to play where you want instead. No one calls Soriano a gamer for making that kind of position switch. What is different about Young? For that matter, when A-Rod joined the Yankees and was widely regarded as the superior shortstop, why was it non-gamer A-Rod and not gamer Derek Jeter who volunteered to switch positions?

Most of the reasons given for a player being a gamer are just terribly generic and nondescript. They feel as if you could replace the name in the heading with any of dozens of other names and not notice. Nick Punto is a gamer because he plays hard and plays good defense and can't hit but still manages to occasionally not end up with the worst possible outcome at the plate. Eric Chavez makes the list because he used to be good and has had injury troubles and now doesn't want to retire, and because dammit, he plays the hardest damn DH you've ever seen. Dustin Pedroia is a gamer because he looks like a midget logger. And he plays hard.

Not that things like playing hard and going all out aren't great traits to have, but really, that describes the majority of players who are in the Majors, and probably in the minors too. Trying to distinguish a handful of players by how hard they play is probably pointless. Just about anybody with marginal or below-average Major League skills (which is a lot of players) is going to be putting everything he has into the game every time he plays in order to be where he is, and most players who are better than that probably are doing the same. You aren't going to be able to find a short list of players out of the several hundred in MLB who legitimately distinguish themselves from everyone else in that regard. So when you resign yourself to the task of selecting just a few players for praise for this aspect of their game, there is a ton of potential for bias in which players you select. Why choose player A who plays hard and does the little things to help is team win over player B who does the same things? Or, as it may be, why choose player C who undermines the manager's authority and picks fights and is contentious, or player D who plays catcher for the White Sox and is keeping the jerk store going through the Kirk Gibson shortage by being their number one seller?

Maybe player A looks small and scrappy and his skills are less obvious from an athletic tools standpoint. Maybe he looks like someone you might see at your office or in your neighborhood at at your kid's ballgames watching his kid. Maybe he has a certain reputation in the media or among fans. Maybe it is some other bias.

Consider Latin American players. Consider the economic status many of them come from, or a scenario where developing under skilled and devoted coaching means sticking out of a large crowd of other kids enough to catch the eye of the buscones and hopefully get a look from some pro teams, where making the Majors means distinguishing yourself from your early- to mid-teens as a pro prospect, distinguishing yourself in camps and workouts with countless other prospects, and distinguishing yourself in summer and winter ball in Latin leagues just to break into the low minors, and then distinguishing yourself from there where most American players are entering into the system directly. Consider how hard most of those players have to work to take that route, and then think that not a single one of those players plays hard enough or does enough little effort things to help his team to distinguish himself among the game's great gamers. That is a ridiculous notion. Of course there are players from Latin America who have spent their lives putting everything into every play as much as anyone in the game does. When no such player is lauded as a gamer, it is not because of a lack of traits or effort or whatever it is that distinguishes one as a gamer.

This isn't to say that there is something wrong with trying to give credit to players you really like or trying to highlight the things you like about that. The issue is, when you do undertake such a subjective task, and when you set out to whittle down hundreds of candidates who could all fit the description just as well to a small handful, biases are going to creep into your selections, be they personal biases or cultural biases or whatever. Because of the nature of a list like this, biases will more than creep in. They can end up dominating your list. Maybe that is ok-maybe I want to highlight how hard Albert Pujols works and how good an example he sets and how he does all the things I would want in a player because I am a huge Pujols fan and it is not that important to me that I don't know how hard everyone else is also working or how much of the same effort they are also putting in. And maybe staring down home runs or glaring at umpires is confidence and intensity and an expression of dominance when Albert does it and showboating or upstaging when someone I don't like does it, and I can be biased because those are my feelings and I understand that. Or maybe it's not. I think that's an important question to ask with this type of thing. What biases are influencing your choices? Are you ok with exhibiting those biases? Are you ok not just with who you are choosing to single out for praise but with who you are leaving out and implicitly setting aside as below your chosen group? Does your opinion, with the biases, add anything to our understanding of the issue being discussed, or is it just applying a generic standard to your personal favourites? Are you doing genuine research or just listing off things off the top of your head that anyone could have done (which is not only not insightful, but also particularly prone to biases that you likely won't identify)?

That means when you single out Tom Foli for his desirable trait of having a fighting attitude and spirit, even when it puts him at odds with the manager or leads to altercations within the team, does that mean you are also ready to single out and praise Dick Allen for the same things (or perhaps Carlos Zambrano, or whoever else)? If not, either you are not communicating your reasoning very clearly, or you are communicating a biased viewpoint. Can you call A-Rod a great gamer for willfully switching positions and for hitting further down in the batting order when the manager put him there and he had no choice? Certainly the man plays hard and has trememdous demands for himself when he plays. Or is there something else that you need to communicate that sets your choices apart? If you are ok with these extensions of your reasoning, what led you to single out the players you did over others who exhibit the same traits listed?

I don't mean to just single out Blyleven here. I use his article because I think it is a good example of something that we are all prone to, and I think looking at his article introduces a lot of questions that I think need to be answered when writing a piece like this.

And, for the hell of it, I close with a gamer-mosaic:



By the way, notice any biases there?
Continue Reading...

Opening Day

As I am about to take my son to another opening day, I pause to reflect on some amazing opening day experiences and memories I have accumulated over the years. This is not a comprehensive list, but an odd assortment of experiences that every true fan will enjoy. I hope they facilitate a whole fountain of precious memories for you, as well.

After seeing the Cards play on grass every year of my life, I was there opening day in 1996 when the major renovations took place to the interior of Busch, including the installation of grass, the red seats, the manual outfield scoreboard, and the flags in right center field for all those whose numbers had been retired.

I remember 1980 opening day, when I scored some tickets in the front row - directly behind the Pittsburgh bullpen. Rod Scurry was a relief pitcher for the Pirates, and his hands were massive. I have no idea why, of all things, I remember that, but seeing those pitchforks at the end of his arms stuck in my mind. By the way, Pete Vuchovitch pitched a brilliant three-hit, complete-game shutout to beat the Pirates 1-0 - the only run scoring when George Hendrick doubled home Bobby Bonds from first in the second inning. How many opening day starters will throw a complete game this year?

In 1983, I was there to see the Cards get their World Series rings. The year before, I camped out three different nights in order to get tickets for both League championship games against the Braves, and for three of the four World Series games against the Brewers (how I missed game 7 tickets is a long and painful story for another time).

In 1995, I took my boys out of school (a fairly common, if not annual, ritual) to attend another opening day, this time in Kansas City. The season got started late because of the player strike that cancelled the World Series (I actually say the last game played in ’94 - when the Cubs lost to the Giants and I was in the bleachers at Wrigley). So, the players’ strike ended, but the Umpire strike had just begun. When we showed up at the Stadium around 9am, there was Bruce Froemming and about six other umpires walking around the parking lot and making a lot of noise. I was a college and high school umpire, so I thought this was amazing. I spent the morning talking with them and supporting their cause. Twice, reps from the KC front offices were sent out to ask them to leave, and I was one of about a half dozen fans that told those front office people to back off. All of a sudden, Bruce and the other umps did a dead sprint across the parking lot (well, by dead sprint I guess I mean the kind of sprint that someone like Froemming does that threatens their death - he did carry a lot of weight with him), yelling obscenities. I thought, “What the hell?” A cab had just pulled up, and the scab crew had gotten out and was walking into the front doors when the umps caught sight of them and gave them hell. It wasn’t long after this that the front office people came out again and threatened to bring the cops out to escort them off the parking lot. Bruce and his clan were thinking about packing it in, when I went up to them with what I thought was a brilliant idea. The Royals were trying to win fans back after the strike cancelled the World Series, and they were giving away General Admission seats on a first come, first served basis. I told Froemming and his crew to get in line for free tickets, let the franchise that was trying to kick them out actually pay for their entry into the game, and then cause a ruckus while in there. Bruce looked at his colleagues and the started thinking about it. Shortly after that my sons and I went in, and I guess I just forgot about the umps. Until about half-way through the game. I heard some unrest kick up in the right field G.A. seats. When I looked to see what was going on, by golly there were the umps lined up and down the stairs in the right field corner, shouting something or other. It didn’t take long for the Royals to boot them out of the stadium- but they made their point. When I got home that night and watched SportsCenter, sure enough there they were. The story made National sports news.

All these are good stories, but the best ever opening day experience of all time is now to be told. In 2005, I attended the last ever game in the Old Busch Stadium. Through the winter, my son and I would attend the ground-breaking ceremony for the new stadium. I would show up at odd hours to watch the demolition of the old, and stood there on a freezing cold night when at about 1:00am the last arch went down. There were a few tears shed. I broke through security fences a number of times to steal odd pieces of the deconstructed stadium, and bough at auction some bleacher seats (where I spent much of my childhood - left field, of course, watching Lou Brock), some red seats, and the large white St. Louis flag that flew at the top of the Stadium.
When opening day rolled around, every effort my brother and I had made to get tickets proved fruitless. We showed up early that morning in 2006 with pockets full of money ready to buy a ticket for whatever it cost. There was nothing. We attended all the opening day ceremonies outside the stadium; met the project director for the stadium construction (hold that thought - we’ll come back to it); and with only hours left before the first pitch still had no tickets. My brother finally answered a call from a childhood friend of his who was on the paint crew of the new stadium, and who was actually in the stadium just hours before the first pitch finishing the painting. He said he didn’t know what use they would be for us, but he scored us two visitor’s passes. He came outside and handed them to us.
What’s a visitor’s pass? We had no idea - we still don’t. But once we had them, our thinking was this: walk through every door you can until someone says “What the hell are you doing here?!” So, when we saw some press people walking through a glass door about three hours before the game, we thought - “What the heck, let’s try it.” We went up to the security guard, flashed our cardboard visitor’s passes, and were prepared for anything but what we heard - nothing. Just a quick nod of the head and we were through.


Our first trip was to the Cardinal dugout - in which we sat for a few minutes before we walked down the line into the outfield. The Stadium was empty - gates wouldn’t open for another hour. When we saw the Cardinal pitcher’s emerge out of the dugout and begin some warm-up tosses, we thought we were in trouble. Turns out not. They were surprised to see us, but as we would learn as the day wore on as long as we pretended like we belonged, no one would question it. We talked to Jason Isringhausen and Adam Wainright, whom I saw pitch five years earlier when he was an 18 year old starting out his career with the Macon Braves.
We walked through every corridor in the Stadium. We saw very little of the game, telling ourselves that this day was about the Stadium. We stood next to Jack Buck’s widow as she lofted the flag up the new flagpole in the Center field plaza right before the playing of the national anthem. We were behind the outfield wall as the Cardinals mounted the red Mustangs that would take them onto the field for their inaugural introduction to the fans - an Opening Day tradition in St. Louis. At one point, we saw a set of stairs go up a wall that led to a door. We climbed it, opened the door, and were in the Cardinal bullpen. You heard that right - in the bullpen. We thought we were in real trouble. Relief pitcher Brad Thompson was standing right there, looked over at two fans now in his bullpen, and said “Who are you?!” I was scared crazy, but not my brother. Jimbo has never met a stranger before, and he just stuck out his hand, said, “Hi, I’m Jim Dorhauer and this is my brother John.” Brad shook our hands, and we just stood there with Brad until the inning ended, exchanging some small talk and well-wishes, admiring the new digs. Then we left. I was half-way down the stairs when I heard the door open again. Jimbo wanted something else. I turned just in time to hear him shout, “Hey, Brad, throw me a ball.” By golly, he did just that and Jimbo walked away with a prize.
From there, we continued opening every door except one: when we got to the player locker room we just stood for a minute or two in awe and wonder and out of respect (not fear, mind you - we had gotten long past that), we refused to break the sanctuary that is the player’s locker room.
I found my good friend’s seats and sat with him for a while and told him what we had been doing. He was jealous and amazed, and refused to believe that we actually made it into the bullpen. So, guess what? We went back and took him with us. When we opened the door and he caught a glimpse of the field from there, he was awestruck. He would not go into the bullpen, but he did stick his foot over the threshold just so he could say he was there, and then ducked back down the stairs and went back to his seat with his own story to tell.
From there we went to the luxury boxes. We met Bob Gibson outside one, and Jimbo got pissed when Bob wouldn’t sign the ball for him. We ended up the owner’s suite - which is located near a whole slew of cooking stations with chefs ready to cook to order whatever cuisine you wanted. There were only a few outs left in the game, and I actually sat in a seat for the first time. Jimbo stood behind me, and before too long I heard him talking to someone. I wasn’t interested in that conversation until I heard him ask: “Are you guys pissed that Selig didn’t show up for this today? I mean, its his Brewers you guys are playing?” The guy admitted they were a little miffed. I wondered who the hell this was, and turned around: it was the Stadium project manager we saw outside the Stadium earlier in the morning - long before we had tickets.
The game ended, and it dawned on both of us we still didn’t have tickets. We wanted those to keep for memories’ sake - and so we started bidding on tix to buy of those who actually paid to see this game. We found a couple of drunks who thought the $40 Jimbo was offering sounded good, and one whom I found who actually gave his away for $20 (I watched Jimbo bid for his and realized if I found someone who was drunk, I could probably get mine cheaper). Thus ended the perfect opening day.
Continue Reading...

More Esoteric Ramblings About ERA+

Recently, Baseball-Reference briefly unveiled a new version of ERA+ which rescaled the stat to show how a pitcher's ERA related to the league rather than how the league related to the pitcher's ERA. With the old numbers, 200 meant that the league ERA was 100% higher than the pitcher's ERA; with the new figures, that 200 is instead presented as 150, meaning that the pitcher's ERA was 50% lower than the league ERA. The advantages of the new version, as well as some of the limitations, were discussed recently by Patriot on his blog.

This new version was soon rolled back, as B-R announced that the change was premature and that the official change will require a more organized approach (apparently simply switching important numbers around unannounced with no explanation and waiting to see if anyone notices is confusing, or something like that). However, in the ensuing discussion of the new version in the afore-linked B-R blog post, poster PhilM brought up an interesting point.

This post is not about anything specific to the new version of ERA+ or to the switch at B-R (should I have mentioned this somewhere in the 2-paragraph lead-in? Well, I'm telling you now, anyway), but before I begin (begin? Should I have also done that before the third paragraph?), let me explain one key benefit of the new version of ERA+. Under the old version of the stat, 2 pitchers with ERA+'s of 200 and 100 over the same number of innings would have a combined ERA+ of...133. Strange, huh? Two 150 ERA+ pitchers were better than a 200 and a 100 ERA+ pitcher, because to combine ERA+, you had to take the harmonic mean, not the simple average. Convert those to the new version, and the 200/100 pair of pitchers have ERA+'s of 150 and 100, for a combined 125 (hey, that's what we'd expect!). The 150/150 pair under the old ERA+ would each have an ERA+ of 133 using the new version. With the new ERA+, it's obvious how to combine ERA+, and it's easy to see that two 133 pitchers are better than a 150 pitcher and a 100 pitcher combined. For the rest of this article, I'll refer only to the new version, so when you see ERA+ after this, think of the version where 150/100 average to 125.

Now, back to PhilM (remember PhilM?): PhilM noticed that when he calculated Walter Johnson's career ERA+ by taking the average of each season ERA+ weighted by IP, it was not exactly the same as his actual career ERA+, although it was close (133 vs. 132). This is an important observation. ERA+ represents the percentage of the average number of earned runs a pitcher allows over a given number of innings. A 150 pitcher allows 50% fewer ER than average. A 133 pitcher allows 33% fewer ER than average. A 90 pitcher allows 10% more ER. You arrive at this percentage by dividing the pitcher's ER by the amount of ER a league average pitcher would have allowed (which you figure by prorating the league ERA to the pitcher's IP). A pitcher who allows 50 ER when the league average pitcher would have allowed 100 allows 50% fewer ER than average, so he has a 150 ERA+.

What this means is that if you want to combine ERA+, what you are really doing is figuring the combined ER allowed, and the combined league average ER allowed; in other words, when each individual ERA+ represents the fraction ER/lgER, you are adding the numerators of each fraction and adding the denominators of each fraction (not adding the fractions themselves; you don't care about common denominators or anything of the like-you may let out a collective sigh of relief here if you feel so moved). If your individual ERA+'s represent the following fractions:

50/100, 20/30, 70/60, 100/100

Then the combined ERA+ will be:

(50+20+70+100)/(100+30+60+100)

Simple enough, right? Now, what PhilM noticed was that this calculation for combined ERA+ is not the same as weighting each individual ERA+ by IP and then averaging them, even though that is supposed to be how we combine ERA+. What gives?

The issue is that averaging the individual ERA+'s, weighted by IP, does in fact simplify to the sum of ER allowed divided by the sum of league ER allowed, but only if the league ERA is the same in each ERA+ you are combining. If that is not the case (and it usually isn't), then what you are doing when you average multiple ERA+'s is not calculating the combined ERA+ but estimating it (albeit pretty well).

This brings us at last to the actual point of this post. PhilM then asked the all-important question: if averaging each season ERA+, weighted by IP, is not the same as calculating career ERA+ directly, which is the better method? PhilM, being the smart guy that he is (I don't really know if PhilM is a smart guy; I assume he is smart because he asked such a good question, and I assume he is a guy because PhilM is a fairly masculine handle), noted that if you have two seasons, one with an ERA+ of 150 and one with an ERA+ of 100, and each had the same number of innings pitched, the intuitive thing would be to say that the pitcher was 125 overall, but in fact, whichever season has the higher league ERA will get more weight. PhilM showed that if the league ERA was twice as high in 150 ERA+ season, that the combined ERA+ would actually be 133, much higher than the 125 we would expect. What's more, if you switched the two seasons so that the league ERA is twice as high in the 100 ERA+ season, the combined ERA+ will be only 117. Of course, the league ERA isn't going to double or halve from year to year, but it will change, and that means that it will make a difference, even if that difference is usually small. Sometimes, such as when a pitcher goes from, say, San Diego to Texas and puts up very different ERA+'s for each team, it might make a pretty noticeable difference. PhilM even noted that the difference, even when small, could actually change the career leaderboard for ERA+, such as flipping Walter Johnson and Lefty Grove.

Let's examine this a little more closely to see what is going on. In PhilM's example, we have a 150 ERA+ season and a 100 ERA+ season, each with the same number of innings. If the league ERA in the former season is double that in the latter, then the combined ERA+ will be 133. That means if we want to combine the two ERA+'s, we can't simply weight by IP and average them. How, then, can we get combined ERA+ from each individual season?

To actually calculate the combined ERA+ by averaging each individual ERA+, you would have to weight each individual ERA+ not just by IP, but by the quantity IP * lgERA. As you can see, seasons where the league ERA is higher than average will get more weight in the calculation, and seasons where the league ERA is lower than average will get less weight.

Think, for a moment, about what ERA+ actually is. At it's core, it is a measure of ERA that removes the context of the run-environment. A 2.00 ERA in an environment where 3.00 is average is not the same as a 2.00 ERA in an environment where 4.00 is average- it's more like a 2.67 ERA in the 4.00 environment. The whole point is that a 133 ERA+ is equal to a 133 ERA+ no matter what the run environment is-whether it is a high- or a low-scoring environment doesn't matter. But, as we just saw, it does matter when we combine ERA+. Higher scoring environments get more weight in the combined figure when it is calculated in the traditional way. That goes against the general purpose of ERA+ of removing the effects of the run-environment on how we interpret the numbers. PhilM's proposal of using the weighted average of each season for career ERA+ fixes this issue and gives each season equal weight (by IP, of course) regardless of what the league ERA was in each season.

Given that the current method of calculating career ERA+ weights each season by a combination of IP and of the league ERA that season, I can definitely get behind this change. It's not entirely a clear-cut choice, though. The change would give different definitions of ERA+ for season and career levels, and the two would no longer be calculated in the same way. There is also the problem that each season is actually a combination of games much like a career is a combination of seasons. That means that over the course of a season, games in a higher run-environment get more weight in calculating season ERA+. In fact, this is probably actually a larger issue because, in general, run-environments will very much more from game to game than from season to season. Does that mean we want to calculate season ERA+ as a weighted average of each game ERA+? That would be a mess to calculate, and even then, you can still break up a game further into innings or PAs. At some point, you have to actually calculate ERA+ in the traditional way and accept the shortcomings, or else you have no pieces to average together to get ERA+ at the higher levels. Choosing where to make this distinction between calculating ERA+ directly and where to start calculating it as a weighted average of individual parts is problematic. Ideally, you make the distinction at as low a level as possible (doing it by game would be better than doing it by season, and doing it by season would be better than not doing it at all if you want to get as close to weighting every IP equally as possible), but each level you go lower makes the stat more of a mess to calculate. Making the distinction at the season level as per PhilM's suggestion is probably as far as you can reasonably go.

Furthermore, the method you use will depend on what precisely you want to convey. If you want ERA+ to just represent the percentage of runs allowed relative to the league average, then higher run environments actually do have more impact. If you want to present ERA as a run-environment neutral stat, or you want it to model value or wins rather than just runs relative to average, taking the weighted average is probably better. I think the way ERA+ is generally defined and explained (as percentage of league average in terms only of runs) follows the traditional method more closely, but the way it is generally viewed or interpreted or used (as a run-environment neutral perspective of pitcher value as measured by ERA) is more in line with the weighted average method. Like PhilM, I would at least ask Baseball-Reference to consider this change, but I would also understand if, after considering both methods, they elected to keep doing as they are.




*note-when I call this version "new", I mean new to Baseball-Reference, which publishes ERA+. The method was proposed some time ago by regular Book Blog poster Guy and has been used for tRA+ at StatCorner for a while.
Continue Reading...