Lights flash throughout the stadium. The noise of the crowd is constant, rising with the excitement, but never dipping below the steady din of the conversations of 40,000 fans. It's a scene that would make Donald Trump proud. The pitcher sets himself in the cold October night. He delivers, and the ball turns end over end, hurling toward the plate, tipping and rolling, as the 40,000 sit on edge waiting to see where the dice stop: the crack of wood is a seven, the snap of leather a two. Yes, this is playoff baseball-a wondrous game, an enthralling spectacle, a beautifully wrapped crap shoot, a pair of dice cloaked in the charms of our modern rounders.
You know the story by now. Even if you never read Moneyball, you've no doubt heard Billy Beane's famous quote about the nature of the playoffs. In a game where even the Royals can expect to beat the Yankees 3 out of 10 times (even if they never pitched Zack Greinke), it's fairly obvious how much chance can enter into the outcome of a short series. Never has this been more obvious than in 2006, when the St. Louis Cardinals established themselves as an 83 win team over the course of 161 games, and then proceeded to out-roll the more potent Padres, Mets, and Tigers in one quick crap shoot after another, riding hot rollers Jeff Suppan, et al to a World Series title.
At least that's the story you tend to hear, even from the typically savvy among us. An 83 win team cannot really be better than a 97 or a 95 win team, but, of course, when you set them at the craps table and call out, "First to four!", anything can happen. It is, of course, true that there is a lot of chance involved in the outcome of a 5- or 7-game series. The inferior team will win a pretty significant portion of the time. What is missing from this story, however, is that the 162 game season is subject to the same laws of chance. Setting n=162 rather than n=7 certainly seems like a good way to cut out the noise and establish the true talent levels of each team, but the reality is that in baseball, where even the best and the worst teams fall within the .400 to .600 range, there's still a lot of chance involved.
Let's look at the 2006 Cardinals as an example. Let's assume, just for example's sake, that the Cardinals' .516 win percentage established their actual ability, and that each team they played was as good as its 162 game record. To win the World Series, the Cardinals had to go through a 5-game set with the 88-win Padres, a 7-game set with the 97-win Mets, and a 7-game set with the 95-win Tigers. While each series on it's own might give the Cardinals a decent chance for success, they had to win all three in succession. What are their odds of getting all the way through?
To keep the math simple, we'll ignore homefield advantage and individual pitcher match-ups and just assume that each team's probability of winning each game is the same. Against the .543 Padres, the .516 Cardinals would have a .472 chance of winning each game (using the log5 method), which would mean they have a .448 chance of winning the 5-game series. That's pretty good. Now on to the Mets.
This was a taller order. Not only did the Mets win 9 more games than the Padres, they had 2 extra games to let the better team work its way to the top. Here, the Cards would have just a .416 shot at each game and a .322 shot at the series. Against the similarly strong 95-win Tigers in the Series, the Cardinals had a .347 chance at winning the set.
The combined chance of all of these events happening together is just .050. If you bump up the Tigers a few wins for their tougher AL schedule, it's just .045. That's a mere 5% chance of the Cardinals winning the World Series if you assume that each team's regular season record establishes its true strength. In other words, if that assumption is true, then the Cardinals winning the World Series is a very unlikely outcome-hardly the type of thing you would expect to be described as a crap shoot.
Regular season records don't establish a team's true strength, however. There are, of course, issues of changing personnel, such as teams having health problems throughout the year and getting healthy for the playoff run (as the Cardinals did), but that's not what I'm talking about. Just the pure random chance that goes into a team's 162-game record can't be ignored any more than the chance that goes into a playoff series can. Recall that the chances of an 83-win team beating an 88-win team, a 97-win team, and then a 95-win team in the playoffs is just 5%. What kind of variability in a team's regular season record would cover that same likelihood?
The Cardinals won 83 out of 161 games. We want to know how good a team could be to still have a 5% chance of winning only 83 out of 161 games. That would be a .583 team, or a 94.4 win team. A team whose actual ability was .583 would be expected to win no more than 83 of 161 games roughly 5% of the time.
So when you think about the role of chance in the playoffs and that the winner of the series does not necessarily determine the better team, don't forget that the same caveat covers the regular season as well. When you see something like the 83-win Cardinals winning the World Series, the role of chance in the outcome of events could mean that the Cardinals were a bad team that got lucky in the playoffs, but it could just as well mean that the Cardinals were a good team that got unlucky in the regular season. After all, the chances that a team that was truly as bad as the Cardinals' record beating 3 teams that were truly as good as the Padres', Mets', and Tigers' records in the playoffs are the same as a team that is truly as good as a 94-95 win team performing as poorly as the Cardinals did over the regular season.
Continue Reading...
Christy Mathewson painting
I finished a new baseball painting today; this one of Christy Mathewson. It was a gift for my dad (who occasionally contributes to 3-DBaseball), and will be hanging in one of his baseball rooms soon. Note that Mathewson does remarkably in BSAB (BrushStrokes Above Background) here.
Image in the full article.

Click image for full-size (~700kb). Also, no, he isn't wearing a hat. He was warming up in the bullpen in my stock image, and, being the dashing man that he was, had not donned his hat. I liked it that way, so he's not wearing one here either.
Continue Reading...
Image in the full article.

Click image for full-size (~700kb). Also, no, he isn't wearing a hat. He was warming up in the bullpen in my stock image, and, being the dashing man that he was, had not donned his hat. I liked it that way, so he's not wearing one here either.
Continue Reading...
FIP Constants (ERA-scale vs. RA-scale)
One of the long-standing debates in statistical circles of the game is whether ERA or RA should be used for pitchers. In the wake of the DIPS-wave that hit the analytical world, the residual of this debate is the question of which scale should be used for metrics that don't directly measure observed runs or earned runs allowed. This new question has nothing really to do with the actual ERA vs. RA debate, as it's purely a matter of scale and not of what we should or shouldn't measure (as is the original debate), but it remains a point of contention. Which is more important: the familiarity of the ERA scale and the readiness with which casual fans can make comparisons to their old standby (most fans will almost always compare DIPS stats to ERA, not to RA, even if the DIPS stat is scaled to RA), or the intuitive nature and usability for more advanced applications of RA? On the one hand, we have the original DIPS, FIP, and now tRA at FanGraphs scaled to ERA, choosing the benefits of familiarity over intuitiveness. On the other hand, we have tRA outside of FanGraphs, as well as most analysts who use DIPS metrics to convert to value, using or converting to the scale of RA instead.
For the most part, I don't particularly care one way or the other, since scaling to ERA or RA is, for most practical purposes, the same thing. Divide or multiply by .92 (or lgERA/lgRA), and you can easily go from one to the other. If you're using the metrics for anything more complicated than, say, looking at them, this step is probably the simplest you'll encounter, so I have no problem with either standard, at least as far as practicality goes. The issue is just a matter of presentation and the implications that go with that.
However, there is a related question I am more interested in, specifically with regard to FIP. As discussed here last October, FIP comes from the linear weights values of 4 categories of events (HR, BB, SO, and BIP) and is scaled to the the league ERA by adding a constant to the calculation. Similarly, FIP could be scaled to RA instead by changing the constant. It shouldn't matter which scale we choose, since we can easily convert from one scale to the other with a simple calculation. Because of how FIP works, however, it does matter which scale we choose when deciding what constant to use in the calculation.
The problem arises from the fact that the linear weights used to calculate the coefficients in FIP are on the scale of runs, while ERA is on the scale of earned runs. Earned runs are a smaller scale than runs, by a magnitude of about .92. To see why this creates a problem, consider how FIP is calculated:
FIP = (13*HR + 3*BB -2*K)/IP + C
let (13*HR + 3*BB -2*K)/IP = x
FIP = x + C
This is the basic construction of FIP; a value is calculated for each pitcher, and then a constant is added to this value to put FIP on a more usable scale. The end goal of this constant is to convert FIP to the scale of either ERA or RA. Which scale you choose should make no difference because, as mentioned earlier, ERA is just RA divided by .92 (or something close to that), and you should be able to convert one to the other with a simple calculation. That is not true in this case. Let's let the following two equations represent FIP scaled to ERA and to RA respectively:
erFIP = x + C1; C1 = lgERA - lgx
rFIP = x + C2; C2 = lgRA - lgx
x is the same in both equations. The only difference is the value of the constant. Now, let's convert rFIP to erFIP using our multiply-by-.92 rule:
erFIP = .92*(x + C2)
=.92*x + .92*C2
.92 times a constant is just another constant, so:
=.92*x + C3; C3 = .92*C2
Compare that to the original equation for FIP scaled to ERA:
erFIP1 = x + C1
erFIP2 =.92*x + C3
As long as we choose the correct values of C1 and C3, there shouldn't be a difference between these two values, but there is. To see why, subtract the two equations, assuming erFIP1=erFIP2:
erFIP1-erFIP2 = (x + C1) - (.92*x + C3)
0 = x-.92*x + C1-C3
Here, C1-C3 is another constant, because it is just the difference between two constant numbers:
0 = x*(1-.92) + C4; C4 = C1 - C3
0 = .08*x + C4
This can't be true for all values of x. When C4 is set to that this equation is true on average, it means that erFIP1 is smaller than erFIP2 when x is lower than average (meaning the difference between the two equations as shown above will be negative) and that erFIP1 is larger than erFIP2 when x is higher than average (the difference between the equations will be positive). In other words, FIP will be lower for good pitchers and higher for bad pitchers if you scale it directly to ERA than if you scale it to RA and then convert to ERA-scale. The spread between pitchers is larger for the former method than for the latter.
Another way to look at this is to consider FIP as two components: the measure of pitchers' results (x), and the constant (C). x is measured in runs. If C is set to scale to earned runs instead of runs, then x will make up a larger portion of FIP, and, since x is the part of FIP that varies from pitcher to pitcher, the variance of FIP between pitchers will be inflated relative to the scale of the metric.
To illustrate this point, consider the following graph, which is just a scatter plot showing erFIP1 and erFIP2 from the above formulae for every pitcher to throw at least 100 IP in a season since 1970, as well as the difference betwen erFIP1 and erFIP2. The graph is sorted from left to right by the difference between the two figures:

Notice that when erFIP1 is smaller than erFIP2 (that is, when using a constant that scales FIP to ERA returns too small a value for FIP, assuming that scaling to RA is correct), FIP is small, and that, without exception, that difference rises as FIP rises for a given value of C (notice that the graph is really just one pattern stacked on top of itself several times; this is just the same pattern being plotted for different values of C in different years).
It shouldn't be surprising that there FIP has some inaccuracies. It is, after all, a shortcut for the original DIPS calculations designed to be much more simple and easy to use with only a small cost in accuracy. The question is how much difference this problem makes. As seen in the above graph, the difference between calculating FIP on the scale of RA and then scaling back to ERA and calculating FIP directly to the scale of ERA is small for most pitchers, and in fact approaches 0 as you get closer to average. It is on the edges of the graph, where pitchers are far from average, where the differences start to grow.
For example, Pedro Martinez, circa 1999. His FIP, with the constant set to scale to ERA, was 1.51*. With the constant set to scale to RA, it was 1.96, which, scaled back to ERA-scale, is 1.79. Still excellent, obviously, but not as good as his traditional FIP suggests. That's a difference of .28 runs per 9 innings. Say we were to calculate a WAR value for Pedro that year, how much difference would that make? We can ignore park adjustments for this specific purpose, since all we care about is how the two methods of calculating FIP compare. The AL average RA in 1999 was 5.31. Using 1.51 as Pedro's FIP (and dividing by .92 to scale to RA), that gives Pedro a W% of .885. Using .380 as replacement level, that's good for 12.0 WAR over 213.1 IP. Using 1.96 as Pedro's FIP gives him a W% of .851, or 11.2 WAR. The difference here is .8 wins.
Since 1970, 12 pitchers have had differences between their WAR figures as calculated by these two methods at least that big:
Similarly, poor pitchers have WARs that are too low when measured by traditional FIP, though not by as much, since they pitch far fewer innings than elite pitchers. Of the 5630 pitcher-seasons with at least 100 IP since 1970, the RMSD of WAR1 and WAR2 was .15, with an average IP of 175.
I've used language in this article that assumes that using the constant that scales to RA is the more correct choice, as the coefficients of FIP are based on a scale of runs rather than runs scored (this is also the method used in the original DIPS statistic), but I haven't gone through the full DIPS calculations to compare. For now, I think it's important to just look at how much difference there is between using different coefficients and whether there is enough difference that it could be worth switching scales. Since the formula would be virtually identical (the only difference would be that the constant would be different), I would prefer using the formula that scales to RA rather than ERA. If you prefer the ERA-scale, that adds a step of multiplying by .92 (technically lgERA/lgRA, which you might as well use since you have to calculate the constant anyway), but that's simple enough that I don't think it hurts simplicity or usability any. It's just standard fare for going from one scale to the other.
*The FIP I'm using here differs from FanGraphs' value. There are a few different formulae for FIP floating around; I am using BB-IBB+HBP for the BB term in FIP, and using different constants for the AL and NL, while FanGraphs uses BB+HBP for the BB term and a single constant for both leagues in each season. Also, while on the subject of FanGraphs, this article shouldn't be an issue for their WAR values, because FG uses the constant that scales to RA for win values.
Continue Reading...
For the most part, I don't particularly care one way or the other, since scaling to ERA or RA is, for most practical purposes, the same thing. Divide or multiply by .92 (or lgERA/lgRA), and you can easily go from one to the other. If you're using the metrics for anything more complicated than, say, looking at them, this step is probably the simplest you'll encounter, so I have no problem with either standard, at least as far as practicality goes. The issue is just a matter of presentation and the implications that go with that.
However, there is a related question I am more interested in, specifically with regard to FIP. As discussed here last October, FIP comes from the linear weights values of 4 categories of events (HR, BB, SO, and BIP) and is scaled to the the league ERA by adding a constant to the calculation. Similarly, FIP could be scaled to RA instead by changing the constant. It shouldn't matter which scale we choose, since we can easily convert from one scale to the other with a simple calculation. Because of how FIP works, however, it does matter which scale we choose when deciding what constant to use in the calculation.
The problem arises from the fact that the linear weights used to calculate the coefficients in FIP are on the scale of runs, while ERA is on the scale of earned runs. Earned runs are a smaller scale than runs, by a magnitude of about .92. To see why this creates a problem, consider how FIP is calculated:
FIP = (13*HR + 3*BB -2*K)/IP + C
let (13*HR + 3*BB -2*K)/IP = x
FIP = x + C
This is the basic construction of FIP; a value is calculated for each pitcher, and then a constant is added to this value to put FIP on a more usable scale. The end goal of this constant is to convert FIP to the scale of either ERA or RA. Which scale you choose should make no difference because, as mentioned earlier, ERA is just RA divided by .92 (or something close to that), and you should be able to convert one to the other with a simple calculation. That is not true in this case. Let's let the following two equations represent FIP scaled to ERA and to RA respectively:
erFIP = x + C1; C1 = lgERA - lgx
rFIP = x + C2; C2 = lgRA - lgx
x is the same in both equations. The only difference is the value of the constant. Now, let's convert rFIP to erFIP using our multiply-by-.92 rule:
erFIP = .92*(x + C2)
=.92*x + .92*C2
.92 times a constant is just another constant, so:
=.92*x + C3; C3 = .92*C2
Compare that to the original equation for FIP scaled to ERA:
erFIP1 = x + C1
erFIP2 =.92*x + C3
As long as we choose the correct values of C1 and C3, there shouldn't be a difference between these two values, but there is. To see why, subtract the two equations, assuming erFIP1=erFIP2:
erFIP1-erFIP2 = (x + C1) - (.92*x + C3)
0 = x-.92*x + C1-C3
Here, C1-C3 is another constant, because it is just the difference between two constant numbers:
0 = x*(1-.92) + C4; C4 = C1 - C3
0 = .08*x + C4
This can't be true for all values of x. When C4 is set to that this equation is true on average, it means that erFIP1 is smaller than erFIP2 when x is lower than average (meaning the difference between the two equations as shown above will be negative) and that erFIP1 is larger than erFIP2 when x is higher than average (the difference between the equations will be positive). In other words, FIP will be lower for good pitchers and higher for bad pitchers if you scale it directly to ERA than if you scale it to RA and then convert to ERA-scale. The spread between pitchers is larger for the former method than for the latter.
Another way to look at this is to consider FIP as two components: the measure of pitchers' results (x), and the constant (C). x is measured in runs. If C is set to scale to earned runs instead of runs, then x will make up a larger portion of FIP, and, since x is the part of FIP that varies from pitcher to pitcher, the variance of FIP between pitchers will be inflated relative to the scale of the metric.
To illustrate this point, consider the following graph, which is just a scatter plot showing erFIP1 and erFIP2 from the above formulae for every pitcher to throw at least 100 IP in a season since 1970, as well as the difference betwen erFIP1 and erFIP2. The graph is sorted from left to right by the difference between the two figures:

Notice that when erFIP1 is smaller than erFIP2 (that is, when using a constant that scales FIP to ERA returns too small a value for FIP, assuming that scaling to RA is correct), FIP is small, and that, without exception, that difference rises as FIP rises for a given value of C (notice that the graph is really just one pattern stacked on top of itself several times; this is just the same pattern being plotted for different values of C in different years).
It shouldn't be surprising that there FIP has some inaccuracies. It is, after all, a shortcut for the original DIPS calculations designed to be much more simple and easy to use with only a small cost in accuracy. The question is how much difference this problem makes. As seen in the above graph, the difference between calculating FIP on the scale of RA and then scaling back to ERA and calculating FIP directly to the scale of ERA is small for most pitchers, and in fact approaches 0 as you get closer to average. It is on the edges of the graph, where pitchers are far from average, where the differences start to grow.
For example, Pedro Martinez, circa 1999. His FIP, with the constant set to scale to ERA, was 1.51*. With the constant set to scale to RA, it was 1.96, which, scaled back to ERA-scale, is 1.79. Still excellent, obviously, but not as good as his traditional FIP suggests. That's a difference of .28 runs per 9 innings. Say we were to calculate a WAR value for Pedro that year, how much difference would that make? We can ignore park adjustments for this specific purpose, since all we care about is how the two methods of calculating FIP compare. The AL average RA in 1999 was 5.31. Using 1.51 as Pedro's FIP (and dividing by .92 to scale to RA), that gives Pedro a W% of .885. Using .380 as replacement level, that's good for 12.0 WAR over 213.1 IP. Using 1.96 as Pedro's FIP gives him a W% of .851, or 11.2 WAR. The difference here is .8 wins.
Since 1970, 12 pitchers have had differences between their WAR figures as calculated by these two methods at least that big:
Year | Pitcher | IP | erFIP | rFIP | W%1 | W%2 | WAR1 | WAR2 | diff |
1972 | Steve Carlton | 346.3 | 2.20 | 2.63 | 0.659 | 0.682 | 10.7 | 11.6 | -0.9 |
1973 | Bert Blyleven | 325.0 | 2.38 | 2.86 | 0.670 | 0.694 | 10.5 | 11.4 | -0.9 |
1979 | J.R. Richard | 292.3 | 2.25 | 2.74 | 0.679 | 0.705 | 9.7 | 10.6 | -0.9 |
1984 | Doc Gooden | 218.0 | 1.80 | 2.29 | 0.725 | 0.761 | 8.4 | 9.2 | -0.9 |
1985 | Doc Gooden | 276.7 | 2.15 | 2.62 | 0.679 | 0.706 | 9.2 | 10.0 | -0.8 |
1978 | Ron Guidry | 273.7 | 2.22 | 2.69 | 0.687 | 0.714 | 9.3 | 10.2 | -0.8 |
1971 | Tom Seaver | 286.3 | 2.13 | 2.57 | 0.670 | 0.695 | 9.2 | 10.0 | -0.8 |
1999 | Pedro Martinez | 213.3 | 1.51 | 1.96 | 0.851 | 0.885 | 11.2 | 12.0 | -0.8 |
1971 | Vida Blue | 312.0 | 2.19 | 2.61 | 0.662 | 0.685 | 9.8 | 10.6 | -0.8 |
1986 | Mike Scott | 275.3 | 2.20 | 2.65 | 0.685 | 0.711 | 9.3 | 10.1 | -0.8 |
1970 | Bob Gibson | 294.0 | 2.55 | 3.03 | 0.671 | 0.694 | 9.5 | 10.2 | -0.8 |
1973 | Nolan Ryan | 326.0 | 2.57 | 3.05 | 0.646 | 0.667 | 9.6 | 10.4 | -0.8 |
Similarly, poor pitchers have WARs that are too low when measured by traditional FIP, though not by as much, since they pitch far fewer innings than elite pitchers. Of the 5630 pitcher-seasons with at least 100 IP since 1970, the RMSD of WAR1 and WAR2 was .15, with an average IP of 175.
I've used language in this article that assumes that using the constant that scales to RA is the more correct choice, as the coefficients of FIP are based on a scale of runs rather than runs scored (this is also the method used in the original DIPS statistic), but I haven't gone through the full DIPS calculations to compare. For now, I think it's important to just look at how much difference there is between using different coefficients and whether there is enough difference that it could be worth switching scales. Since the formula would be virtually identical (the only difference would be that the constant would be different), I would prefer using the formula that scales to RA rather than ERA. If you prefer the ERA-scale, that adds a step of multiplying by .92 (technically lgERA/lgRA, which you might as well use since you have to calculate the constant anyway), but that's simple enough that I don't think it hurts simplicity or usability any. It's just standard fare for going from one scale to the other.
*The FIP I'm using here differs from FanGraphs' value. There are a few different formulae for FIP floating around; I am using BB-IBB+HBP for the BB term in FIP, and using different constants for the AL and NL, while FanGraphs uses BB+HBP for the BB term and a single constant for both leagues in each season. Also, while on the subject of FanGraphs, this article shouldn't be an issue for their WAR values, because FG uses the constant that scales to RA for win values.
Continue Reading...
Back-to-Back Inning-Ending Double Plays (with a catch)
Last night, I was watching Venezuela play Puerto Rico in the Caribbean Series on the MLB Network. Aside from watching former-Cardinal pitcher Jason Simontacchi dazzle for a few innings, the most remarkable quirk of the night was that two straight half-innings ended on double-plays to the outfield (the first of which--a trapped fly ball by the right-fielder that caught the runners not knowing if the ball would be/had been caught and led to a force-out at second followed by a tag-out at home--was particularly bizarre). Curious, I decided to see if this has ever happened in MLB.
As always, it's Retrosheet to the rescue. Turns out, it's happened twice this decade. The most recent was in the bottom of the third/top of the fourth in a 2007 game between Minnesota and Tampa Bay when Ty Wigginton and Jeff Cirillo each hit into fly-outs with the runner caught tagging at first. Oddly enough, that was also the night when Carlos Pena hit the catwalk in Tropicana Field in two straight ABs, in case you were wondering about that "Single to 2B (Pop Fly)" line for Pena in the bottom of the 10th that led to the winning run.
Before that, it happened in the 9th inning of a 2001 game between Bostons of past and present (in Atlanta, naturally) when both teams spent the last of their days' worth of PAs on such double plays.
But what about a game with back-to-back inning-ending outfield double plays, with one of them seeing both outs come on the infield? None of this fly-out/outfield assist crap. Anyone can do that (even Jeff Cirillo!). You have to go all the way back to 1956 to find a possible match for that. On July 6, Detroit got out of the bottom of the first with a typical, boring sac-fly-slash-9-3-6-double-play, but in the top of the second, the real magic happened. Maybe. Officially, Bill Tuttle grounded into a 9-4-3 double play. Yes, grounded into. Now, I'm a bit skeptical that this double play ever made it to the outfield, just because, well, a 9-4-3 ground out? A straight 9-3 ground out is rare enough without the detour through second. Hell, even just a 9-4 fielder's choice is damn near unheard of. Maybe there was an error in recording the data. Maybe there was a weird 5-infielder shift on, though that would make no sense for the defense to try with a 2-run lead in the top of the second and a runner on first. Or, maybe Bill Tuttle and Jack Phillips both fell down. Maybe they had money on the scoreboard cap-dance game and could not afford to let their eyes wander to that less important game going on below. After all, the money in those days wasn't so great that such a matter would have been trivial. Maybe nothing unusual happened, Tuttle crossed first safely and easily, and the ump just blew the call by 3 or 4 seconds. Whatever it was, that's the one entry in Retrosheet's PBP files that fits, at least officially, all the criteria for what happened last night.
Continue Reading...
As always, it's Retrosheet to the rescue. Turns out, it's happened twice this decade. The most recent was in the bottom of the third/top of the fourth in a 2007 game between Minnesota and Tampa Bay when Ty Wigginton and Jeff Cirillo each hit into fly-outs with the runner caught tagging at first. Oddly enough, that was also the night when Carlos Pena hit the catwalk in Tropicana Field in two straight ABs, in case you were wondering about that "Single to 2B (Pop Fly)" line for Pena in the bottom of the 10th that led to the winning run.
Before that, it happened in the 9th inning of a 2001 game between Bostons of past and present (in Atlanta, naturally) when both teams spent the last of their days' worth of PAs on such double plays.
But what about a game with back-to-back inning-ending outfield double plays, with one of them seeing both outs come on the infield? None of this fly-out/outfield assist crap. Anyone can do that (even Jeff Cirillo!). You have to go all the way back to 1956 to find a possible match for that. On July 6, Detroit got out of the bottom of the first with a typical, boring sac-fly-slash-9-3-6-double-play, but in the top of the second, the real magic happened. Maybe. Officially, Bill Tuttle grounded into a 9-4-3 double play. Yes, grounded into. Now, I'm a bit skeptical that this double play ever made it to the outfield, just because, well, a 9-4-3 ground out? A straight 9-3 ground out is rare enough without the detour through second. Hell, even just a 9-4 fielder's choice is damn near unheard of. Maybe there was an error in recording the data. Maybe there was a weird 5-infielder shift on, though that would make no sense for the defense to try with a 2-run lead in the top of the second and a runner on first. Or, maybe Bill Tuttle and Jack Phillips both fell down. Maybe they had money on the scoreboard cap-dance game and could not afford to let their eyes wander to that less important game going on below. After all, the money in those days wasn't so great that such a matter would have been trivial. Maybe nothing unusual happened, Tuttle crossed first safely and easily, and the ump just blew the call by 3 or 4 seconds. Whatever it was, that's the one entry in Retrosheet's PBP files that fits, at least officially, all the criteria for what happened last night.
Continue Reading...
Wait 'Til Next Year
It was nearly dawn. Election officials for Baseball's Hall of Fame were working feverishly to verify the results for that afternoon. Usually, they just casually glanced over the ballots and totaled them up, but not this year. This year, one Bert Blyleven was making things difficult. He fell a mere five votes shy of enshrinement. The higher-ups had ordered a recount, citing some obscure provision in the Hall charter about less than one percent, or some such thing. So there they were, in a small room in Cooperstown under the harsh light of a single bare bulb, meticulously poring through each ballot.
"It can't be," one of them let out at last, stopping with the stack of ballots before him. "Something's not right." He pushed the stack of papers across the table to one of his fellow officials. "Take a look at this."
"What's this?" the second official asked as he stared down at the pile of ballots clipped together with the word "FLORIDA" written across a yellow post-it on top. He began flipping through them. "Huh?" he said. "How many votes did Galarraga get this year?"
"Twenty-two."
"Twenty-two? No shit, huh?"
A third official chimed in, "Don't worry about Galarraga. He's nowhere near getting in, we've only got to double check on Blyleven."
"No, this is interesting. Twenty-two votes. Any of you guys of a mind to vote for Galarraga for the Hall of Fame?" No one was. "And yet, twenty-two votes."
"Who cares? A guy voted for David Segui, for Christ's sake. Two for Eric Karros! No one takes this shit seriously anyways. So twenty-two guys decided to play the same joke, or he scratched someone's back back in the day, or twenty-two writers are morons. Does it matter? What's this got to do with Blyleven?"
"That's the thing. Twenty-two votes. You know how many votes he got from Florida? Twenty-two votes." The others were interested now. "Look at this thing," he laid the ballots out on the table:

They were baffled. None of them had seen a ballot like that before. Right there, Bert Blyleven's name, the second one down, with a line straight over it leading directly to the second circle down, the circle indicating a vote for...Andres Galarraga.
"Any of the other ballots like this?" one of them asked.
"Just the ones from Florida."
"No shit?"
"No shit."
And there it was. Twenty-two votes. Blyleven missed by five. And there wasn't a thing they could do about it but count the votes as cast and run the results out to be released to the public, with Blyleven woefully close, painfully short, his deciding votes butterflied away to Andres Galarraga. Continue Reading...
"It can't be," one of them let out at last, stopping with the stack of ballots before him. "Something's not right." He pushed the stack of papers across the table to one of his fellow officials. "Take a look at this."
"What's this?" the second official asked as he stared down at the pile of ballots clipped together with the word "FLORIDA" written across a yellow post-it on top. He began flipping through them. "Huh?" he said. "How many votes did Galarraga get this year?"
"Twenty-two."
"Twenty-two? No shit, huh?"
A third official chimed in, "Don't worry about Galarraga. He's nowhere near getting in, we've only got to double check on Blyleven."
"No, this is interesting. Twenty-two votes. Any of you guys of a mind to vote for Galarraga for the Hall of Fame?" No one was. "And yet, twenty-two votes."
"Who cares? A guy voted for David Segui, for Christ's sake. Two for Eric Karros! No one takes this shit seriously anyways. So twenty-two guys decided to play the same joke, or he scratched someone's back back in the day, or twenty-two writers are morons. Does it matter? What's this got to do with Blyleven?"
"That's the thing. Twenty-two votes. You know how many votes he got from Florida? Twenty-two votes." The others were interested now. "Look at this thing," he laid the ballots out on the table:

They were baffled. None of them had seen a ballot like that before. Right there, Bert Blyleven's name, the second one down, with a line straight over it leading directly to the second circle down, the circle indicating a vote for...Andres Galarraga.
"Any of the other ballots like this?" one of them asked.
"Just the ones from Florida."
"No shit?"
"No shit."
And there it was. Twenty-two votes. Blyleven missed by five. And there wasn't a thing they could do about it but count the votes as cast and run the results out to be released to the public, with Blyleven woefully close, painfully short, his deciding votes butterflied away to Andres Galarraga. Continue Reading...
The Mystery of Andre Dawson
It's Hall of Fame Week (and by week, I mean for however long I feel like writing about HOF topics) here at 3-D Baseball, and the first order of business is to congratulate Andre Dawson. Mr. Dawson (should you be reading this), congratulations.
Now, to the rest of you. Dawson's election has stirred up quite a bit of debate. In fact, his is probably the most controversial HOF selection since that of Jim Rice. Unlike with Rice, I'm not going to criticize Dawson's selection, but I do want to highlight some of what makes his election so hotly debated and what that says about our process of deeming players worthy.
Is Dawson a HOFer? It's a simple enough question, but how do we answer? Do we:
A. Google his Baseball-Reference page, observe that he has been elected to the Hall of Fame, and is therefore, by definition, a Hall of Famer; or
B. Try to come up with our own definition for what standards the Hall requires, and see if Dawson fits those standards.
If you answered A, skip the rest of this article and google "Andre Dawson". If you answered B, then we're on the same page. Let's continue, then, shall we?
Again, is Dawson a HOFer? Of course he is. He is one of only 3 players, along with Willie Mays and Barry Bonds, with 400 HR and 300 steals. He joins those 2, plus Ken Griffey, Jr. and Mike Schmidt, as the only players with 400 HR and 8 Gold Gloves. With that kind of company, how can Dawson not be a HOFer? His only peers with that kind of power, speed, and defense are among the greatest players the game has ever seen.
He won an MVP and finished second in the voting two other times. He won an MLB record (which he shares with several other players) one Rookie of the Year and was an All Star 8 times (eat that, Bert Blyleven!). Baseball-Reference has his HOF Monitor, HOF Standards, and Grey Ink all at HOF levels.
As a hitter, his power struck fear into the hearts of pitchers; as a fielder, his arm gave base-runners nightmares. He was his era's premier 5-tool player. He put up great numbers in the last clean era of the game and only now looks worse because 'roided up players have artificially raised the bar, which shouldn't be held against Dawson's numbers. And hey, now that Jim Rice is in, how can you keep Dawson out?
OR...
In no way does Dawson belong. You thought Jim Rice didn't get on base? Dawson's .323 OBP is nearly .030 points worse than Rice's and .020 points worse than any other outfielder in the Hall. He would be the only OFer with a below average OBP in the Hall of Fame, and never before has OBP been so heavily stressed in our understanding of player valuations as now. He took fewer walks than Jim Rice despite playing 5 more seasons, and he walked only five-hundred-something times while striking out over 1500 times. His .279 average is the 4th worst of any HOF outfielder, and he only managed to rack up high counting totals by hanging around for 21 years, and even then, he didn't hit any of the magic numbers like 3,000 hits or 500 home runs, which aren't even automatic anymore anyway.
He was never a spectacular player and only had one HOF caliber year. You don't enshrine players for being good but not great for a long time. Put him in that ever-popular "Hall of Very Good" with Dale Murphy (more MVPs than Dawson), et al, but not the Hall of Fame. He was never an obvious candidate, and if it took 9 years for the voters to decide he belonged, he probably doesn't.
Richie Allen has a higher AVG, OBP, and SLG, and he did it in a worse offensive era. So do Tim Salmon, Reggie Smith, Ted Kluszewski, Fred Lynn, Kevin Freakin' Mitchell, and about half of everyone who ever played at Coors.
Dawson wasn't close to the best player on this year's ballot. Blyleven, Larkin, Alomar, Trammell, Martinez, and McGwire are all better candidates. Heck, he wasn't even the best candidate in his own outfield; Tim Raines holds that distinction. How do you justify his selection over theirs?
Now, did you choose A, or did you choose B? Just as in the great literary tradition of choose-your-own-ending mysteries, the path you took for deciding whether or not Dawson belongs often depends largely on a decision you made completely before the fact. If you start with a position or a leaning and then go looking for the stats or evidence that reinforces that position, you end up with an incomplete picture and, ultimately, inconsistent results. You end up putting Jim Rice in and leaving out outfield-mates Fred Lynn and Dwight Evans. You set up Larkin and Alomar for eventual induction while Trammell toils in percentages in the low-20s and Lou Whitaker falls immediately off the ballot. You end up with Durdles killing Edwin Drood every once in a while.
With the mess of stats and opinions and awards results and subjective considerations to wade through, it is essential that we have some consistent, objective starting point. You decide on some system you go to for every player, every time, and decide what other factors you want to consider to make adjustments from there, or else you run the risk of reaching biased and inconsistent conclusions and running in circles with everyone who chose B instead of A. Ideally, you want a system that includes both offensive and defensive considerations. I would recommend Sean Smith's WAR database as a starting point, but you can use whatever you are most comfortable with, as long as it is consistent and objective. If you use that, you might like to know that the borderline point (where a player is about a 50-50 shot to get in) is about 55 WAR (I got that figure from Tom Tango's posts on his blog; also, in full disclosure, I also basically stole this rant from Tango), but you can decide what your own standards are. From there, maybe you want to add consideration to peak value, or rate value, or playoff performance, or the effects of injuries or other factors that could affect performance, but you make sure that you give every player you are considering the same courtesy. For example, if you give Dawson credit for losing a lot of his value to bad knees that took a constant pounding on the turf in Montreal, you should probably also note that Richie Allen could never throw quite right after he cut his hand on a headlight and had troubles with his shoulder after Frank Thomas used it for batting practice (thank goodness it wasn't the Frank Thomas who could hit), and that Ron Santo played with diabetes, and that Tim Rainse played through serious impairment at times.
So, no more choosing A or B. Is Andre Dawson a Hall of Famer? It's the same question as 'Is Alan Trammell a Hall of Famer?' or 'Is Bert Blyleven a Hall of Famer?' or 'Is Ray Lankford a Hall of Famer', and they all lead down the same path, toward their various conclusions, not from their various conclusions.
Continue Reading...
Now, to the rest of you. Dawson's election has stirred up quite a bit of debate. In fact, his is probably the most controversial HOF selection since that of Jim Rice. Unlike with Rice, I'm not going to criticize Dawson's selection, but I do want to highlight some of what makes his election so hotly debated and what that says about our process of deeming players worthy.
Is Dawson a HOFer? It's a simple enough question, but how do we answer? Do we:
A. Google his Baseball-Reference page, observe that he has been elected to the Hall of Fame, and is therefore, by definition, a Hall of Famer; or
B. Try to come up with our own definition for what standards the Hall requires, and see if Dawson fits those standards.
If you answered A, skip the rest of this article and google "Andre Dawson". If you answered B, then we're on the same page. Let's continue, then, shall we?
Again, is Dawson a HOFer? Of course he is. He is one of only 3 players, along with Willie Mays and Barry Bonds, with 400 HR and 300 steals. He joins those 2, plus Ken Griffey, Jr. and Mike Schmidt, as the only players with 400 HR and 8 Gold Gloves. With that kind of company, how can Dawson not be a HOFer? His only peers with that kind of power, speed, and defense are among the greatest players the game has ever seen.
He won an MVP and finished second in the voting two other times. He won an MLB record (which he shares with several other players) one Rookie of the Year and was an All Star 8 times (eat that, Bert Blyleven!). Baseball-Reference has his HOF Monitor, HOF Standards, and Grey Ink all at HOF levels.
As a hitter, his power struck fear into the hearts of pitchers; as a fielder, his arm gave base-runners nightmares. He was his era's premier 5-tool player. He put up great numbers in the last clean era of the game and only now looks worse because 'roided up players have artificially raised the bar, which shouldn't be held against Dawson's numbers. And hey, now that Jim Rice is in, how can you keep Dawson out?
OR...
In no way does Dawson belong. You thought Jim Rice didn't get on base? Dawson's .323 OBP is nearly .030 points worse than Rice's and .020 points worse than any other outfielder in the Hall. He would be the only OFer with a below average OBP in the Hall of Fame, and never before has OBP been so heavily stressed in our understanding of player valuations as now. He took fewer walks than Jim Rice despite playing 5 more seasons, and he walked only five-hundred-something times while striking out over 1500 times. His .279 average is the 4th worst of any HOF outfielder, and he only managed to rack up high counting totals by hanging around for 21 years, and even then, he didn't hit any of the magic numbers like 3,000 hits or 500 home runs, which aren't even automatic anymore anyway.
He was never a spectacular player and only had one HOF caliber year. You don't enshrine players for being good but not great for a long time. Put him in that ever-popular "Hall of Very Good" with Dale Murphy (more MVPs than Dawson), et al, but not the Hall of Fame. He was never an obvious candidate, and if it took 9 years for the voters to decide he belonged, he probably doesn't.
Richie Allen has a higher AVG, OBP, and SLG, and he did it in a worse offensive era. So do Tim Salmon, Reggie Smith, Ted Kluszewski, Fred Lynn, Kevin Freakin' Mitchell, and about half of everyone who ever played at Coors.
Dawson wasn't close to the best player on this year's ballot. Blyleven, Larkin, Alomar, Trammell, Martinez, and McGwire are all better candidates. Heck, he wasn't even the best candidate in his own outfield; Tim Raines holds that distinction. How do you justify his selection over theirs?
Now, did you choose A, or did you choose B? Just as in the great literary tradition of choose-your-own-ending mysteries, the path you took for deciding whether or not Dawson belongs often depends largely on a decision you made completely before the fact. If you start with a position or a leaning and then go looking for the stats or evidence that reinforces that position, you end up with an incomplete picture and, ultimately, inconsistent results. You end up putting Jim Rice in and leaving out outfield-mates Fred Lynn and Dwight Evans. You set up Larkin and Alomar for eventual induction while Trammell toils in percentages in the low-20s and Lou Whitaker falls immediately off the ballot. You end up with Durdles killing Edwin Drood every once in a while.
With the mess of stats and opinions and awards results and subjective considerations to wade through, it is essential that we have some consistent, objective starting point. You decide on some system you go to for every player, every time, and decide what other factors you want to consider to make adjustments from there, or else you run the risk of reaching biased and inconsistent conclusions and running in circles with everyone who chose B instead of A. Ideally, you want a system that includes both offensive and defensive considerations. I would recommend Sean Smith's WAR database as a starting point, but you can use whatever you are most comfortable with, as long as it is consistent and objective. If you use that, you might like to know that the borderline point (where a player is about a 50-50 shot to get in) is about 55 WAR (I got that figure from Tom Tango's posts on his blog; also, in full disclosure, I also basically stole this rant from Tango), but you can decide what your own standards are. From there, maybe you want to add consideration to peak value, or rate value, or playoff performance, or the effects of injuries or other factors that could affect performance, but you make sure that you give every player you are considering the same courtesy. For example, if you give Dawson credit for losing a lot of his value to bad knees that took a constant pounding on the turf in Montreal, you should probably also note that Richie Allen could never throw quite right after he cut his hand on a headlight and had troubles with his shoulder after Frank Thomas used it for batting practice (thank goodness it wasn't the Frank Thomas who could hit), and that Ron Santo played with diabetes, and that Tim Rainse played through serious impairment at times.
So, no more choosing A or B. Is Andre Dawson a Hall of Famer? It's the same question as 'Is Alan Trammell a Hall of Famer?' or 'Is Bert Blyleven a Hall of Famer?' or 'Is Ray Lankford a Hall of Famer', and they all lead down the same path, toward their various conclusions, not from their various conclusions.
Continue Reading...
On the Dominance of Barry Bonds
Yesterday, Joe Posnanski wrote a piece looking at the best players in baseball over each 5 year stretch since 1970 using Win Shares. Tom Tango noted on his blog that Win Shares gave a skewed perspective because their baseline (which is zero) is biased, and that Win Shares Above Bench would be better. Curious, I decided to see what rWAR had to say on the subject (which, coincidentally, happens to be just what Larry Granillo had done, albeit with a different process, at wezen-ball a day before Joe published his article).
I found something very interesting:
Most WAR over 5-year span:
Bonds was the leader in WAR over the previous 5 seasons for 14 straight years. On top of that, he was second in baseball in WAR from 1987-1991 (to Roger Clemens) and from 2002-2006 (to Albert Pujols). It's easy to remember just how good he was in the early part of the 2000s (exemplified here; he managed to keep his 5-year WAR lead in 2005 despite going to the plate only 52 times that season), but it seems like he never gets the credit he deserves for absolutely dominating the '90s. At the time, everyone held Griffey as the decade's dominant player. Frank Thomas was probably his biggest challenger in most people's minds. Mark McGwire was the the favourite of some for a time later in the decade, and Alex Rodriguez had his fair share of proponents. No one seemed to notice that Bonds was better than them all even back then, and it wasn't even close.
As someone who likes to look over the history of the game as a hobby, I'm constantly amazed to find new things that demonstrate just how ridiculously good Bonds was long before anyone started noticing. This was one of those things. If it surprises you, I highly recommend looking at the things Barry did in the earlier part of his career. Look at his batting, his defense, his baserunning, everything. Compare him to the best of his generation and to the best of all time. If you were around back then, you just might find yourself asking, much like I have, what you could have been thinking back then not to notice?
I guess it's better late than never.
Continue Reading...
I found something very interesting:
Most WAR over 5-year span:
From | To | Leader | WAR |
1988 | 1992 | Barry Bonds | 41.9 |
1989 | 1993 | Barry Bonds | 46.3 |
1990 | 1994 | Barry Bonds | 45 |
1991 | 1995 | Barry Bonds | 42.6 |
1992 | 1996 | Barry Bonds | 45.1 |
1993 | 1997 | Barry Bonds | 43.9 |
1994 | 1998 | Barry Bonds | 42.6 |
1995 | 1999 | Barry Bonds | 40.2 |
1996 | 2000 | Barry Bonds | 41.6 |
1997 | 2001 | Barry Bonds | 43.3 |
1998 | 2002 | Barry Bonds | 46.7 |
1999 | 2003 | Barry Bonds | 47.7 |
2000 | 2004 | Barry Bonds | 56.1 |
2001 | 2005 | Barry Bonds | 47.8 |
Bonds was the leader in WAR over the previous 5 seasons for 14 straight years. On top of that, he was second in baseball in WAR from 1987-1991 (to Roger Clemens) and from 2002-2006 (to Albert Pujols). It's easy to remember just how good he was in the early part of the 2000s (exemplified here; he managed to keep his 5-year WAR lead in 2005 despite going to the plate only 52 times that season), but it seems like he never gets the credit he deserves for absolutely dominating the '90s. At the time, everyone held Griffey as the decade's dominant player. Frank Thomas was probably his biggest challenger in most people's minds. Mark McGwire was the the favourite of some for a time later in the decade, and Alex Rodriguez had his fair share of proponents. No one seemed to notice that Bonds was better than them all even back then, and it wasn't even close.
As someone who likes to look over the history of the game as a hobby, I'm constantly amazed to find new things that demonstrate just how ridiculously good Bonds was long before anyone started noticing. This was one of those things. If it surprises you, I highly recommend looking at the things Barry did in the earlier part of his career. Look at his batting, his defense, his baserunning, everything. Compare him to the best of his generation and to the best of all time. If you were around back then, you just might find yourself asking, much like I have, what you could have been thinking back then not to notice?
I guess it's better late than never.
Continue Reading...