3-D Baseball: Did Wainwright and Carpenter Split the Vote? and other Cy Young Stories

Congrats to Tim Lincecum, who was awarded his second Cy Young in as many seasons yesterday. His selection is intriguing for a number of reasons, in particular the apparent shift it signals in the BBWAA. For the first time ever in a non-shortened season, the voters have dipped their minimum win threshold to as low as 15 to reward a starter for simply out-pitching everyone else in the league. And to top it off, this came a day after resoundingly favouring Zack Greinke's superiority in spite of his only 16 wins.

This shift comes in part, I'd imagine, because of the shifting electorate. Keith Law and Will Carroll, both internet based writers, voted for the first time. Both turned in the only ballots that didn't have the same three names as everyone else. However, Carroll still voted Wainwright #1, and that's just 2 guys. There's more to it. Maybe there are other new voters in the newspaper ranks as well, but I suspect some of the older guard is changing as well. There have always been writers who have been more open to more detailed and less haphazard analysis, but with the advent of FanGraphs and other stat sites, good numbers are easier to come by than ever before, all gathered in one hugely popular place that most voters have probably heard of and even visited. Now those voters are better armed than ever before. There are also those who look at ever-shrinking win totals and the endangerment of the 20-game winner and ask if starting pitchers still have the same extent to influence wins and losses in this age of extensive bullpens and high-powered, deficit-erasing offenses. Those voters have begun to question if 15-7 is really the best measure any longer.

I'd love to write about all that, because it's really the most interesting thing about this vote to me. I'd love to, but, unfortunately, I won't. Two paragraphs are all I can spare right now because, as interesting as that is, everyone I encounter is so damn fixated on painting controversy all over the award.

Maybe I hang around too many Cardinal fans (being one myself). Maybe I shouldn't go near message boards this time of year. Whatever it is, I have heard and read way too much from people who are flat out angry (example of actual message board topic: "Simple Poll: If you could punch Keith Law in the face, would you?" - I'd link it, but it seems to have been deleted). On the one hand, many people are now concluding, as many in the analytical world did long ago, that these awards are meaningless. On the other, rather than just ignoring them, they are yelling and sending hate mail to voters and publishing for others outlets where they can do the same. So how should you, as a reasonable person, address these concerns?

For a lot of people, you shouldn't even bother. Just let them be angry and don't worry about throwing yourself in their path. Some people really do want explanations though, and don't understand how it worked out this way, or really think some unfortunate circumstances robbed their guy of a rightful award. The first, and probably most basic, thing I want to address is the idea that the two St. Louis candidates split the vote, essentially giving Lincecum the award by default. This idea was floated around before the winner was announced and has been repeated a lot since (it was even discussed on MLB.com's live award show-which, by the way, had to be the most anti-clamactic announcement I've ever heard: long corny segment with Captain Morgan mascot, cut to anchor desk, small, forced laugh at Captain Morgan mascot, and, out of nowhere, with no build-up, simply stated as if it were casual discourse, "Tim Lincecum wins the Cy Young." Fine).

This concept has no basis in reality, however. I've yet to see any evidence that voters vote regionally as the theorists claim, but even if they did, this would be impossible. Imagine, for a moment, that voting were entirely provincial, and voters split their allegiance within their own division and voted purely for their guy. What would happen? The NL West has 10 votes, so they all go to Lincecum. Let's assume worst case scenario for the St. Louis two and split the 12 NL Central votes evenly between them. Of course, the 12 Central voters also put the one they didn't vote for 2nd. Assuming the same three are all on each ballot, and the West voters are equally split between Carp and Wainwright for 2nd and 3rd, that leaves us with the following scenario:

	First	Second	Third	Total
Lincecum	10	0	12	62
Carpenter	6	11	5	68
Wainwright	6	11	5	68

This is the most favourable a split of purely regional voting can go to Lincecum, where the other two divide everything as evenly as possible. Assuming the East splits its votes between all 3, there is no way for Lincecum to win a vote where all three are considered equal with the deciding factor being regional splitting of votes, because the Central division has more voters. Whichever between Carp and Wainwright can draw the extra first or second place vote from the East would win.

The three-deep ballot makes such a splitting of votes impossible. If voters are really split between Wainwright and Carpenter because of their regional proximity, then those voters simply put the other guy second and push Lincecum to third, and he makes up no ground by the other two splitting the first place votes.

Of course, we also know that voters don't vote purely regionally. Without comprehensive data on who voted what, it's hard to say exactly how much regionalism appears to affect voting, but I doubt it's at all a significant factor, and anyway, even if it were, it wouldn't matter. The current ballot makes the issue of splitting votes moot.

There's also the issue of fans who feel that the ballots should not go 3 deep, and the most first place votes should win. The basic idea is actually to prevent splitting the vote (so don't let anyone try to use both arguments on you; if they are concerned about splitting of the vote, then extending the ballot past 1 name is essential to deal with that concern). This year, there was likely some splitting of the first place votes along many lines-not regional, but based on evaluation methods. Lincecum and Carpenter split the votes of those who looked past wins. Wainwright and Lincecum split the votes of those who considered IP an important factor. Etc. Lincecum and Carpenter were superior to Wainwright in a lot of important stats, but similar to each other rate-wise, so they split a lot of votes between them. There were more people who voted that felt that Wainwright was the best pitcher, but there were also more people who voted who felt that Lincecum was better than Wainwright than felt Wainwright was better than Lincecum. If you asked the voters, instead of picking one, to choose between those two, Lincecum would win, so is going only by first place votes a better representation of the sentiment of the voters? Should the winner be the guy who got 1 additional first place vote when it's also true that an additional 6 voters felt that guy was the worst of the top 3 candidates?

In practice, it is rare that the most first place votes don't win. The only time it happens is when there are a group of candidates who a lot of voters feel are better than the one who gets the most first place votes, but who split the first place votes among themselves. The last time it happened, Ivan Rodriguez won the 1999 AL MVP while Pedro Martinez carried more first place votes. The voters who considered pitchers more heavily had only one candidate, so Martinez carried all the votes of a minority. The majority of voters considered Pedro's season short of several non-pitchers, but their first place votes were split between multiple candidates. The only other time it happened with the Cy Young, Tom Glavine beat out Trevor Hoffman in 1998 despite Hoffman having 2 more first place votes. Hoffman carried every vote from the voters who weighed relief efforts relatively more than the other voters, whereas those who felt Hoffman's contributions in limited innings fell short of a plethora of other candidates split their first place votes between several starters. As a whole, the voters felt Glavine was better than Hoffman, just as had been the case with Pudge in 1999.

There is also the issue of one-name only voting discouraging voters from putting down the name they truly feel is deserving. Would all of the voters who felt Kevin Brown was the best pitcher in the NL in 1998 have written his name down if they also thought Glavine was easily better than Hoffman, but that Hoffman would win the award over both Glavine and Brown if they didn't put their support into the guy they thought could win the Award? Having only one name per ballot asks voters to choose between voting based on who they think will give the award to the most deserving candidate and voting for who they think is the best pitcher. This also means that it's impossible to know for sure who would have had the most first place votes under a one-name-ballot system. Would any voters who went Carpenter-Lincecum have looked at Carp's innings, thought, no way he gets more votes than Lincecum, and then put down Lincecum's name as the most deserving candidate between him and Wainwright? No way to know, but it's feasible, so it's impossible to consider Wainwright a guaranteed winner under that system anyway. You can't collect the votes under one set of rules and then change the rules to decide what the votes really said. Of course, this is seldom if ever raised as an issue exept when done so by someone complaining that it robbed his/her favoured candidate. People seem to be searching for arguments against the result by attacking the method rather than arguing against the method itself.

What really seems to set people off is the matter of Vazquez and Haren being on ballots at all. People immediately began deploring their least favourite voters, thinking it could only have been that idiot who did this. THT noted that John Heyman blasted "dumb sportswriters" on his Twitter for their omission of Carpenter, only to later back off and apologize, as commenter Zach Sanders wrote, once he found out that one of them was Will Carroll. Even other BBWAA voters, even ones who consider themselves statistically versed (including one who cited day/night splits as a reason for voting Carpenter over Lincecum), have spoken out against their colleagues here and implied that they don't belong in the process.

I don't really know what to say to someone so adamantly opposed to differing opinions, except to show them how it is reasonable to think Dan Haren and Javier Vazquez were in a class with the other top pitchers in the league (besides Lincecum, anyway). I wrote privately my picks with some explanation nearly a month ago, and to be honest, I'm shocked at the uproar against anyone for putting one of those two in the top 3. I rated them as much closer to Wainwright/Carpenter than either of those guys were to Lincecum, so outrage for 2 votes for Haren or Vazquez over Carpenter or Wainwright and nothing but total apathy toward Wainwright getting 12 first place votes baffles me. It really does. I don't have the most faith in fans or writers when it comes to analyzing players, but I just don't understand the venom. Here's what I wrote last month:

NL Cy Young:

1. Tim Lincecum
2. Chris Carpenter
3. Javier Vazquez

Despite what all the pundits are saying about this being as tight as a race could possibly be (and maybe they're right as far as the actual voting goes, I don't know), the NL race also has a clear winner to me, although not as clear as in the AL, and then a pack bunched behind him. Carpenter may have made a run at Lincecum if he hadn't gotten hurt, but with over a 30 IP difference between them, Lincecum pulls well ahead.

I'm sure you want to know how I can put Vazquez ahead of Wainwright. There were 4 pitchers I considered behind Lincecum: Carpenter, Wainwright, Vazquez, and Haren. For all of them (and Lincecum), I looked at their pitching from several perspectives of run prevention, including both traditional perspectives (ERA and RA) and defense-independent perspectives (FIP from FanGraphs, xFIP from THT, and tRA and tRA* from StatCorner). For each perspective, I considered pure rate production (an expected winning percentage unweighted by IP) as well as total production (W% converted to wins based on IP, both above average and above replacement). ERA, RA, and FIP were also park-adjusted using Baseball-Reference's multi-year pitcher park factor for each players' home park (the others already have adjustments that neutralize park to a large extent). I also looked at WPA and WPA/LI with the wins above average figures.

Overall, the things that stuck out were:

-Lincecum was clearly ahead of the pack whether I looked at rate production, wins above average, or wins above replacement.

-Carpenter's rate production was closest to Lincecum's. He was dead even rate-wise with Vazquez in just the defense-independent stats, but he pulled away in the traditional stats. However, Carpenter's win value production, especially above replacement (IP have more weight above replacement than above average) fell back to the pack. He was about even with Wainwright in WAR production just behind Vazquez and Haren, and either slightly ahead of or slightly behind Vazquez for second in WAA depending on how I averaged the different perspectives.

-Haren picked up the most ground from the park adjustment. Pitching in Chase Field probably hurt him quite a bit in most people's eyes because his raw stats aren't as good as he actually pitched.

-Wainwright, and this is the kicker, was consistently at the back of the pack no matter what I looked at. Rate wise, WAR, WAA (both with and without the WPA stats) all had his production rated at the bottom of the group. He was really good this year, but if you just look at how well he pitched and not just at his traditional numbers, he wasn't quite as good as the best pitchers in the league. When everything I was using to evaluate each pitcher was pointing to Vazquez having pitched better than Wainwright, I just couldn't write him in ahead of Vazquez.

And that's all I can really say to people who question selections of Vazquez or Haren as damning one to unworthy idiocy. When I look at each pitcher's production, I just don't see it that way. It's certainly close enough that it shouldn't incite this kind of reaction. Also, looking back on my work now (not what I wrote above, but where I worked it out), I may have given too much consideration to rate production. I'm still undecided on how to handle "best" in terms of rate (regressed to some extent for those with lower playing time) vs. pure value. Point being, I can easily see Vazquez ahead of Carpenter for second. In fact, if you are most concerned with WAR, I can see any of them ahead of Carpenter. But uproar over Vazquez or Haren over Carpenter and apathy toward Wainwright ahead of everybody? I can't see it. If you are willing to concede that methods can legitimately differ enough to produce any order of Lincecum, Carpenter, and Wainwright, how can a method that puts Vazquez or Haren in the top three be so far off as to be grounds for expulsion from the process?

2 comments:

Unknown said...: You'll be glad to know that Viva El Birdos was quite rational about the whole Cy Young thing, and most of us supported Lincecum.

http://www.vivaelbirdos.com/2009/11/20/1166269/joe-jackson-talks-briefly-to-a
http://www.vivaelbirdos.com/2009/11/22/1167610/the-cy-young-stats; November 24, 2009 at 11:26 AM
Kincaid said...: Indeed, VEB can be counted on for a rational approach.

I like those two articles. Good, concise explanations of the pros and cons of the major valuation tools. I never saw the twist coming in the Shoeless Joe story. Perhaps a revised cut of Eight Men Out is in order.; November 24, 2009 at 8:34 PM

3-D Baseball

Did Wainwright and Carpenter Split the Vote? and other Cy Young Stories

2 comments:

Post a Comment

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive