Baseball fans have always been infatuated with the strikeout when it comes to pitchers. This infatuation has only strengthened in the past couple decades as statistical work has found pitchers to have far less control over what happens to a ball once it's put in play than previously thought. Current baseball statistics such as DIPS and FIP ("Defense Independent" and "Fielding Independent" ERA substitutes) rely heavily on a pitcher's K-rate to determine his value. There's an awful lot of good and not really much bad that can come from a strikeout, unlike a ball put in play, be it a ground ball, line drive, or fly ball. Hard to imagine anyone would not love them, isn't it? There is, however, a pretty strong sentiment among those in the game and those who follow the game that strikeouts carry a significant downside that can even outweigh their benefit at times. Namely, these people are concerned about the effects of high strikeout rates on pitch counts.

The argument goes that strikeouts require more pitches than outs on balls in play. After all, your fielders can convert an out for you on only one pitch, while a strikeout demands a minimum of 3. This argument is most often extended anecdotally to young pitchers who have yet to "learn to pitch" and, because of their inexperience, ego, stupidity, or any combination thereof, take themselves out of games early by running up their pitch counts chasing strikeouts when they should be inviting the batter to hit the ball.

This seems to make a lot of sense. So too, however, did the general notion that good pitchers as a rule were better at inducing outs on balls in play, so perhaps we better check into this theory as well. As it turns out, the two theories are somewhat linked. The assumption that contact plate appearances are better for pitch counts than strikeouts is heavily dependent on the rate at which outs are converted on contact plate appearances. If you can get a hitter to put the ball in play on one pitch, that's great, but if it takes 3 balls in play to record 1 out, then you might as well have just struck the guy out.

Obviously, it does not usually take 3 balls in play to get 1 out, but neither does it usually take 1 pitch to induce a ball in play. The question is, can contact be induced in few enough pitches to offset the number of balls in play it takes to record an out? We are concerning ourselves with pitches per out because the issue with pitch counts is how deep into the game a pitcher can go, and we measure that quantity in innings, or, in other words, in outs. Therefore, the primary concern is how efficiently a pitcher uses his pitches to record outs. This question is central to the issue of how strikeouts and contact PAs affect pitch counts, yet it is largely ignored or taken for granted. The answer is yes, it is more efficient to get outs by inducing contact, but just barely.

Since 2000, pitchers have used .15 more pitches per out to get strikeouts than to get an out on a ball put in play. In recent years, the gap is even smaller (about .10 over the past few years). The reason the difference is so close is that the number of balls in play required to convert an out, on average, very nearly offsets the drop in pitches required to induce contact versus striking a batter out.

Over the span considered, the league BABIP is just under .300. However, that figure excludes home runs for the simple reason that they are dealt with separately for most purposes: since we are only concerned with the conversion-rate of outs once the ball hits the bat, we have no reason to exclude HR, and the rate at which hits are recorded once the ball is hit rises to .325. This also excludes the error rate, which further reduces the probability an out is converted on a ball once it is hit. Of course, we must also add in the possibility for the double play, since 2 outs on one pitch are more efficient than just 1. Considered all together, these factors are just enough that it doesn't make much difference in your pitch count if you strike out more batters or get more outs on balls in play.

To illustrate this point, the following graphs show every pitching season since 2000 with at least 100 IP measuring the pitcher's pitches thrown/9 IP against his K/9 rate and against his H/9 rate. Keep in mind that since most pitchers allow very close to the league average BABIP over significant samples, hit rates are largely dependent on the rate of balls in play a pitcher allows, so we're basically looking at how pitch counts relate to high contact pitchers and to low contact pitchers:

As our data suggests, there is no emerging pattern that either high strikeout pitchers or high contact pitchers require more pitches to get through 9 innings. For the most part, it doesn't matter how frequently a Major League pitcher strikes out hitters or how frequently he allows hits as to how many pitches he has to throw. The trade offs of one compared to the other mostly cancel out.

What, then, is the primary determinant in a pitcher's pitch count? It's walks, of course. They are the one event that costs pitches with no chance of an out, and it shows. Look at the graph comparing pitchers' pitches/9 IP to their BB rates:

Now here is a clear pattern. Simply put, by far the strongest factor in how many pitches a pitcher needs to get through an inning is how frequently he walks batters. Almost every time you hear a broadcaster or analyst talk about how a pitcher throws too many pitches because he strikes out too many hitters, the real reason will be because the pitcher walks too many hitters, not that he strikes out too many (unless, of course, the personality has not checked the facts and is commenting on a pitcher who doesn't even throw more pitches per inning than normal). That's why pitchers like Roy Halladay, CC Sabathia, Cole Hamels, Johan Santana, Josh Beckett, Rickey Nolasco, and even Tim Lincecum were all taking fewer pitches than average to complete each inning in 2008 and pitchers like Barry Zito, Tom Gorzelanny, and Miguel Batista were all taking more pitches than average. It has nothing to do with the strikeouts. It's the walks.

Of course, there is some relationship between walks and strikeouts, but not enough that you can look at a pitcher's K-rate and tell all that much about his walk rate. The correlation of a pitcher's walk rate and K-rate is actually very weak-less than .1, which means if you know a pitcher's K-rate, you only really have about 10% of the information needed to estimate his BB-rate. An individual pitcher's K- and BB-rates will generally rise and fall together, but how those rates relate to each other varies widely from pitcher to pitcher. So while a pitcher with poor control may find it more effecient to cut down on his walks at the potential expense of some of his strikeouts, the strikeouts are really not the culprit, and a pitcher who already has good control has no reason to cut down on his strikeouts purely to reduce his pitch count, as he generally won't see any noticeable results.

Continue Reading...

### Splitting Hairs: Park Effects vs. Home/Away

One of the easily overlooked but essential factors in player evaluation is the consideration of park effects on player stats. Every park in the Majors affects stats differently, and to truly compare players on an even plane, we must account for this, especially for players in the most extreme parks. The problem, and likely one of the main reasons park effects can at times be overlooked, is that these effects are very difficult to get a good grasp on. We know Coors inflates hitting stats, but by how much, and which stats are most effected? These are difficult adjustments to make intuitively, so too often they are just skipped entirely, or thrown in as an afterthought when fans or mainstream analysts discuss numbers.

On the other hand, there are many who fall prey to another, potentially worse trap in evaluating park effects. They use a player's home/away splits to estimate the effects of the home park on a player's numbers and how his numbers would translate to another park.

There are many downfalls to this approach. The most obvious is that inherent in home/away splits is a home-hitter bias independent of park. Since 2000, hitters have put up an OPS of .774 at home and .745 on the road. So you're going to expect to see a .030 point difference on average even in a completely neutral park, and for some reason, this is rarely accounted for properly by fans who use this method. The assumption is all too often that a hitter's road line is what you'll get from him in a neutral park. In light of the clear home-hitter bias, this is obviously not the case.

Furthermore, there exists the possibility that a particular player will naturally exhibit a stronger home-away split for whatever reason. Some hitters may be more comfortable hitting at home than normal or more affected by traveling to road parks. An exaggerated home-away split does not isolate the home park as the primary factor, and it is common to see players on the same team post widely varying degrees of home-away splits, which should not be the case if these splits were heavily dependent on park effects.

There are also some indications that at least some parks, particularly those with the most extreme hitting environments, can actually suppress road production in addition to boosting home production. For example, pitches break differently at Coors because of the thinner air at altitude, so hitters who tune their swings to how pitches break in Coors will find the ball slightly more difficult to center in other parks than they would if they did not play half their games in Colorado (this phenomenon shows in Colorado's team line-drive rates, as well as in their opponents' rates in Coors).

The fact is, there are too many factors in play in home-road splits to truly estimate park effects. These other factors can be accounted for, but the reason home/away splits seem to be so popular is the ease with which they can be found and reported, and some of these adjustments are beyond the scope of what can be reasonably expected from most fans. Even if you do attempt them, they could take as much or more work than just using park factors to make your adjustments.

We have still not gotten to perhaps the biggest issue with using a player's home/away splits. Even assuming you have properly adjusted for the other issues with using a player's splits, you are invariably going to run into sample size issues. A full season's batting data is already pushing the lower limits of the number of at bats you'd like to see. By going to a player's splits, you are more than cutting your sample in half. You will commonly be dealing with AB totals under 250. To get an idea of the uncertainty present in such small samples, a .300 hitter will hit either under .270 or over .330 about 30% of the time over 250 ABs. You simply cannot garner much meaningful information from a player's home/away splits without at least using several years worth of data, even once you've accounted for everything else.

Take, for instance, Brad Hawpe's 2006 season, when he had over 270 PAs in both his home and away samples. His OPS was .144 points higher on the road, away from Coors. No one would have argued that he would hit .144 points better in OPS if he were traded to a neutral park. Or, consider the famously consistent Albert Pujols. In 2007, he hit .211 OPS points higher on the road. In 2008, he hit .119 points better at home. It's obvious that these data points are not reliable indicators of park effects, but fans and analysts routinely take equally unreliable single-season splits and use them to project severely overblown adjustments to players.

Of course, I can cherry pick outlier seasons to make my point just like any other fan does to make his, so if the above paragraph contained a bit too much anecdotal evidence for your liking, I'm with you. Don't worry. I've got more. Namely, I've got every set of paired seasons since 2000 where a player had at least 200 ABs both home and away for the same team two years in a row (there are 883 of them). Using this data, we can compare a player's splits one year to his splits the next and see how reliable they are. The following graph plots each player's splits one year on the X-axis and his splits the following year on the Y-axis. If the splits are reliable, we should see a definite line forming going from the lower left to the upper right.

As you can see, there is no such pattern here. The data is pretty weakly correlated (r=.21), indicating that a player's home/away splits one year are not a very good indicator of what they will be the next year. Home/away splits are simply too volatile to draw any reliable conclusions from for single players.

If we break down our sample to players who hit at least .100 OPS points higher at home, 41% of these players showed splits of .030 points or lower the following season. Of players who hit at least .150 points higher at home, 38% showed a split of .029 points or lower the next year. This, as we saw earlier, is the average home/away split in Major League Baseball over this time period. If we include all players who returned to within a standard deviation of the average, the percentages rise to 45% and 40%. Of the players who showed splits greater than .150 points, 63% cut that split in half or greater the following year. Players who show very high splits one year are still somewhat more likely to be higher than average playing for the same team the following year, but not nearly enough to tell anything significant by it.

When it comes to adjusting for park effects, player home/away splits just don't cut it as substitutes for park factors. They present a certain allure - they're easy to look up and easy to understand, and they sound convincing in their own pseudo-logical way - and for that they've become quite popular. To get a reliable grasp on park effects on a player's numbers, however, you need to be able to look beyond such crude methods. Ideally, find some park factors, or look up park-adjusted stats. In a lot of cases, you would be better off just not adjusting at all than taking the splits at face value. There are times when home/away splits may be appropriate, but make sure you have a good reason for using them over park factors or park-adjusted stats, make sure you are making the proper adjustments, and make sure you are looking at a large enough sample using several years, or even aggregates of several players if need be. Of course, when you do it right, home/away splits start to resemble park factors more and more. They essentially become a method of calculating park factors. But now we're just splitting hairs.

Continue Reading...

On the other hand, there are many who fall prey to another, potentially worse trap in evaluating park effects. They use a player's home/away splits to estimate the effects of the home park on a player's numbers and how his numbers would translate to another park.

There are many downfalls to this approach. The most obvious is that inherent in home/away splits is a home-hitter bias independent of park. Since 2000, hitters have put up an OPS of .774 at home and .745 on the road. So you're going to expect to see a .030 point difference on average even in a completely neutral park, and for some reason, this is rarely accounted for properly by fans who use this method. The assumption is all too often that a hitter's road line is what you'll get from him in a neutral park. In light of the clear home-hitter bias, this is obviously not the case.

Furthermore, there exists the possibility that a particular player will naturally exhibit a stronger home-away split for whatever reason. Some hitters may be more comfortable hitting at home than normal or more affected by traveling to road parks. An exaggerated home-away split does not isolate the home park as the primary factor, and it is common to see players on the same team post widely varying degrees of home-away splits, which should not be the case if these splits were heavily dependent on park effects.

There are also some indications that at least some parks, particularly those with the most extreme hitting environments, can actually suppress road production in addition to boosting home production. For example, pitches break differently at Coors because of the thinner air at altitude, so hitters who tune their swings to how pitches break in Coors will find the ball slightly more difficult to center in other parks than they would if they did not play half their games in Colorado (this phenomenon shows in Colorado's team line-drive rates, as well as in their opponents' rates in Coors).

The fact is, there are too many factors in play in home-road splits to truly estimate park effects. These other factors can be accounted for, but the reason home/away splits seem to be so popular is the ease with which they can be found and reported, and some of these adjustments are beyond the scope of what can be reasonably expected from most fans. Even if you do attempt them, they could take as much or more work than just using park factors to make your adjustments.

We have still not gotten to perhaps the biggest issue with using a player's home/away splits. Even assuming you have properly adjusted for the other issues with using a player's splits, you are invariably going to run into sample size issues. A full season's batting data is already pushing the lower limits of the number of at bats you'd like to see. By going to a player's splits, you are more than cutting your sample in half. You will commonly be dealing with AB totals under 250. To get an idea of the uncertainty present in such small samples, a .300 hitter will hit either under .270 or over .330 about 30% of the time over 250 ABs. You simply cannot garner much meaningful information from a player's home/away splits without at least using several years worth of data, even once you've accounted for everything else.

Take, for instance, Brad Hawpe's 2006 season, when he had over 270 PAs in both his home and away samples. His OPS was .144 points higher on the road, away from Coors. No one would have argued that he would hit .144 points better in OPS if he were traded to a neutral park. Or, consider the famously consistent Albert Pujols. In 2007, he hit .211 OPS points higher on the road. In 2008, he hit .119 points better at home. It's obvious that these data points are not reliable indicators of park effects, but fans and analysts routinely take equally unreliable single-season splits and use them to project severely overblown adjustments to players.

Of course, I can cherry pick outlier seasons to make my point just like any other fan does to make his, so if the above paragraph contained a bit too much anecdotal evidence for your liking, I'm with you. Don't worry. I've got more. Namely, I've got every set of paired seasons since 2000 where a player had at least 200 ABs both home and away for the same team two years in a row (there are 883 of them). Using this data, we can compare a player's splits one year to his splits the next and see how reliable they are. The following graph plots each player's splits one year on the X-axis and his splits the following year on the Y-axis. If the splits are reliable, we should see a definite line forming going from the lower left to the upper right.

As you can see, there is no such pattern here. The data is pretty weakly correlated (r=.21), indicating that a player's home/away splits one year are not a very good indicator of what they will be the next year. Home/away splits are simply too volatile to draw any reliable conclusions from for single players.

If we break down our sample to players who hit at least .100 OPS points higher at home, 41% of these players showed splits of .030 points or lower the following season. Of players who hit at least .150 points higher at home, 38% showed a split of .029 points or lower the next year. This, as we saw earlier, is the average home/away split in Major League Baseball over this time period. If we include all players who returned to within a standard deviation of the average, the percentages rise to 45% and 40%. Of the players who showed splits greater than .150 points, 63% cut that split in half or greater the following year. Players who show very high splits one year are still somewhat more likely to be higher than average playing for the same team the following year, but not nearly enough to tell anything significant by it.

When it comes to adjusting for park effects, player home/away splits just don't cut it as substitutes for park factors. They present a certain allure - they're easy to look up and easy to understand, and they sound convincing in their own pseudo-logical way - and for that they've become quite popular. To get a reliable grasp on park effects on a player's numbers, however, you need to be able to look beyond such crude methods. Ideally, find some park factors, or look up park-adjusted stats. In a lot of cases, you would be better off just not adjusting at all than taking the splits at face value. There are times when home/away splits may be appropriate, but make sure you have a good reason for using them over park factors or park-adjusted stats, make sure you are making the proper adjustments, and make sure you are looking at a large enough sample using several years, or even aggregates of several players if need be. Of course, when you do it right, home/away splits start to resemble park factors more and more. They essentially become a method of calculating park factors. But now we're just splitting hairs.

Continue Reading...

### King of the ʞastle

Matthew Carruth made a post at Fangraphs recently regarding the correlation between swinging strikes a pitcher gets on his fastball and his K-rate and that between called strikes on the fastball and K-rate. Not surprisingly, it's the pitchers who can make hitters miss who tend to rack up the strikeouts. But what about those pitchers who like to live on the corners? The control artists who somehow manage to rack up strikeouts while, to the untrained eye, throwing what amount to BP fastballs? Surely these exceptions exist. We've all seen them, been baffled by them, been tricked into thinking we could hit Major League pitching by them. They have to exist, don't they?

Of course they exist. And there's one who stands above them all.

To find the pitchers who most excel at the called strikeout, we turn to Retrosheet's pitch sequence data to categorize each strikeout. Starting in 1988, we have most of this data available: from 1988 to 1999, 4.5% of strikeouts are left uncategorized, and the data from 2000 on is complete. Over this time, 70.4% of categorized strikeouts came on swings and misses, much like Mr. Carruth would have expected. 26.9% came on called strikes, and 2.1% came on caught foul tips. A handful came on other events, such as missed or fouled bunt attempts or swing attempts on pitchouts to protect a hit and run. This gives us an average ratio of called strikeouts to swinging strikeouts of .382.

It should come as no surprise that the pitcher with the most called strikeouts since 1988, in our partial sample, is Randy Johnson with 1231. He has, after all, struck out more hitters than anyone in the game over that period, and it's not even close. Roger Clemens is in a similar position of having a vastly greater amount of strikeouts than anyone below him on this list; he comes in at #3 on this list with 1202. The guy between them, with a full 1539 Ks fewer than the Unit and 728 Ks fewer than the Rocket, is Greg Maddux, with 1227 called strikeouts. Keep in mind that the 4 strikeout difference between Johnson and Maddux is nothing compared to the 4.5% of strikeouts from the late '80s and '90s that are missing from these calculations, and if we look at the number of uncategorized strikeouts each had and their called-strikeout rates, there's a 78% chance Maddux actually has as many called strikeouts as Johnson since 1988, and a 73% chance he has more. So we can say with reasonable confidence that Greg Maddux has more called strikeouts than any pitcher since 1988.

After these three above 1200, only two pitchers had more than 800 called strikeouts: Mussina at 902 and Glavine at 810. By sheer volume and ratio combined, Maddux is the clear king of the called strikeout, at least since 1988. The only pitchers who can touch his total are far behind him in called:swinging ratio, and the only pitchers who are in a league with Maddux in C:S ratio are nowhere near his total.

By C:S ratio alone, however, there are a handful of pitchers better than Maddux at keeping bats on shoulders that could have otherwise likely hit the ball. Maddux' ratio of .656 is still the 7th best among pitchers with at least 1000 strikeouts, but the top spot this time belongs to John Burkett. Burkett pitched for 5 teams in 14 full seasons and racked up 1766 strikeouts as a 2-time All Star. In our sample, Burkett struck out .838 batters looking for every 1 he struck out swinging, a full .100 better than the next pitcher on the list (Mark Garnder, with a ratio of .737 in 1256 Ks). From Gardner, there's another big drop to Mike Morgan at .676 in 1090 Ks, and then he, Rick Helling, Bartolo Colon, and Estoban Loaiza are pretty tightly packed in Maddux' range. Like I said, no one of Maddux' notability here. Burkett's 758 called strikeouts are actually 6th, right after Glavine, in our sample, but no one else here has even half of Maddux' called strikeout total.

If we set the cutoff at 2000 Ks to focus on the true strikeout pitchers, Maddux is the clear winner again. The closest to his .656 ratio is Mussina at .524. Glavine, at .507, is the only other pitcher here with at least half as many called strikeouts as swinging strikeouts. For comparison, Clemens' ratio is .458 and Johnson's is .363. Clearly, this generation hasn't seen another strikeout artist in the mold of Greg Maddux.

While Greg could certainly miss his fair share of bats (he's still 6th since 1988 in swinging strikeouts), there is no one else in recent memory who could fool hitters in the zone like he could. With his devastating late movement and complete command of every pitch he threw, hitters were damned if they did and damned if they didn't with Maddux on the mound. With his ability to make hitters walk back to the dugout without so much as a wave at the ball, he's raised more money for kangaroo courts across the NL than anyone in the history of the game, a record as untouchable as Cy Youngs 749 complete games, inflation be damned. In short, he's the best there ever was. Since 1988.

Continue Reading...

Of course they exist. And there's one who stands above them all.

To find the pitchers who most excel at the called strikeout, we turn to Retrosheet's pitch sequence data to categorize each strikeout. Starting in 1988, we have most of this data available: from 1988 to 1999, 4.5% of strikeouts are left uncategorized, and the data from 2000 on is complete. Over this time, 70.4% of categorized strikeouts came on swings and misses, much like Mr. Carruth would have expected. 26.9% came on called strikes, and 2.1% came on caught foul tips. A handful came on other events, such as missed or fouled bunt attempts or swing attempts on pitchouts to protect a hit and run. This gives us an average ratio of called strikeouts to swinging strikeouts of .382.

It should come as no surprise that the pitcher with the most called strikeouts since 1988, in our partial sample, is Randy Johnson with 1231. He has, after all, struck out more hitters than anyone in the game over that period, and it's not even close. Roger Clemens is in a similar position of having a vastly greater amount of strikeouts than anyone below him on this list; he comes in at #3 on this list with 1202. The guy between them, with a full 1539 Ks fewer than the Unit and 728 Ks fewer than the Rocket, is Greg Maddux, with 1227 called strikeouts. Keep in mind that the 4 strikeout difference between Johnson and Maddux is nothing compared to the 4.5% of strikeouts from the late '80s and '90s that are missing from these calculations, and if we look at the number of uncategorized strikeouts each had and their called-strikeout rates, there's a 78% chance Maddux actually has as many called strikeouts as Johnson since 1988, and a 73% chance he has more. So we can say with reasonable confidence that Greg Maddux has more called strikeouts than any pitcher since 1988.

After these three above 1200, only two pitchers had more than 800 called strikeouts: Mussina at 902 and Glavine at 810. By sheer volume and ratio combined, Maddux is the clear king of the called strikeout, at least since 1988. The only pitchers who can touch his total are far behind him in called:swinging ratio, and the only pitchers who are in a league with Maddux in C:S ratio are nowhere near his total.

By C:S ratio alone, however, there are a handful of pitchers better than Maddux at keeping bats on shoulders that could have otherwise likely hit the ball. Maddux' ratio of .656 is still the 7th best among pitchers with at least 1000 strikeouts, but the top spot this time belongs to John Burkett. Burkett pitched for 5 teams in 14 full seasons and racked up 1766 strikeouts as a 2-time All Star. In our sample, Burkett struck out .838 batters looking for every 1 he struck out swinging, a full .100 better than the next pitcher on the list (Mark Garnder, with a ratio of .737 in 1256 Ks). From Gardner, there's another big drop to Mike Morgan at .676 in 1090 Ks, and then he, Rick Helling, Bartolo Colon, and Estoban Loaiza are pretty tightly packed in Maddux' range. Like I said, no one of Maddux' notability here. Burkett's 758 called strikeouts are actually 6th, right after Glavine, in our sample, but no one else here has even half of Maddux' called strikeout total.

If we set the cutoff at 2000 Ks to focus on the true strikeout pitchers, Maddux is the clear winner again. The closest to his .656 ratio is Mussina at .524. Glavine, at .507, is the only other pitcher here with at least half as many called strikeouts as swinging strikeouts. For comparison, Clemens' ratio is .458 and Johnson's is .363. Clearly, this generation hasn't seen another strikeout artist in the mold of Greg Maddux.

While Greg could certainly miss his fair share of bats (he's still 6th since 1988 in swinging strikeouts), there is no one else in recent memory who could fool hitters in the zone like he could. With his devastating late movement and complete command of every pitch he threw, hitters were damned if they did and damned if they didn't with Maddux on the mound. With his ability to make hitters walk back to the dugout without so much as a wave at the ball, he's raised more money for kangaroo courts across the NL than anyone in the history of the game, a record as untouchable as Cy Youngs 749 complete games, inflation be damned. In short, he's the best there ever was. Since 1988.

Continue Reading...