3-D Baseball: A Note About Park Factors

The year was 1996. Baseball was riding high again after the ugly strike that had brought the game screeching to a halt just a short year and a half earlier. Cal Ripken had broken Lou Gehrig's long-standing record the year before. Mark McGwire was just starting to belt home runs like no one had ever seen (he missed 32 games that year and still managed to top 50 for the first time in his career), and no one had yet figured out why. Most importantly for a 10 year-old Cardinal fan, an aging Wizard was making his last rounds before bowing out of the game, and he was doing it for a playoff contender at that. This was the setting in which I hit the road with 3 of my closest relatives in the brotherhood of baseball, 2 generations of biological brothers on the road from St. Louis to Cincinnati, and then onward throughout the northeastern quadrant of the country wherever there was baseball.

We saw the Monster of Fenway, the obtuse angles of the outfield wall, the Pesky pole. We saw the ivy of Wrigley, the bleachers, the crowds on Waveland Avenue. The canopies of old Tiger Stadium. The friezes in Yankee Stadium, the grandstands of Doubleday Field. The visible seams in the turf at the SkyDome. We mercifully skipped over Shea. My brother, a year older but just as wide-eyed, and I learned why they were called cathedrals. My dad and uncle saw for the first time their temples of Mecca jump out from staticky TV sets into tangible form beneath their feet.

Younger still, I learned something about the field of play that sets baseball apart. For 127 feet, 3⅜ inches from home plate, in a 90 foot square, everything identical. It's meticulously laid out in the rule book with no room for error. Even the baseball itself is more loosely regulated than the diamond within these dimensions. But beyond that, you can do whatever the hell you want. Like my first Little League field, which had no outfield fence but some tricky playground equipment to navigate in play in left field. Anything that rolled into the woods in right or across the street past the playground and into the neighbor's farm was assumed to be a home run simply because no one wanted to chase a ball that far. Seeing in person the lots these players roamed as we traveled across the country awakened me to a greater truth of baseball: we weren't just too poor for a real field. We were partaking in one of baseball's grand traditions.

All this is to say that baseball exhibits a wide variety of contexts, much more so than other mainstream sports in this country. To a statistician like myself, this can present some problems. What a hitter does in Coors is not the same as what a hitter who puts up the same numbers in Petco does. For this reason, park factors are one of the most important adjustments we have to make to raw statistics, to the point that any advanced metric that wishes to be taken seriously must include them.

Fortunately, there are places that keep track of basic park factors, and these can be very useful as long as you know how to use them. For example, you may want to know how Matt Holliday's success in Coors would translate into Oakland's park. Or how Adrian Gonzalez would hit had he not screwed up his karma and found himself banished to the caverns of Petco. The problem is, it is not always entirely clear how to use these factors to adjust stats.

One place to find simple park factors is on ESPN.com. They list the park factors for a number of basic statistics along with the formula they use:

PF = ((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG))

This is the basic idea behind all park factors: compare what happens in a team's park to what happens in every other park. You can use this formula (replacing runs scored with whatever stat you want to look at, of course) to calculate your own park factor for any statistic you would like.

So how do you use these park factors? Say, you want to know what factor of 1.12 means. Basically, that means that the stat you are looking at is increased by a factor of 1.12 in that park compared to other parks in the league. So if a player hits 40 home runs in a park with a home run factor of 1.12, you would estimate that he likely got 12% boost in the home runs he hit at home, and if he played in a neutral park, he probably would have hit fewer. To account for an increase of 12%, you would divide the total by 1.12.

However, you have to also keep in mind that this type of park factor only tells you about the boost a player gets at home. He also plays half his games on the road, which we assume is a roughly neutral environment, on the whole. So if a hitter plays half his games in a 1.12 environment and the other half in a 1.00 (which would be perfectly neutral) environment, then we would want to average these to find the effect on his overall numbers. Which means the hitter played in an overall environment that increased his totals by 6 percent rather than the 12 percent shown in the park factor.

So with ESPN's park factors, or with any you might calculate using the same formula, you have to be careful in applying them to a player's season numbers because the PF only accounts for what he does in his home games. You have to average this with a neutral environment to account for the fact that he plays half his games on the road.

Another site that publishes park factors, and one of the best sites for baseball statistics in general, is Baseball-Reference. On every team page, there are a few different park factors listed. Unlike ESPN, they only list run scoring park factors, but also unlike ESPN, they adjust for a number of problems with the simple formula shown above.

As you will quickly notice, there are both one-year and multi-year park factors. One problem with these calculations is that they rely on observational data which can be prone to random fluctuation to estimate the effect of the park. Which isn't really a problem; in fact, it's more or less what all statistics are. But, like with all statistics, having more data to look at improves the accuracy, and by looking at 3 years of data instead of 1, you can get a better park factor. Which B-R does and ESPN doesn't. So that's one difference. Another improvement B-R makes is that it looks at runs per 27 outs rather than runs per game. If a team plays a number of extra inning games on the road, or has a number of games at home where they don't play the bottom half of the 9th, the number of innings played on the road or at home can vary enough to throw off your park factor by a small amount. Using outs (or innings) in the denominator instead of games accounts for this: an out or an inning is a fixed unit of length in baseball. A game is not. A third improvement takes into account that the road environment is not always neutral. 1.00 is a neutral park based on the league average. However, if an NL park has a factor of 1.12, then the other 15 parks will average to a bit less than 1.00, because adding in the 1.12 from the 16th park will bring the average up to 1.00. Instead of averaging 1.12 with a road environment of 1.00, you might have to average it with a road environment of .99. It's a small difference, but for extreme parks, it can matter.

Basically, these are all improvements on the previous type of park factor. So far, there is nothing that would cause you to use the park factor any differently. However, B-R makes two further adjustments that do make a difference in how you use them, or rather, one adjustment that changes how you use them and another you need to be aware of, because it is doing something completely different. The first is that it already averages the home and road environments of a team. So there is no need to cut the PF in half like with ESPN's. This is one reason that the PFs listed on B-R are generally smaller than on ESPN. So when you see a park factor of 112, you would actually divide the season numbers by 1.12, not 1.06.

The other adjustment has nothing to do with the park, but rather another component of team context. The quality of pitching a hitter faces is not the same as the quality of the league's pitching as a whole, because the league as a whole includes the hitters teammates, while the group of pitchers he faces does not. So if a hitter is on a team with a very good pitching staff, his numbers get a slight boost from not having to face those good pitchers. B-R's park factors account for this, which is slightly misleading, because it has nothing to do with the park. But it is another important adjustment to make, and since it's used the same way, lumping it with the park factor makes sense. This is also why there are separate PFs for hitters and pitchers. The effect of the park is the same, but accounting for the quality of hitters the pitchers face and accounting for the quality of pitchers the hitters face produce different results.

You can also calculate this type of park factor for any statistic you like, but it is much more complicated. The methodology is outlined in more detail on B-R's site.

To give an example, I calculated an ISO park factor for the Ballpark at Arlington from the years 2001-2003. The final park factor was 1.04, meaning Rangers hitters in those years saw a 4% increase in ISO from playing for the Rangers. The park factor, using the simple method above, was 1.13. The adjustments for improving that park factor give a more accurate figure of 1.12. This number would be used the same way as the ESPN park factors: average it with a neutral context and then divide the player's ISO by the final figure. Using B-R's methodology, we do this anyway, and end up with a park factor of 1.06. That is the effect of the park. However, we also further account for the fact that Texas' pitchers sucked, and that the Rangers had the poor fortune of not getting to face these pitchers. That adjustment knocks 2% off their ISO, so we drop our final number down to 1.04. So the adjustments to the numbers would go as follows:

due to park: divide by 1.06
due to quality of pitchers faced: divide by .98
combined effect of both: divide by 1.04

Basically, the most important things to keep in mind when dealing with park factors:

-you divide the stat by the park factor to neutralize the context
-with ESPN's PFs, or others calculated similarly, you first have to halve the park factor and then divide
-with B-R's PFs, or others calculated similarly, you simply divide by the number they give you
-B-R's PFs are more accurate than ESPN's, but only provide PFs for run scoring and not other stats

This is far from everything there is to know about park factors, but it should give you an idea of what they are, and how to use them.

2 comments:

hostile postulate said...: my main concern with park factors is that they don't take enough information into account. it seems like they're simply a measure of what happened in a given park, not why it happened. opposing pitchers are taken into account, as you mentioned, but is there any compensation for opposing defenses? also, is there any way to distinguish the effects that parks have on right/left-handed hitters? or are all these various other factors rendered moot because they don't have much of an outcome on park factors when it's all said and done?; May 11, 2009 at 9:25 PM
Kincaid said...: Pitchers don't really need to be taken into account to measure the actual park factor, because looking at what a team does in it's home and road games is essentially the same sample of pitchers over a long enough length of time. For the Rangers, for example, the home sample is the Rangers' pitchers pitching to the rest of the league's hitters, and the rest of the league's pitchers against the Rangers' hitters. The road sample is the exact same thing. So if you just want to measure the effect of the park itself, you don't have to worry about it. There are small variations in who pitches or hits where how often, but not enough to worry about. The overall effect, if you're looking at enough games in calculating the park factor, is minuscule compared to the amount of work it would take to account for. It's the same for defense, or any other quality of a team's players. As far as the park factor itself goes, it shouldn't make a difference, because it's equally present in both the home and road sample.

The reason you account for hitters and pitchers is to account for the quality of opposition a player faces and has nothing to do with the park itself. In looking at the quality of pitching a team has and adjusting for it, you are really looking at what the team gave up. For example, for a run PF, how many runs the team allowed compared to average. Or for an ISO PF, the ISO the team allowed compared to average. And park-adjusted, of course, since we've already figured the park factor at this point. When we look at what the team allowed to the opposing offense, we call it "pitching", but really, it's a combination of pitching and defense. For example, if the Rangers allowed 1.24 times the ISO an average team would have allowed in the Ballpark in Arlington, that is the combination of the total extra bases their pitching and defense allowed relative to the number of at bats the opposition had. It's mostly pitching and simpler to just say pitching, but it's a combination of both. Just like ERA or WPA for pitchers is actually a combination of pitching and defense-which is why we have stats like FIP. So defense is already accounted for in what we call "pitching".

Left-/right-handed splits can be calculated in more or less the same way. You would just look at only the stats by left-handed or right-handed hitters instead of entire team stats. I don't know of anyone that publishes these, so you'd have to do them yourself. In some cases, it would probably make a significant difference (like Fenway, for example), but what we have is at least better than nothing, and for most parks the difference is probably not that much. The biggest problem with this is that you can start to run into team composition problems: if a team has a lot of lefties, then the home sample will be skewed high because more of your hitters will be hitting at home in the park you are measuring for, which is a problem because players hit better at home. So your PF will tell you that lefties hit better there even if they don't. Or you have the opposite problem for teams with relatively few lefties. That doesn't mean you can't calculate it, but you would have to account for how many left- or right-handed PAs each team had and add that adjustment into the formula. So, you could do it, it would just be more complicated.; May 12, 2009 at 3:38 PM

3-D Baseball

A Note About Park Factors

2 comments:

Post a Comment

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive