The year was 1996. Baseball was riding high again after the ugly strike that had brought the game screeching to a halt just a short year and a half earlier. Cal Ripken had broken Lou Gehrig's long-standing record the year before. Mark McGwire was just starting to belt home runs like no one had ever seen (he missed 32 games that year and still managed to top 50 for the first time in his career), and no one had yet figured out why. Most importantly for a 10 year-old Cardinal fan, an aging Wizard was making his last rounds before bowing out of the game, and he was doing it for a playoff contender at that. This was the setting in which I hit the road with 3 of my closest relatives in the brotherhood of baseball, 2 generations of biological brothers on the road from St. Louis to Cincinnati, and then onward throughout the northeastern quadrant of the country wherever there was baseball.
We saw the Monster of Fenway, the obtuse angles of the outfield wall, the Pesky pole. We saw the ivy of Wrigley, the bleachers, the crowds on Waveland Avenue. The canopies of old Tiger Stadium. The friezes in Yankee Stadium, the grandstands of Doubleday Field. The visible seams in the turf at the SkyDome. We mercifully skipped over Shea. My brother, a year older but just as wide-eyed, and I learned why they were called cathedrals. My dad and uncle saw for the first time their temples of Mecca jump out from staticky TV sets into tangible form beneath their feet.
Younger still, I learned something about the field of play that sets baseball apart. For 127 feet, 3⅜ inches from home plate, in a 90 foot square, everything identical. It's meticulously laid out in the rule book with no room for error. Even the baseball itself is more loosely regulated than the diamond within these dimensions. But beyond that, you can do whatever the hell you want. Like my first Little League field, which had no outfield fence but some tricky playground equipment to navigate in play in left field. Anything that rolled into the woods in right or across the street past the playground and into the neighbor's farm was assumed to be a home run simply because no one wanted to chase a ball that far. Seeing in person the lots these players roamed as we traveled across the country awakened me to a greater truth of baseball: we weren't just too poor for a real field. We were partaking in one of baseball's grand traditions.
All this is to say that baseball exhibits a wide variety of contexts, much more so than other mainstream sports in this country. To a statistician like myself, this can present some problems. What a hitter does in Coors is not the same as what a hitter who puts up the same numbers in Petco does. For this reason, park factors are one of the most important adjustments we have to make to raw statistics, to the point that any advanced metric that wishes to be taken seriously must include them.
Fortunately, there are places that keep track of basic park factors, and these can be very useful as long as you know how to use them. For example, you may want to know how Matt Holliday's success in Coors would translate into Oakland's park. Or how Adrian Gonzalez would hit had he not screwed up his karma and found himself banished to the caverns of Petco. The problem is, it is not always entirely clear how to use these factors to adjust stats.
One place to find simple park factors is on ESPN.com. They list the park factors for a number of basic statistics along with the formula they use:
PF = ((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG))
This is the basic idea behind all park factors: compare what happens in a team's park to what happens in every other park. You can use this formula (replacing runs scored with whatever stat you want to look at, of course) to calculate your own park factor for any statistic you would like.
So how do you use these park factors? Say, you want to know what factor of 1.12 means. Basically, that means that the stat you are looking at is increased by a factor of 1.12 in that park compared to other parks in the league. So if a player hits 40 home runs in a park with a home run factor of 1.12, you would estimate that he likely got 12% boost in the home runs he hit at home, and if he played in a neutral park, he probably would have hit fewer. To account for an increase of 12%, you would divide the total by 1.12.
However, you have to also keep in mind that this type of park factor only tells you about the boost a player gets at home. He also plays half his games on the road, which we assume is a roughly neutral environment, on the whole. So if a hitter plays half his games in a 1.12 environment and the other half in a 1.00 (which would be perfectly neutral) environment, then we would want to average these to find the effect on his overall numbers. Which means the hitter played in an overall environment that increased his totals by 6 percent rather than the 12 percent shown in the park factor.
So with ESPN's park factors, or with any you might calculate using the same formula, you have to be careful in applying them to a player's season numbers because the PF only accounts for what he does in his home games. You have to average this with a neutral environment to account for the fact that he plays half his games on the road.
Another site that publishes park factors, and one of the best sites for baseball statistics in general, is Baseball-Reference. On every team page, there are a few different park factors listed. Unlike ESPN, they only list run scoring park factors, but also unlike ESPN, they adjust for a number of problems with the simple formula shown above.
As you will quickly notice, there are both one-year and multi-year park factors. One problem with these calculations is that they rely on observational data which can be prone to random fluctuation to estimate the effect of the park. Which isn't really a problem; in fact, it's more or less what all statistics are. But, like with all statistics, having more data to look at improves the accuracy, and by looking at 3 years of data instead of 1, you can get a better park factor. Which B-R does and ESPN doesn't. So that's one difference. Another improvement B-R makes is that it looks at runs per 27 outs rather than runs per game. If a team plays a number of extra inning games on the road, or has a number of games at home where they don't play the bottom half of the 9th, the number of innings played on the road or at home can vary enough to throw off your park factor by a small amount. Using outs (or innings) in the denominator instead of games accounts for this: an out or an inning is a fixed unit of length in baseball. A game is not. A third improvement takes into account that the road environment is not always neutral. 1.00 is a neutral park based on the league average. However, if an NL park has a factor of 1.12, then the other 15 parks will average to a bit less than 1.00, because adding in the 1.12 from the 16th park will bring the average up to 1.00. Instead of averaging 1.12 with a road environment of 1.00, you might have to average it with a road environment of .99. It's a small difference, but for extreme parks, it can matter.
Basically, these are all improvements on the previous type of park factor. So far, there is nothing that would cause you to use the park factor any differently. However, B-R makes two further adjustments that do make a difference in how you use them, or rather, one adjustment that changes how you use them and another you need to be aware of, because it is doing something completely different. The first is that it already averages the home and road environments of a team. So there is no need to cut the PF in half like with ESPN's. This is one reason that the PFs listed on B-R are generally smaller than on ESPN. So when you see a park factor of 112, you would actually divide the season numbers by 1.12, not 1.06.
The other adjustment has nothing to do with the park, but rather another component of team context. The quality of pitching a hitter faces is not the same as the quality of the league's pitching as a whole, because the league as a whole includes the hitters teammates, while the group of pitchers he faces does not. So if a hitter is on a team with a very good pitching staff, his numbers get a slight boost from not having to face those good pitchers. B-R's park factors account for this, which is slightly misleading, because it has nothing to do with the park. But it is another important adjustment to make, and since it's used the same way, lumping it with the park factor makes sense. This is also why there are separate PFs for hitters and pitchers. The effect of the park is the same, but accounting for the quality of hitters the pitchers face and accounting for the quality of pitchers the hitters face produce different results.
You can also calculate this type of park factor for any statistic you like, but it is much more complicated. The methodology is outlined in more detail on B-R's site.
To give an example, I calculated an ISO park factor for the Ballpark at Arlington from the years 2001-2003. The final park factor was 1.04, meaning Rangers hitters in those years saw a 4% increase in ISO from playing for the Rangers. The park factor, using the simple method above, was 1.13. The adjustments for improving that park factor give a more accurate figure of 1.12. This number would be used the same way as the ESPN park factors: average it with a neutral context and then divide the player's ISO by the final figure. Using B-R's methodology, we do this anyway, and end up with a park factor of 1.06. That is the effect of the park. However, we also further account for the fact that Texas' pitchers sucked, and that the Rangers had the poor fortune of not getting to face these pitchers. That adjustment knocks 2% off their ISO, so we drop our final number down to 1.04. So the adjustments to the numbers would go as follows:
due to park: divide by 1.06
due to quality of pitchers faced: divide by .98
combined effect of both: divide by 1.04
Basically, the most important things to keep in mind when dealing with park factors:
-you divide the stat by the park factor to neutralize the context
-with ESPN's PFs, or others calculated similarly, you first have to halve the park factor and then divide
-with B-R's PFs, or others calculated similarly, you simply divide by the number they give you
-B-R's PFs are more accurate than ESPN's, but only provide PFs for run scoring and not other stats
This is far from everything there is to know about park factors, but it should give you an idea of what they are, and how to use them.