For the most part, I don't particularly care one way or the other, since scaling to ERA or RA is, for most practical purposes, the same thing. Divide or multiply by .92 (or lgERA/lgRA), and you can easily go from one to the other. If you're using the metrics for anything more complicated than, say, looking at them, this step is probably the simplest you'll encounter, so I have no problem with either standard, at least as far as practicality goes. The issue is just a matter of presentation and the implications that go with that.
However, there is a related question I am more interested in, specifically with regard to FIP. As discussed here last October, FIP comes from the linear weights values of 4 categories of events (HR, BB, SO, and BIP) and is scaled to the the league ERA by adding a constant to the calculation. Similarly, FIP could be scaled to RA instead by changing the constant. It shouldn't matter which scale we choose, since we can easily convert from one scale to the other with a simple calculation. Because of how FIP works, however, it does matter which scale we choose when deciding what constant to use in the calculation.
The problem arises from the fact that the linear weights used to calculate the coefficients in FIP are on the scale of runs, while ERA is on the scale of earned runs. Earned runs are a smaller scale than runs, by a magnitude of about .92. To see why this creates a problem, consider how FIP is calculated:
FIP = (13*HR + 3*BB -2*K)/IP + C
let (13*HR + 3*BB -2*K)/IP = x
FIP = x + C
This is the basic construction of FIP; a value is calculated for each pitcher, and then a constant is added to this value to put FIP on a more usable scale. The end goal of this constant is to convert FIP to the scale of either ERA or RA. Which scale you choose should make no difference because, as mentioned earlier, ERA is just RA divided by .92 (or something close to that), and you should be able to convert one to the other with a simple calculation. That is not true in this case. Let's let the following two equations represent FIP scaled to ERA and to RA respectively:
erFIP = x + C1; C1 = lgERA - lgx
rFIP = x + C2; C2 = lgRA - lgx
x is the same in both equations. The only difference is the value of the constant. Now, let's convert rFIP to erFIP using our multiply-by-.92 rule:
erFIP = .92*(x + C2)
=.92*x + .92*C2
.92 times a constant is just another constant, so:
=.92*x + C3; C3 = .92*C2
Compare that to the original equation for FIP scaled to ERA:
erFIP1 = x + C1
erFIP2 =.92*x + C3
As long as we choose the correct values of C1 and C3, there shouldn't be a difference between these two values, but there is. To see why, subtract the two equations, assuming erFIP1=erFIP2:
erFIP1-erFIP2 = (x + C1) - (.92*x + C3)
0 = x-.92*x + C1-C3
Here, C1-C3 is another constant, because it is just the difference between two constant numbers:
0 = x*(1-.92) + C4; C4 = C1 - C3
0 = .08*x + C4
This can't be true for all values of x. When C4 is set to that this equation is true on average, it means that erFIP1 is smaller than erFIP2 when x is lower than average (meaning the difference between the two equations as shown above will be negative) and that erFIP1 is larger than erFIP2 when x is higher than average (the difference between the equations will be positive). In other words, FIP will be lower for good pitchers and higher for bad pitchers if you scale it directly to ERA than if you scale it to RA and then convert to ERA-scale. The spread between pitchers is larger for the former method than for the latter.
Another way to look at this is to consider FIP as two components: the measure of pitchers' results (x), and the constant (C). x is measured in runs. If C is set to scale to earned runs instead of runs, then x will make up a larger portion of FIP, and, since x is the part of FIP that varies from pitcher to pitcher, the variance of FIP between pitchers will be inflated relative to the scale of the metric.
To illustrate this point, consider the following graph, which is just a scatter plot showing erFIP1 and erFIP2 from the above formulae for every pitcher to throw at least 100 IP in a season since 1970, as well as the difference betwen erFIP1 and erFIP2. The graph is sorted from left to right by the difference between the two figures:
Notice that when erFIP1 is smaller than erFIP2 (that is, when using a constant that scales FIP to ERA returns too small a value for FIP, assuming that scaling to RA is correct), FIP is small, and that, without exception, that difference rises as FIP rises for a given value of C (notice that the graph is really just one pattern stacked on top of itself several times; this is just the same pattern being plotted for different values of C in different years).
It shouldn't be surprising that there FIP has some inaccuracies. It is, after all, a shortcut for the original DIPS calculations designed to be much more simple and easy to use with only a small cost in accuracy. The question is how much difference this problem makes. As seen in the above graph, the difference between calculating FIP on the scale of RA and then scaling back to ERA and calculating FIP directly to the scale of ERA is small for most pitchers, and in fact approaches 0 as you get closer to average. It is on the edges of the graph, where pitchers are far from average, where the differences start to grow.
For example, Pedro Martinez, circa 1999. His FIP, with the constant set to scale to ERA, was 1.51*. With the constant set to scale to RA, it was 1.96, which, scaled back to ERA-scale, is 1.79. Still excellent, obviously, but not as good as his traditional FIP suggests. That's a difference of .28 runs per 9 innings. Say we were to calculate a WAR value for Pedro that year, how much difference would that make? We can ignore park adjustments for this specific purpose, since all we care about is how the two methods of calculating FIP compare. The AL average RA in 1999 was 5.31. Using 1.51 as Pedro's FIP (and dividing by .92 to scale to RA), that gives Pedro a W% of .885. Using .380 as replacement level, that's good for 12.0 WAR over 213.1 IP. Using 1.96 as Pedro's FIP gives him a W% of .851, or 11.2 WAR. The difference here is .8 wins.
Since 1970, 12 pitchers have had differences between their WAR figures as calculated by these two methods at least that big:
Similarly, poor pitchers have WARs that are too low when measured by traditional FIP, though not by as much, since they pitch far fewer innings than elite pitchers. Of the 5630 pitcher-seasons with at least 100 IP since 1970, the RMSD of WAR1 and WAR2 was .15, with an average IP of 175.
I've used language in this article that assumes that using the constant that scales to RA is the more correct choice, as the coefficients of FIP are based on a scale of runs rather than runs scored (this is also the method used in the original DIPS statistic), but I haven't gone through the full DIPS calculations to compare. For now, I think it's important to just look at how much difference there is between using different coefficients and whether there is enough difference that it could be worth switching scales. Since the formula would be virtually identical (the only difference would be that the constant would be different), I would prefer using the formula that scales to RA rather than ERA. If you prefer the ERA-scale, that adds a step of multiplying by .92 (technically lgERA/lgRA, which you might as well use since you have to calculate the constant anyway), but that's simple enough that I don't think it hurts simplicity or usability any. It's just standard fare for going from one scale to the other.
*The FIP I'm using here differs from FanGraphs' value. There are a few different formulae for FIP floating around; I am using BB-IBB+HBP for the BB term in FIP, and using different constants for the AL and NL, while FanGraphs uses BB+HBP for the BB term and a single constant for both leagues in each season. Also, while on the subject of FanGraphs, this article shouldn't be an issue for their WAR values, because FG uses the constant that scales to RA for win values.