3-D Baseball: 2017

Weighting Calculator for Past Data

Calculate Weighting for Past Data

decay factor:

Estimate the regression constant (for binomial stats only)

regression constant:

Math Behind Weighting Past Results (THT Article)

In my article "The Math of Weighting Past Results" on the Hardball Times, I gave a formula for finding the proper weighting for past data given certain inputs from the dataset. This formula defined the relationship between weighted results and talent, and the proper weighting was the value that maximized that relationship.

I started with a formula for a sample with exactly two days and then generalized that to cover any length of sample. I more or less explained where the two-day version came from in the article, but not the full version, which was as follows:

This supplement will go through the calculations of generalizing the simpler two-day formula to what we see above. It will rely heavily on the use of geometric series, so I would recommend having some familiarity with those before attempting to follow these calculations.

In the article, we treated each day's results as a separate variable and the overall sample as a sum of these individual daily variables. When we had two days in our sample, the combined variance was defined by the following formula:

This formula can be expanded to include more than two variables, but it starts to get messy really quickly. To make expanding is simpler, the formula can be re-written as a covariance matrix. If you have n variables, then the covariance matrix will be an n X n array, where each entry is the covariance between the variables representing that row and column. For two days, we would fill in the covariance matrix as follows:

	x₁	x₂
x₁	Var_x1	Cov_x1,x2
x₂	Cov_x1,x2	Var_x2

The combined variance is equal to the sum of the items in the matrix, which you can see is equivalent to the above formula.

This makes it much simpler to expand the formula for additional variables since you just have to add more rows and columns to the matrix. In the article, we found that the day-to-day correlation of talent (r) and the decay factor used to weight past data (w) can be used to explain changes in the variances and covariances throughout the sample:

Translating this to our covariance matrix gives us (with Var_x1 and Var_true written as v_x and v_t to save space):

	x₁	wx₂
x₁	v_x	rw*v_t
wx₂	rw*v_t	w²v_x

If we expand this to include additional days, every term except those on the diagonal will include a Var_true factor, and those on the diagonal will instead have a Var_x1 factor. (This is because the terms on the diagonal represent the covariance of each variable with itself, which is just the variance of that variable.) Similarly, every term contains an r factor and a w factor, except that the terms on the diagonal have no r (because these are relating the results of one day to themselves, so it is irrelevant how much talent changes from day to day).

For now, let's strip out the variance factors and focus only on what happens to r and w as we expand the matrix to cover more days. We'll look at r and w separately, but keep in mind these are just factors from the same matrix, not two separate matrices. If you placed one on top of the other, so that each r term lines up with the corresponding w term, and then put the variances back in, you'd get the full matrix.

This covariance matrix is essentially the same as what we worked with for the variance article, except now we are introducing weights for past results. As a result, the only real difference here is what happens with the w's, and the r terms follow the same pattern as in math for the variance article:

	x₁	wx₂	w²x₃	w³x₄	...	w^d-1x_d
x₁	r⁰	r¹	r²	r³	...	r^d-1
wx₂	r¹	r⁰	r¹	r²	...	r^d-2
w²x₃	r²	r¹	r⁰	r¹	...	r^d-3
w³x₄	r³	r²	r¹	r⁰	...	r^d-4
⋮	⋮	⋮	⋮	⋮	⋱	⋮
w^d-1x_d	r^d-1	r^d-2	r^d-3	r^d-4	...	r⁰

The weights also follow a pattern, though not the same one as the r factors. The weight for each term equals the combined weight of the two variables it represents:

	x₁	wx₂	w²x₃	w³x₄	...	w^d-1x_d
x₁	w⁰	w¹	w²	w³	...	w^d-1
wx₂	w¹	w²	w³	w⁴	...	w^d
w²x₃	w²	w³	w⁴	w⁵	...	w^d+1
w³x₄	w³	w⁴	w⁵	w⁶	...	w^d+2
⋮	⋮	⋮	⋮	⋮	⋱	⋮
w^d-1x_d	w^d-1	w^d	w^d+1	w^d+2	...	w^2(d-1)

While the two patterns are different, there are three important things to note that hold for both of them:

1) The terms on the main diagonal form their own distinct pattern.
2) The remaining terms are symmetrical about the diagonal, with the terms above and below the diagonal mirroring each other.
3) The terms on each diagonal parallel to the main diagonal follow a distinct pattern.

We need to find the sum of the matrix to get the variance in the weighted results. Using these three observations, we can simplify the sum by dividing the matrix up into parts.

We'll start with the main diagonal of the matrix. The terms on the diagonal follow the form w²ⁱ*Var_x1. The sum of these terms is a geometric series, which makes it simple to evaluate:

Next, because the matrix is symmetrical about the diagonal, we can focus on the sum for only the terms above or below the diagonal and then double our result later.

We'll compute this sum by continuing to divide the matrix along its diagonal rows. The r values within a given diagonal are all identical, which we can see in this graphic from the math for the previous article on variance:

The w values within each diagonal also follow a set pattern, though slightly more complex than the one for r's. Rather than r¹+r¹+r¹+..., we get w¹+w³+w⁵+... The basic pattern for the first diagonal is:

That's just for the w component of each term. If we include the r and variance components, we get this for the sum of the terms in the first diagonal adjacent the main diagonal:

This is still a geometric series, so we can evaluate the sum for this diagonal.

For the second diagonal, the w's go w²+w⁴+w⁶+..., which gives us:

If we keep going, we'll find that for each additional diagonal, the exponent for r will rise by one, the starting value of i in the summation will rise by one (which also means the summation will have one fewer term, which we can see by looking at the matrix), and each diagonal will alternate having an extra w outside the geometric sum due to the diagonals alternating between odd and even exponents.

Fortunately, the alternating w problem disappears when distribute that w back into the result for the geometric sum of each odd diagonal. We end up with the following pattern for the sum of each diagonal (after factoring out the Var_true component from each term):

This gives us two separate geometric series: the first multiplies by a factor of rw, and the second by a factor of r/w. Simplifying these geometric series gives us:

That gives us the sum of everything above the main diagonal in the covariance matrix. To get the full sum of the matrix, we need to double this (to account for everything below the diagonal, which mirrors this calculation) and add the sum of the main diagonal:

This gives us the full variance of the weighted results. Our formula calls for the standard deviation instead of the variance, so we just take the square root of this.

Next, we need to calculate the covariance between current talent and the weighted observations. We can get this using another covariance matrix based on the idea of "shared" variance mentioned in the Hardball Times article. The covariance between the results and talent for a given day is the same as the variance in talent, since the variance in talent is inherent in the variance of the results (i.e. that variance is shared between the results and the talent levels for that day).

To fill out the rest of the covariance matrix, we use the fact that the covariance between results and current talent drops the further the results are from the present time. The amount the covariance drops is determined by the day-to-day correlation in talent and the weight given to past data:

	x₁	wx₂	w²x₃	w³x₄	...	w^d-1x_d
t₁	(rw)⁰v_t	(rw)¹v_t	(rw)²v_t	(rw)³v_t	...	(rw)^d-1v_t

This is also a geometric series which multiplies by a factor of rw. The sum simplifes to:

As long as we know the values for r, w, Var_true and Var_x1, we can work out what the variance will be over any number of days, which means as long as we know r, Var_true and Var_x1, we can find the value of w which maximizes the relationship between weighted results and current talent.

Typically we would find this by taking the derivative of the formula and finding the point where the derivative equals 0, but given that this is a rather unpleasant derivative to calculate (and most likely will have difficult-to-find zeroes), I would strongly recommend just using the optimize function in R or some other statistical program (the calculator on the Hardball Times uses the same method to minimize/maximize a function as the optimize function in R).

One final note: this all relies on the assumption of exponential decay weighting. Exponential decay is not necessarily implied by the underlying mathematical processes; it's an assumption we are making to make our lives easier. Theoretically, we could fit the weight for each day individually, but this is far, far more complicated and not really worth the effort.

If you had 100 days in your sample, instead of maximizing the correlation for w, you would have to maximize it for a system of 100 different weight variables. If you would like to attempt this, by all means, have fun, but, while the exponential decay assumption is a simplification, it does work pretty well.

The true weight values do tend to drop slightly faster for the most recent data and then level out more for older data than exponential decay allows for, but on the whole, it doesn't make that much difference to use exponential decay. Continue Reading...

Math Behind Regression with Changing Talent Levels (THT Article)

In my article "Regression with Changing Talent Levels: the Effects of Variance" on the Hardball Times, I talk about how changes in players' true talent levels from day to day reduce the variance of talent in the population overall over time. In other words, the spread in talent over a 100-game sample will be smaller than the spread in talent over a one-game sample. In the article, I gave the following formula to calculate how much the spread in talent is reduced, which I will further explain here:

*Note: in the THT article, I used d for the number of days instead of n to avoid confusion with another formula that was referenced from a previous article, which used n for something else. For this article, I'm just going to use n for the number of days.

The value given by the formula is the ratio of talent variance over n days to the talent variance for a single day. In other words, the variance in talent drops by a multiplicative factor that is dependent on the length of the sample and the correlation of talent from day to day.

Now, how do we get that formula?

If we only have two days in our sample, it is not too difficult to calculate the drop in talent variance. Let t₀ be a variable representing player talent levels on Day 1, and t₁ be a variable representing player talent levels on Day 2. We want to find the variance of the average talent levels over both days, or (t₀+t₁)/2.

The following formula gives us the variance of the sum of two variables:

The covariance is directly proportional to the correlation between the two variables and is defined as follows:

(Note that sd_t₀sd_t₁ = var_t₀ = var_t₁ because the standard deviation and variance for both variables are the same.)

Before we continue, there is an important thing to note. Because we are trying to derive a formula for a ratio (variance in talent over n days divided by variance in talent over one day), we don't necessarily need to calculate the numerator and denominator of that ratio exactly. As long as we can calculate values that are proportional to those values by the same factor, the ratio will be preserved.

Technically, we want the variance of the value (t₀+t₁)/2 and not just t₀+t₁, which would be vart(1+r)/2 instead of 2vart(1+r). However, those two values are proportional, so it doesn't really matter for now which we calculate as long as we can also calculate a value for the denominator that is proportional by the same factor.

For two days, the above calculations are simple enough. Once you start adding more days, however, it starts to get more complicated. Fortunately, the above math can also be expressed with a covariance matrix:

	t₀	t₁
t₀	var₀	cov_0,1
t₁	cov_0,1	var₁

The variance of the sum t₀+t₁ is equal to the sum of the terms in the covariance matrix, which you can see just gives us the formula: var_t₀+t₁ = var_t₀ + var_t₁ + 2cov_t₀,t₁. The covariance matrix is convenient because it can be expanded for any number of days:

Covariance matrix between talent n days apart

	t₀	t₁	t₂	t₃	...	t_n-1
t₀	var₀	cov_0,1	cov_0,2	cov_0,3	...	cov_0,n-1
t₁	cov_0,1	var₁	cov_1,2	cov_1,3	...	cov_1,n-1
t₂	cov_0,2	cov_1,2	var₂	cov_2,3	...	cov_2,n-1
t₃	cov_0,3	cov_1,3	cov_2,3	var₃	...	cov_3,n-1
⋮	⋮	⋮	⋮	⋮	⋱	⋮
t_n-1	cov_0,n-1	cov_1,n-1	cov_2,n-1	cov_3,n-1	...	var_n-1

We can also construct a correlation matrix. Given that we know the correlation of talent from one day to the next, this isn't that difficult. If the correlation between talent levels on Day 1 and Day 2 is r, and the correlation between talent levels on Day 2 and Day 3 is also r, we can chain those two facts together to find that the correlation between talent levels on Day 1 and Day 3 is r².

The same logic can be extended for any number of days, so that the correlation between talent levels n days apart is rn:

Correlation matrix between talent n days apart

	t₀	t₁	t₂	t₃	...	t_n-1
t₀	r⁰	r¹	r²	r³	...	r^n-1
t₁	r¹	r⁰	r¹	r²	...	r^n-2
t₂	r²	r¹	r⁰	r¹	...	r^n-3
t₃	r³	r²	r¹	r⁰	...	r^n-4
⋮	⋮	⋮	⋮	⋮	⋱	⋮
t_n-1	r^n-1	r^n-2	r^n-3	r^n-4	...	r⁰

This matrix is more useful than the covariance matrix, because all we need to know to fill in the entire correlation matrix is the value of r. And because correlation is proportional to covariance (cov_t₀,t₁ = r · var_t₀), the sum of the correlation matrix is proportional to the sum of the covariance matrix.

Our next step, then, is to calculate the sum of the correlation matrix. Notice that the terms on each diagonal going from the top left to bottom right are identical:

We can use this pattern to simplify the sum. Since the matrix is symmetrical, we can ignore the terms below the long diagonal and calculate the sum for just the top half of the matrix, and then double it later:

r⁰	r¹	r²	r³	...	r^n-1	→	r^n-1
	r⁰	r¹	r²	⋱	⋮		⋮
		r⁰	r¹	⋱	r³	→	(n-3)r³
			r⁰	⋱	r²	→	(n-2)r²
				⋱	r¹	→	(n-1)r¹
					r⁰	→	nr⁰

There is one r⁰ term in each column of the matrix, so there are n r⁰ terms in the sum. Likewise, there are (n-1) r¹ terms, (n-2) r² terms, etc. If we group each diagonal into its own distinct term, we get a sum whose terms follow the pattern (n-1)*rⁱ:

Applying the distributive property and separating the terms of the sum, we get the following:

The first sum is a simple geometric series, which we can calculate using the formula for geometric series:

The second sum is similar, but the additional i factor makes it a bit trickier since it is no longer a geometric series. We can, however, transform it into a geometric series using a trick where we convert this from a single sum to a double sum, where we replace the expression inside the sum with another sum.

The idea is that each term of the series is itself a separate sum which has i terms of rⁱ. This sum can be written as follows:

Notice that we switched to using the index h rather than i. This means there is nothing inside the sum that increments on each successive term, and the i acts as a static value. In other words, this is just adding up the value rⁱ i times, which is of course equal to irⁱ.

In order to visualize how this double sum works, we can write down the terms of the sum in an array with i rows and h columns, where the value corresponding to each pair of (i,h) values is rⁱ. For example, here is what the array would look like with n=4:

	h=0	h=1	h=2	h=3
i=0	r⁰	r⁰	r⁰	r⁰
i=1	r¹	r¹	r¹	r¹
i=2	r²	r²	r²	r²
i=3	r³	r³	r³	r³

The greyed-out values are included to complete the array, but are not actually part of the sum. If we go through the sum iteratively, we start at i=0, and take the sum of rⁱ from h=0 to h=-1. Since you can't count up from 0 to -1, there are no values to count in this row, which represents the fact that irⁱ = 0 when i=0.

Next, we go to i=1, and fill in the values r¹ for k=0 to k=0. The next row, when i=2, we go from h=0 to h=1. And so on.

We are currently taking the sum of each row and then adding those individual sums together. However, we could also start by taking the sum of each column, which would be equivalent to reversing the order of the two sums in our double series:

Note that the inner sum now goes from i=h+1 to i=n-1, which you can see in the columns of the array of terms above.

This is useful because each column of the array is a geometric series, meaning it will be easy to compute. The sum of each column is just the geometric series from i=0 to i=n-1. Then, to eliminate the greyed-out values from the sum, we subtract the geometric series from i=0 to i=h.

This is the value for our inner sum, so we plug that back into the outer sum:

We now have values for both halves of our original sum, so next we combine them to get the full value:

We still have one more step to go to calculate the full sum of the correlation matrix. Recall that when we started, we were working with a symmetrical correlation matrix, and because the matrix was symmetrical along the diameter, we set out to find the sum for only the upper half of the matrix. In order to get the sum of the full matrix, we have to double this value:

Finally, note that the long diagonal of the correlation matrix only occurs once in the matrix, so by doubling our initial sum, we are double-counting that diagonal. In order to correct for this, we need to subtract the sum of that diagonal, which is just n*1 (since each element in that diagonal equals 1):

This value is proportional to the sum of the covariance matrix, which is proportional to the variance of talent in the population over n days.

Next, we need to come up with a corresponding value to represent the variance of talent over a single day. To do this, we can rely on the fact that as long as talent never changes, the variance in talent over any number of days is the same as the variance in talent over a single day. Instead of comparing to the variance in talent over a single day, we can instead compare to the variance in talent over n days when talent is constant from day to day.

This allows us to construct a similar correlation matrix to represent the constant-talent scenario. Compared to the correlation matrix for changing talent, this is trivially simple: since talent levels are the same throughout the sample, the correlation between talent from one day to the next will always be one.

In other words, the correlation matrix will just be an n x n array of 1s. And the sum of an n x n array of 1s is just n^2.

The ratio of these two values will give us the ratio of talent variance after n days of talent changes to the talent variance when talent is constant:

And that is our formula for finding the ratio of variance in true talent over n days to the variance in true talent on a single day, given the value r for the correlation of true talent from one day to the next. With some simplification, the above formula is equivalent to what was posted in the THT article:

3-D Baseball

Weighting Calculator for Past Data

Calculate Weighting for Past Data

decay factor:

regression constant:

Math Behind Weighting Past Results (THT Article)

Math Behind Regression with Changing Talent Levels (THT Article)

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive