Bayes, Regression, and the Red Sox (but mostly Bayes)

WARNING: Math heavy post.

Let's say that we have a hypothetical Major League team that has started the season 2-10. Say we want to see how likely it is that a team that goes 2-10 is still a .500 team. If you've taken any statistics courses, you may be familiar with hypothesis testing where you formulate a null hypothesis that the team is a .500 team, and then calculate the probability that they would go 2-10 by chance. You would then see that a .500 team will go 2-10 (or worse) over a 12 game span less than 2% of the time, and you'd probably reject your null hypothesis, depending on how high you set your significance level.

This does not, however, tell you anything about the likelihood that the team actually is a .500 team, only the likelihood that they would perform that badly if they were in fact a .500 team. To address the former issue, we can instead use a Bayesian approach.

To do that, we have to make an assumption about the population that the team is drawn from (in other words, about the spread of talent across all teams in the league). Most teams in MLB fall somewhere in the .400-.600 range (at least in true talent, if not in observed win percentage), so for convenience sake, let's just assume MLB teams follow a normal distribution of talent with a standard deviation of .050 W%.

This is our prior distribution. Once we have that, we can apply Bayes' Theorem:

P(A|B) = P(B|A)*P(A)/P(B)

We want to find the probability of A given B [P(A|B)], which is to say that we want to find the probability that a team drawn from our prior population distribution (u=.500, SD=.050) is actually a .500 (or better) team, given that we have observed that team to go 2-10 over 12 games. To do this, we need to find the probabilities P(B|A), P(A), and P(B).

P(A) is the simplest. This is the probability of a random team drawn from our prior distribution is at least a .500 team, ignoring of what we observed over the 12 game sample. This is just .500, because the population is centered around .500. So, P(A)=.5.

*NOTE-if you want to use a W% other than .500, all you have to do is find the probability of drawing a W% at least that high from the prior distribution, which is trivial when the prior distribution is normal since that is a common computer function. If you don't have a program that can calculate normal distribution probabilities, you can use an online tool.

P(B) is a bit more difficult. This is the probability that we observe a random team drawn from our prior distribution to go 2-10 over 12 games. This would be simple to calculate for any one team if we already know its true W%, but we need to know the average probability for all possible teams we could draw from our prior distribution. This will require a bit of calculus.

For any one team with a known true W% (p), the probability of going 2-10 is:

( 12! / (2! * 10!) ) * p^2 * (1-p)^10

where 12 represents the number of total games, 2 the number of wins, and 10 the number of losses. We need to find the average value of that formula across the entire prior distribution.

To do this, we utilize the same principle as a weighted average. In this case, the weight given to each possible value of p is represented by the probability density function of the prior distribution. So, for example, the odds of a .400 team going 2-10 are about 6.4%, and that 6.4% gets weighted by the quantity f(p), where f(p) is the pdf for our prior normal distribution. We repeat this for each possible value of p, add up the weighted terms, and then divide by the sum of the weights. As you have probably guessed, this means taking the definite integral (from p=0 to p=1) of the probabilities weighted by f(p):

INTEGRAL(f(p) * ( ( 12! / (2! * 10!) ) * p^2 * (1-p)^10 ) dp)

f(p) is just the normal distribution pdf, which is:

f(p) = e^((-((x-u)^2))/(2*VAR))/(sqrt(2*Pi*VAR)) ; u=.5, VAR=.05^2=.0025

As fun as it sounds to go ahead and do that by hand, we'll just let computers do this part for us too, using an online definite integral calculator. You can copy and paste this into the function field if you want to try it for yourself:


or, more generally, if you want to play around with different records or different means and variances for the prior normal distribution:


Integrating that from 0 to 1, we get a total value of .020. Remember that we still have to divide by the total sum of the weights to find the weighted average, but since that is just F(p), or the cumulative distribution function of the prior distribution, it is equal to one by definition. Therefore, P(B) = .02.

Finally, we have to calculate P(B|A). This is the probability of observing a 2-10 record, given that we are drawing a random team from the prior distribution that fulfills condition A, which is that the team has at least a .500 true W%. This is done very similarly to finding P(B) above, except we are only considering values of p>.5.

Start by calculating the same definite integral as before, but from .5 to 1 instead of from 0 to 1 (this is done by simply changing a textbox below the formula). This gives a value of .0045. That is the weighted sum of all the probabilities; to turn the weighted sum into an average, we still have to divide by the sum of all the weights, which in this case is .5 (this is the cdf F(p) from .5 to 1). Dividing .0045 by .5, we get .009. This is P(B|A).

P(B) = .02
P(B|A) = .009

P(A|B) = .009*.5/.02 = .22

The probability of a team that goes 2-10 being at least a true talent .500 team is about 22% (assuming our prior distribution is fairly accurate). As you can see, this is pretty far from what one might conclude from using the hypothesis test to reject the null hypothesis that the team is a .500 team. This is why it is important not to misconstrue the meaning of the hypothesis test. The hypothesis test only tells you a very specific thing, which is how likely or unlikely the observed result is if you assume the null hypothesis to be true. Rejecting the null hypothesis on this basis does not necessarily mean the null hypothesis is unlikely; that depends on the prior distribution of possible hypotheses that exist. Considering potential prior distributions allows us to make more relevant estimates and conclusions about the likelihood of the null hypothesis.

Another advantage of the Bayesian approach is that it gives us a full posterior distribution of possible results. For example, when we observe a 2-10 team, we can not only estimate the odds that it is a true .500 team, but also the odds that it is a true .550 team, or .600 team, or whatever. Also, since we have a full distribution of likelihoods, we can also figure out the expected value.

The posterior distribution of possible true talent W% for a team that is observed to go 2-10 is represented by the product we integrated earlier:


We find the expected value of that function the same way we found the average value of the above probabilities. This time, we want to find the average value of x (or p, as we were calling it before), so we weight each value of x by the above function, and then divide by the sum of the weights. For the numerator, this means integrating the above function multiplied by x:

x * (((W+L)!/(W!*L!))*exp((-((x-u)^2))/(2*VAR))*(x^W*(1-x)^L)/(sqrt(2*Pi*VAR)))

The denominator is just the summation of the function not multiplied by x, which we already did above (it is the same thing as the P(B) above, or .020).

Plugging this into the definite integral calculator, we get:

0.009442586/0.020355243 = 0.464

So a team that goes 2-10 over 12 games will be, on average, about a .464 team (again, assuming our prior distribution is accurate).

As a shortcut for the above calculations, one can also use regression to the mean to estimate the expected value of the team's true W%. This is done by finding the point where random variance in the observed record equals the variance of talent in the population. We know the standard deviation of talent in the population, using our assumed prior, is .050 wins per game. The random variance in the observed record is n*p*(1-p), where n is the number of games and p is the teams' true W%. Since p mostly stays around .5, this approximately equal to n*.25 (it's actually about n*.2475, but the difference is minimal). When you have 100 observed games, the random variance will be 100*.25 = 25. The variance of talent will be (100*.05)^2 = 25. Therefore, after 100 games, the variance due to random variation will equal the variance of talent in the population.

At this point, we would regress exactly 50% to the mean to estimate the expected value of a team's true W%. This is the same as adding 100 games of league average performance to a team's observed record. This also works after any observed number of games. To estimate our 2-10 team's true W% with regression to the mean, we would add 50 wins and 50 losses to the observed record and get an expected W% of 52/112 = .464. How well regression to the mean approximates Bayesian probability depends on the prior distribution you choose, but in this case, it works very well (rounds to the same thousandths place).

This is all done assuming a prior that presumes nothing about the team in question other than the fact that it comes from a distribution of teams roughly like that we observe in MLB. What if, in addition to knowing the 2-10 team comes from MLB, we also know that the team employs several really good players and that we believe them to be one of the most talented teams in the league? We can adjust the prior for that information as well. Our prior distribution, after accounting for the amount of talent we believe to be on the team, might have a mean of .590 instead of .500. Let's say that we assume a normal distribution with a mean of .590 and a SD of .030 for our prior. Now we get an expected W%, after observing the team go 2-10, of .572. In this case, the absymal 12 game observed sample dropped our expectations for the team going forward from .590 to .572. Of course, this all depends on what prior you assume, but as long as you make reasonable assumptions, your estimate should give you a decent idea.


Willy Hu said...

Nicely done!

DSMok1 said...

Indeed, excellent work!

I believe the general form for calculating the posterior distribution from 2 standard distributions (when we're not necessarily using a binomial distribution for the variance) is:

Posterior = (A/stdevA^2 + B/stdevB^2)/(1/stdevA^2 + 1/stdevB^2)

Where A is the mean if distribution A and B is the mean of distribution B.

And the posterior standard deviation is then

PostStdev = 1/sqrt(1/stdevA^2 + 1/stdevB^2)

I derived all of this from the calculus when I was working up my various Bayesian NBA rating systems.

Kincaid said...

I made an Excel macro to make calculating Bayesian W% estimates for a group of teams easier. You'll need an Excel spreadsheet with each team's W and L totals, and then paste the linked code into Excel's VBA environment. Read the comments embedded in the code to see what values you might have to alter.

Unknown said...

Outstanding article. I plan to use many of these ideas in a football article, and would love to give you as much credit as possible. Is there a good e-mail I could reach you at?

Post a Comment

Note: Only a member of this blog may post a comment.