Win Expectancy and Leverage Index tables, R Code

This post is just a quick dump of some code you can use to create win-expectancy and leverage index tables like what I used for my recent Baseball PreGUESTus article. It is written for the free statistical program R, and it builds upon the excellent work on run-expectancy and run distribution tables done by Sobchak at ChancesIs.com.

In order to run this code, you will need R with the package plyr installed. You will also need the file bo_transitions.csv from ChancesIs (either the CSV file hosted on that site, or one created using a similar query to the one Sobchak published) and the file game_state_frequency.csv, which you can copy from this table. Sobchak's data and the game_state_frequency table are from the years 1993-2010. You can collect the data for other years by altering Sobchak's SQL query and this game_state_frequency query.

*note-you only need game_state_frequency.csv for calculating LI. You don't need it if all you want is a WE table.


Once you have those files on your computer, you can construct a win-expectancy table with the following R code:

Win Expectancy Table, R code

You will have to change the line
setwd("/Users/Seshoumaru/Desktop/untitled folder/baseball/run-win expectancy")

to the folder path where you saved the necessary CSV files.

The win expectancy values are generated based on Sobchak's simulated run distributions. It is currently set to run 100,000 simulated innings from each state to estimate the distributions. You can raise the number of simulations to increase the precision, but it will take longer to process. On my computer, 100,000 simulations took about 4 minutes to run. 1,000,000 simulations took about an hour. The win expectancies themselves are not simulated, however.

The code limits run scoring to 16 runs for the remainder of the inning you are in, plus 16 runs total for the rest of the game. This is done to greatly reduce processing time. The generated tables cover scores from the home team being down 16 to up 16 (all score differentials are from the perspective of the home team.

The above code assumes equal run distributions for both teams. With a few changes, you can alter the code to include home-field advantage by using separate distributions for the home and away teams. To do this, you will need to alter Sobchak's query to create additional bo_transition files for just the home team and just the away team (called bo_transitions_home.csv and bo_transitions_away.csv). Once you have added those files, you can run the following code:


Win Expectancy Table, HFA version, R code



Continue Reading...