On Miguel Cabrera, Value, and the Triple Crown

“In ’67, the triple crown was never even mentioned once.  We were so involved in the pennant race, I didn’t know I won the triple crown until the next day, when I read it in the paper.”
-Carl Yastrzemski to the Boston Herald, published September 26, 2012

“Is it too early to say that [Cabrera] has a legitimate shot at a Triple Crown this season hitting in front of Fielder? I don't think so.”
-Fox News sports article, published April 13, 2012

The Triple Crown has grown in stature over the years.  That’s not to say it wasn’t a big deal before, but reporters now are asking Carl Yastrzemski about someone else winning it faster than they ever asked him about winning it himself.  In 1942, when Ted Williams won it, no one even had a list of previous winners compiled.  An AP reporter had to research it for his story on Williams’ feat, and he still missed the most recent occurrence (Joe Medwick, whose Triple Crown just five years earlier escaped detection).

Back then, it was a cool thing.  It wasn’t necessarily the historic thing it’s become.  It didn’t yet carry the mythical ethos of the pantheon-dwellers -- Williams, Mantle, Yaz, Frank Robinson, etc -- who could once do what for so long escaped their modern counterparts.  When someone won it, it didn’t carry the weight of a whole generation of fans who grew up hearing about it and never seeing it.  It was just a cool thing.

I can see getting excited about it.  It’s an impressive feat.  It’s something we’ve waited for for a long time.  It's something only a handful of the greats have even done.

And yet, I have a hard time getting excited.  It was a great season, sure.  A wonderful season at the plate.  But the best season I’ve ever seen?  Not close.  Which means I’ve seen a lot of non-Triple-Crown seasons that were better, because this is the first Triple Crown of my lifetime.  You don’t even have to look that hard to find a better season.  There’s another one right in front of our noses.

I’m talking, of course, about Miguel Cabrera’s 2011 season.

I know that seems, at least on the surface, like a bit of a contrarian statement.  How could he have been better when he hit 14 fewer home runs and drove in 36 fewer runs and didn’t, I don’t know, win the first Triple Crown in four and a half decades?  I don’t mean it as a contrarian viewpoint, though.  I just think Cabrera hit better in 2011 than in 2012.

Let me explain myself.  First, we need to establish what we mean by “better”.

I grew up with a fairly traditional baseball upbringing.  I was the son of a catcher who was the son of a catcher, saved only from the tools of ignorance myself by a bad case of sinistrality (a condition my dad only fully forgave me for when my younger sister took up softball and inherited his old gear).  I learned the game from proud field generals who would rather hold their ground to a hard-charging runner than hit a home run, even if they dropped the ball in the process.

That’s not a bad way to learn the game.  It was a great way to learn it.  But part of that upbringing was growing up thinking that Rickey Henderson was Lou Brock-Lite, and that Ted Sizemore was the ideal #2 hitter, and that Tony Gwynn was the best hitter in the game.  Part of that was drafting Ozzie Smith for my first fantasy league in a three-team-deep league.

It’s not that those things are necessarily wrong.  I don’t remember or care what happened in that fantasy league, other than that I remember drafting my favourite player.  I don’t remember or care how many runs the Padres scored with Tony Gwynn anchoring their lineup, or how many games they won.  I remember that watching Tony Gwynn was unlike watching anyone else in baseball, because you felt like you knew you were going to see something happen.  He was going to put the ball in play, and the defense was going to scramble to field it.  When Tony won, it felt like he won because he could almost place the ball at the spot where it landed.  When the defense won, it felt like they got away with one.  It was exciting to someone who learned the game the way I did.

As far as baseball is a game of entertainment, maybe Tony Gwynn was the best hitter in the game.  Arguing for Tony Gwynn over Frank Thomas, or Barry Bonds, or Fred McGriff, or a handful of other guys as a hitter, though, isn’t really an argument of value or production.  It’s an argument of what “best” means to begin with.  He was better at some things, yeah.  Maybe better at the things that are most important to you.  At some point, though, it started to hit me that, whatever abstract ideals I might hold about what a hitter should be, the very concrete objective of all hitters is the same.  They hit as best they can to win games, and they do so by helping to score runs.

That’s something that’s hard to measure when your statistical upbringing comes mostly from Topps and Donruss.  How many runs is Gwynn’s AVG worth?  How many runs are Thomas’ walks and extra base hits worth?  I don’t know.  It doesn’t say on the back of the card.  We all know when we watch a game that getting on base is important, that making outs is bad, and that getting to second or third is better than getting to first.  How much better?  I don’t know.  And so the argument becomes about what best actually means, because the units of measurement are not helpful.

Continue Reading...

Clutch, WPA/LI, and the Home Run Bias

The stat Clutch, as published on FanGraphs and Baseball-Reference, is designed to quantify how much better or worse a hitter has produced in situations based on how critical those situations are in the immediate context of a game.  Players who perform better in more critical situations (for example, late in a close game) than they normally do will have a positive Clutch rating, and players who perform worse in such situations will have a negative Clutch.  It does this by comparing two values for a hitter:  his WPA and his WPA.LI.  I will assume you are familiar enough these two stats (not necessarily their inner workings, but at least what they are) as a prerequisite for this piece; if not, you can catch up on B-R's or FanGraphs' explanation pages.

WPA.LI follows two key constraints.  The first is that, for a given game state (i.e. the inning, the score, the number of outs, and the placement of any runners on base), the relative value of a play is determined by how much that play affects the team's chances of winning.  If the bases are empty, a walk is credited the same as a single.  If the bases are loaded with the winning run on third, a walk is credited the same as a home run.  This constraint works exactly like WPA (as one might expect from a WPA-based metric).

The second constraint differentiates WPA.LI from WPA.  One of the properties of WPA is that some situations are inherently weighted more strongly than others.  A key at bat late in a close game can swing a team's chances of winning by several times as much as the same result in a blowout, and it is credited accordingly.  WPA.LI, on the other hand, ensures that the average play in every situation gets the same weight.

So, on the one hand, you have WPA, which weights PAs according to their immediate impact on the game.  One clutch PA might be worth as much as 4 or 5 normal PAs, and one mop-up PA might be worth practically nothing.  On the other hand, you have WPA.LI, which weights every PA equally, just like most other stats do.  Basically, it is linear weights, but with the ability to tailor the value of each event to the specific situation rather than sticking to a blanket value for each event across all situations.  While WPA tells the story of clutch hitting (who got the big hit when the team most needed production), WPA.LI tells the story of situational hitting (who got on base when the team needed baserunners, put the ball in play when the strikeout was most costly, or hit for power when advancing runners quickly was more important than getting another guy on first).

There is a third important constraint which WPA.LI does not adhere to, however.  Ideally, the average value of each event would match its linear weights value.  If a home run is worth 1.4 runs above average across all situations, then you would like the average WPA.LI value of a HR to be 1.4 runs (or rather, the equivalent value on the wins scale).  That is not the case, however.

The following linear weights values represent the average change in run and win expectancy for that event across all situations, along with the average WPA.LI value of each event.  All three versions have been placed on the runs scale by setting the value of the out at -.27 in order to make them easier to compare directly:

1B 0.47 0.47 0.44
2B 0.77 0.75 0.75
3B 1.05 1.06 1.04
HR 1.41 1.42 1.58
BB 0.31 0.30 0.31
K -0.29 -0.30 -0.29
out -0.27 -0.27 -0.27

As you can see, WPA.LI does fine at assigning the correct value to most events, but the value of the HR is way off.  This may seem counterintuitive; if WPA.LI just creates custom linear weights for each situation based on the WPA values, why would the average WPA.LI value be different from the average WPA value?  We can look at the mathematical relationship between WPA and WPA.LI to see why this is.
Continue Reading...

The Pujols Decision: One Fan's Reflections

Stan Musial is the man in St. Louis.  Nearly 50 years after Musial last played for the Cardinals, he remains the undisputed king of Cardinal baseball.  His statue alone stands tall outside the main entrance to Busch Stadium, a few hundred feet south of the plaza where all the lesser (albeit much more attractive) statues of other Cardinal greats sit.  For decades, no one in St. Louis thought they would ever see a player rival Musial.

And then Albert came along.  Just one year and one Bobby Bonilla injury removed from his professional debut as a 13th round draft pick, Pujols was in the starting lineup and lighting up the National League.  He hit for average.  He hit for power.  He got on base.  He eventually learned to play a very good first base.  For the first time, St. Louis fans saw a player and thought, "this could be the guy who tops Musial."

The accolades came.  The MVPs (three of them, same as Musial), the All Star appearances, the Silver Sluggers and Gold Gloves, the home runs and hits and RBIs; all of them flocked to Pujols' Baseball-Reference page like moths to Matt Holliday's ear.

The wins followed.  Led by Pujols' success, the team made the playoffs 7 out of 11 seasons, winning 3 pennants and 2 World Series along the way.  From 2001-2011, only the high-spending Yankees and Red Sox won more games than did Pujols' Cardinals.  Pujols was the best player in the game, a superstar of whose order the franchise had not seen in decades.  Fans watched in awe and wondered how high his career would stack by the time it ended.

Pujols was, over his 11 years with St. Louis, remarkably similar to Musial when Musial was at his best.  Compare Pujols’ career in St. Louis to Musial’s best 11 year stretch (1943-54):

Musial (1943-54) 7564 2251 1174 281 990 1301
Pujols (2001-11) 7433 2073 1329 445 975 1291

Musial (1943-54) .346 .434 .591 171 88 98
Pujols (2001-11) .328 .420 .617 167 84 88

In both traditional counting totals and more sabermetric evaluations, the two come up as near equals.  Musial got on base a bit better (in an environment where hitters got on base more than they do today) while Pujols hit for more power (in an environment where hitters hit for more power than they did in Musial’s day).  The two were comparable fielders, good for their position, but at the weak end of the fielding spectrum.

Musial rates slightly better in both Baseball-Reference’s and FanGraphs’ implementations of WAR, but they are close enough that which one you would pick will largely depend on how you approach the different eras (i.e. how you want to adjust for things like integration, expansion, population growth, international development, improved scouting, the war years, etc).  They’re close enough that it would reasonable to take the position that no Cardinal fan has ever seen one of their own play at a higher level than Pujols has over his 11 years with the team, not even Musial.  It’s not a slam-dunk position; maybe you still take Musial.  But, for the first time since Musial retired, you’d probably at least have to think about it.

Watching Pujols play ignited Cardinal fans like watching Musial did, and we loved every minute of it.  Naturally, we wanted that to continue.  We wanted another all-time great to stay a career Cardinal.  Then, out of nowhere, the report swept in from the winter meetings that Pujols had signed with the Angels.  No build up, nothing.  No one had even talked about the Angels in the weeks of negotiating that led off the offseason.  Just like that, he was gone.

Continue Reading...

Win Expectancy and Leverage Index tables, R Code

This post is just a quick dump of some code you can use to create win-expectancy and leverage index tables like what I used for my recent Baseball PreGUESTus article. It is written for the free statistical program R, and it builds upon the excellent work on run-expectancy and run distribution tables done by Sobchak at ChancesIs.com.

In order to run this code, you will need R with the package plyr installed. You will also need the file bo_transitions.csv from ChancesIs (either the CSV file hosted on that site, or one created using a similar query to the one Sobchak published) and the file game_state_frequency.csv, which you can copy from this table. Sobchak's data and the game_state_frequency table are from the years 1993-2010. You can collect the data for other years by altering Sobchak's SQL query and this game_state_frequency query.

*note-you only need game_state_frequency.csv for calculating LI. You don't need it if all you want is a WE table.

Once you have those files on your computer, you can construct a win-expectancy table with the following R code:

Win Expectancy Table, R code

You will have to change the line
setwd("/Users/Seshoumaru/Desktop/untitled folder/baseball/run-win expectancy")

to the folder path where you saved the necessary CSV files.

The win expectancy values are generated based on Sobchak's simulated run distributions. It is currently set to run 100,000 simulated innings from each state to estimate the distributions. You can raise the number of simulations to increase the precision, but it will take longer to process. On my computer, 100,000 simulations took about 4 minutes to run. 1,000,000 simulations took about an hour. The win expectancies themselves are not simulated, however.

The code limits run scoring to 16 runs for the remainder of the inning you are in, plus 16 runs total for the rest of the game. This is done to greatly reduce processing time. The generated tables cover scores from the home team being down 16 to up 16 (all score differentials are from the perspective of the home team.

The above code assumes equal run distributions for both teams. With a few changes, you can alter the code to include home-field advantage by using separate distributions for the home and away teams. To do this, you will need to alter Sobchak's query to create additional bo_transition files for just the home team and just the away team (called bo_transitions_home.csv and bo_transitions_away.csv). Once you have added those files, you can run the following code:

Win Expectancy Table, HFA version, R code

Continue Reading...