Skip to content

Simulation, CHONE, and the Cardinals

November 18, 2009

At the end of my last post I cautioned that the CHONE projections (and it applies to ALL projections) were a point solution (albeit the best guess at a point solution) and had to be thought of is concert with some error bars.  In an attempt to get my arms around the ramifications of that point I created a quick little monte carlo simulation in excel/vba to produce some distributions for offensive runs above average.  I’ll quickly outline the basic methodology used, to include my input set, and then I’ll present some initial results.

Since I knew I was going to be using CHONE projections I went back and collected the archived projections from 2009.  I did a quick comparison between the projected wOBAs and the actual wOBAs from this year for various levels of prior experience to get insight into what the standard deviation should be for the distributions that will feed the simulation.  The generalized results are in the table below

Experience SD
None/Low 0.038
Med 0.030
A Lot 0.025

I ran that information through the simulation in combination with this years CHONE projections and found what I though to be a spread that was too wide (both on an aggregate team basis and an individual level).  I have very little to base this off of other than gut feel and combing back through the Fangraph archives of team totals from seasons past.

To address the variances on the individual level (which in turn addressed the aggregate solution) I went from using a normal distribution to a truncated normal by placing upper and lower bounds on the simulated wOBA (I implemented this using a re-draw method vice a rounding method).  The upper and lower bounds were influenced by reviewing the 2009 data and looking for caps that existed for various projected production levels (i.e. a projected 0.300 wOBA never produced beyond a certain actual wOBA). That still did not give results on the aggregate that passed the “smell test”, so I cut the standard deviations in half.  This last move was rather arbitrary on my part, and I plan to do some robustness testing along with analyzing a larger data set than just 2009.  That being said, the simulation was “done” and is at a state where I didn’t mind putting results out for all to see.

For today I have three scenarios to display reults from

  1. Signing Matt Holliday and going with internal options at the rest of the positions
  2. Going entirely with internal options (basically Freese at 3rd and Craig in left)
  3. Same as 2 except downward adjusting the Freese and Craig Projections as they “feel” a bit high

After the jump I’ll have more input data and the results.

The following tables spell out the inputs exactly for all three cases.  First for 1.

Player Projected wOBA PA wOBA SD L U
Pujols 0.433 675 0.013 0.380 0.460
Holliday 0.386 675 0.013 0.315 0.440
Ryan 0.310 650 0.015 0.250 0.375
Rasmus 0.333 625 0.019 0.265 0.400
Ludwick 0.356 600 0.013 0.285 0.400
Schumaker 0.330 600 0.015 0.265 0.385
Molina 0.329 575 0.015 0.255 0.385
Freese 0.336 500 0.019 0.265 0.385
Lugo 0.311 300 0.013 0.250 0.375
Pitchers 0.200 300 0.000 0.190 0.210
Craig 0.345 275 0.019 0.285 0.400
Mather 0.333 125 0.019 0.265 0.400
Larue 0.273 115 0.013 0.220 0.360
T Greene / Thurston 0.313 75 0.019 0.250 0.375

Now the changes for 2 (all omitted data is the same)

Player Projected wOBA PA wOBA SD L U
Craig 0.345 575 0.019 0.285 0.400
Mather 0.333 325 0.019 0.265 0.400
Jay 0.323 175 0.013 0.255 0.385

and finally for 3

Player Projected wOBA PA wOBA SD L U
Freese 0.330 500 0.019 0.265 0.385
Craig 0.330 575 0.019 0.265 0.385

And now some visual results for the three runs. First a comparison of the with and without Holliday runs.

Clearly the distribution with Holliday is pushed to the right (more runs on average), and while it’s hidden the farthest left it goes is -35.  The “With” distribution is a little more narrow because Holliday takes at bats from players that have a higher s.d. (Craig, Mather and Jay).

And now for the comparison between different projections for the two young starters.


Making the two upcoming rookies league average hitters instead of their above average projection takes the offensive runs above average total being centered somewhere in the low 20s to being centered in the upper teens.  And finally, what is probably the most useful chart, the cumulative distribution function (CDF for you stat folks) for the three simulations.

Using these curves we can quickly calculate probabilities the team producing certain values of runs above average.  The greater gain of something like this will be when I incorporate the defensive projections and the pitching projections to come up with distributions and curves for the entire teams WAR.  Then I can throw those probabilities up against some of the work Nick has done to come up with a set of playoff odds before the season commences.  Anyway, I’m sure I’ll be breaking this out fairly frequently after I get the defensive projections done, and CHONE delivers the pitching projections.  Also, once I get it cleaned up a little I can make it available to anyone else that might want to take it for a spin.

  1. Pete DuBois permalink
    January 8, 2010 3:16 PM

    This is great stuff! Projections like this (which give a range of likely performance) would be much more useful than the single-point projections we get now.

    There are lots of variables, and in a perfect world I’d think different types of players would have more different standard deviation values possible. Now that this work is out there, though, I’m sure there will be those trying to expand on it (as I’m sure you will); I would also think that those that do the actual projections should be able to at least some of the basic work. However, it won’t be me – that stuff is beyond my expertise/experience/patience level.

    Like I said, great stuff.


  1. Simulating the Cards Plan B’s « Play a Hard Nine

Comments are closed.

%d bloggers like this: