# Simulation, CHONE, and the Cardinals

At the end of my last post I cautioned that the CHONE projections (and it applies to ALL projections) were a point solution (albeit the best guess at a point solution) and had to be thought of is concert with some error bars. In an attempt to get my arms around the ramifications of that point I created a quick little monte carlo simulation in excel/vba to produce some distributions for offensive runs above average. I’ll quickly outline the basic methodology used, to include my input set, and then I’ll present some initial results.

Since I knew I was going to be using CHONE projections I went back and collected the archived projections from 2009. I did a quick comparison between the projected wOBAs and the actual wOBAs from this year for various levels of prior experience to get insight into what the standard deviation should be for the distributions that will feed the simulation. The generalized results are in the table below

Experience | SD |
---|---|

None/Low | 0.038 |

Med | 0.030 |

A Lot | 0.025 |

I ran that information through the simulation in combination with this years CHONE projections and found what I though to be a spread that was too wide (both on an aggregate team basis and an individual level). I have very little to base this off of other than gut feel and combing back through the Fangraph archives of team totals from seasons past.

To address the variances on the individual level (which in turn addressed the aggregate solution) I went from using a normal distribution to a truncated normal by placing upper and lower bounds on the simulated wOBA (I implemented this using a re-draw method vice a rounding method). The upper and lower bounds were influenced by reviewing the 2009 data and looking for caps that existed for various projected production levels (i.e. a projected 0.300 wOBA never produced beyond a certain actual wOBA). That still did not give results on the aggregate that passed the “smell test”, so I cut the standard deviations in half. This last move was rather arbitrary on my part, and I plan to do some robustness testing along with analyzing a larger data set than just 2009. That being said, the simulation was “done” and is at a state where I didn’t mind putting results out for all to see.

For today I have three scenarios to display reults from

- Signing Matt Holliday and going with internal options at the rest of the positions
- Going entirely with internal options (basically Freese at 3rd and Craig in left)
- Same as 2 except downward adjusting the Freese and Craig Projections as they “feel” a bit high

After the jump I’ll have more input data and the results.

The following tables spell out the inputs exactly for all three cases. First for 1.

Player | Projected wOBA | PA | wOBA SD | L | U |
---|---|---|---|---|---|

Pujols | 0.433 | 675 | 0.013 | 0.380 | 0.460 |

Holliday | 0.386 | 675 | 0.013 | 0.315 | 0.440 |

Ryan | 0.310 | 650 | 0.015 | 0.250 | 0.375 |

Rasmus | 0.333 | 625 | 0.019 | 0.265 | 0.400 |

Ludwick | 0.356 | 600 | 0.013 | 0.285 | 0.400 |

Schumaker | 0.330 | 600 | 0.015 | 0.265 | 0.385 |

Molina | 0.329 | 575 | 0.015 | 0.255 | 0.385 |

Freese | 0.336 | 500 | 0.019 | 0.265 | 0.385 |

Lugo | 0.311 | 300 | 0.013 | 0.250 | 0.375 |

Pitchers | 0.200 | 300 | 0.000 | 0.190 | 0.210 |

Craig | 0.345 | 275 | 0.019 | 0.285 | 0.400 |

Mather | 0.333 | 125 | 0.019 | 0.265 | 0.400 |

Larue | 0.273 | 115 | 0.013 | 0.220 | 0.360 |

T Greene / Thurston | 0.313 | 75 | 0.019 | 0.250 | 0.375 |

Now the changes for 2 (all omitted data is the same)

Player | Projected wOBA | PA | wOBA SD | L | U |
---|---|---|---|---|---|

Craig | 0.345 | 575 | 0.019 | 0.285 | 0.400 |

Mather | 0.333 | 325 | 0.019 | 0.265 | 0.400 |

Jay | 0.323 | 175 | 0.013 | 0.255 | 0.385 |

and finally for 3

Player | Projected wOBA | PA | wOBA SD | L | U |
---|---|---|---|---|---|

Freese | 0.330 | 500 | 0.019 | 0.265 | 0.385 |

Craig | 0.330 | 575 | 0.019 | 0.265 | 0.385 |

And now some visual results for the three runs. First a comparison of the with and without Holliday runs.

Clearly the distribution with Holliday is pushed to the right (more runs on average), and while it’s hidden the farthest left it goes is -35. The “With” distribution is a little more narrow because Holliday takes at bats from players that have a higher s.d. (Craig, Mather and Jay).

And now for the comparison between different projections for the two young starters.

Making the two upcoming rookies league average hitters instead of their above average projection takes the offensive runs above average total being centered somewhere in the low 20s to being centered in the upper teens. And finally, what is probably the most useful chart, the cumulative distribution function (CDF for you stat folks) for the three simulations.

Using these curves we can quickly calculate probabilities the team producing certain values of runs above average. The greater gain of something like this will be when I incorporate the defensive projections and the pitching projections to come up with distributions and curves for the entire teams WAR. Then I can throw those probabilities up against some of the work Nick has done to come up with a set of playoff odds before the season commences. Anyway, I’m sure I’ll be breaking this out fairly frequently after I get the defensive projections done, and CHONE delivers the pitching projections. Also, once I get it cleaned up a little I can make it available to anyone else that might want to take it for a spin.

This is great stuff! Projections like this (which give a range of likely performance) would be much more useful than the single-point projections we get now.

There are lots of variables, and in a perfect world I’d think different types of players would have more different standard deviation values possible. Now that this work is out there, though, I’m sure there will be those trying to expand on it (as I’m sure you will); I would also think that those that do the actual projections should be able to at least some of the basic work. However, it won’t be me – that stuff is beyond my expertise/experience/patience level.

Like I said, great stuff.