Monday, May 19, 2014

Using Hits in a Team's Pythagorean Expectation

Pythagorean Expectation was created by Bill James to estimate a team's more "true" win percentage based on their runs scored and runs allowed. It is calculated as follows: 



The exponents have been tweaked slightly since James originally derived this formula (I use 1.83 for MLB as opposed to 2), and can also be altered to fit other sports.

So runs scored and runs allowed are used to give a more accurate representation of wins and losses, which can then be used to calculate win percentage. My hypothesis was to take this a level deeper: use hits scored and hits allowed to estimate runs scored and runs allowed to represent wins and losses which then can be used to give win percentage. 
Using data from the past 10 seasons, I found that the average Runs/Hit = 0.507. Here's an example of how I then use this to calculate various takes on "true" win percentage for my Tampa Bay Rays:

Their actual record is 19-26, which gives an actual win percentage of 0.422:
They've scored 179 runs and allowed 201, so their Pythagorean Expectation (Pyth_Runs) estimates their win percentage at 0.447, suggesting they should start winning more games:
Finally, they've scored 382 hits and allowed 392. Multiplying these numbers by the factor of 0.507 estimates their runs scored at 194 and runs allowed at 199, which gives a Pyth_Hits win percentage of 0.488, an even better figure:
Win %:0.422
Pyth_Runs %:0.447
Pyth_Hits %:0.488
However, these results don't mean much in a vacuum. How does each representation of win percentage perform as a way of ranking/rating teams? I tested each of these calculations to predict playoff series results (using regular season data) for the past 5 seasons (for uniformity, I left out the single-elimination wild card games of the past two years).

Win %Pyth_RunsPyth_Hits
20096-16-15-2
20104-34-35-2
20113-43-44-3
20123-42-55-2
20135-25-25-2
Total Correct212023
Pyth_Hits had the best record over this 5 year span, and is also the most consistent: Win % and Pyth_Runs have losing records in 2011 and 2012, while Pyth_Hits never does worse than 4-3.

No comments:

Post a Comment