Wednesday, April 23, 2014

Modeling MLS, MLB

I’ve finished building the MDS Model for MLS.

The ratings are, as before:
Matrix: A retrodictive rating that calculates a “true” win percentage based only on results (W/D/L, margin of victory is not considered) and the strength of each opponent
Pyth: A predictive rating that’s designed to predict future games, it takes into account an adjusted score for each game based on the strength of each opponent (from the Matrix model)
Comp: A composite of the two ratings to rank the teams
SOS: Strength of schedule

The MDS Model actually lines up pretty closely with that of Jeff Sagarin, USA Today’s statistician. 


Even though the San Jose Earthquakes haven’t gotten a win yet, they’re ranked higher than their record indicates (10th overall in my model) thanks to the 2nd highest SOS in MLS so far (also according to my model).

I also finished the MLS Model for MLB as well.

Friday, April 18, 2014

Building a Retrodictive Model

First, regarding the Matrix model I ended up going with, I had concerns about the invertibility of the matrix due to the limited connectedness of teams since there aren't many games played between teams in different parts of the country. However, I only ultimately had to omit one team (due to the team not playing any games as a result of forfeiting all of their matches).

The Matrix model simply takes into account the strength of each opponent and wins/losses (and connects all the teams to determine their relative strength). It's not designed to be predictive, but still should predict future results decently based on each team's past performance. I don't have any working knowledge of the distribution of Ultimate scores, so I can't adapt it to build a predictive portion of the model. The best I could do is something like split the difference between hockey and football, but I don't have any real mathematical justification for doing so.

My original idea for a retrodictive model was one of best fit: but not involving least squares, but instead least absolute deviation. The end result would by a Pyth rating (like my previous models give) that minimizes the following equation:


Σ P(x) - W(x) = luck(x)

Where W(x) = # of wins for team x, and each individual P(x) is a Log5 probability:



Where A is the Pyth rating for team x, and B is the Pyth rating for an opponent on x's schedule. The model would calculate each team's Pyth rating to minimize each team's "luck", and thus provide a model that retrodictively predicts each past game. luck(x) would be the amount x got "lucky" with respect to their "expected" win total.

I tried this out in Matlab, but had issues with calculating a nonlinear minimization function with more than one variable (and in this case I would've needed some 200+ variables solved for involving 1500+ games). Solving for the original Matrix model was trivial compared to that.

Applying the MDS Model to Ultimate Frisbee

I've applied the Matrix portion of the MDS Model to that of Ultimate Frisbee with the goal of obtaining a retrodictive model that improves on the limitations of the official ranking algorithm. (Why include sin and pi? Why not use e? Why not just use the Elo system?) 
Plus it gives me the opportunity to use the name The Ultimate Matrix.

The Ultimate Matrix does the following things:
• It takes into account all official matches between College division teams in the United States
• It omits games that were forfeits
• It does not take into account final score: only wins/losses (the official algorithm takes into account score)
• It emphasizes the strength of each team's opponent (much more so than the official algorithm)
• It fully updates the model after each game (and is thus Bayesian) instead of adding a rating differential to the teams' previous ratings

Thursday, April 17, 2014

NBA, NHL Playoff Brackets

The NBA Playoffs start this weekend and the NHL Playoffs started yesterday, so I've updated the MDS Model to pick both brackets! This year the NHL format has been altered slightly, as it is no longer reseeded.