Monday, December 29, 2014

NFL Playoff Picks 2014-15

The NFL Playoffs start this weekend, and the top seeds (NE, DEN, SEA, and GB) are appropriate (representing 4 of the top 5 teams in Pyth (the lone missing team being Kansas City, who disappointingly didn't make the playoffs)), and these teams are heavily favored.

Saturday, December 27, 2014

Creating an NCAAF Model as a Function of Stadium Size

My friend roommate colleague (who's also my friend and roommate), "Stan Hooper", suggested that a decent model for predicting college football game outcomes would be rating teams based on the sizes of their stadiums.

I took the size of each team's stadium and predicted the winner of each game simply by which team had the larger capacity. I factored in home-field advantage by adding the average stadium capacity to the home team's stadium size (home-field advantage in terms of points is 3.5, which is approximately the average margin of victory by home teams (this season is 3.78)) in this equation.

On the season, the Home-Field Model predicted 60.96% of straight up winners. For comparison, my MDS Model predicted 71.71% correctly, and Vegas tabbed 75.11% correctly. Honestly I'm just relieved that a model created in 30 seconds by importing a Wikipedia page into a spreadsheet didn't beat the one I worked an entire semester on.

Monday, December 22, 2014

Kentucky is So Good They Broke My Model

The Matrix model uses a series of n x n matrices involving only wins and losses to calculate a "true" win percentage that factors in the strength of each team's opponents. There is a matrix for wins and a matrix for losses, which are then inverted, factoring in the total number of games each team has played, and then multiplied by an n x 1 vector that is each team's net wins (Note: n = number of teams (in NCAAB, 351)):

net wins = wins - losses

Each game simply is denoted by a "1". Therefore, the outputs of the win and loss matrices are bound between -1 and 1. If a team has a winning "adjusted" record, their rating is above 0; otherwise it's negative. I then standardize this rating to be analogous to win percentage, bound between 0 and 1:

i = initial Matrix rating; -1 <= i <= 1
i + 1 = i'
i' / 2 = f
f = final Matrix rating; 0 <= f <= 1

Kentucky has a rating of 1.005. As seen above, the final rating has an upper bound of 1. So the issue must be within the win and loss matrices themselves.

Kentucky's loss matrix rating is 1.000; this checks out, since they're undefeated and thus have 0 losses. The issue then is pinpointed to their win matrix: their rating is 1.010.

In their win matrix, they have 12 wins, and 12 total games (Note: only Division 1 games are included, but in this case all of Kentucky's opponents have been Division 1). This checks out too. Their strength of schedule is very high: 0.645 (0.500 is average). This is the only explanation I can offer as to why Kentucky's rating is so high: they've played a very tough slate of teams and beaten them all. Even so, it shouldn't be above 1. 

A full rundown of the Matrix method can be found here on page 31, "Least Squares Ratings", written by Kenneth Massey.