Probabilis

Monday, May 19, 2014

Using Hits in a Team's Pythagorean Expectation

Pythagorean Expectation was created by Bill James to estimate a team's more "true" win percentage based on their runs scored and runs allowed. It is calculated as follows:

The exponents have been tweaked slightly since James originally derived this formula (I use 1.83 for MLB as opposed to 2), and can also be altered to fit other sports.

So runs scored and runs allowed are used to give a more accurate representation of wins and losses, which can then be used to calculate win percentage. My hypothesis was to take this a level deeper: use hits scored and hits allowed to estimate runs scored and runs allowed to represent wins and losses which then can be used to give win percentage.

Revisiting STFC and Picking NCAAB ATS

As I wrote last year, if you had simply followed the crowd of Streak for the Cash in picking college basketball against the spread, you would have made a decent return, winning above the break-even threshold of 52.38%. I wanted to revisit this idea with a full season worth of data, since the best model only picked ATS winners correctly 50.93% of the time for the 2013-14 season.

I simply found regression to the mean. For the entire NCAAB season on STFC, the consensus went 116-152: 43.28%. In fact, if you had faded (gone against) these picks, you would've won 56.72% of your picks, netting a 8.28% return.

But in last season's analysis, I only took into account January 1 to the end of the season. For the 2013-14 season (from January 1-on), the trend was still reversed: 79-109, 42.02%. Again, fading these picks would've won 57.98% of the time, for a 10.69% return.

Will next year's STFC consensus picks beat Vegas or will fading those picks be the winning strategy? Your guess is as good as mine. I'm sticking with my math.

Thursday, May 15, 2014

The effects of “market size” on team performance in the NBA, NFL, and MLB

For my semester research project in ECON 570H (Econometrics) at UNC, I analyzed "the effects of “market size” on team performance in the three major North American professional sports leagues (NBA, NFL, and MLB)".

Here are my findings.

Wednesday, May 14, 2014

The FIFA Strategy of High Volume Shooting: Does It Work in the World Cup?

So called "video game strategies" revolve around high octane offenses that are designed to generate scoring. Applying these strategies to real life has been demonstrated in both football and basketball, including whether teams go for it on 4th down enough in the NFL, and utilizing efficient scoring in the NBA (shooting threes and layups).

Ultimately my question was whether my "shoot enough times and eventually you'll score" FIFA strategy holds up at the highest level: the World Cup. I analyzed all games from 2002, 2006, and 2010, and then generated a TOBIT censored regression model (with standard errors clustered by year) that uses Shots and Shots on Goal to predict the number of goals scored by each team. The model was as follows:

Goals = -.6120012 + -.0535694*Shots + .3903682*Shots_on_Goal

Both effects were statistically significant, and these coefficients indicate that shots on goal has the larger impact on how many goals are scored. This makes sense since shots on goal are higher quality chances.

The average number of shots on goal for a team in these three World Cups was 5.3, and ranged from 0 to 15. So, I was interested in looking at only the upper end of this range: teams that took 9 or more shots on goal.

In the 192 games played involving 384 teams, only 49 teams attempted 9+ shots on goal. Within this group, only 12 of them had a positive residual: i.e. they scored more than would be expected by the above model. The other 37 teams scored less than expected, and on average, teams that had 9+ shots on goal scored 0.09 goals less per match than the model predicts.

This offers the conclusion that high volume shooting does not generate more goals; rather, quality chances are more indicative of success, and teams that "park the bus" and are defensive-oriented (like Italy) have more success than their high-shooting counterparts.

Friday, May 9, 2014

Predicting Lines (Against the Spread)

The MDS Model is essentially a strength of opponent-adjusted Pythagorean expectation for each team. Thus, each team's rating is designed to estimate "true win percentage" and is bounded by [0, 1], which allows for it to be used to predict matchup win probabilities by Log5.

However, Log5 only gives the probability of Team A beating Team B (and inversely, Team B beating Team A). A major use of modeling sports is for picking games "against the spread", which is considerably harder than picking "straight up" winners: lines (i.e. spreads) are designed to be 50/50 - having even money on each side - and the best professional gamblers in the world win only 55% of these picks.