Thursday, June 26, 2014

Simulating the Knockout Rounds of the World Cup

Once again, the input for the Monte Carlo simulator is ESPN's SPI.

The main difference here from FiveThirtyEight's predictions is likely Brazil's home field advantage, which is included in my simulator as 0.5 goals (they don't specify the figure they use).

CountrySPI RatingQuartersSemisFinalChampion
Costa Rica78.657.60%23.26%8.83%2.43%
United States78.943.37%15.75%7.14%1.96%

Sunday, June 22, 2014

Scenarios for Advancing for Every Team in the World Cup

Here are the scenarios every team needs in the third round of group games to advance (if possible):

Group A
Brazil: Advances with win/draw vs Cameroon OR loss and Mexico win OR loss and Croatia win and win  goal differential
Mexico: Advances with win/draw vs Croatia OR loss and Cameroon win and win goal differential
Croatia: Advances with win vs Mexico OR draw and Cameroon win and win goal differential

Group B
Netherlands: Advances; 1st in group with win/draw vs Chile, 2nd in group with loss
Chile: Advances; 1st in group with win vs Netherlands, 2nd in group with draw/loss
Spain: Eliminated.

Group C
Colombia: Advances; almost certainly as 1st in group (only fall to 2nd with loss vs Japan and Ivory Coast win and lose goal differential)
Ivory Coast: Advances with win vs Greece OR draw and Colombia win/draw OR draw and Colombia loss (pending goal differential with Japan)
Japan: Advances with win vs Colombia and Greece win and better goal differential with Greece OR win and Ivory Coast/Greece draw and better goal differential with Ivory Coast
Greece: Advances with win vs Ivory Coast and Colombia win/draw OR win and Japan win and better goal differential with Japan

Group D
Costa Rica: Advances; likely as 1st in group (only fall to 2nd with loss vs England and Italy or Uruguay win and lose goal differential)
Italy: Advances with win/draw vs Uruguay
Uruguay: Advances with win vs Italy
England: Eliminated.

Group E
France: Advances with win/draw vs Ecuador as 1st in group, only is eliminated with loss and Switzerland win and massive change in goal differential
Ecuador: Advances with win vs France and Switzerland draw/loss OR win and Switzerland win and win goal differential OR draw and Switzerland loss OR draw and Switzerland draw OR loss and Switzerland loss and win goal differential
Switzerland: Advances with win vs Honduras and France win/draw OR win and France loss and win goal differential OR draw and France win OR loss and France win and win goal differential
Honduras: Advances with win and France win and win goal differential

Group F
Argentina: Advances; 1st in group with win/draw vs Nigeria, 2nd with loss
Nigeria: Advances with win/draw vs Argentina (as 1st with win, 2nd with draw) OR loss and Bosnia win/draw OR loss and Bosnia loss and win goal differential
Iran: Advances with win vs Bosnia and Argentina win and win goal differential
Bosnia-Herzegovina: Eliminated.

Group G
Germany: Advances with win/draw vs United States (as 1st in group) OR loss and Portugal draw OR loss and Portugal win and win goal differential OR loss and Ghana win and win goal differential
United States: Advances with win/draw vs Germany (1st in group with win, 2nd with draw) OR loss and Portugal draw OR loss and Portugal win and win goal differential OR loss and Ghana win and win goal differential
Ghana: Advances with win vs Portugal and Germany win and win goal differential OR win and United States win and win goal differential
Portugal: Advances with win vs Ghana and Germany win and win large goal differential OR win and United States win and win large goal differential

Group H
Belgium: Advances; likely as 1st in group, only falls to 2nd with loss vs South Korea and Algeria win and lose goal differential to Algeria
Algeria: Advances with win/draw vs Russia and Belgium win/draw OR draw and South Korea win and win goal differential
Russia: Advances with win vs Algeria and Belgium win/draw OR win and South Korea win and win goal differential
South Korea: Advances with win vs Belgium and Russia win and win goal differential OR win and Russia tie and win goal differential

Tuesday, June 10, 2014

The Long Run Effect of the "Simple Strategy" with Blackjack

The House has a default advantage mathematically when you play Blackjack (unless you can count cards). So I sought to test how much of an advantage the "simple strategy" really gives the player, which can be followed like so:

I wrote a Blackjack-playing program in Java that would play by the "simple strategy" no matter what (simplified even further by only hitting or standing).  
Note: the automated dealer followed normal Vegas rules, had to hit on 16 or below and on soft 17's, and blackjack paid 3:2. More information on Blackjack rules can be found here.

The program never makes mistakes, never gets fatigued, and always follows the strategy. I started the computer "player" off with $1,000, had it play 100,000 hands, betting $1 on each hand, and after all that, it finished in the black, up +$151. That's only a 0.151% return for 100,000 hands of play.

What I found the most surprising was how much variance there was in the player's performance: its net loss fell as low as -$127, and its net gain got as high as +$177. The House can always withstand these large runs; however, players with small bankrolls can not. Also, about half of the time the computer player was in the red. I plotted every 1,000 hands to illustrate this variability:

Sunday, June 8, 2014

Why are the Colorado Rockies Favored by the MDS Model?

The Colorado Rockies are currently a below-.500 ball club, 3 games below .500, and 11 games back in the NL West. However, they're #9 in my MDS Model, and #7 in the predictive Pyth component, far above a below-average team. My question is: why? They have the 26th ranked strength-of-schedule, so I hypothesized that it's due to their output at home, which is boosted by the high altitude of Coors Field. However, this effect should also boost the runs allowed at home too. 

Since Pyth simply adjusts runs scored and runs allowed by the strength of each opponent, I then hypothesized that the margin of victory at home might be significantly different than on the road, which would boost their rating. I checked the data for so far in 2014. For Colorado home games, the average MOV has been 4.78, while the league average (for 2013) was 3.28. However, this difference isn't significant, with a p-value of 0.2776. So this might still be the reason they're favored in my model, but I can't be certain. If the Rockies start rattling off a winning streak... You heard it here first.

Saturday, June 7, 2014

Spread of MOV in MLB (Predicting Run Lines)

I undertook the same margin of victory (MOV) analysis I did with Ultimate Frisbee to determine the spread of MLB games, with the intention to then use this to predict run lines. As mentioned in the above post, to predict MOV (lines), I calculate the inverse of the normal distribution for each win percentage and then multiply this by a parameter for each league. I had not yet calculated this parameter for baseball/MLB.

I took all games played in the 2013 season (both regular and postseason) and found that the standard deviation in MOVs is 2.546. To be conservative (as I usually am when predicting lines), I bump this up to 3. Fittingly, 62.91% of games were within 3 runs or less (close to the 68% desired within one standard deviation of normally distributed data). However, as with Ultimate, more exists at the extremes than would be predicted by this normal distribution. However, run lines are always set at +/-1.5, which is where I'm most concerned (within 1 to 2 runs). So I'm sticking with a SD of 3.

SD for MOV:
NBA:      10.5
NCAAB:    10
NFL:      10.5
NCAAF:    13.5
NHL:      2
MLB:      3
Soccer:   1.5
Ultimate: 4

Monday, June 2, 2014

Simulating the 2014 World Cup

I adapted my Monte Carlo simulator (not very efficiently, it took about 1,500 lines of code) to simulate this year’s World Cup 10,000 times. Its input was ESPN’s Soccer Power Index.

Below are the results, including the average points each team earned in the group stage, percentage placing 1st in group, placing 2nd in group, (and advancing out of the group, which is simply these 2 numbers added together), advancing to the quarterfinals, semifinals, final, and winning the entire tournament.

Sunday, June 1, 2014

This World Cup’s Multiple Groups of Death

Group G, which consists of Germany, Portugal, United States, and Ghana, was dubbed this World Cup’s “Group of Death” as soon as the draw was announced. However, Group G might actually be the 3rd toughest group: Group B and Group D also have legitimate claims for the “Group of Death” claim. 

This depends upon how the toughness of a group is gauged: the group with the highest average team rating, or the group that is the “deepest”? Per ESPN’s Soccer Power Index, the highest average rating is found in Group B, the second highest is Group D’s, and then Group G follows in third. In the case of Group B, the first three teams are stacked: at least one of Spain, Chile, or Netherlands will not advance. However, Australia has the second-lowest rating of the tournament (Algeria is last). 

If you consider “toughness” by how deep the groups are, Group D has the second highest average rating, and also has the highest rating for its “worst” team. Costa Rica trails Uruguay, England, and Italy (again, at least one of these teams will not advance), but would be the second-best team in Group A, and third-best in Groups C, F, and H. By this measure, Group G is the second-toughest “Group of Death”, as Ghana is no lightweight either.