Thursday, May 28, 2015

Stephen Curry Will Likely Break Danny Green's NBA Finals Record

Just two years ago, Danny Green broke Ray Allen's NBA Finals record for 3-pointers made, hitting 23 in the 7-game series against the Miami Heat. However, his record may not last long...

Stephen Curry has been incredible in this year's edition of the NBA Playoffs, averaging 4.9 threes made a game. He's already broken Reggie Miller's overall playoff record, and had done so in only 13 games (Miller's previous record came in 22). So, it seems natural to think Curry has a good chance to break Green's NBA Finals record as well. But just how good of a chance?

If the series goes to 5 games or more, Curry should be expected to hit more than 23 threes (4.9*5 = 24.5). But to determine exactly how likely he is to achieve this mark, I wrote a simulator (in Python) to simulate both the series itself and how many threes Curry makes.

Overall, he breaks Green's record 62.68% of the time, and ties it another 5.73% of the time (for a total likelihood of 68.41% that he'll finish the series as the co-record holder or better). Additionally, he'll finish as NBA champion 88.47% of the time as well.

Intuitively, the longer the series goes, the more often the record is broken:

# of GamesBreaks Record

That's not a misprint: if the NBA finals goes the full seven games, Curry breaks the record an incredible 99% of the time. And if each team wins at least one game, it's more likely than not that he gets the job done.

Tuesday, May 26, 2015

How Often We Played Brawl, Mario Party, etc During College

After considering how my college friends and I still play Backyard Football 2002 13 years after it came out, I wondered whether it was easily the most cost effective video game I/my parents had ever purchased. I don't have the data to see how much I played it in the past 4 years (since unfortunately I don't play it through Steam), but I do have the entire day-by-day log of my Wii for comparison to get a feel for determining if I'm correct.

I'm not.

I figured we played Mario Party the most, since we routinely would play it for 3+ hours straight. However, I was very, very, very, very wrong. I GREATLY underestimated how much my roommate (and I, and our friends, but mainly my roommate) had played Brawl over the past 4 years:

Mario Party 2335255.87
Super Mario 64131821.97
Mario Kart 643906.50

Those 18,483 minutes are equal to 12 days, 20 hours, and 3 minutes. Almost 2 full weeks of gameplay, and the vast majority (89%) occurred during freshman and sophomore year:

Mario Party 24939085491402
Super Mario 6423145561220
Mario Kart 6455026471

Here's some more context that makes this even more obscene:
  • I only had Brawl for one semester in each of those first two years: spring semester freshman year, and fall semester sophomore year
  • This includes a stretch of 17 straight days in the month of November 2012 in which Brawl was played, totaling 28.83 hours in that time period alone
  • There were only 9 days between August 18, 2012 (the start of school) and November 20, 2012 (Thanksgiving break, when I "lost" the game) in which Brawl was not played; this is when the entirety of the 9,296 sophomore year minutes were played. That's 83 out of 92 days, averaging 112 minutes per day
  • The max time played in one day was 9 hours, 10 minutes (for Brawl)
I may have been wrong about my original idea, but assuming we played Mario Party more in each sitting was correct: Mario Party did average more time played (per days played): 139.67 minutes for Mario Party versus 116.98 minutes for Brawl.

Sunday, May 24, 2015

The SJ Giants' Recent Win Streak

Continuing the recent theme of "unlikely things to happen to the San Jose Giants since I started working there": prior to my first game of the season, the Giants were 12-26, sitting firmly in last place, with a win percentage of 0.316. Since then, they've gone 5-0, and I've yet to witness a loss this season.

Just how unlikely was this win streak? By using the win-loss records of the Giants and their next two opponents (the Modesto Nuts and the Rancho Cucamonga Quakes) prior to the 5-game win streak, and then calculating the standard Log5 formula to estimate "the probability that team A will win (the) game" for each game, I determined this likelihood directly:

OpponentModestoR. Cucamonga
GameWin G1Win G2Win G3Win G4Win G5
Game Prob29.35%29.35%29.35%20.00%20.00%
Cumulative Prob29.35%8.61%2.53%0.51%0.10%

So prior to this homestand, the Giants had a 0.10% chance of winning every one of those 5 games. They now sit a half game out of 4th place, and we'll see if the good luck continues.

Saturday, May 23, 2015

How the Odds of Two Teams Having Identical Box Scores Change As Runs Scored Increase

The previous matching box score simulation brought up yet another research question: how does the frequency of this occurrence change with respect to how many runs are scored by both teams? I.E. Is a low-scoring game more likely to have matching runs inning-by-inning, or do more runs lead to more matches?

For simplicity (at least relative simplicity), we'll assume the visiting team is shutout (which is the single-most likely outcome, and the math is straightforward on this):

Note: Data from Baseball-Reference

So for now we'll focus on the other team (in this case the home team, the Giants). Overall, the total probability two teams match is 0.6022%. Over all runs, the results are as follows:

Total RunsTotal ProbFrequency

As seen in the chart below, the likelihood increases up to 3 runs, and decreases as runs increase from there (with the final bar showing the frequency of 10 or more runs):

There is in fact a pattern to this distribution, and the (still very small) chances of the scores matching do change with respect to the total number of runs scored.

Back to the instance in which one team is shutout: 6.2057% * 6.2057% is 0.3851%, but it seems two teams in two different games get shutout more often than once in every 260 tries. However, this 1 in 260 refers to the number of pairs, so consider a typical day of baseball: there are 15 games, and 1 team has to win (and thus score 1 or more runs) in each game, which leaves us with 15 possible teams to match. 15 choose 2 is represented as follows:
 which is equal to 105. So there are actually 105 possible pairs in any given (typical) night of MLB games, meaning we should see 0.003851 * 105 = 0.404 matches of shutouts on average. That's about 1 every 2.5 nights, which seems to fit reality.

"What are the odds?" That Any Two Teams Share an Identical Box Score

Yesterday I wrote of the 1 in 411,577,489 occurrence of the San Francisco Giants sharing the same inning-by-inning run totals as one of their minor league affiliates, the San Jose Giants. However, I only looked at the odds of that exact box score (a 2-0 final, with runs in the bottom of the 3rd and bottom of the 8th) happening. (Honestly because the math was more straightforward). But the more interesting question is to determine the probability that ANY set of inning-by-inning run totals is shared by an MLB team and one of their affiliates.

To do this, I wrote a simulator (in Python), using the figures from Baseball-Reference as the basis for each inning's runs scored probability, with the aim being to see how often any two teams would match their run totals inning-for-inning. Over 1 million simulations (I assumed this wouldn't occur very often), two teams (in different games) completely matched their scoring 0.6022% of time (so it actually happened more often than I expected). This means that the probability of both teams in both games matching their scoring would be 0.003626%, or 1 in 27,576.

As before, there's a 1/15 chance that these two matching box scores would involve an MLB team and its affiliate, so (1/15) * (1/27576) = 0.000242%, or 1 in 413,628. This is still an extremely rare occurrence, but not as extreme as the 1 in 411,577,489 chance of the exact box score that occurred the other day.