Monday, August 24, 2020

"What are the odds?" An MLB Team Hits a Grand Slam Four Games in a Row

Last week, the San Diego Padres hit a grand slam in four consecutive games (all against the Texas Rangers) - something that had never been done before:


As CBS Sports points out above, there have been roughly ~407,000 games in MLB history - so what were the odds that it hadn't happened yet?

I simulated each game of the Padres/Rangers series (2 in Arlington, 2 in San Diego) and estimate the probability of hitting a grand slam in each game as:


GameNumTeamProbGrandSlam
1SDP5.64%
1TEX2.27%
2SDP4.77%
2TEX2.16%
3SDP2.58%
3TEX1.63%
4SDP3.17%
4TEX1.54%

Multiplying each game together results in a truly unlikely series of events: 

TeamProb4InARow1 in...
SDP0.000220%454,489
TEX0.000012%8,124,789
Avg0.000116%860,825
So over the course of Major League history, it truly is unlikely it hadn't happened yet. There are slightly less four game streaks than games, since you the first three games in a season don't make a streak of four. So as an estimate, I removed 3 games per year times 117 years = 351 games, out of 407,000, gives 406,649 approximate sequences of four games in a row.

Using this San Diego/Texas series as a proxy, there is roughly a 99.999884% chance that a four game stretch does NOT have a grand slam in each game (1 - 0.000116%).

So 99.999884% ^ 406,649 four game sequences = 62.35% - the odds that this had not happened yet. Resulting in a 37.65% chance of making it this far in to MLB history without the feat occurring.

Therefore, it might be fair to guess that the baseball gods were therefore punishing the Rangers for griping about Tatis hitting the first grand slam on a 3-0 count late in a blowout.

Sunday, August 23, 2020

Effect of Serve Rules/Scoring in Ping Pong/Table Tennis

In ping pong, there seem to be three primary rule variations around switching which player serves:

How do these rule variations benefit/hurt the better player?

In all cases, I'm going to assume a best-of-5 series (first player to win 3 games wins), and each game is to 11, win by 2.

For calibration of the simulator, I'm assigning a "favorite" and an "underdog", where the "favorite" has to have a slightly higher chance of winning a point when they serve vs when they return:
  • Favorite wins point on serve: 55%
  • Favorite wins point on return: 50%
I then ran each rule set described above 10,000 times, with the "favorite" serving first. The probability the "favorite" wins a best-of-5 match:
  • Switch every 2 points: 73.83%
  • Switch every 5 points: 73.93%
  • Switch serve on lost point, no point recorded: 76.69%
So switching serve when the returning player wins the point (and not recording a point) is a huge advantage to the better player - because it effectively lengthens the game, because points are only recorded while serving. It's a well known phenomenon that the shorter the game, the more randomness is exhibited, and the better the chances are for the underdog.

However, this flips when the "underdog" serves first, but only for the third rule set:
  • Switch every 2 points: 73.58%
  • Switch every 5 points: 73.77%
  • Switch serve on lost point, no point recorded: 74.39%
The first two serving patterns (switching every 2 or 5 points) are more fair, since switching serve is independent of who scored points, and results in virtually the same win probability regardless of who serves first.

Friday, August 21, 2020

NBA Playoffs: Comparing Simulation Output vs SRS Model

Originally, I ran my play-by-play NBA simulator on this year's playoffs to estimate each team's chances, and then separately simplified those results to an SRS model so each team could easily be directly compared.

But if I run that SRS model back through the simulator, how would the predictions change?

The original projections were:


SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1EastMIL84.4%49.2%34.4%22.4%
8EastORL15.6%3.1%0.9%0.3%
4EastIND18.3%4.1%1.4%0.4%
5EastMIA81.7%43.6%28.8%17.5%
3EastBOS65.0%32.2%11.0%4.8%
6EastPHI35.0%10.3%2.4%0.7%
2EastTOR83.0%51.5%20.0%10.1%
7EastBKN17.0%5.9%1.2%0.3%
SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1WestLAL62.5%28.2%13.6%5.4%
8WestPOR37.5%12.5%4.1%1.1%
4WestHOU44.3%23.8%11.0%4.4%
5WestOKC55.7%35.5%19.7%9.4%
3WestDEN46.3%12.8%4.7%1.5%
6WestUTA53.7%19.6%8.4%3.0%
2WestLAC60.6%43.6%26.3%13.5%
7WestDAL39.4%24.0%12.3%5.2%

The SRS model then gave these relative ratings:

TeamMMultMatrix Rank
MIA3.741
MIL3.722
LAC3.113
OKC2.324
TOR2.055
DAL1.766
LAL1.447
HOU1.098
BOS0.619
UTA-0.1810
DEN-2.1811
POR-2.3712
PHI-3.1213
IND-3.5114
ORL-3.6815
BKN-4.1016
So I then have to run these ratings through Log5, converting the expected margin of victory to a probability using a standard deviation of 13.47 in NBA, and then simulating each round again (or I can do the math explicitly).

For example, take the LAC/DAL series. The original simulation output had:
  • LAC single game win probability: 54.88%
  • Average MOV: 1.65
  • Over a 7 game series, this is equivalent to: 60.57% series win probability
Now let's take the above ratings. We have to invert the first calculation:
  • LAC rating - DAL rating = 3.11 - 1.76: 1.35 average MOV
  • Normal distribution; mean = 0, standard deviation = 13.47, x = 1.35: 53.99% LAC single game win probability
  • Over a 7 game series, this is equivalent to: 58.67% series win probability
    • The full math on this is at the end of this post
Running this through the playoff bracket gives the following probabilities:

SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1EastMIL88.5%48.1%32.1%19.9%
8EastORL11.5%1.8%0.4%0.1%
4EastIND12.0%2.0%0.5%0.1%
5EastMIA88.0%48.0%32.1%20.0%
3EastBOS72.8%33.9%11.0%4.7%
6EastPHI27.2%7.0%1.0%0.2%
2EastTOR84.1%54.6%22.3%11.5%
7EastBKN15.9%4.5%0.5%0.1%
SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1WestLAL73.2%34.9%17.0%7.0%
8WestPOR26.8%6.7%1.8%0.4%
4WestHOU42.1%22.8%10.6%4.1%
5WestOKC57.9%35.6%19.3%9.0%
3WestDEN37.3%8.3%2.3%0.5%
6WestUTA62.7%20.7%8.2%2.6%
2WestLAC58.7%43.5%26.4%13.6%
7WestDAL41.3%27.6%14.4%6.2%
This gives the strange phenomenon where the Bucks are barely more likely to reach the conference finals than the Heat, yet the Heat are slightly more likely to make the Finals and win it all, as the Bucks are marginally more likely to win their first round series, and the Heat are only the slightest of favorites in each game over the Bucks. 

Nevertheless, we get different results! Directionally they're almost the same (same picks in the first and second round), but there are large differences in magnitude in these early rounds.

SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1EastMIL4.1%-1.0%-2.3%-2.5%
8EastORL-4.1%-1.3%-0.5%-0.2%
4EastIND-6.3%-2.1%-0.9%-0.3%
5EastMIA6.3%4.5%3.3%2.5%
3EastBOS7.8%1.7%-0.1%-0.2%
6EastPHI-7.8%-3.3%-1.3%-0.5%
2EastTOR1.0%3.0%2.4%1.4%
7EastBKN-1.0%-1.4%-0.7%-0.2%
SeedConferenceTeamRound 2Conf FinalsFinalsChampion
1WestLAL10.6%6.6%3.4%1.6%
8WestPOR-10.6%-5.8%-2.3%-0.7%
4WestHOU-2.2%-1.0%-0.4%-0.3%
5WestOKC2.2%0.1%-0.4%-0.4%
3WestDEN-9.0%-4.5%-2.3%-1.0%
6WestUTA9.0%1.0%-0.1%-0.4%
2WestLAC-1.9%-0.1%0.1%0.1%
7WestDAL1.9%3.6%2.1%1.0%
Calculating Series Probability

Neutral court makes this calculation much easier - we can just calculate each possible outcome (winning in 4, 5, 6, or 7 games).

Take our LAC/DAL example: 53.99% LAC win probability in any game. We just have to calculate the following outcomes, multiplied by the number of possible combinations for each series:
  • Win in 4: WWWW, 8.5%, 1 possible outcome
  • Win in 5: WWWLW, 3.91%, 4 possible outcomes
    • Think of it as 4 Choose 1 (nCr calculation): there are 4 places (games 1, 2, 3, 4) to put the 1 loss
  • Win in 6: WWWLLW, 1.8%, 10 possible outcomes
    • 5 Choose 2 = 10
  • Win in 7: WWWLLLW, 0.83%, 20 possible outcomes
    • 6 Choose 3 = 20


=6!(3!(63)!)
= 20
OutcomeG1G2G3G4G5G6G7Win SeriesCombosTotal ProbSeries Prob
Win in 454%54%54%54%8.50%18.50%58.67%
Win in 554%54%54%46%54%3.91%415.64%
Win in 654%54%54%46%46%54%1.80%1017.99%
Win in 754%54%54%46%46%46%54%0.83%2016.55%