Probabilis: August 2020

Monday, August 24, 2020

"What are the odds?" An MLB Team Hits a Grand Slam Four Games in a Row

Last week, the San Diego Padres hit a grand slam in four consecutive games (all against the Texas Rangers) - something that had never been done before:

View this post on Instagram

A post shared by CBS Sports (@cbssports) on Aug 20, 2020 at 8:41pm PDT

As CBS Sports points out above, there have been roughly ~407,000 games in MLB history - so what were the odds that it hadn't happened yet?

I simulated each game of the Padres/Rangers series (2 in Arlington, 2 in San Diego) and estimate the probability of hitting a grand slam in each game as:

GameNum	Team	ProbGrandSlam
1	SDP	5.64%
1	TEX	2.27%
2	SDP	4.77%
2	TEX	2.16%
3	SDP	2.58%
3	TEX	1.63%
4	SDP	3.17%
4	TEX	1.54%

Multiplying each game together results in a truly unlikely series of events:

Team	Prob4InARow	1 in...
SDP	0.000220%	454,489
TEX	0.000012%	8,124,789
Avg	0.000116%	860,825

So over the course of Major League history, it truly is unlikely it hadn't happened yet. There are slightly less four game streaks than games, since you the first three games in a season don't make a streak of four. So as an estimate, I removed 3 games per year times 117 years = 351 games, out of 407,000, gives 406,649 approximate sequences of four games in a row.

Using this San Diego/Texas series as a proxy, there is roughly a 99.999884% chance that a four game stretch does NOT have a grand slam in each game (1 - 0.000116%).

So 99.999884% ^ 406,649 four game sequences = 62.35% - the odds that this had not happened yet. Resulting in a 37.65% chance of making it this far in to MLB history without the feat occurring.

Therefore, it might be fair to guess that the baseball gods were therefore punishing the Rangers for griping about Tatis hitting the first grand slam on a 3-0 count late in a blowout.

Sunday, August 23, 2020

Effect of Serve Rules/Scoring in Ping Pong/Table Tennis

In ping pong, there seem to be three primary rule variations around switching which player serves:

Switch every 2 cumulative points (points can be scored regardless of serving)
Switch every 5 cumulative points (points can be scored regardless of serving)
Switch upon loss of a point, but don't record a point (points can only be scored when serving)

How do these rule variations benefit/hurt the better player?

In all cases, I'm going to assume a best-of-5 series (first player to win 3 games wins), and each game is to 11, win by 2.

For calibration of the simulator, I'm assigning a "favorite" and an "underdog", where the "favorite" has to have a slightly higher chance of winning a point when they serve vs when they return:

Favorite wins point on serve: 55%
Favorite wins point on return: 50%

I then ran each rule set described above 10,000 times, with the "favorite" serving first. The probability the "favorite" wins a best-of-5 match:

Switch every 2 points: 73.83%
Switch every 5 points: 73.93%
Switch serve on lost point, no point recorded: 76.69%

So switching serve when the returning player wins the point (and not recording a point) is a huge advantage to the better player - because it effectively lengthens the game, because points are only recorded while serving. It's a well known phenomenon that the shorter the game, the more randomness is exhibited, and the better the chances are for the underdog.

However, this flips when the "underdog" serves first, but only for the third rule set:

Switch every 2 points: 73.58%
Switch every 5 points: 73.77%
Switch serve on lost point, no point recorded: 74.39%

The first two serving patterns (switching every 2 or 5 points) are more fair, since switching serve is independent of who scored points, and results in virtually the same win probability regardless of who serves first.

Friday, August 21, 2020

NBA Playoffs: Comparing Simulation Output vs SRS Model

Originally, I ran my play-by-play NBA simulator on this year's playoffs to estimate each team's chances, and then separately simplified those results to an SRS model so each team could easily be directly compared.

But if I run that SRS model back through the simulator, how would the predictions change?

The original projections were:

Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	East	MIL	84.4%	49.2%	34.4%	22.4%
8	East	ORL	15.6%	3.1%	0.9%	0.3%
4	East	IND	18.3%	4.1%	1.4%	0.4%
5	East	MIA	81.7%	43.6%	28.8%	17.5%
3	East	BOS	65.0%	32.2%	11.0%	4.8%
6	East	PHI	35.0%	10.3%	2.4%	0.7%
2	East	TOR	83.0%	51.5%	20.0%	10.1%
7	East	BKN	17.0%	5.9%	1.2%	0.3%
Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	West	LAL	62.5%	28.2%	13.6%	5.4%
8	West	POR	37.5%	12.5%	4.1%	1.1%
4	West	HOU	44.3%	23.8%	11.0%	4.4%
5	West	OKC	55.7%	35.5%	19.7%	9.4%
3	West	DEN	46.3%	12.8%	4.7%	1.5%
6	West	UTA	53.7%	19.6%	8.4%	3.0%
2	West	LAC	60.6%	43.6%	26.3%	13.5%
7	West	DAL	39.4%	24.0%	12.3%	5.2%

The SRS model then gave these relative ratings:

Team	MMult	Matrix Rank
MIA	3.74	1
MIL	3.72	2
LAC	3.11	3
OKC	2.32	4
TOR	2.05	5
DAL	1.76	6
LAL	1.44	7
HOU	1.09	8
BOS	0.61	9
UTA	-0.18	10
DEN	-2.18	11
POR	-2.37	12
PHI	-3.12	13
IND	-3.51	14
ORL	-3.68	15
BKN	-4.10	16

So I then have to run these ratings through Log5, converting the expected margin of victory to a probability using a standard deviation of 13.47 in NBA, and then simulating each round again (or I can do the math explicitly).

For example, take the LAC/DAL series. The original simulation output had:

LAC single game win probability: 54.88%
Average MOV: 1.65
Over a 7 game series, this is equivalent to: 60.57% series win probability

Now let's take the above ratings. We have to invert the first calculation:

LAC rating - DAL rating = 3.11 - 1.76: 1.35 average MOV
Normal distribution; mean = 0, standard deviation = 13.47, x = 1.35: 53.99% LAC single game win probability
Over a 7 game series, this is equivalent to: 58.67% series win probability

The full math on this is at the end of this post

Running this through the playoff bracket gives the following probabilities:

Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	East	MIL	88.5%	48.1%	32.1%	19.9%
8	East	ORL	11.5%	1.8%	0.4%	0.1%
4	East	IND	12.0%	2.0%	0.5%	0.1%
5	East	MIA	88.0%	48.0%	32.1%	20.0%
3	East	BOS	72.8%	33.9%	11.0%	4.7%
6	East	PHI	27.2%	7.0%	1.0%	0.2%
2	East	TOR	84.1%	54.6%	22.3%	11.5%
7	East	BKN	15.9%	4.5%	0.5%	0.1%
Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	West	LAL	73.2%	34.9%	17.0%	7.0%
8	West	POR	26.8%	6.7%	1.8%	0.4%
4	West	HOU	42.1%	22.8%	10.6%	4.1%
5	West	OKC	57.9%	35.6%	19.3%	9.0%
3	West	DEN	37.3%	8.3%	2.3%	0.5%
6	West	UTA	62.7%	20.7%	8.2%	2.6%
2	West	LAC	58.7%	43.5%	26.4%	13.6%
7	West	DAL	41.3%	27.6%	14.4%	6.2%

This gives the strange phenomenon where the Bucks are barely more likely to reach the conference finals than the Heat, yet the Heat are slightly more likely to make the Finals and win it all, as the Bucks are marginally more likely to win their first round series, and the Heat are only the slightest of favorites in each game over the Bucks.

Nevertheless, we get different results! Directionally they're almost the same (same picks in the first and second round), but there are large differences in magnitude in these early rounds.

Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	East	MIL	4.1%	-1.0%	-2.3%	-2.5%
8	East	ORL	-4.1%	-1.3%	-0.5%	-0.2%
4	East	IND	-6.3%	-2.1%	-0.9%	-0.3%
5	East	MIA	6.3%	4.5%	3.3%	2.5%
3	East	BOS	7.8%	1.7%	-0.1%	-0.2%
6	East	PHI	-7.8%	-3.3%	-1.3%	-0.5%
2	East	TOR	1.0%	3.0%	2.4%	1.4%
7	East	BKN	-1.0%	-1.4%	-0.7%	-0.2%
Seed	Conference	Team	Round 2	Conf Finals	Finals	Champion
1	West	LAL	10.6%	6.6%	3.4%	1.6%
8	West	POR	-10.6%	-5.8%	-2.3%	-0.7%
4	West	HOU	-2.2%	-1.0%	-0.4%	-0.3%
5	West	OKC	2.2%	0.1%	-0.4%	-0.4%
3	West	DEN	-9.0%	-4.5%	-2.3%	-1.0%
6	West	UTA	9.0%	1.0%	-0.1%	-0.4%
2	West	LAC	-1.9%	-0.1%	0.1%	0.1%
7	West	DAL	1.9%	3.6%	2.1%	1.0%

Calculating Series Probability

Neutral court makes this calculation much easier - we can just calculate each possible outcome (winning in 4, 5, 6, or 7 games).

Take our LAC/DAL example: 53.99% LAC win probability in any game. We just have to calculate the following outcomes, multiplied by the number of possible combinations for each series:

Win in 4: WWWW, 8.5%, 1 possible outcome
Win in 5: WWWLW, 3.91%, 4 possible outcomes

Think of it as 4 Choose 1 (nCr calculation): there are 4 places (games 1, 2, 3, 4) to put the 1 loss

Win in 6: WWWLLW, 1.8%, 10 possible outcomes

5 Choose 2 = 10

Win in 7: WWWLLLW, 0.83%, 20 possible outcomes

6 Choose 3 = 20

C (n, r) = C (6, 3)

= \frac{6!}{(3! (6 - 3)!)}

Outcome	G1	G2	G3	G4	G5	G6	G7	Win Series	Combos	Total Prob	Series Prob
Win in 4	54%	54%	54%	54%				8.50%	1	8.50%	58.67%
Win in 5	54%	54%	54%	46%	54%			3.91%	4	15.64%
Win in 6	54%	54%	54%	46%	46%	54%		1.80%	10	17.99%
Win in 7	54%	54%	54%	46%	46%	46%	54%	0.83%	20	16.55%