Sunday, January 18, 2015

Fitting the Exponential Distribution to TV Timeouts in College Basketball

My "dream" is to be in attendance at a college basketball game in which an entire TV timeout is skipped: i.e. the under-16, under-12, under-8, or under-4. I include called timeouts in this; in other words, I want to go through the entirety of one of these 4-minute intervals without a timeout occurring.

My assumption is that the timing of these timeouts is a Poisson process, and thus follows the exponential distribution: the closer to each 4-minute mark, the more likely a timeout is to occur. I took a clustered simple random sample of 60 games (resulting in 480 data points, since there are 8 TV timeouts a game) by clustering on each day of the season and then randomly choosing one game every two days.

I graphed the frequency of how long it took for a timeout to occur after each 4-minute mark, and the distribution does appear to be exponential:

Over the 480 instances I sampled, 4 times there was not a timeout during one of these 4-minute stretches, which is equal to 0.83%. However, this is equal to 6.67% of games (since 8 timeouts occur each game), or 1 in every 15 games, which is somewhat often. In my time at Carolina, I've attended 44 men's basketball games (home, away, and neutral site), which means I should have seen this occur roughly 2.9 times (and it never has). So how likely is it that I haven't witnessed a skipped TV timeout?

We can answer this question using the exponential distribution and the average time until a timeout occurs under each 4-minute mark, which was 36.6 seconds in the sample. P(X>240) = 0.000078, or 0.0078% (240 seconds equaling 4 minutes). For a given game, we multiple this by 8, which is equal to 0.06% per game. We assume each game is independent to one another, so we take the probability of NOT seeing this occur (99.94%) and raise it to the 44th power. This indicates a probability of 97.29%.

But is this distribution really a good fit? For 240 seconds, P(X>240) = 0.000078, which implies a timeout should have been skipped 0.037 times in the sample, but in fact this happened 4 times. For 60 seconds, P(X>60) = 0.0106, which indicates my assumption is incorrect: there are actually 88 instances in the 480 data points in which the time until a timeout is longer than 60 seconds (18.33%).

The issue is that Poisson processes are independent, and thus the exponential distribution has the memoryless property: given that time has passed and the event has not occurred, it is still just as likely to occur as when time t = 0. This is likely not the case, since the longer play goes on without a dead ball, the more likely it is a coach for either team will call a timeout, possibly to substitute players or alter strategy.

So, I took out all called timeouts and only focused on dead ball situations, when a TV timeout occurred naturally. This changed the average time to 35.27 seconds, eliminated 40 data points, and the resulting frequency graph to this:

It still appears exponentially distributed, and we can now be more confident in the notion of this representing Poisson processes, since dead ball occurrences should happen independent of one another. However, P(X>240) = 0.000063, which is even smaller, and implies a 99.95% chance of me not witnessing a skipped TV timeout in a given game. So why did this occur 4 times in the sample of 480? Perhaps ESPN's play-by-play log(s) have errors, and these 4 "missing" TV timeouts simply were never entered into the feed for these games. It would be reasonable to assume that once we get to the NEXT 4-minute interval, it becomes more likely for a coach to call a timeout, but since called timeouts have been removed, these 4 data points supposedly reflect natural dead ball timeouts.

What does this mean for myself as a fan and my "dream"? I will have to attend a total of 1398 college basketball games before the likelihood of seeing a skipped TV timeout crosses the threshold of 50%.