Wednesday, August 19, 2020

Building a Simple Rating System (SRS) Model (NBA)

Previously, I simulated every game via my play-by-play simulator to estimate the chances of each team in each round of this year's NBA playoffs.

While simulation is the most flexible technique, as it allows for all kinds of customization  (adding/removing players, tweaking lineup minutes, quantifying individual locations' home court advantage, factoring in weather for outdoor sports), it's rigid in its specificity in that it can't be generalized outside of the exact conditions scripted to the simulator.

A math or stats based model, on the other hand, can approximate more general projections, such as matching up any two teams overall. 

So I took the output of each possible matchup combination to go backwards and build a Simple Rating System (SRS). This technique can be used over a normal schedule where every team doesn't play one another (like they did in my simulations) using some linear algebra (additional resources on this at the end of this post). This also is the new adapted methodology to the matrix component of my original MDS Model.

First I set up a matrix of every simulated outcome:


TeamMILORLINDMIABOSPHITORBKNLALPORHOUOKCDENUTALACDAL
MIL0.06.26.00.43.64.82.46.53.34.92.72.34.13.61.22.0
ORL-6.20.0-0.8-6.5-3.5-1.1-3.90.2-3.7-1.7-3.2-3.4-0.5-2.6-4.0-4.1
IND-6.00.80.0-5.6-3.2-0.6-3.30.1-4.6-1.9-3.5-4.4-1.9-2.4-4.6-3.5
MIA-0.46.55.60.02.45.51.94.92.04.82.10.74.63.60.22.7
BOS-3.63.53.2-2.40.02.4-0.73.4-0.43.10.4-2.30.70.4-1.9-1.7
PHI-4.81.10.6-5.5-2.40.0-4.30.6-2.40.2-2.7-3.6-1.0-2.8-4.8-3.8
TOR-2.43.93.3-1.90.74.30.05.90.62.40.7-0.82.10.8-1.00.8
BKN-6.5-0.2-0.1-4.9-3.4-0.6-5.90.0-4.4-1.2-4.2-4.7-1.6-3.3-5.4-4.6
LAL-3.33.74.6-2.00.42.4-0.64.40.02.00.2-1.52.30.6-1.4-0.5
POR-4.91.71.9-4.8-3.1-0.2-2.41.2-2.00.0-1.9-3.3-0.1-1.6-4.3-2.7
HOU-2.73.23.5-2.1-0.42.7-0.74.2-0.21.90.0-0.90.31.1-2.00.1
OKC-2.33.44.4-0.72.33.60.84.71.53.30.90.04.22.6-0.70.7
DEN-4.10.51.9-4.6-0.71.0-2.11.6-2.30.1-0.3-4.20.0-0.6-4.3-2.7
UTA-3.62.62.4-3.6-0.42.8-0.83.3-0.61.6-1.1-2.60.60.0-3.0-0.8
LAC-1.24.04.6-0.21.94.81.05.41.44.32.00.74.33.00.01.7
DAL-2.04.13.5-2.71.73.8-0.84.60.52.7-0.1-0.72.70.8-1.70.0

Note how some of these results are intransitive, just like real life:

  • BOS beats HOU by 0.4
  • HOU beats DAL by 0.1
  • DAL beats BOS by 1.7

Next I set up a "Win Matrix", with the total number of games played/simulated along the hypotenuse: 


(Team A, Team B) =
MOV if Team A beat Team B;
else if Team A = Team B then # games played;
else 0


TeamMILORLINDMIABOSPHITORBKNLALPORHOUOKCDENUTALACDAL
MIL15.06.26.00.43.64.82.46.53.34.92.72.34.13.61.22.0
ORL0.015.00.00.00.00.00.00.20.00.00.00.00.00.00.00.0
IND0.00.815.00.00.00.00.00.10.00.00.00.00.00.00.00.0
MIA0.06.55.615.02.45.51.94.92.04.82.10.74.63.60.22.7
BOS0.03.53.20.015.02.40.03.40.03.10.40.00.70.40.00.0
PHI0.01.10.60.00.015.00.00.60.00.20.00.00.00.00.00.0
TOR0.03.93.30.00.74.315.05.90.62.40.70.02.10.80.00.8
BKN0.00.00.00.00.00.00.015.00.00.00.00.00.00.00.00.0
LAL0.03.74.60.00.42.40.04.415.02.00.20.02.30.60.00.0
POR0.01.71.90.00.00.00.01.20.015.00.00.00.00.00.00.0
HOU0.03.23.50.00.02.70.04.20.01.915.00.00.31.10.00.1
OKC0.03.44.40.02.33.60.84.71.53.30.915.04.22.60.00.7
DEN0.00.51.90.00.01.00.01.60.00.10.00.015.00.00.00.0
UTA0.02.62.40.00.02.80.03.30.01.60.00.00.615.00.00.0
LAC0.04.04.60.01.94.81.05.41.44.32.00.74.33.015.01.7
DAL0.04.13.50.01.73.80.04.60.52.70.00.02.70.80.015.0

Next I take the inverse of this matrix (which is only invertible after enough games have been played such that every team is connected to one another by a graph network):

Win' = Win-1

I then multiply this by the total margin of victory (MOV) over all games played:

Rating = Win' * MOV

TeamMILORLINDMIABOSPHITORBKNLALPORHOUOKCDENUTALACDALMOVmmult
MIL0.07-0.01-0.010.00-0.010.00-0.010.00-0.01-0.01-0.01-0.01-0.01-0.01-0.01-0.0154.03.83
ORL0.000.070.000.000.000.000.000.000.000.000.000.000.000.000.000.00-45.1-2.96
IND0.000.000.070.000.000.000.000.000.000.000.000.000.000.000.000.00-44.7-2.79
MIA0.00-0.01-0.010.07-0.01-0.01-0.010.00-0.01-0.01-0.010.00-0.01-0.010.00-0.0147.04.44
BOS0.00-0.01-0.010.000.07-0.010.00-0.010.00-0.010.000.000.000.000.000.004.12.69
PHI0.000.000.000.000.000.070.000.000.000.000.000.000.000.000.000.00-35.4-1.88
TOR0.00-0.01-0.010.000.00-0.020.07-0.020.00-0.010.000.00-0.010.000.000.0019.53.99
BKN0.000.000.000.000.000.000.000.070.000.000.000.000.000.000.000.00-51.1-3.41
LAL0.00-0.01-0.020.000.00-0.010.00-0.020.07-0.010.000.00-0.010.000.000.0011.33.63
POR0.00-0.01-0.010.000.000.000.00-0.010.000.070.000.000.000.000.000.00-26.4-0.80
HOU0.00-0.01-0.010.000.00-0.010.00-0.020.00-0.010.070.000.000.000.000.008.03.03
OKC0.000.00-0.010.00-0.01-0.010.00-0.01-0.01-0.010.000.07-0.02-0.010.000.0028.73.52
DEN0.000.00-0.010.000.000.000.00-0.010.000.000.000.000.070.000.000.00-20.8-0.45
UTA0.00-0.01-0.010.000.00-0.010.00-0.010.00-0.010.000.000.000.070.000.00-3.31.94
LAC0.000.00-0.010.00-0.01-0.010.00-0.01-0.01-0.01-0.010.00-0.01-0.010.07-0.0137.74.04
DAL0.00-0.01-0.010.00-0.01-0.010.00-0.010.00-0.010.000.00-0.010.000.000.0716.43.77

This now gives me a rating over the games won in the simulations. However, it doesn't account for losses (or lack thereof); obviously the more you win, the less you lose - but if your losses are all close, then you should get credit for that. Thus I do the same process by just transposing the initial win matrix:

Loss = WinT

This results in the following ratings set, where I average the win and loss ratings:

TeamMMultMatrix RankWMatrixLMatrix
MIA3.7414.443.04
MIL3.7223.833.60
LAC3.1134.042.19
OKC2.3243.521.11
TOR2.0553.990.12
DAL1.7663.77-0.24
LAL1.4473.63-0.76
HOU1.0983.03-0.85
BOS0.6192.69-1.47
UTA-0.18101.94-2.30
DEN-2.1811-0.45-3.91
POR-2.3712-0.80-3.94
PHI-3.1213-1.88-4.37
IND-3.5114-2.79-4.23
ORL-3.6815-2.96-4.39
BKN-4.1016-3.41-4.78

Interestingly, this method has Miami by a hair as the best team - compared to Milwaukee's, their simulated MOV was higher over ORL, PHI, DEN, UTA, and DAL, and they only project to lose to Milwaukee by 0.4 points (51.2% win probability in a single game).

These are designed to be centered around 0 (in this case, 0 is the average playoff team, not average NBA team). With the lack of transitivity shown earlier, these ratings won't exactly approximate the simulated results of each matchup (MIL projects to beat ORL by 7.4 points by these ratings, yet 6.2 points in their simulations). But they allow for a much easier direct comparison of the strength of each team.

Resources
Pro Football Reference matrix description
Visual guide by Ed Feng
In depth dissertation by Kenneth Massey

No comments:

Post a Comment