Probabilis: misc

Showing posts with label misc. Show all posts

Monday, August 3, 2015

If You Could Remove One Opposing Team's Player in a Game, How Would You Do It? Part 2

This post is a continuing analysis of the "Tranquilizer Theory": what if you could remove one opposing player for the remainder of the game, at any point in the game? When would you use it? Who would you use it on? Part 1

Previously I only looked at using the tranquilizer on the best player on the opposing team at the beginning of the game and how it would affect your team's win probability. Now I'll look at how when you use the tranquilizer affects your odds of winning, assuming you wait to use it during the game. One critical component of this, however, is that you must be in the lead late in the game for your team to gain the largest advantage (by taking away the game-changing play that would have led the losing team to come back). Intuitively, this makes sense: when you're trailing you have more to gain; thus, as the leading team, you have more to lose.

If You Could Remove One Opposing Team's Player in a Game, How Would You Do It? Part 1

While watching a myriad of different sporting events in college, my friends and I would often discuss a theory that would make any sport more interesting and complex: what if you could remove one opposing player for the remainder of the game, at any point in the game? In other words (and this is the PC version), what if you had a tranquilizer, with one dose, that could take out someone on the other team at any time? When would you use it? Who would you use it on?

The second question is a lot easier than the first, so let's assume we use the tranquilizer right when the game starts, so the opposing player would be out for the entirety of the game. Your goal then would be to remove the player who would most likely have the largest positive impact on the game for their team (and we're talking in the general case, for an average team. In some cases this choice would be obvious based on your opponent).

Downfall of Parsing in Google Sheets

With the start of college basketball rapidly approaching, I went through to set up my model for the new season. In doing so I checked for errors (probably the most intensive and frustrating part of writing code), and found that ESPN's schedules had changed the names of certain colleges, slightly. Every "State" is changed to "St", with the exception of a few, including Ohio State, NC State, and Iowa State. Those alterations and the rest are as follows:

Parsing Data in Google Sheets

I figured I would give a rundown how I parse data on Google Sheets (which can be translated to use in Microsoft Excel).

To start, I import data from an external site; it has to be in a table or .csv format. In this example, I'm importing from ESPN's schedule for the Tampa Bay Rays.

Building a Retrodictive Model

First, regarding the Matrix model I ended up going with, I had concerns about the invertibility of the matrix due to the limited connectedness of teams since there aren't many games played between teams in different parts of the country. However, I only ultimately had to omit one team (due to the team not playing any games as a result of forfeiting all of their matches).

The Matrix model simply takes into account the strength of each opponent and wins/losses (and connects all the teams to determine their relative strength). It's not designed to be predictive, but still should predict future results decently based on each team's past performance. I don't have any working knowledge of the distribution of Ultimate scores, so I can't adapt it to build a predictive portion of the model. The best I could do is something like split the difference between hockey and football, but I don't have any real mathematical justification for doing so.

My original idea for a retrodictive model was one of best fit: but not involving least squares, but instead least absolute deviation. The end result would by a Pyth rating (like my previous models give) that minimizes the following equation:

Σ P(x) - W(x) = luck(x)

Where W(x) = # of wins for team x, and each individual P(x) is a Log5 probability:

Where A is the Pyth rating for team x, and B is the Pyth rating for an opponent on x's schedule. The model would calculate each team's Pyth rating to minimize each team's "luck", and thus provide a model that retrodictively predicts each past game. luck(x) would be the amount x got "lucky" with respect to their "expected" win total.

I tried this out in Matlab, but had issues with calculating a nonlinear minimization function with more than one variable (and in this case I would've needed some 200+ variables solved for involving 1500+ games). Solving for the original Matrix model was trivial compared to that.

Saturday, July 27, 2013

New blog launched devoted to sports picks

I've decided to keep this blog entirely devoted to sports and statistics, and have now launched Probabilis Sports Picks where I'll post my daily picks.

Sunday, July 14, 2013

STFC and "sheep" pickers: why going chalk is actually the best strategy for most wins in a month

On Streak for the Cash, many players consider following the "sheep" pickers (those blindly following the favorite with the majority of the picks) a bad strategy. But are favorites really over favored by STFC players? This premise depends upon the reasoning behind what percentage should back each side of a prop: should the percentages accurately reflect the probability of each respective side winning, or should the larger percentage simply take the favorite? I.E. If the favorite wins 70% of the time, should 70% of picks back the favorite? Since the goal of STFC is to simply pick winners (and you don't have to consider value or losing money), the best strategy is to maximize your expected number of wins. With this in mind, the "sheep" are actually playing the ideal strategy: even if one side's chances are slightly above 50%, they're the better pick.

This theory is backed by the numbers: I analyzed 6,799 STFC picks between 2010 and 2011, and the favorite won 54.54% of the time. When I broke down these picks by sport/league, only in one instance did the favorite have a losing record: 9-13 (40.91%) in Auto Racing. In every other category, the favorite "sheep" pick won more often than the underdog. I also analyzed 1,649 picks from 2013, and this trend continued: 52.82% of the 2013 favorites have been winners. And the more confident the pick, the better the result: for all 2013 props with 75-100% of the picks backing one side, 56.22% of these favorites have won, compared to sides with 60-75% of the picks (46.29% correct) and 50-60% (47.09% correct).

Keep in mind that this strategy is ideal for picking most wins in a month, but not necessarily for getting a "streak". The premise behind Streak for the Cash is that each prop is close to 50/50, and you need 27 wins in a row to win the grand prize "stash". Even if you are able to find an advantage of picking the favorite that has a 55% advantage every time, the chances of getting 27 correct picks in a row are .0000097%. If each prop is truly 50%, this falls to .00000075%. Basically, you have to get extremely lucky no matter what "strategy" you employ. However, when the goal is to get the most wins in a month, you want to maximize your expected number of wins: and picking the favorite every time is the way to go.

Numbers Numbers Numbers!!!

I watch sports. A lot. My roommate at UNC and I have determined that, when we combine the screen time of our two TVs, we watch a total of 60 hours of sports in a typical week. So, I figured I should be generating some sort of output from this: and thus this blog was born.

I intend to write about statistics and probability, primarily concerning sports and perhaps the stock market and other topics that interest me (Carolina basketball will definitely be a focus when the season starts in November). I'm also currently refining a "betting system" I've been working on for any sport I have a model for: from MLB to NFL to NBA to even WNBA, and I'll hopefully start posting daily picks if my methods are successful.

My main influences have been Nate Silver, Ken Pomeroy, and Jeff Sagarin; their work is fantastic. Finally, regarding probabilities and picks, as a great mathematician at UNC once told me, "You can never be 100% certain!"

Probabilis

Categories