I’ve spent the last few weeks introducing some non-projection metrics to augment our idea of range of outcomes in PGA DFS. Some of these metrics can, in fact, be priced into projections, but it’s still ambiguous as to whether or not these other metrics matter more in cash games versus GPPs. (Example: Does having low volume result in a 50 percent penalty 10 percent of the time, or a 10 percent penalty 50 percent of the time? Both will have the same average effect over the long run, but the latter is far worse for cash-game lineups.)
You could put together some distributions, hypothesize around which is more important, and come up with some rules, but that’s still a little too theoretical for my taste. Instead, we can take an empirical approach: Collect all of these metrics (projection and non-projection), attach them to a bunch of lineups, and see which metrics are important for game selection.
To put this into practice, I went back through all the golf slates for 2016, loaded up each player’s metrics at the time of each tournament, and generated 100,000 lineups for each tournament. I calculated the sum total of each metric (projections, total volume/recency indices, floor/ceiling, etc.) for each lineup, as well as an indicator on whether or not each lineup hit the double-up cash line. This approach would be a little limited in other sports because of the interactive effects of having two players in the same lineup (think stacking in MLB/NBA), but in golf, each player’s outcomes don’t correlate all that much with each other. Sure, you have things like weather conditions which affect groups of players at a time, but the interactive effects between a pair of players are kept to a minimum.
So for each of the lineups in the data set (2.4 million in total), we have a bunch of summed metrics. In order to see which ones drive success, I used a simple logistic regression model to test how predictive each metric is in terms of having a winning cash lineup. There are other modeling approaches that may be more accurate at the margins of predicting cash-game success, but for the purposes of this research, I’m mostly interested in how important each of the metrics are relative to one another, and the coefficients of the logistic regression model will do that just fine.
In addition, this approach probably wouldn’t work until the sample set is robust enough with enough diversity in the slates. Example: Based on the results from Sawgrass this year, if you had Jason Day plus Rory McIlroy in your lineups, you’d cash no matter what, so those results would pull the cash/no-cash model heavily in favor of high Vegas odds being important. But if that was one of only five tournaments in the sample set, that approach might be overweighted based on a limited set of tournaments. It’s a long-winded way of saying lineups from a specific tournament are highly correlated, and the best way to get general rules is have a lot of tournaments in the sample set.
We’ll go over the outputs of the cash game model next week, and see if we can use it to come up with some cash-game rules to incorporate those non-projection metrics.