Our Blog


Building an Optimal Model for the WGC-Bridgestone Invitational

Full disclosure: This is a bit of an exploratory piece. Let me start with why I’m writing it in the first place: Often, PGA DFS players hear something along the lines of, “Course X is a distance course.” On our PGA Daily Fantasy Flex podcast, we make similar statements. I believe we provide context and degree to those statements, but some analysts don’t. That’s a problem, because degrees matter when you’re building models to predict player performance for a golf tournament. Course X favors distance — but how much? Should you weight Long-Term Driving Distance 20/100 in Models? 30/100?

Thankfully, we can (sort of) deal with this problem, if we’re willing to be a little creative. Our Models and Trends tool have the same metrics, and thus we can use the latter to test for weighted metrics to use in the former. And that’s exactly what I did: I took every metric within Models, calculated how golfers in the top 20 percent of that metric performed historically at this week’s course, Firestone Country Club, and then assigned weights. This is an imperfect science for many reasons, one of which is that some metrics have negative value. For example, golfers with a Long-Term Driving Distance in the top 20 percent have historically scored a -7.01 Plus/Minus. That is less than the baseline for all golfers at Firestone (-6.55), and thus I eliminated it from the Model. Ideally, we would weight metrics both positively and negatively; since I can’t do that, I used only metrics that tested above baseline and built a model accordingly.

Kelly McCann in his course breakdown goes into more detail about Firestone and the metrics that have historically backtested well here, but here’s the Plus/Minus and metric/baseline differential for each positive metric at Firestone:

  • Long-Term Eagle Percentage: +2.26, +8.81
  • Recent Birdie Percentage: -0.36, +6.19
  • Recent Scrambling Percentage: -1.55, +5.00
  • Recent Eagle Percentage: -1.87, +4.68
  • Course Driving Distance: -1.98, +4.57
  • Pro Trends Rating: -3.62, +2.93
  • Long-Term Birdie Percentage: -3.69, +2.86
  • Recent Adjusted Round Score: -3.90, +2.65
  • Recent Driving Accuracy: -4.03, +2.52
  • Recent Driving Distance: -4.58, +1.97
  • Long-Term Count: -4.60, +1.95
  • Odds: -4.79, +1.76
  • Recent MC Score: -4.86, +1.69
  • Recent Greens in Regulation: -4.88, +1.67
  • Long-Term Adjusted Round Score: -4.96, +1.59
  • Putts Per Round Differential: -6.02, +0.53

From here, I translated all of those Plus/Minus values into percentages. For example, with a +2.26 Plus/Minus, Long-Term Eagle Percentage is given a weight of 17/100 in the model because of how that value compares to the Plus/Minus values of the rest of the metrics. With a Plus/Minus of -3.90, Recent Adjusted Round (Adj Rd) Score will be given a weight of 5/100 in the model. And so forth; using the positive metrics, we not only have an idea of what has historically been important at Firestone but also how important it has been relative to other metrics.

The Importance of Eagles

Of course, there are some probable flaws here. For one, eagles are incredibly rare, and especially on such a difficult course; Dustin Johnson won last year at just six-under par. Second, this model gives little weight to LT Adj Rd Score, which has tested to be one of the most predictive propriety metrics — or any metric, really — at FantasyLabs. With a weight of only 3/100, it is important — but it’s far less important than it is in the Pro Models. For instance, in the Colin Davy Model, LT Adj Rd Score has a weight of 40/100.

Still, somehow this odd model with LT Eagle Percentage at the top backtested brilliantly with a Plus/Minus of +5.00:

What gives? — especially with the small sample of the eagle data?

I’ll likely continue to explore this in further weeks, but my theory is that eagle data works as a decent proxy for daily fantasy value. LT Adj Rd Score is still the all-in-one metric with the most utility — it best identifies the most talented golfers — but there’s sometimes a gap between Adj Rd Score and DraftKings scoring, which favors the golfers who can accumulate birdies and eagles. Take the difference between Bill Haas and Justin Thomas in terms of their long-term metrics:

  • Haas: 69.0 LT Adj Rd, 0.2 Adjusted Eagles, 11.6 Adjusted Birdies, 18 percent MC
  • Thomas: 69.0 LT Adj Rd, 0.5 Adjusted Eagles, 13.6 Adjusted Birdies, 28 percent MC

Haas and Thomas have the same Long-Term Adj Rd Score, but their ceilings are nowhere near the same in terms of daily fantasy scoring. This isn’t a huge problem in Models, because we have 100 points with which to work. Even in a model that weights LT Adj Rd Score heavily — like Colin’s Model does — it still has 60/100 points to bump Thomas above Haas given his superior marks in birdie percentage and so forth. But if eagle percentage is actually a decent proxy and correlates well with both LT Adj Rd and LT Birdies then perhaps it’s a more useful stat than first realized; perhaps it can help identify the golfers who are the most talented and have the most upside.

Here’s the correlation between eagle percentage and the other metrics for players specifically in the WGC-Bridgestone Invitational this week:

Correlations R2
PPG 0.49462
LTAdj 0.42198
LTBird 0.61483
MC 0.47452

Indeed, LT Eagle Percentage does correlate well with LT Adj Rd Score and especially LT Birdie average. For reference, the correlation between LT Adj Rd and LT Birdie Average in this field is 0.81119, which is incredibly high. I guess my point is this: While weighting both LT Adj Rd and LT Birdie Average highly highlights the best golfers with upside and likely teases out the Haases of the world, perhaps LT Eagle Percentage makes the difference even starker.

Again, I’ll keep on exploring this idea of optimizing a model before optimizing a lineup as time goes on. Perhaps this eagle anomaly is a Firestone-only phenomenon given the small, loaded field and no cut line. Perhaps we’ll see LT Adj Rd and Birdie metrics become more important at other courses. Stay tuned.

For whatever it’s worth, this specific model’s highest-rated golfer currently is Brooks Koepka.

Full disclosure: This is a bit of an exploratory piece. Let me start with why I’m writing it in the first place: Often, PGA DFS players hear something along the lines of, “Course X is a distance course.” On our PGA Daily Fantasy Flex podcast, we make similar statements. I believe we provide context and degree to those statements, but some analysts don’t. That’s a problem, because degrees matter when you’re building models to predict player performance for a golf tournament. Course X favors distance — but how much? Should you weight Long-Term Driving Distance 20/100 in Models? 30/100?

Thankfully, we can (sort of) deal with this problem, if we’re willing to be a little creative. Our Models and Trends tool have the same metrics, and thus we can use the latter to test for weighted metrics to use in the former. And that’s exactly what I did: I took every metric within Models, calculated how golfers in the top 20 percent of that metric performed historically at this week’s course, Firestone Country Club, and then assigned weights. This is an imperfect science for many reasons, one of which is that some metrics have negative value. For example, golfers with a Long-Term Driving Distance in the top 20 percent have historically scored a -7.01 Plus/Minus. That is less than the baseline for all golfers at Firestone (-6.55), and thus I eliminated it from the Model. Ideally, we would weight metrics both positively and negatively; since I can’t do that, I used only metrics that tested above baseline and built a model accordingly.

Kelly McCann in his course breakdown goes into more detail about Firestone and the metrics that have historically backtested well here, but here’s the Plus/Minus and metric/baseline differential for each positive metric at Firestone:

  • Long-Term Eagle Percentage: +2.26, +8.81
  • Recent Birdie Percentage: -0.36, +6.19
  • Recent Scrambling Percentage: -1.55, +5.00
  • Recent Eagle Percentage: -1.87, +4.68
  • Course Driving Distance: -1.98, +4.57
  • Pro Trends Rating: -3.62, +2.93
  • Long-Term Birdie Percentage: -3.69, +2.86
  • Recent Adjusted Round Score: -3.90, +2.65
  • Recent Driving Accuracy: -4.03, +2.52
  • Recent Driving Distance: -4.58, +1.97
  • Long-Term Count: -4.60, +1.95
  • Odds: -4.79, +1.76
  • Recent MC Score: -4.86, +1.69
  • Recent Greens in Regulation: -4.88, +1.67
  • Long-Term Adjusted Round Score: -4.96, +1.59
  • Putts Per Round Differential: -6.02, +0.53

From here, I translated all of those Plus/Minus values into percentages. For example, with a +2.26 Plus/Minus, Long-Term Eagle Percentage is given a weight of 17/100 in the model because of how that value compares to the Plus/Minus values of the rest of the metrics. With a Plus/Minus of -3.90, Recent Adjusted Round (Adj Rd) Score will be given a weight of 5/100 in the model. And so forth; using the positive metrics, we not only have an idea of what has historically been important at Firestone but also how important it has been relative to other metrics.

The Importance of Eagles

Of course, there are some probable flaws here. For one, eagles are incredibly rare, and especially on such a difficult course; Dustin Johnson won last year at just six-under par. Second, this model gives little weight to LT Adj Rd Score, which has tested to be one of the most predictive propriety metrics — or any metric, really — at FantasyLabs. With a weight of only 3/100, it is important — but it’s far less important than it is in the Pro Models. For instance, in the Colin Davy Model, LT Adj Rd Score has a weight of 40/100.

Still, somehow this odd model with LT Eagle Percentage at the top backtested brilliantly with a Plus/Minus of +5.00:

What gives? — especially with the small sample of the eagle data?

I’ll likely continue to explore this in further weeks, but my theory is that eagle data works as a decent proxy for daily fantasy value. LT Adj Rd Score is still the all-in-one metric with the most utility — it best identifies the most talented golfers — but there’s sometimes a gap between Adj Rd Score and DraftKings scoring, which favors the golfers who can accumulate birdies and eagles. Take the difference between Bill Haas and Justin Thomas in terms of their long-term metrics:

  • Haas: 69.0 LT Adj Rd, 0.2 Adjusted Eagles, 11.6 Adjusted Birdies, 18 percent MC
  • Thomas: 69.0 LT Adj Rd, 0.5 Adjusted Eagles, 13.6 Adjusted Birdies, 28 percent MC

Haas and Thomas have the same Long-Term Adj Rd Score, but their ceilings are nowhere near the same in terms of daily fantasy scoring. This isn’t a huge problem in Models, because we have 100 points with which to work. Even in a model that weights LT Adj Rd Score heavily — like Colin’s Model does — it still has 60/100 points to bump Thomas above Haas given his superior marks in birdie percentage and so forth. But if eagle percentage is actually a decent proxy and correlates well with both LT Adj Rd and LT Birdies then perhaps it’s a more useful stat than first realized; perhaps it can help identify the golfers who are the most talented and have the most upside.

Here’s the correlation between eagle percentage and the other metrics for players specifically in the WGC-Bridgestone Invitational this week:

Correlations R2
PPG 0.49462
LTAdj 0.42198
LTBird 0.61483
MC 0.47452

Indeed, LT Eagle Percentage does correlate well with LT Adj Rd Score and especially LT Birdie average. For reference, the correlation between LT Adj Rd and LT Birdie Average in this field is 0.81119, which is incredibly high. I guess my point is this: While weighting both LT Adj Rd and LT Birdie Average highly highlights the best golfers with upside and likely teases out the Haases of the world, perhaps LT Eagle Percentage makes the difference even starker.

Again, I’ll keep on exploring this idea of optimizing a model before optimizing a lineup as time goes on. Perhaps this eagle anomaly is a Firestone-only phenomenon given the small, loaded field and no cut line. Perhaps we’ll see LT Adj Rd and Birdie metrics become more important at other courses. Stay tuned.

For whatever it’s worth, this specific model’s highest-rated golfer currently is Brooks Koepka.