Our Blog


Colin Davy: Strokes Gained Is Overrated in PGA DFS

When the first iterations of the FantasyLabs PGA product were in development, strokes gained was one of the first questions we looked at. It’s not hard to see why: it’s derived from very granular data, it’s used frequently by some of the smartest minds in golf analytics, and it’s become the de facto standard metric in DFS write-ups and previews. So I imagine there will be a lot of people surprised to see strokes gained missing from our PGA product.

Top DFS Promo Codes

As we’ve stated many times before, our products are always designed to steer our customers towards making correct decisions, and sometimes that entails excluding counterproductive data even if customers would be otherwise free to ignore it. And that’s why you won’t find strokes gained in our PGA tools, because for DFS purposes, it is exactly that: data that causes more trouble than it’s worth.

To be clear, I think strokes gained is an incredibly valuable metric for golf in general. Pros use it with great success, and there has been some great research and white papers derived from ShotLink data. PGA has been incredibly successful marketing it as well, which partially explains its status as a default space-age metric that does everything.

However, its uses for DFS purposes haven’t been explored in detail, and more importantly, compared to more conventional data sources.  That’s kind of an important step, because a lot of the predictive added value of strokes gained can be captured by transformations and adjustments to conventional data. Methods like z-scores and field adjustments – which we do here at FantasyLabs – normalize results according to course conditions and difficulty, capturing a huge chunk of value from strokes gained.

In addition, the areas in which strokes gained really shines – granular metrics like approach shots from the rough, putts inside 15 feet, fairway sand save percentages, etc. – just aren’t as important in predicting DFS scores as everyone thinks they are. There will be more articles in the future quantifying the effect of player stats, but I’ve been deep in the weeds on this stuff, and a base rating derived from generic stats gets you at least 90% of the way there.

Okay, so that’s plenty of words on why conventional data can be almost as good as strokes gained, but that’s still not a reason to exclude it outright. Why not just include both, and let our users make their own decisions? Here’s where strokes gained gets borderline actively harmful for DFS purposes: it’s not available in non-PGA events. As I mentioned in the opening podcast, golfers play a lot of non-PGA events, so if you’re relying exclusively on strokes gained, you’re potentially missing critical information from those tours. Recent form matters in PGA arguably the most of any DFS sport, so you can’t afford to miss out on any developments.

 

An Example

Let’s take an actual example and see how this will play out: at the start of February, Danny Willett won a Euro event. Three weeks later, he was the third-highest-priced golfer in the Valspar Championship. You need to decide whether to roster him. You have his conventional data and his strokes gained data side by side, and you know strokes gained is missing an outright win. How do you resolve the gap in the data and make a go/no-go decision?

  • Do you ignore him and stick to PGA players only, where you’ll have full strokes gained data? You’re sticking your head in the sand on purpose and missing a lot of potential value.

 

  • Do you try to translate conventional data into strokes gained equivalents? That’s a really error-prone process, since there’s no way to know if you’re doing the translation correctly and potentially doing more harm than good. At that point, why not just use conventional data, since it’s already an apples-to-apples comparison?

 

  • Do you limit the use of strokes gained to tiebreaking purposes when you’re stuck between two players (Danny Willett vs. Patrick Reed, for example)? That’s closer to a reasonable response, but it still doesn’t address the gap in the data. Also, look at where you’ve ended up going this route: strokes gained has no longer become the primary metric you use for evaluation, but rather a third-tier tiebreaker.  And that proves the overall point that it’s not nearly as important as you might think.

 

I still believe additional information should only lead to better predictions, and there is a theoretical way to resolve all of these gaps with a sound methodology. But that solution will be full of gray areas and introduce massive amounts of subjectivity, which is in direct opposition to the spirit of our PGA product.

If I thought strokes gained was this massive fountain of edge that made the subjectivity worth it, I would include it, but all of the data and potential uses point to the costs outweighing the benefits too often. I am still excited at ShotLink data being used to advance golf analytics in general, but remember: it was never intended to be used to develop out-of-sample predictions.

So let’s stop pretending like it has to be used in DFS.

When the first iterations of the FantasyLabs PGA product were in development, strokes gained was one of the first questions we looked at. It’s not hard to see why: it’s derived from very granular data, it’s used frequently by some of the smartest minds in golf analytics, and it’s become the de facto standard metric in DFS write-ups and previews. So I imagine there will be a lot of people surprised to see strokes gained missing from our PGA product.

Top DFS Promo Codes

As we’ve stated many times before, our products are always designed to steer our customers towards making correct decisions, and sometimes that entails excluding counterproductive data even if customers would be otherwise free to ignore it. And that’s why you won’t find strokes gained in our PGA tools, because for DFS purposes, it is exactly that: data that causes more trouble than it’s worth.

To be clear, I think strokes gained is an incredibly valuable metric for golf in general. Pros use it with great success, and there has been some great research and white papers derived from ShotLink data. PGA has been incredibly successful marketing it as well, which partially explains its status as a default space-age metric that does everything.

However, its uses for DFS purposes haven’t been explored in detail, and more importantly, compared to more conventional data sources.  That’s kind of an important step, because a lot of the predictive added value of strokes gained can be captured by transformations and adjustments to conventional data. Methods like z-scores and field adjustments – which we do here at FantasyLabs – normalize results according to course conditions and difficulty, capturing a huge chunk of value from strokes gained.

In addition, the areas in which strokes gained really shines – granular metrics like approach shots from the rough, putts inside 15 feet, fairway sand save percentages, etc. – just aren’t as important in predicting DFS scores as everyone thinks they are. There will be more articles in the future quantifying the effect of player stats, but I’ve been deep in the weeds on this stuff, and a base rating derived from generic stats gets you at least 90% of the way there.

Okay, so that’s plenty of words on why conventional data can be almost as good as strokes gained, but that’s still not a reason to exclude it outright. Why not just include both, and let our users make their own decisions? Here’s where strokes gained gets borderline actively harmful for DFS purposes: it’s not available in non-PGA events. As I mentioned in the opening podcast, golfers play a lot of non-PGA events, so if you’re relying exclusively on strokes gained, you’re potentially missing critical information from those tours. Recent form matters in PGA arguably the most of any DFS sport, so you can’t afford to miss out on any developments.

 

An Example

Let’s take an actual example and see how this will play out: at the start of February, Danny Willett won a Euro event. Three weeks later, he was the third-highest-priced golfer in the Valspar Championship. You need to decide whether to roster him. You have his conventional data and his strokes gained data side by side, and you know strokes gained is missing an outright win. How do you resolve the gap in the data and make a go/no-go decision?

  • Do you ignore him and stick to PGA players only, where you’ll have full strokes gained data? You’re sticking your head in the sand on purpose and missing a lot of potential value.

 

  • Do you try to translate conventional data into strokes gained equivalents? That’s a really error-prone process, since there’s no way to know if you’re doing the translation correctly and potentially doing more harm than good. At that point, why not just use conventional data, since it’s already an apples-to-apples comparison?

 

  • Do you limit the use of strokes gained to tiebreaking purposes when you’re stuck between two players (Danny Willett vs. Patrick Reed, for example)? That’s closer to a reasonable response, but it still doesn’t address the gap in the data. Also, look at where you’ve ended up going this route: strokes gained has no longer become the primary metric you use for evaluation, but rather a third-tier tiebreaker.  And that proves the overall point that it’s not nearly as important as you might think.

 

I still believe additional information should only lead to better predictions, and there is a theoretical way to resolve all of these gaps with a sound methodology. But that solution will be full of gray areas and introduce massive amounts of subjectivity, which is in direct opposition to the spirit of our PGA product.

If I thought strokes gained was this massive fountain of edge that made the subjectivity worth it, I would include it, but all of the data and potential uses point to the costs outweighing the benefits too often. I am still excited at ShotLink data being used to advance golf analytics in general, but remember: it was never intended to be used to develop out-of-sample predictions.

So let’s stop pretending like it has to be used in DFS.