The Myth of the Specialist

DFS has its occasional moments where its strategies can teach people a thing or two about predictive analytics. Even your boilerplate write-ups for the main sports do a pretty good job about emphasizing what stats are most predictive of future outcomes and why (RB carries/WR targets in football, usage rate and number of possessions in basketball, etc.)

PGA is in the analytics dark ages relative to the other sports. As a result, there’s been a proliferation of analysis that relies on “key statistics” where little-to-no leg work has been done showing the out-of-sample predictive value of those statistics. I’m not saying that metrics like long-term AdjRd or SG:T2G will have no future predictive value, but rather that in the absence of understanding how predictive each of the generally used stats are, they all get mashed together and get treated interchangeably. That can have some major consequences for your research process.

There are a lot of research processes that are guilty of not checking predictive value of their stats, but I think the PGA DFS attempt of finding “specialists” each week is one of the most egregious. The concept is simple enough: look at course-specific conditions each week (course difficulty, field strength, total course length, etc.), see which golfers have overperformed their averages in each of the categories, and pick a golfer that overachieves in each of the categories. To be clear: I think the specialist approach could contain valuable information in the abstract. Things like grass type and less-than-driver courses are course fit-type angles that can probably add to traditional course-fit approaches.

However, as it’s set up now, finding a specialist by filtering on relative performance in each category is the Warren Buffet coin-flipping fallacy incarnate. By variance alone, you’re pretty much guaranteed to have a normal distribution of golfers doing better or worse than average in each of these conditions. And you should honestly assume that until you can prove otherwise, category-specific performance is random and not predictive of future outcomes in that category (we’ll elaborate on this in a future article). To illustrate, here are some other specialist-type categories you could just as easily include in that type of approach:

Courses in a city with at least two professional sports teams
Courses in a state with one Democratic and one Republican Senator
Courses ending in a vowel
Courses where the sponsor has a market cap of at least $10 billion

You’re probably looking at these and thinking they’re ridiculous examples. None of these factors have anything to do with the actual golf being played, so it’s silly to include them (a true statement, to be clear). But I bring up these ridiculous examples because I guarantee you there are golfers that perform better and worse in each of these categories. How would you prove that the “legitimate” specialist categories (course par score, course difficulty, wind level, etc.) are okay to use in this approach, but these exaggerated categories aren’t? You could infer some sort of subject matter-type judgment as to what is and isn’t important. But the entire recent history of sports analytics is littered with examples of disproving such expertise. The only objective way to determine legitimate specialist-type categories is to prove which of them have actual meaningful out-of-sample predictive power.

Why don’t you see this type of proof, even though it’s basically required? My guess is it’s because a fair amount of work. It’s a lot easier to pick and choose your categories based on intuition, but coming up with prior expectations and correlating year-over-year metrics to see what’s predictive is a lot harder. You could probably talk yourself into the effort not being worth it, because you already know what’s relevant.

But you will be shocked how many of these specialist-type factors aren’t all that predictive out of sample. We’ll get into a couple examples in next week’s article to illustrate the point. But in the meantime, golf analysis could stand a little more statistical proof of predictive power of the metrics you hear from week to week.