Our Blog


How To Measure Course History

Quantifying the Unexplainable

I missed the entire debate on course history when it happened last month. For all I know, it might still be going on. Even hearing about it second hand, though, I’m honestly shocked that the split is even close to 50/50 on whether to incorporate it into building PGA lineups. Course history absolutely matters.

Don’t get me wrong: I think that it’s the most frustrating metric in all of PGA. Course history essentially says something like this:

We don’t know why this golfer’s performance is so different at this one place, but it just is.

The job of data is to explain as much as possible, and course history is the unexplainable leftover at the end.

When phrased that way, the argument against using course history becomes a little more persuasive. If you can’t otherwise explain results, the chance that the data is just noise skyrockets. Fold in the fact that course history has an extremely small sample size, and I can see why you might think it’s too noisy to consider.

However, if you can at least quantify the unexplainable leftover that is course history, you can backtest it to see if it is in fact predictive of future outcomes (a practice you should be doing for all your metrics). The short version is that course history is in fact predictive. The long version will come in a future article.

“It’s in the Data”

As uncomfortable as I might be using a metric that by definition encompasses unexplained variation, I’d be less comfortable assuming that if a model can’t explain something then it’s reality’s fault and not the model’s. There are plenty of narrative-type explanations (such as confidence, unrecorded experience, and general comfort level) that could explain course history, but we simply don’t have the data to use for root-causing course history. I’ll settle for lumping it into a metric that I know I can’t explain but that I can measure.

I’ve still heard a fair number of data-driven arguments against course history, each of which is some variation on “if course history is predictive, it should come out in the data.” While this should always be the standard for making data-driven decisions, “it’s in the data” isn’t a catch-all phrase that is the be-all and end-all for an argument.

Different people can come to different conclusions depending on how they analyze the data. I don’t know exactly why I’m on the other side of these data-driven arguments since I have no visibility into the nitty-gritty of what constitutes “in the data.”

However, I’ve had enough experience building other models for other sports to have some ideas as to why course history may not show up in other people’s data-driven approaches. Chief among them: It matters immensely exactly how you quantify and measure course history, since it’s such a nebulous concept. I have a way of measuring course history. I’m not necessarily convinced that I have the best way.

But I do have some lessons learned on what makes for good principles on how to measure course history. I hope that someone takes these as a starting point and comes up with a better way to measure it: I’m becoming increasingly convinced that it deserves more credit than most people give it.

Colin’s Commandments

Here are the commandments I’ve come up with to date for measuring course history:

– Course history should be measured relative to expectations. I start by measuring course history as how much better or worse I would expect a golfer to perform at a given course, all else being equal. This has a couple of advantages: It accounts for a player’s change in skill level at each course visit, it minimizes the influence of variable playing conditions, and when done correctly it is independent of field strength. Measuring absolute performance will leave your course history metric too much in the whims of the inherent randomness of all of these factors.

– Your metric for expectation matters. Choose a non-noisy one. Maybe you might start with an adjusted-round type metric, i.e. how many strokes per round did a golfer do compared to what you expected. Strokes per round, however, is a function of a lot of things, including skill level, weather, variation in course conditions, and course fit. It’s extremely subject to noise, and the more noisy your expectation metric is, the harder it will be to separate out signal. Personally, I’m a fan of using head-to-head probabilities as my target expectation, since not only does it minimize a lot of those factors, but it utilizes the fact that you have roughly 150 data points per tournament instead of the one that you’d get from strokes per round. Speaking of which . . .

– Incorporate sample size. Which do you think is a better course history: One year with a T5 finish? Or three years with a 21-30 finish? I’m not sure we know enough to say conclusively yet, but I’m leaning toward the latter. Joint probability says that the odds of random chance explaining three above-average finishes are much smaller than a single tournament. The exact weighting and balance of quality of finishes versus number of finishes is very much up for debate, but you absolutely need both.

– A rolling five-year window is better than including all history. This is a recent conclusion of mine after doing a preview for a course that had a major redesign in the last five years. Sometimes courses get major overhauls in redesigns, but more often than not they’ll have more subtle changes that reflect the ever-changing intent of the Tour (i.e. Tiger-proofing). This commandment definitely improves accuracy in the general case. However, using a five-year window causes you to lose out on potential cases of elite long-term course history (e.g. Sergio Garcia at Sawgrass). I’m still working through the best way to balance those tradeoffs, but for the majority of course history incorporation, the five-year window ends up being better.

Next week, we’ll actually put up my measurement of course history and quantify its effect on roster construction.

Quantifying the Unexplainable

I missed the entire debate on course history when it happened last month. For all I know, it might still be going on. Even hearing about it second hand, though, I’m honestly shocked that the split is even close to 50/50 on whether to incorporate it into building PGA lineups. Course history absolutely matters.

Don’t get me wrong: I think that it’s the most frustrating metric in all of PGA. Course history essentially says something like this:

We don’t know why this golfer’s performance is so different at this one place, but it just is.

The job of data is to explain as much as possible, and course history is the unexplainable leftover at the end.

When phrased that way, the argument against using course history becomes a little more persuasive. If you can’t otherwise explain results, the chance that the data is just noise skyrockets. Fold in the fact that course history has an extremely small sample size, and I can see why you might think it’s too noisy to consider.

However, if you can at least quantify the unexplainable leftover that is course history, you can backtest it to see if it is in fact predictive of future outcomes (a practice you should be doing for all your metrics). The short version is that course history is in fact predictive. The long version will come in a future article.

“It’s in the Data”

As uncomfortable as I might be using a metric that by definition encompasses unexplained variation, I’d be less comfortable assuming that if a model can’t explain something then it’s reality’s fault and not the model’s. There are plenty of narrative-type explanations (such as confidence, unrecorded experience, and general comfort level) that could explain course history, but we simply don’t have the data to use for root-causing course history. I’ll settle for lumping it into a metric that I know I can’t explain but that I can measure.

I’ve still heard a fair number of data-driven arguments against course history, each of which is some variation on “if course history is predictive, it should come out in the data.” While this should always be the standard for making data-driven decisions, “it’s in the data” isn’t a catch-all phrase that is the be-all and end-all for an argument.

Different people can come to different conclusions depending on how they analyze the data. I don’t know exactly why I’m on the other side of these data-driven arguments since I have no visibility into the nitty-gritty of what constitutes “in the data.”

However, I’ve had enough experience building other models for other sports to have some ideas as to why course history may not show up in other people’s data-driven approaches. Chief among them: It matters immensely exactly how you quantify and measure course history, since it’s such a nebulous concept. I have a way of measuring course history. I’m not necessarily convinced that I have the best way.

But I do have some lessons learned on what makes for good principles on how to measure course history. I hope that someone takes these as a starting point and comes up with a better way to measure it: I’m becoming increasingly convinced that it deserves more credit than most people give it.

Colin’s Commandments

Here are the commandments I’ve come up with to date for measuring course history:

– Course history should be measured relative to expectations. I start by measuring course history as how much better or worse I would expect a golfer to perform at a given course, all else being equal. This has a couple of advantages: It accounts for a player’s change in skill level at each course visit, it minimizes the influence of variable playing conditions, and when done correctly it is independent of field strength. Measuring absolute performance will leave your course history metric too much in the whims of the inherent randomness of all of these factors.

– Your metric for expectation matters. Choose a non-noisy one. Maybe you might start with an adjusted-round type metric, i.e. how many strokes per round did a golfer do compared to what you expected. Strokes per round, however, is a function of a lot of things, including skill level, weather, variation in course conditions, and course fit. It’s extremely subject to noise, and the more noisy your expectation metric is, the harder it will be to separate out signal. Personally, I’m a fan of using head-to-head probabilities as my target expectation, since not only does it minimize a lot of those factors, but it utilizes the fact that you have roughly 150 data points per tournament instead of the one that you’d get from strokes per round. Speaking of which . . .

– Incorporate sample size. Which do you think is a better course history: One year with a T5 finish? Or three years with a 21-30 finish? I’m not sure we know enough to say conclusively yet, but I’m leaning toward the latter. Joint probability says that the odds of random chance explaining three above-average finishes are much smaller than a single tournament. The exact weighting and balance of quality of finishes versus number of finishes is very much up for debate, but you absolutely need both.

– A rolling five-year window is better than including all history. This is a recent conclusion of mine after doing a preview for a course that had a major redesign in the last five years. Sometimes courses get major overhauls in redesigns, but more often than not they’ll have more subtle changes that reflect the ever-changing intent of the Tour (i.e. Tiger-proofing). This commandment definitely improves accuracy in the general case. However, using a five-year window causes you to lose out on potential cases of elite long-term course history (e.g. Sergio Garcia at Sawgrass). I’m still working through the best way to balance those tradeoffs, but for the majority of course history incorporation, the five-year window ends up being better.

Next week, we’ll actually put up my measurement of course history and quantify its effect on roster construction.