In my last article, I looked at strong inference and White Swans in order to elucidate the idea that we do not focus on negative Trends nearly as much as we should when making fantasy sports decisions. Early in the analytical process, identifying the players to fade is just as important as (and maybe more important than) narrowing in on the potential players for your lineup. In this article, which can be thought of as an addendum to the previous piece, I want to explore the idea of selection bias and how it often manifests in the ways that we evaluate slates and players.

This is the 19th installment of The Labyrinthian, a series dedicated to exploring random fields of knowledge in order to give you unordinary theoretical, philosophical, strategic, and/or often rambling guidance on daily fantasy sports. Consult the introductory piece to the series for further explanation.  

“Selection Bias”: A Fancy Phrase for Screwing Up

Selection bias, as a concept, is exactly what it sounds like: it is the bias that occurs when a researcher studies a data set in a nonrandom (selective) manner. In the way that academics and statisticians talk about it, almost any mistake can be considered selection bias. It is a huge mansion in which are many erroneous rooms with formal-sounding names:

  • Sampling bias: A sample population is created or sorted in a skewed manner
  • Faulty bootstrapping: A sample of a larger population is re-sampled in a manner inconsistent with the original sampling
  • Premature termination: A study is ended early so that the results at that time support preconceived notions
  • Confirmation bias: A researcher pursues affirming instead of negating evidence
  • Cherry picking: Specific data is used to support a position and other relevant data undermining that position is ignored.
  • Availability heuristic: The ideas, theories, or information that are more readily accessed by our thought processes are deemed more important than those that aren’t.
  • Recency bias: The overvaluation of recent information and occurrences.
  • Suggestibility: A researcher is easily influenced by external input, which impacts the act of gathering and analyzing data.

I could go on, but you get the idea. For almost every inclination we have, there is a formal type of bias theorized by researchers who point out how we humans tend to select and sort data in a selective manner.

Basically, to be human is to be engaging and attempting not to engage in the selective bias all the time.

The Consequences of Selection Bias

Selection bias can have many negative consequences, most of which can be grouped under two large types:

  1. The unrepresentativeness of what we see.
  2. The suboptimal perception of and actions stemming from our perception of what we see.

When we create a data set to analyze — whatever that data set may be — if we do so selectively then what we see will be an inaccurate representation of reality. It will purport to tell us that the world behaves in one way, when the world really might not behave that way at all. When the data we analyze is incorrect, the beliefs we form as a result of that analysis are very likely to be wrong.

Of course, there are times when the data set is correct but we perceive it through a selective lens. In these instances, we often interpret data in a way that confirms our beliefs about the world. Insofar as the data is subordinate to our preconceived notions, whether the data is accurate or not is almost irrelevant. It might as well not even exist for all the good (and harm) it does. When this is the case, the predictions we make will often be wildly inaccurate, despite the fact that they have “data” attached to them. They will seemingly be grounded in facts, but really they will be grounded in desires, and we will thus behave in a highly suboptimal manner.

And then there are the situations in which, even if the data and our perceptions of the data are accurate, our transition from perception to action can be flawed. For instance, the connection between theory and practicality in the minds of the researcher could be severed. Maybe someone could want to act in a new and informed manner but simply fall back into the default mode of action because their habits selectively opt for that course of behavior.

However we err — in collecting, analyzing, or acting upon data — when we do selectively, the result is often our believing or behaving as if the world is something other than it is. And when that happens we often view the chess game of existence as if it were checkers.

And, as DFS players, we lose money.

Balancing Positive and Negative Plus/Minus DFS Trends

As I stated in my previous piece, I believe that “in DFS far too many people start their lineup construction process by looking for players to put in their lineups, not players not to put in their lineups.”

Additionally, I also suggested that when using our Trends tool you consider the possibility of creating only negative Plus/Minus trends, which would appear in the “My Trends” column of our Models and be balanced out on the positive side by our Pro Trends, which also appear in the models. The idea of balance is of great importance and what I want to explore for the rest of the article.

Some people emailed me to say that, although they liked the piece, they weren’t thrilled with the idea of no longer having the benefit of the positive trends they have created. As one person put it, “I have worked very hard on creating positive Trends with high plus/minus and consistency. It makes no sense to eliminate these profitable positive Trends.” And that’s entirely fair.

You don’t need to get rid of your positive trends if you don’t want to. If, though, you do decide to keep your positive trends, then you should also think about ways of ensuring that your “My Trends” section is balanced, since a balance in the models (with a positive Pro Trends column and a negative “My Trends” column) is really what I was trying to get you to think about in that last piece.

Are Your DFS Trends Unbalanced?

I bet that there is already way too much selection bias occurring in DFS. People select player on “hot streaks” for their lineups. They use too many of the players from their favorite teams. They use too many players competing against teams they hate. They unknowingly prioritize the wrong types of statistics when deciding between small groups of players. In DFS, selection bias is everywhere.

And, unfortunately, it probably exists in the trends that you create. I was talking with a subscriber last week who said that he was considering using a player who matched with some of his positive trends. But he was also apprehensive about using that player because he also matched for a negative trend. I asked for specifics, and the subscriber said that the player in question matched for maybe five positive trends and one negative trend. I asked the subscriber how many trends he had in total. He had around 25 trends in total — 23 were positive and two were negative. (And I’ve had multiple conversations like this with various subscribers.)

On the one hand, a player who matches 20 percent of your positive trends and 50 percent of your negative trends is indeed problematic. On the other hand, in this situation we have no accurate way of knowing how problematic that player is because your trend section is highly unbalanced. It might be the case that if you created 21 more negative trends, the player in question would still match for only one trend, in which case he would probably be a strong play. It’s also possible that he might actually match for roughly 50 percent of your 23 negative trends. In that case, he would probably be a player to avoid. The problem is that, because of the imbalance in the number of trends, we are simply engaging in conjecture.

In those instances in which a great disparity exists between our positive and negative trends, we have fallen victim to the selection bias in that we have collected and sorted data points in a skewed manner that provides us with a highly uncertain representation of reality.

Combatting Selection Bias in Our DFS Trends

Selection bias is natural, but we can mitigate it in the way that we create DFS trends. In theory, when we evaluate slates and players we want to take the entire pool of lineup candidates into account, and so we want our portfolio of trends to reflect that goal. Clearly, not every trend will be expansive (since the goal of a trend is to separate some players from others on the basis of data), but by making sure that our trends individually and collectively aren’t constructed in such a way so as to select for players for or against whom we have biases then we can use our trends to analyze slates and players in a thorough and encompassing manner. Even when we are separating players with our trends, the goal is to separate them as evenly as possible.

For instance, the less particular and more universal our trends are, the better. Although I have recommended previously that you might want to create some highly specific trends with small sample sizes — such trends can be useful when chasing Black Swans — you will likely want most of your trends to be very open and applicable to a large number of players. With large trends, we have a much better chance of touching on all the players in a slate and of creating various sample populations that are truly representative.

We can also combat selection bias by creating complementary trends, which should provide us with a fuller perspective. In my last piece, I presented an example of a simple trend matching for small forwards who have averaged no more than 25 minutes per game over the last year:

Under 25 Minutes-SF-2

This is an actionable trend, but it might have even more utility when paired with a complementary trend capable of placing the original trend in context. This second trend would match for small forwards averaging at least 25 minutes per game across that same timeframe:

Over 25 Minutes-SF

In seeing this second trend, we can fully appreciate the extent to which small forwards who average fewer than 25 minutes per game underperform. It’s not just that they fail to score as many points as the small forwards averaging more than 25 minutes — which we would entirely expect. It’s that the low-minute small forwards underperform and the high-minute small forwards outdo their salary-adjusted expectations. There’s no reason why this should inherently be the case — DraftKings doesn’t need to overvalue the bad players and undervalue the good players — but at the small forward that is exactly what they do.

In other words, as un-intuitive as this sounds, to get your money’s worth when investing in small forwards, it actually makes sense (per the paired trends) to spend more money at that position, not less. And maybe you actually knew this information. Maybe you didn’t. But with the paired trends you were able to evaluate the entirety of the small forward position and have the data to guide your decision.

And, finally, you can combat selection bias through balance. Note the small forward trends above. In terms of minutes (0-25, 25-50), they are balanced. And they are also balanced in terms of numbers (one positive trend, one negative trend). And they are relatively balanced in sample size. Both of them screen for thousands of players and are about as encompassing as trends can be. In creating these trends, I believe that I was able to circumvent selection bias.

Balance in the Labyrinth

The goal of the Trends section is to help you make the most accurate predictions possible so that you can make informed money-making DFS decisions. Whatever strategies you have with our Trends section, if they are working for you then by all means keep those in place. I’m not encouraging you to change the core of the process that has led to your success. Rather, I’m offering ideas that might improve what you do.

If you believe that you’re not getting out of the Trends tools all that you could be, that might be the result of an unbalanced approach. The more balanced and inclusive your trends are, the more knowledge you gain about the values of players — and probably about “value” as a DFS concept.

Balance is what prevents us from selectively focusing solely on positive Plus/Minus trends that yield sample populations that are small and possibly unrepresentative. With balanced trends, you will be in a much better position to identify players who might look like good plays but really aren’t (and vice versa).

In the labyrinth, balance isn’t how you get to the center. Balance is the center.


The Labyrinthian: 2016, 19

Previous installments of The Labyrinthian can be accessed via my author page. If you have suggestions on material I should know about or even write about in a future Labyrinthian, please contact me via email, [email protected], or Twitter @MattFtheOracle.