When Does a Small Sample Size Become a Statistically Relevant Sample Size?

One of the most common questions that I get from Fantasy Labs users and readers is how to interpret data. It’s an issue that is very important to DFS players – when should we rely on small sample sizes and when should we trust in larger data samples? Or perhaps more importantly: when do small sample sizes hit that point where they’re no longer “small samples” anymore and thus can be safely trusted?

I’m going to continue with this article and give you my extended thoughts on this subject, but I do want to be clear from the get-go – I don’t know.

To explain, let’s first get a little philosophical. Technically, everything can be considered a small sample. It’s why the concept of infinity is so difficult to understand – okay, so something is infinite, but when does it end? What happened before the universe existed? I don’t want to go too far down this rabbit hole, but the idea of time is something that is completely relative.

When we’re trying to evaluate sample sizes in sports, we have a similar issue – everything is relative. Data from five minutes of a player is less than data from five games of a player which is less than data from 50 games of a player which is less than data from five seasons of a player which is….well, you get my point. Where along that timeline is the exact moment where that data stops being “small sample-y” and instead becomes statistically relevant?

I believe it comes down to an important phrase: increasing confidence. I know that you probably just want me to give you an exact moment where the data is statistically relevant so you can go on with your day. But instead of thinking about it as a single moment, think of it as something that is a continuous process and something that you can become increasingly confident in as time goes on and the data set grows larger. Of course, the question will always be there –  “what is enough confidence?” – but that just takes us in that never-ending, useless loop of trying to find a non-existent point.

Despite not having a point to hold onto, we can continually gauge our confidence level with data. And this is important – different contests require different levels of confidence in data. Am I confident in Karl-Anthony Towns’ statistics through four games enough to put a ton of money down on him in cash games? Maybe not. But am I confident enough in them that I’m willing to make him a core part of my GPP lineups? I think that’s definitely reasonable.

I typically always trust the larger data set when I’m evaluating players. Anthony Davis has struggled this year, but we have an immensely bigger sample size of him being incredible at basketball than whatever is happening right now. I know that’s an extreme example because, duh, Anthony Davis is obviously still really good, but the point is to figure out what you have confidence in and base your decisions on that. And on that note, many DFS players will base their decisions off recent, small samples, which in turn can make the larger data set – one that would predict regression – even more valuable in tournaments.

This may sound harsh, but if you’re asking for an exact moment or point where a small sample turns to a statistically relevant one, you’re thinking about it incorrectly. Daily fantasy is a zero-sum game, which makes all data relevant – even its non-relevancy makes it relevant. The thing you have to find as a DFS player is your confidence level in both data sets and your willingness to trust them in different contests. And that’s a personal decision that is based off a variety of factors, including your bankroll, average win rate, the time you’re able to invest into DFS research, and so on.

One of the most common questions that I get from Fantasy Labs users and readers is how to interpret data. It’s an issue that is very important to DFS players – when should we rely on small sample sizes and when should we trust in larger data samples? Or perhaps more importantly: when do small sample sizes hit that point where they’re no longer “small samples” anymore and thus can be safely trusted?

I’m going to continue with this article and give you my extended thoughts on this subject, but I do want to be clear from the get-go – I don’t know.

To explain, let’s first get a little philosophical. Technically, everything can be considered a small sample. It’s why the concept of infinity is so difficult to understand – okay, so something is infinite, but when does it end? What happened before the universe existed? I don’t want to go too far down this rabbit hole, but the idea of time is something that is completely relative.

When we’re trying to evaluate sample sizes in sports, we have a similar issue – everything is relative. Data from five minutes of a player is less than data from five games of a player which is less than data from 50 games of a player which is less than data from five seasons of a player which is….well, you get my point. Where along that timeline is the exact moment where that data stops being “small sample-y” and instead becomes statistically relevant?

I believe it comes down to an important phrase: increasing confidence. I know that you probably just want me to give you an exact moment where the data is statistically relevant so you can go on with your day. But instead of thinking about it as a single moment, think of it as something that is a continuous process and something that you can become increasingly confident in as time goes on and the data set grows larger. Of course, the question will always be there –  “what is enough confidence?” – but that just takes us in that never-ending, useless loop of trying to find a non-existent point.

Despite not having a point to hold onto, we can continually gauge our confidence level with data. And this is important – different contests require different levels of confidence in data. Am I confident in Karl-Anthony Towns’ statistics through four games enough to put a ton of money down on him in cash games? Maybe not. But am I confident enough in them that I’m willing to make him a core part of my GPP lineups? I think that’s definitely reasonable.

I typically always trust the larger data set when I’m evaluating players. Anthony Davis has struggled this year, but we have an immensely bigger sample size of him being incredible at basketball than whatever is happening right now. I know that’s an extreme example because, duh, Anthony Davis is obviously still really good, but the point is to figure out what you have confidence in and base your decisions on that. And on that note, many DFS players will base their decisions off recent, small samples, which in turn can make the larger data set – one that would predict regression – even more valuable in tournaments.

This may sound harsh, but if you’re asking for an exact moment or point where a small sample turns to a statistically relevant one, you’re thinking about it incorrectly. Daily fantasy is a zero-sum game, which makes all data relevant – even its non-relevancy makes it relevant. The thing you have to find as a DFS player is your confidence level in both data sets and your willingness to trust them in different contests. And that’s a personal decision that is based off a variety of factors, including your bankroll, average win rate, the time you’re able to invest into DFS research, and so on.