My (Quick) Thoughts on Batter vs Pitcher Data (BvP)

I think it’s natural and probably accurate to assume that certain batters perform better off of particular pitchers than others. Whether it’s the mechanics of a pitcher’s throwing motion, the type of pitches he throws, or whatever, we’d certainly expect some batters to crush certain pitchers and struggle versus others, even over the long run.

But there’s a difference between a particular phenomenon existing and being able to use it to make accurate predictions. At the end of the day, I don’t really care how a particular batter has performed in the past—we don’t get fantasy points because Miguel Cabrera spanked the ball last week—I just care about predicting what’s going to happen in the future with some degree of accuracy.

I think data and a scientific, mathematical approach to daily fantasy sports is the best way to do that, but not all data is the same. Just because analytics can help us make money in daily fantasy sports doesn’t mean all numbers should be treated equally. For the most part, all we care about is that certain numbers can help us make more accurate predictions, and when it comes to BvP, there’s no data that suggests it has any predictive value for projecting hitter performance.

When it comes down to it, the sample sizes involved with BvP are almost always too small to draw meaningful conclusions; we can explain past events well, but not use the data to help us moving forward. As a batter accrues more and more at-bats against a single pitcher, we can become more and more confident that the results we’ve witnessed are representative of reality. The problem is that the point at which we can be semi-confident in BvP numbers is probably somewhere in the range of 100 or more plate appearances—a figure almost no one reaches against a particular pitcher.

Ultimately, it’s just very, very difficult to separate BvP from variance. Of course we’re always going to see a wide range of performances for a batter against different pitchers, but for the most part, it will be challenging to know if those results are a signal of something greater, or just noise.

So do I use BvP data? Almost never—maybe a couple times per season. There are two occasions when I believe maybe, just maybe, we can use BvP data as a minor component of our decision-making.

The first is when a hitter has truly extreme results against a particular pitcher. The matchup that comes to mind when talking about outlying results is Paul Goldschmidt against Tim Lincecum. At the time of this writing, Goldschmidt has 28 at-bats against Lincecum with 15 hits, seven home runs, 17 RBI, 1.357 slugging percentage, and 1.916 OPS.

Goldschmidt’s results are so extreme that, even though they could be the result of chance, it’s probably more likely that he truly has Lincecum’s number. And by that, I mean he’ll probably continue to crush him in the future, albeit not to that degree. And again, almost all BvP data isn’t extreme enough to overcome small sample sizes.

The second time that I consider using BvP data is when analyzing a pitcher, specifically when a pitcher faces a group of hitters against whom he’s thrown a lot in the past. Maybe he has 30 career matchups against one guy, 35 against another, 20 against another, and so on. In isolation, we can’t be confident that the data from those individual matchups is meaningful, but in aggregate, it might be actionable because we can overcome the sample size issue.

I still don’t use BvP data much for pitchers, but when I do, it’s typically for in-division games. In those games, a pitcher might have enough collective matchups against the opposition’s hitters that we can start to get a sense of whether or not his past numbers are indicative of future play. But again, the sample size needs to be large and the results still pretty extreme, in one direction or the other.

I think it’s natural and probably accurate to assume that certain batters perform better off of particular pitchers than others. Whether it’s the mechanics of a pitcher’s throwing motion, the type of pitches he throws, or whatever, we’d certainly expect some batters to crush certain pitchers and struggle versus others, even over the long run.

But there’s a difference between a particular phenomenon existing and being able to use it to make accurate predictions. At the end of the day, I don’t really care how a particular batter has performed in the past—we don’t get fantasy points because Miguel Cabrera spanked the ball last week—I just care about predicting what’s going to happen in the future with some degree of accuracy.

I think data and a scientific, mathematical approach to daily fantasy sports is the best way to do that, but not all data is the same. Just because analytics can help us make money in daily fantasy sports doesn’t mean all numbers should be treated equally. For the most part, all we care about is that certain numbers can help us make more accurate predictions, and when it comes to BvP, there’s no data that suggests it has any predictive value for projecting hitter performance.

When it comes down to it, the sample sizes involved with BvP are almost always too small to draw meaningful conclusions; we can explain past events well, but not use the data to help us moving forward. As a batter accrues more and more at-bats against a single pitcher, we can become more and more confident that the results we’ve witnessed are representative of reality. The problem is that the point at which we can be semi-confident in BvP numbers is probably somewhere in the range of 100 or more plate appearances—a figure almost no one reaches against a particular pitcher.

Ultimately, it’s just very, very difficult to separate BvP from variance. Of course we’re always going to see a wide range of performances for a batter against different pitchers, but for the most part, it will be challenging to know if those results are a signal of something greater, or just noise.

So do I use BvP data? Almost never—maybe a couple times per season. There are two occasions when I believe maybe, just maybe, we can use BvP data as a minor component of our decision-making.

The first is when a hitter has truly extreme results against a particular pitcher. The matchup that comes to mind when talking about outlying results is Paul Goldschmidt against Tim Lincecum. At the time of this writing, Goldschmidt has 28 at-bats against Lincecum with 15 hits, seven home runs, 17 RBI, 1.357 slugging percentage, and 1.916 OPS.

Goldschmidt’s results are so extreme that, even though they could be the result of chance, it’s probably more likely that he truly has Lincecum’s number. And by that, I mean he’ll probably continue to crush him in the future, albeit not to that degree. And again, almost all BvP data isn’t extreme enough to overcome small sample sizes.

The second time that I consider using BvP data is when analyzing a pitcher, specifically when a pitcher faces a group of hitters against whom he’s thrown a lot in the past. Maybe he has 30 career matchups against one guy, 35 against another, 20 against another, and so on. In isolation, we can’t be confident that the data from those individual matchups is meaningful, but in aggregate, it might be actionable because we can overcome the sample size issue.

I still don’t use BvP data much for pitchers, but when I do, it’s typically for in-division games. In those games, a pitcher might have enough collective matchups against the opposition’s hitters that we can start to get a sense of whether or not his past numbers are indicative of future play. But again, the sample size needs to be large and the results still pretty extreme, in one direction or the other.