Thursday, May 19, 2016

The Statistical Theory Behind the Shift

The Cardinals Executing the Shift
For years now Major League teams have been implementing "The Shift". The shift is a strategic re-positioning of fielders according to where a batter hits the ball most often. A shift for a left handed hitter will traditionally have the short stop play right up the middle, the second baseman play about two-thirds of the way between first and second base, the first baseman close to the line, and the third baseman play in shallow right field, with all the outfielders shading toward right field.



Why do teams shift?

Through years of compiling detailed data about players' batted balls teams were able to come to the conclusion that the traditional positioning of fielders may be inefficient for certain players. These teams observed that some players hit to one particular field more than others. The mistake that teams were making for decades was assuming a uniform distribution of hits. This means that team was assuming that when a player hit a ball, that ball had an equal probability of landing anywhere on the field. When teams discovered that the hitting distribution was not uniform, and in fact skewed, they adjusted their fielders accordingly.

The Mathematics Behind The Shift

Through any analytical process there are always numbers backing up the claim. In fact not many people will accept you claim with out any mathematical or statistical backing. The mathematics that support the shift is something called the sum of squares residuals. A residual is the difference between an observed value and the expected value. For example Derek Jeter's career batting average before his last season was .312. So the expected value for his batting average in his last season was .312. He ended up hitting for a batting average of .256 in his final season, so the residual would be (.256 - .312) which equals -0.056. For predictive models you want to minimize the sum of squares residuals, if your sum of squares residuals is equal to zero, the predictive model would be the most efficient.  By predicting where the ball may land, the shift will minimize the distance a player will have to run to get the ball. In this case the observed value is the distance a player will have to run, and the expected value is zero. The claim is that by shifting the sum of the distance needed to be covered by a fielder is minimized.

Example of the Effectiveness of The Shift

Lets compare two players, Jacoby Ellsbury and Chris Davis in 2015. Jacoby Ellsbury had a hit distribution that was close to uniform. He hit to right field at a rate of 37.8%, to center field at a rate of 35.1%, and to left field at a rate of 27.0%. Teams typically play Ellbsury straight up, the traditional positioning of the fielders. A uniform hit distribution would have about 33.3% to all fields, so you can see that Ellsbury's hit distribution is a bit skewed. So in theory a defense should shift slightly toward right field for Ellsbury. Chris Davis on the other hand hit to right field at a rate of 55.9%, center field at a rate of 26.5%, and to left field at a rate of 17.6%. Davis' hit distribution is very skewed to right field. So theoretically you would want to position your players closer to right field than the traditional way.

The Shift: Visual Example

For this example imagine a left handed pull hitter, such as Chris Davis. The first diagram is a traditional positioning for a defense and on the second diagram is the shifted positioning for a defense. The red dots are where a ball landed and the black lines are the distance the fielder had to run to get the ball.

Notice that the distances the fielders had to run are 10 ft, 5 ft, and 15 ft. The expected value that the fielders had to run was 0 ft. So the residuals would be:
10 - 0 = 10
5 - 0 = 5
15 -0 = 15
And the sum of these values squared is:
10^2 + 5^2 + 15^2 = 350.




In this diagram the red dot are in the exact same location, but the defense is shifted toward right field. Notice that the distance the fielders had to run is much smaller. The residuals would be:
10 - 0 = 10
2 - 0 = 2
1- 0 = 1
And the sum of these values squared is:
10^2 + 2^2 + 1^2 = 105

The sum of squared residuals is much smaller for the shifted defense than it is for the traditional defense.


Conclusions 

The shift is here to stay. Physically players hit the ball the furthest when they pull the ball, so a big guy like Chris Davis will try to pull the ball to try to hit a home run. This trade off is something that Davis can live with a player. Since 2012 Chris Davis has averaged 40 home runs per season, but has only a .256 batting average. He has made a decision as a player that he would rather produce power numbers rather than hitting safely more often. There are situations that it would make logical sense to hit a ground ball toward the left side of the field for a hit; such as in a close game in the 9th inning, but Davis has the potential to hit the ball out of the ballpark.

Every player in the MLB is different and that calls for a different shift for each player. It is up to the analytic department to keep up on that and determine the best shift for a player and they do that by determining the minimal sum of square residuals.