Saturday, June 20, 2015

Getting Inside the Manager's Head

Do you ever watch a game and think that the manager is making a boneheaded decision by leaving a pitcher in or taking a pitcher out of a game? It turns out that the manager's 'gut' feeling can be quantified. In fact through the miracles of regression analysis, knowledge of baseball, and critical thinking, fans can now understand when and why a starting pitcher is going to be taken out of a game.

Joe Girardi Calling to the Bullpen
Using several continuous and categorical variables the number of pitches, age of the pitcher, runs allowed, and whether the game is being played at home and away had the most effect on the manager's decision to pull out a pitcher. We use these variables to predict innings pitched.

There is to be a note on runs allowed. If a pitcher is starting an inning and allows a base runner that base runner is to be considered a potential run until he is not on base. So when calculating how many innings a pitcher will pitch you are to include the runners on base on base in the current inning as runs. This is because often a manager's decision to take out a pitcher is based on the possibility of giving up more runs, also known as base runners.

In the regression equation, runs allowed has the greatest impact on whether a manager will take out the pitcher, which makes plenty of sense. It conveniently works out that each run allowed results in a deduction of one third of an inning pitched. Pitch count also has a lot to do when a pitcher is going to come out of a game. The more pitches a pitcher throws, the further he will go into a game, but not necessarily will be taken out.

An interesting thing I discovered when running this regression model was that 0.5 more innings are pitched by starting pitchers at home than away. People may say that makes plenty of sense because in order to conclude a win at home the home team has to pitch the top of the ninth but the away team does not have to pitch the bottom of the ninth. Managerial tendencies in today's game show that it is unlikely for a starting pitcher to pitch a complete game, so the extra half an inning does not have much to do with the concept of last licks, but rather strictly being at home.

Regression Equation:

A          Innings = 1.57 + 0.04734 Pitches - 0.3219 Runs + 0.0226 Age

H          Innings = 2.07 + 0.04734 Pitches - 0.3219 Runs + 0.0226 Age

Clayton Kershaw handing the ball over to manager
Don Mattingly 
The older the pitcher the further he is to go into the game, either due to experience or the manager's respect for the pitcher.

This equation is to be used a game goes on, because with each pitch the pitcher pitches the further he will go into the game. 100 pitches results in about 6 innings pitched without giving up a run. 100 pitches and 3 runs allowed results in about 5 innings pitched.

If a pitcher is approaching the inning which the equation yields he becomes more and more likely to get pulled from the game. If a pitcher pitches past the inning which the equation yields he becomes very likely to be taken out of the game. The regression model summary had a R^2 value of 65.38%. This means that 65.38% of the manager's decision to take a pitcher out of a game is attributed to these factors. Other non quantifiable factors that may effect a managers decision to take out a pitcher are a reliable bullpen, injury history, outliers in score, game delays, and ejections.

One variable that I tested that turned out not to have much effect on when a manager would take a pitcher out of a game was the categorical variable of League. Pitchers pitching in National League games stay in games to similar lengths as pitchers pitching in American League games.


 

Tuesday, May 26, 2015

Environmental Factors on a Game

A few months ago I did some research on how environmental factors effect how many runs are scored with Andrew Nave and Terrance McCabe, my classmates at SUNY Oneonta. We collected data from 100 games in the 2014 season. From these games we observed the number of runs scored, the temperature, location of the stadium, home or away, time of game, and ballpark size.

Our sample was collected by randomly selecting a number 1-30, each representing a different team, then randomly selecting a number 1-162, each representing a game played by that team. This data collection strategy gave us a very good representation of the population.

The first hypothesis we tested was whether more runs were scored at home or away. We predicted the same amount of runs are scored at home and away on average, to debunk the age old belief of home field advantage.  From our sample the mean (average) number of runs scored at home is 4.125 and the mean number of runs scored at away games is 4.047. In the nominal statistics teams do in fact score more runs at home, but after using a 2 sample T-test a p-value of .896 was yielded meaning that the two samples are practically identical. So the same amount of runs are scored at home and away. In baseball home field advantage is more of a long term advantage, where a team can build their team in a certain way. For example the New York Yankees signing left handed hitters due to the short distance to right field or the San Francisco Giants signing league average pitchers to pitch in their spacious ballpark to get above average ERAs.

Another belief about baseball games is that more runs are scored in the summer heat than in the cold. We tested the belief by splitting our sample into two categories, games where the starting game temperature was above 80 degrees and where the starting game temperature was below 80 degrees. Games with game time temperature of 80+ degrees had a mean of 4.138 runs and games with a temperature of less than 80 degrees had a mean of 4.071. Using a 2 sample T-test with the claim that more runs are scored during hot games, a p-value of .458 was the result. Since this is greater than the alpha value of 0.05 it is safe to say that from our sample the same amount of runs are scored in the heat and in the cold.

Often the west coast gets a bad reputation for its lack of runs scored because of the marine layer, but this is yet another theory we tried to debunk. By separating the sample into East, Central, and West we found that the means runs scored were 3.968, 4.023, 4.375 respectively. We tested the claim that the number of runs scored regardless of location are equal. By using an Analysis of Variance test (ANOVA test) we found a p-value of 0.858 which means there is not enough evidence to reject our claim. One may think that the runs scored are solely effected by the teams being better or worse than other teams, but the games that teams play on the road are also included in the sample.

Perhaps the most important find from our data collection was from the regression equation. Through backwards elimination we eliminated variables to find the best regression model. The variables that have the most effect on how many runs are scored in a game are time of game, elevation, and temperature. Where 0 represents day games and 1 represent night games

0: Runs = 1.27 + 0.0283 (Temp) +0.000678 (Elevation)
1: Runs = 1.96 + 0.0283 (Temp) + 0.000678 (Elevation)

Our regression analysis yielded an R^2 value of 10.47% which means that 10.47% of the runs scored in a game are solely attributed to the environmental factors mentioned above. In a 10 run game it is safe to say that 1 of the runs was scored because of the environment the team was playing in.

You can't predict baseball, but if you had to choose how many runs a team will score on a particular day the regression equation will be an decent estimate. The number of runs scored will most likely be around 4.

Monday, March 23, 2015

Hideki Matsui and the Rush for International Free Agents

The desire for international free agents has skyrocketed in the past few years. With better scouting more talents are being discovered and coming to the United States. Players have been coming to the United States from the Dominican Republic and Cuba for decades, but the desire for Japanese free agents started with the success Ichiro found in Seattle.

Undoubtedly the most successful Japanese player to make the transition to the MLB Ichrio sparked a period of high spending on international free agents. One player that often gets overlooked as a great success is Hideki Matsui.

Hideki Matsui
In 2003 Matsui signed a 3 year $21 Million contract with the New York Yankees, which today would be worth $26.67 Million. Over this 3 year period Matsui had an average annual WAR of 3.9 which is considered above value and above average. Also over that 3 year span Matsui actually played an average of 162.33 games.

Following his 3 year contract Matsui signed a 4 year $52 Million deal yielding an average annual value of $13 Million which is the modern day equivalent of a 4 year $66 Million contract. Over this span Matsui only played an average of 107 games with an average annual WAR of 2.2. He did not do as well in his last 4 years with the Yankees as he did in his first 3, but considering the value that international free agents are signing for today, the Matsui deal was a steal for the Yankees.

Yoenis Cespedes signed a 4 year $36 Million deal with the Oakland Athletics in 2012. Due to the slight change in CPI from 2012 to 2015 the nominal and real dollars of the deal are close to equal. The average annual value of the first 3 years of the Cespedes deal was $8.5 Million and his average annual WAR for those 3 years was 3.2, just less than Matsui, but at a similar average annual value.

Another international free agent who signed in 2012 was Yu Darvish who signed a 6 year $56 Million deal with the Texas Rangers, but over the first 3 years of the deal Darvish was paid $25 Million which yields an average annual value of $8.33 Million, but his average annual WAR was greater than both Cespedes and Matsui at 4.3 per season.

In 2014 international free agents began signing for much more. Part of this is due to a change in the posting system in Japan. Instead of the teams competing to post the highest bid to negotiate with a player, they can now post a max bid of $20 Million and all teams that posted the bid can negotiate with the player.

Masahiro Tanaka 
Masahiro Tanaka was a player that had teams post the max bid and ended up signing a 7 year $155 Million deal with Yankees. In his first season he had a WAR of 3.3 but at a much higher cost of an average annual value of $22 Million.

Also in 2014 Rusney Castillo a Cuban prospect signed a 7 year $72.5 Million with the Boston Red Sox. Castillo only played 10 games in 2014, but his average annual value is much higher than most other international prospects. The interesting part of the contract is the length. Castillo is only 27 years old and the Red Sox made a 7 year commitment just like the Yankees did with Tanaka, before either of them played a single game in the Major League.

Jose Abreu signed a 6 year $66.3 Million deal with the Chicago White Sox in 2014 and was paid $7 Million in 2014. His WAR for the 2014 season was 5.5 as he hit himself to a Rookie of the Year Award in an MVP caliber season. He had never played a game in the Major League prior to signing the contract, but the White Sox definitely got their bargain.        
Yoan Moncada 

Yoan Moncada is just 19 years old and he signed with the Red Sox with a signing bonus of $31.5 Million. The International free agent market is getting more competitive with more money being invested, players are signing at a younger age, and longer commitments being made to the players. The days were a team can sign an international free agent like Hideki Matsui for a contract similar to his initial 3 years $21 Million contract are gone. The risk involved in signing international free agents is higher than ever.