Monday, June 20, 2016

Who Would Win If The Team Names Were Taken Literally?

What would happen if there were battles between the teams in the MLB if their names were taken literally? Which team would come out on top? Lets find out.

To set up the match ups, we're going to set up the American League and National League where the team with the best record plays the team with the worst record, the team with the second best record plays the team with the second worst record, and so on. The team right in the middle gets a by, because no one noticed them.

American League

Rangers vs Twins
Orioles vs Athletics
Red Sox vs Angels
Blue Jays vs Rays
Indians vs White Sox
Royals vs Astros
Mariners vs Yankees
Tigers get a by

National League

Cubs vs Braves
Giants vs Reds
Nationals vs Padres
Dodgers vs Phillies
Marlins vs Brewers
Mets vs DBacks
Cardinals vs Rockies
Pirates get a by



Round 1:

AL:

The Rangers easily handle the Twins as they are outnumbered and clearly out skilled in combat.

The Orioles stand no match for the country's top athletes the Athletics. The birds had no way of completing a strong attack.

A pair of Red Sox couldn't stop the Angels as they just sat in the clouds and watched the sox tumble in the drier.

The Blue Jays were flying around for days trying to find the Rays but were drastically unsuccessful. Maybe if they were still the Devil Rays it would have been a different story.

Much like the Red Sox the White Sox stood no chance to the Indians. Who have are some of the best in combat in the MLB.

No matter how royal they may be, the Royals could not compete with the technology that the Astros had on their side.

The Mariners made quick work of the Yankees would continued to attempt to swim to their boats, but their wallets kept dragging them down.

NL:

A group of baby bears were not able to fend off the Braves things may have been different if they had help from their football companions, the Bears.

The Giants had an easy time seeing the Reds in the fields and easily stomped them out.

Although the Nationals took a lot of pride in their country, but it wasn't enough, the Padres had the lord on their side. The Nationals were all struck by lightning.

Although people from Philadelphia can get very violent, the Dodgers just jumped out of the way, the Phillies eventually got frustrated and began fighting each other.

The Marlins were ready to fight, but the Brewers never showed up. When investing what happened, an associate of the MLB found them all passed out drunk in the brewery.

The Metropolitans (Mets) were really confusing the Diamondbacks early by taking the subway, but eventually the Dbacks found them injected their venom into the Mets.

It was taking millions of years for the Rockies to move so this round is going to the Cardinals.

Round 2:


AL: 

Rangers vs Athletics
Angels vs Rays
Indians vs Astros
Mariners vs Tigers

They were able to beat the Orioles with their amazing athleticism but the Rangers use their weapons to take out the Athletics in round 2.

The Rays are unable to use their ultraviolet radiation to cause long term damage to the Angels. The Angels move on to the next round.

Once again the Indians put up a great fight, but the Astros technology is too complex and blew away the tribe.

The Tigers were swimming to the Mariners boats not giving up and leaping aboard. It looked like something out of the Life of Pi. The Tigers beat the Mariners and ate all their fish.

NL:

Braves vs Giants
Padres vs Dodgers
Marlins vs DBacks
Cardinals vs Pirates

With dozens of arrows in their legs the Giants stopped playing games and grow frustrated with the Braves and just stomped them out like they did the Reds.

The Padres had the lord on their side in the first round but the Dodgers could not be hit. The lord eventually had better things to do and the Padres went down.

Did you know that Diamondbacks could swim? That must be why they have a pool in their park. Well they jumped in the water and injected venom into all the Marlins.

The pirate's life was not for the Cardinals as they went out to sea and the Pirates swiped them down with their swords one by one.

Round 3:

AL:

Rangers vs Angels
Astros vs Tigers

The Angels got this far without ever having to fight, but the Rangers were not the most religious people and didn't believe they were real. The Angels never showed up.

The Astros have made it this far with their technology, but the Tigers have found their way into the control area. They take out the engineers behind the controls.

NL:

Giants vs Dodgers
Diamondbacks vs Pirates

The Giants were too big for the Dodgers to dodge this time around. The Giants take the NL West rivalry once again.

The Pirates already having scurvy and drunk from all the rum they have been drinking after the victory over the Cardinals get injected with the Diamondback's venom and go quickly.

Round 4:

AL:

Rangers vs Tigers

The Rangers on their horses cannot outrun the Tigers. Their guns nearly out of ammo, and the Tigers to agile to hit. The Tigers take down the Rangers.

NL:

Giants vs Diamondbacks

Even with all the venom in the Diamondbacks combined the Giants are just too big to be affected by the venom. The Dbacks are very agile and avoid being stomped out for a very long time. This series went all the way to game seven, but the Giants find themselves victorious.

Final Round: 

Giants vs Tigers


The Tigers are hungry for blood and the Giants have a lot of it. Quickly avoiding attacks by the Giants, the Tigers are getting tired. Much like the 2012 World Series the Giants easily stomp out the Tigers and are victorious with their pure size and strength.






Thursday, May 19, 2016

The Statistical Theory Behind the Shift

The Cardinals Executing the Shift
For years now Major League teams have been implementing "The Shift". The shift is a strategic re-positioning of fielders according to where a batter hits the ball most often. A shift for a left handed hitter will traditionally have the short stop play right up the middle, the second baseman play about two-thirds of the way between first and second base, the first baseman close to the line, and the third baseman play in shallow right field, with all the outfielders shading toward right field.



Why do teams shift?

Through years of compiling detailed data about players' batted balls teams were able to come to the conclusion that the traditional positioning of fielders may be inefficient for certain players. These teams observed that some players hit to one particular field more than others. The mistake that teams were making for decades was assuming a uniform distribution of hits. This means that team was assuming that when a player hit a ball, that ball had an equal probability of landing anywhere on the field. When teams discovered that the hitting distribution was not uniform, and in fact skewed, they adjusted their fielders accordingly.

The Mathematics Behind The Shift

Through any analytical process there are always numbers backing up the claim. In fact not many people will accept you claim with out any mathematical or statistical backing. The mathematics that support the shift is something called the sum of squares residuals. A residual is the difference between an observed value and the expected value. For example Derek Jeter's career batting average before his last season was .312. So the expected value for his batting average in his last season was .312. He ended up hitting for a batting average of .256 in his final season, so the residual would be (.256 - .312) which equals -0.056. For predictive models you want to minimize the sum of squares residuals, if your sum of squares residuals is equal to zero, the predictive model would be the most efficient.  By predicting where the ball may land, the shift will minimize the distance a player will have to run to get the ball. In this case the observed value is the distance a player will have to run, and the expected value is zero. The claim is that by shifting the sum of the distance needed to be covered by a fielder is minimized.

Example of the Effectiveness of The Shift

Lets compare two players, Jacoby Ellsbury and Chris Davis in 2015. Jacoby Ellsbury had a hit distribution that was close to uniform. He hit to right field at a rate of 37.8%, to center field at a rate of 35.1%, and to left field at a rate of 27.0%. Teams typically play Ellbsury straight up, the traditional positioning of the fielders. A uniform hit distribution would have about 33.3% to all fields, so you can see that Ellsbury's hit distribution is a bit skewed. So in theory a defense should shift slightly toward right field for Ellsbury. Chris Davis on the other hand hit to right field at a rate of 55.9%, center field at a rate of 26.5%, and to left field at a rate of 17.6%. Davis' hit distribution is very skewed to right field. So theoretically you would want to position your players closer to right field than the traditional way.

The Shift: Visual Example

For this example imagine a left handed pull hitter, such as Chris Davis. The first diagram is a traditional positioning for a defense and on the second diagram is the shifted positioning for a defense. The red dots are where a ball landed and the black lines are the distance the fielder had to run to get the ball.

Notice that the distances the fielders had to run are 10 ft, 5 ft, and 15 ft. The expected value that the fielders had to run was 0 ft. So the residuals would be:
10 - 0 = 10
5 - 0 = 5
15 -0 = 15
And the sum of these values squared is:
10^2 + 5^2 + 15^2 = 350.




In this diagram the red dot are in the exact same location, but the defense is shifted toward right field. Notice that the distance the fielders had to run is much smaller. The residuals would be:
10 - 0 = 10
2 - 0 = 2
1- 0 = 1
And the sum of these values squared is:
10^2 + 2^2 + 1^2 = 105

The sum of squared residuals is much smaller for the shifted defense than it is for the traditional defense.


Conclusions 

The shift is here to stay. Physically players hit the ball the furthest when they pull the ball, so a big guy like Chris Davis will try to pull the ball to try to hit a home run. This trade off is something that Davis can live with a player. Since 2012 Chris Davis has averaged 40 home runs per season, but has only a .256 batting average. He has made a decision as a player that he would rather produce power numbers rather than hitting safely more often. There are situations that it would make logical sense to hit a ground ball toward the left side of the field for a hit; such as in a close game in the 9th inning, but Davis has the potential to hit the ball out of the ballpark.

Every player in the MLB is different and that calls for a different shift for each player. It is up to the analytic department to keep up on that and determine the best shift for a player and they do that by determining the minimal sum of square residuals.

Saturday, June 20, 2015

Getting Inside the Manager's Head

Do you ever watch a game and think that the manager is making a boneheaded decision by leaving a pitcher in or taking a pitcher out of a game? It turns out that the manager's 'gut' feeling can be quantified. In fact through the miracles of regression analysis, knowledge of baseball, and critical thinking, fans can now understand when and why a starting pitcher is going to be taken out of a game.

Joe Girardi Calling to the Bullpen
Using several continuous and categorical variables the number of pitches, age of the pitcher, runs allowed, and whether the game is being played at home and away had the most effect on the manager's decision to pull out a pitcher. We use these variables to predict innings pitched.

There is to be a note on runs allowed. If a pitcher is starting an inning and allows a base runner that base runner is to be considered a potential run until he is not on base. So when calculating how many innings a pitcher will pitch you are to include the runners on base on base in the current inning as runs. This is because often a manager's decision to take out a pitcher is based on the possibility of giving up more runs, also known as base runners.

In the regression equation, runs allowed has the greatest impact on whether a manager will take out the pitcher, which makes plenty of sense. It conveniently works out that each run allowed results in a deduction of one third of an inning pitched. Pitch count also has a lot to do when a pitcher is going to come out of a game. The more pitches a pitcher throws, the further he will go into a game, but not necessarily will be taken out.

An interesting thing I discovered when running this regression model was that 0.5 more innings are pitched by starting pitchers at home than away. People may say that makes plenty of sense because in order to conclude a win at home the home team has to pitch the top of the ninth but the away team does not have to pitch the bottom of the ninth. Managerial tendencies in today's game show that it is unlikely for a starting pitcher to pitch a complete game, so the extra half an inning does not have much to do with the concept of last licks, but rather strictly being at home.

Regression Equation:

A          Innings = 1.57 + 0.04734 Pitches - 0.3219 Runs + 0.0226 Age

H          Innings = 2.07 + 0.04734 Pitches - 0.3219 Runs + 0.0226 Age

Clayton Kershaw handing the ball over to manager
Don Mattingly 
The older the pitcher the further he is to go into the game, either due to experience or the manager's respect for the pitcher.

This equation is to be used a game goes on, because with each pitch the pitcher pitches the further he will go into the game. 100 pitches results in about 6 innings pitched without giving up a run. 100 pitches and 3 runs allowed results in about 5 innings pitched.

If a pitcher is approaching the inning which the equation yields he becomes more and more likely to get pulled from the game. If a pitcher pitches past the inning which the equation yields he becomes very likely to be taken out of the game. The regression model summary had a R^2 value of 65.38%. This means that 65.38% of the manager's decision to take a pitcher out of a game is attributed to these factors. Other non quantifiable factors that may effect a managers decision to take out a pitcher are a reliable bullpen, injury history, outliers in score, game delays, and ejections.

One variable that I tested that turned out not to have much effect on when a manager would take a pitcher out of a game was the categorical variable of League. Pitchers pitching in National League games stay in games to similar lengths as pitchers pitching in American League games.


 

Tuesday, May 26, 2015

Environmental Factors on a Game

A few months ago I did some research on how environmental factors effect how many runs are scored with Andrew Nave and Terrance McCabe, my classmates at SUNY Oneonta. We collected data from 100 games in the 2014 season. From these games we observed the number of runs scored, the temperature, location of the stadium, home or away, time of game, and ballpark size.

Our sample was collected by randomly selecting a number 1-30, each representing a different team, then randomly selecting a number 1-162, each representing a game played by that team. This data collection strategy gave us a very good representation of the population.

The first hypothesis we tested was whether more runs were scored at home or away. We predicted the same amount of runs are scored at home and away on average, to debunk the age old belief of home field advantage.  From our sample the mean (average) number of runs scored at home is 4.125 and the mean number of runs scored at away games is 4.047. In the nominal statistics teams do in fact score more runs at home, but after using a 2 sample T-test a p-value of .896 was yielded meaning that the two samples are practically identical. So the same amount of runs are scored at home and away. In baseball home field advantage is more of a long term advantage, where a team can build their team in a certain way. For example the New York Yankees signing left handed hitters due to the short distance to right field or the San Francisco Giants signing league average pitchers to pitch in their spacious ballpark to get above average ERAs.

Another belief about baseball games is that more runs are scored in the summer heat than in the cold. We tested the belief by splitting our sample into two categories, games where the starting game temperature was above 80 degrees and where the starting game temperature was below 80 degrees. Games with game time temperature of 80+ degrees had a mean of 4.138 runs and games with a temperature of less than 80 degrees had a mean of 4.071. Using a 2 sample T-test with the claim that more runs are scored during hot games, a p-value of .458 was the result. Since this is greater than the alpha value of 0.05 it is safe to say that from our sample the same amount of runs are scored in the heat and in the cold.

Often the west coast gets a bad reputation for its lack of runs scored because of the marine layer, but this is yet another theory we tried to debunk. By separating the sample into East, Central, and West we found that the means runs scored were 3.968, 4.023, 4.375 respectively. We tested the claim that the number of runs scored regardless of location are equal. By using an Analysis of Variance test (ANOVA test) we found a p-value of 0.858 which means there is not enough evidence to reject our claim. One may think that the runs scored are solely effected by the teams being better or worse than other teams, but the games that teams play on the road are also included in the sample.

Perhaps the most important find from our data collection was from the regression equation. Through backwards elimination we eliminated variables to find the best regression model. The variables that have the most effect on how many runs are scored in a game are time of game, elevation, and temperature. Where 0 represents day games and 1 represent night games

0: Runs = 1.27 + 0.0283 (Temp) +0.000678 (Elevation)
1: Runs = 1.96 + 0.0283 (Temp) + 0.000678 (Elevation)

Our regression analysis yielded an R^2 value of 10.47% which means that 10.47% of the runs scored in a game are solely attributed to the environmental factors mentioned above. In a 10 run game it is safe to say that 1 of the runs was scored because of the environment the team was playing in.

You can't predict baseball, but if you had to choose how many runs a team will score on a particular day the regression equation will be an decent estimate. The number of runs scored will most likely be around 4.

Monday, March 23, 2015

Hideki Matsui and the Rush for International Free Agents

The desire for international free agents has skyrocketed in the past few years. With better scouting more talents are being discovered and coming to the United States. Players have been coming to the United States from the Dominican Republic and Cuba for decades, but the desire for Japanese free agents started with the success Ichiro found in Seattle.

Undoubtedly the most successful Japanese player to make the transition to the MLB Ichrio sparked a period of high spending on international free agents. One player that often gets overlooked as a great success is Hideki Matsui.

Hideki Matsui
In 2003 Matsui signed a 3 year $21 Million contract with the New York Yankees, which today would be worth $26.67 Million. Over this 3 year period Matsui had an average annual WAR of 3.9 which is considered above value and above average. Also over that 3 year span Matsui actually played an average of 162.33 games.

Following his 3 year contract Matsui signed a 4 year $52 Million deal yielding an average annual value of $13 Million which is the modern day equivalent of a 4 year $66 Million contract. Over this span Matsui only played an average of 107 games with an average annual WAR of 2.2. He did not do as well in his last 4 years with the Yankees as he did in his first 3, but considering the value that international free agents are signing for today, the Matsui deal was a steal for the Yankees.

Yoenis Cespedes signed a 4 year $36 Million deal with the Oakland Athletics in 2012. Due to the slight change in CPI from 2012 to 2015 the nominal and real dollars of the deal are close to equal. The average annual value of the first 3 years of the Cespedes deal was $8.5 Million and his average annual WAR for those 3 years was 3.2, just less than Matsui, but at a similar average annual value.

Another international free agent who signed in 2012 was Yu Darvish who signed a 6 year $56 Million deal with the Texas Rangers, but over the first 3 years of the deal Darvish was paid $25 Million which yields an average annual value of $8.33 Million, but his average annual WAR was greater than both Cespedes and Matsui at 4.3 per season.

In 2014 international free agents began signing for much more. Part of this is due to a change in the posting system in Japan. Instead of the teams competing to post the highest bid to negotiate with a player, they can now post a max bid of $20 Million and all teams that posted the bid can negotiate with the player.

Masahiro Tanaka 
Masahiro Tanaka was a player that had teams post the max bid and ended up signing a 7 year $155 Million deal with Yankees. In his first season he had a WAR of 3.3 but at a much higher cost of an average annual value of $22 Million.

Also in 2014 Rusney Castillo a Cuban prospect signed a 7 year $72.5 Million with the Boston Red Sox. Castillo only played 10 games in 2014, but his average annual value is much higher than most other international prospects. The interesting part of the contract is the length. Castillo is only 27 years old and the Red Sox made a 7 year commitment just like the Yankees did with Tanaka, before either of them played a single game in the Major League.

Jose Abreu signed a 6 year $66.3 Million deal with the Chicago White Sox in 2014 and was paid $7 Million in 2014. His WAR for the 2014 season was 5.5 as he hit himself to a Rookie of the Year Award in an MVP caliber season. He had never played a game in the Major League prior to signing the contract, but the White Sox definitely got their bargain.        
Yoan Moncada 

Yoan Moncada is just 19 years old and he signed with the Red Sox with a signing bonus of $31.5 Million. The International free agent market is getting more competitive with more money being invested, players are signing at a younger age, and longer commitments being made to the players. The days were a team can sign an international free agent like Hideki Matsui for a contract similar to his initial 3 years $21 Million contract are gone. The risk involved in signing international free agents is higher than ever.


Sunday, December 7, 2014

Justin Upton

The Winter Meetings started today and the Braves' left fielder will definitely be a hot topic of discussion. The Braves already traded slick fielding outfielder Jason Heyward in a deal for Shelby Miller, but signed another great outfielder in Nick Markakis. Justin Upton's name has been fluttered around in trade discussions so far this off season. Many thought the Braves were in on the Yasmany Tomas sweepstakes and if they signed the Cuban slugger Justin Upton would be on the move. The Diamondbacks ended up signing Tomas, so now the question for the Braves is whether or not to trade Justin Upton.

Justin Upton will be a free agent after the 2015 season and will be just 28 years old next season. Upton was already traded once (by the Diamondbacks after the 2012 season), and the Braves definitely won the trade. Being one of the best young players in the game, Upton can fetch a large payout in a trade and will be in store for a large pay day whether it be through an extension or free agency.  

Since 2011 Upton has been constantly healthy playing in at least 149 games each season, and has hit for both power and average. His stolen base total has decrease dramatically since being traded to the Braves, but his other statistics have remained consistent except for strikeout percentage. In his final two seasons with the Diamondbacks Upton had a strikeout percentage of 19%. In the two seasons he has spent with the Braves his strikeout percentage has gone up to 26%, which means he is striking out on average once per game.

Besides the problem with the strike outs, Justin Upton can be very valuable to a team seeking a powerful bat. His slashline over the last three seasons is .271/.350/.462 equating to an OPS+ of 122. The Braves are very short on pitching, but they already traded one of their young outfielders, would they be willing to trade another? One theory that has been tossed around is using Evan Gattis in left field while letting catching prospect Christian Bethancourt get a chance behind the plate, but the Braves would definitely miss Upton's power.

Possible landing spots for Upton could be Seattle, Baltimore, San Francsico, and San Diego. The Braves will be seeking young pitchers who are ready or already established at a Major League level. The Mariners and Orioles have these pitchers in the form of Taijuan Walker, James Paxton, and Kevin Gausman. The only other outfield sluggers left on the market are Melky Cabrera, Matt Kemp,  Alex Rios, and a defensively challenged Micheal Morse. As more names come off the board, Upton will be in higher demand.
 

Saturday, December 6, 2014

Player Profile: Daniel Murphy

Daniel Murphy played his way to his first All Star appearance in 2014, but he is underrated in many respects given his position. Murphy is a below average defensive second baseman, but his value comes with the fact that he can play multiple positions. In 2014 Murphy logged innings at third base and first base, and has experience playing left field. Daniel Murphy will never win and MVP and most likely will never win a gold glove award, but he can be a pivotal and important part of any team.

A comparable player to Daniel Murphy is Ben Zobrist, who is much more valuable than Murphy given the variety of positions and his high on base percentage he obtains every season and has much higher power numbers. Murphy is entering his age 30 season and his trade value may be higher than ever. He will be a free agent after the 2015 season so the Mets may be willing to trade him.

An important aspect of the game for non power hitting players is strikeout rate. Over the course of the past four seasons, Murphy has struck out just 12.8% of the time. Over the course of the last three seasons Murphy has played an average of 153 games, showing that he can remain healthy for an entire season and stay consistent. Over that same span of three seasons he posted a .288/.327/.407 slashline equating to a 107 OPS+. He can hit for some power (Average of 9 home runs and 38 doubles per season) and can swipe a few bags (Average of 15 per season). He reached career highs in home runs and stolen bases in 2013 when he hit 13 and swiped 23 respectively.

Daniel Murphy would make a lot of sense for a lot of different teams. Second base is a position that lacks good hitters. Although it is headlined by names like Robinson Cano, Jose Altuve, and Ian Kinsler, only about a third of the league has an OPS over .700. Competing teams like the Orioles, Blue Jays, and Diamondbacks may be interested in Murphy's services.