Tuesday, May 26, 2015

Environmental Factors on a Game

A few months ago I did some research on how environmental factors effect how many runs are scored with Andrew Nave and Terrance McCabe, my classmates at SUNY Oneonta. We collected data from 100 games in the 2014 season. From these games we observed the number of runs scored, the temperature, location of the stadium, home or away, time of game, and ballpark size.

Our sample was collected by randomly selecting a number 1-30, each representing a different team, then randomly selecting a number 1-162, each representing a game played by that team. This data collection strategy gave us a very good representation of the population.

The first hypothesis we tested was whether more runs were scored at home or away. We predicted the same amount of runs are scored at home and away on average, to debunk the age old belief of home field advantage.  From our sample the mean (average) number of runs scored at home is 4.125 and the mean number of runs scored at away games is 4.047. In the nominal statistics teams do in fact score more runs at home, but after using a 2 sample T-test a p-value of .896 was yielded meaning that the two samples are practically identical. So the same amount of runs are scored at home and away. In baseball home field advantage is more of a long term advantage, where a team can build their team in a certain way. For example the New York Yankees signing left handed hitters due to the short distance to right field or the San Francisco Giants signing league average pitchers to pitch in their spacious ballpark to get above average ERAs.

Another belief about baseball games is that more runs are scored in the summer heat than in the cold. We tested the belief by splitting our sample into two categories, games where the starting game temperature was above 80 degrees and where the starting game temperature was below 80 degrees. Games with game time temperature of 80+ degrees had a mean of 4.138 runs and games with a temperature of less than 80 degrees had a mean of 4.071. Using a 2 sample T-test with the claim that more runs are scored during hot games, a p-value of .458 was the result. Since this is greater than the alpha value of 0.05 it is safe to say that from our sample the same amount of runs are scored in the heat and in the cold.

Often the west coast gets a bad reputation for its lack of runs scored because of the marine layer, but this is yet another theory we tried to debunk. By separating the sample into East, Central, and West we found that the means runs scored were 3.968, 4.023, 4.375 respectively. We tested the claim that the number of runs scored regardless of location are equal. By using an Analysis of Variance test (ANOVA test) we found a p-value of 0.858 which means there is not enough evidence to reject our claim. One may think that the runs scored are solely effected by the teams being better or worse than other teams, but the games that teams play on the road are also included in the sample.

Perhaps the most important find from our data collection was from the regression equation. Through backwards elimination we eliminated variables to find the best regression model. The variables that have the most effect on how many runs are scored in a game are time of game, elevation, and temperature. Where 0 represents day games and 1 represent night games

0: Runs = 1.27 + 0.0283 (Temp) +0.000678 (Elevation)
1: Runs = 1.96 + 0.0283 (Temp) + 0.000678 (Elevation)

Our regression analysis yielded an R^2 value of 10.47% which means that 10.47% of the runs scored in a game are solely attributed to the environmental factors mentioned above. In a 10 run game it is safe to say that 1 of the runs was scored because of the environment the team was playing in.

You can't predict baseball, but if you had to choose how many runs a team will score on a particular day the regression equation will be an decent estimate. The number of runs scored will most likely be around 4.