W e a v e r | 1
1 Introduction
Baseball is arguably the most complicated of the major North American sports, and even though
there have been professional baseball league since 1876 [1] there is still much interest and debate
in how to play optimally [2-5]. Considering the amount of money and fame involved in winning
baseball games, if one team is playing more optimally than another it can have large economic
effects on the franchises and the people who depend on them. One of the most controversial
decisions according to baseball pundits is whether to sacrifice bunt, with many baseball
columnists claiming that the strategy is overused [6-8]. A bunt is when a batter tries to gently tap
the ball without swinging in attempt to get on base or to advance the players already on base. A
sacrifice bunt is a bunt that occurs when there is at least one player on base already and less than
two outs (it is called a sacrifice as often the bunt causes the batter to sacrifice himself to advance
the other players).
The decision of whether to bunt occurs during a pitcher-batter confrontation (also known
as a plate appearance). The three possible outcomes to this confrontation are as follows: the
batter is out, the batter reaches base (1st, 2nd
or 3rd
) or the batter scores (he reaches home plate).
During this plate appearance, the batter essentially has two strategies: to swing fully (nonbunt) or
to bunt (there are other strategies as players can alter their swing such as to hit for power or
contact but these decisions are static when considering whether to bunt or swing fully).
While the batter-pitcher confrontation is occurring, the game is in some state and the
confrontation leads to change in the game state. The game starts with zero outs and no players on
base, and after three outs are recorded the teams switch from being on offense to defense and
vice versa. This period from starting state until three outs occurs is often referred to as a frame;
two consecutive frames (each team has one frame on offence) is known as an inning. In addition
to the number of outs, there is also some combination of players already on the bases (this
combination is referred to as a base state); each base either has a player on it or is empty, which
means there are 8 possible base states. By incorporating the number of outs with the information
on the base state into a single entity, then any game state in a frame can be described. These
entities are called base/out states [5], and as there are three possibilities for number of outs and
eight base states, then there are 24 base/out states. This model also considers a special 25th
base/out state, which occurs when three outs are recorded (it does not matter where the runners
are as the third out means the frame is over and no more runs can be scored).
In creating a model to analyze whether to use the bunt strategy or the normal strategy,
Markov chains will be used extensively. This is a common method in analyzing decisions in
baseball and has been widely accepted as very accurate [9-11]. The overall method of deciding
whether to use the bunt strategy or the normal strategy is to consider the distributions of base/out
W e a v e r | 2
states after a bunt and after hitting normally, while considering the ability of the players
involved. It can then be determined which distribution of base/out states gives the team a better
chance of winning the game; the model will be refined as more variables are considered.
2 The Model
2.1 Basic Notation
While there does exist a body of literature on baseball models that make extensive use of Markov
chains, there is no common standard for notation, and as a result some notation will need to be
developed prior to the investigation. The notation developed here tries to be consistent with other
major works on the subject, while being as natural and rigorous as needed.
Base/out states will be denoted by the letter βπ β with appropriate subscript of four
numbers; the first three numbers describe the base state and the fourth the number of outs. The
format for the subscript is consistent with notation developed in a highly influential work in the
area [5]. As an example, "π 0000" means that there is nobody on 1st, 2
nd or 3
rd base and there are
zero outs, while "π 1201" means that there is a player on 1st and 2
nd base (but not 3
rd) and there is
one out. The final base/out state, which signifies the end of the frame, is denoted "π πππβ.
There are also some key functions that will be used throughout the investigation, such as
π£(β), and π(β | β), and π(β | β). The first of these, π£(π ), is the value of π , which is the number of
expected runs from a certain base/out state to the end of the frame. Next, π(π β²|π ) is the
probability of transitioning from state π (base/out state in current period) to state π β² (base/out in
next period). Last, π(π β²|π ) is the number of runs scored during the transition from state π to state
π β². For example, π(π β²0000 |π 0000) = 1 and π(π β²1000 | π 0000) = 0.
2.2 Run Expectancies
An essential tool to this investigation is that of run expectancies (RE). These are the
average number of runs scored from a certain base/out state to the end of the frame; as the model
uses Markov chains it does not matter how a team got to a certain base/out state. Using empirical
data of the number of runs scored from a certain base/out state to the end of the frame, the run
expectancies are determined for each base/out state (these are given in the empirical part of the
paper).
This is not the only way that run expectancies can be calculated though. They can also be
found by a Markov process where only the probability of going from one state to another is
observed. There is strong evidence that shows that run expectancies determined empirically and
by Markov processes are very similar [5], which is confirmed in this paper. However, the only
W e a v e r | 3
reason for developing this method in this paper is not just to reconfirm that Markov chains are a
reasonable assumption in modelling baseball; the method of finding run expectancies by Markov
processes introduces terminology and techniques that will be needed to compare the bunt to the
nonbunt strategy.
When considering a base/out state involving no players on base, the formula for run
expectancy can easily be determined (Equation 1).
π£(π ) = β[π£(π β²) +
π β²
π(π β²|π )][π(π β²|π )] (Eq. 1)
Sadly, this simple equation is only sufficient when there are no players on base. When the
bases are not empty then a transition in base/out state can occur with or without a change in
batter. For example, if there is a player on first base with zero outs (base/out state of π 1000) and
the player steals second base then the batter is the same, but the game state is now π 0200. During
this transition, the batter stays the same and no runs are scored and so π(π β²0200|π 1000) = 0.
Likewise, if the batter doubles and the player on first scores on the play, then the game state is
also π 0200; during this transition the batter changes and one run is scored on the play meaning
that π(π β²0200|π 1000) = 1. This causes a slight problem as this means that π(π β²|π ) can have
different values, and so is ambiguous as currently defined.
To deal with this, we will create a set, πΈ(π β²|π ), which is the set containing all events that
can lead to a transition in the game state; for our purposes, this set contains only two elements,
which are sets themselves. The first of these elements is the set of all events that lead to a
transition in game state without the batter changing. Most events that fall into this element are
due to what is known as baserunning in baseball, and as a result this element of πΈ(π β²|π ) will be
denoted bsr. The second of these elements is the set of all events that lead to a transition in game
state as a result of a change of batter. It will be said that all of these events are due to non-
baserunning, and so this element of πΈ(π β²|π ) will be denoted nbsr. Note that bsr and nbsr are
mutually exlcusive and that every event leading to a change in game state must belong to either
bsr or nbsr.
With regards to notation, this leads to some minor amendments as more information is
required. It is no longer sufficient to use π(π β²|π ) as this does not specify whether the change of
state was due to bsr or nbsr. To incorporate this info, π(π β²|π , π) is used where π β πΈ(π β²|π ); this
is number of runs scored given a state π and an event π to transition to the state π β. Similarly
π(π β², π|π ) is now used, which is the probability of transitioning to state π β by event π given
starting state π .
W e a v e r | 4
With this new notation in place, Equation 1 can be updated to give the run expectancy of
any base/out state π . A useful property used is that π(π β², ππ π|π ) + π(π β², πππ π|π ) = π(π β²|π ).
π£(π ) = β β [π£(π β²)
πβπΈ(π β²|π )π β²
+ π(π β²|π , π)][π(π β², π|π )]
= β β [
πβπΈ(π β²|π )π β²
π(π β²|π , π)][π(π β², π|π )] + β π£(π β²)π(π β²|π )
π β²
( Eq. 2)
As π(π β²|π ) and π(π β², π|π ) are observed and the value of π(π β²|π , π) is determined by π , π β² and π which are
known, then only π£(π ) and π£(π β²) are unknown, which means that by Equation 2, π£(π ) can be written as a
linear combination of the values of 25 base/out states. Doing this for every state π , a set of 25 equations
with 25 variables arises, which can then be put into a matrix and solved. The solution gives the
Markov run expectancies, which are based solely from observed values of probabilities of
specific state transitions.
By a similar process, the probability that a specific number of runs are scored from a
certain base/out state to the end of the frame can be ascertained. Let π§0(β) be a function for the
probability of a base/out state leading to zero runs at the end of the frame. Then an equation can
be created to find the probability of a certain base/out state leading to zero runs in terms of
probabilities of other base/out states leading to zero runs. After this is done for all 25 base/out
states, there are 25 equations, which can then be solved. The equation desired (Equation 3) is the
sum of the probabilities of going from one state to another multiplied by the probability of
scoring zero runs in this new state for all possible states; transitions where one or more runs are
scored are then discounted.
π§0(π ) = β β [ π§0(π β²)πΏ0(π β²|π , π)π(π β², π|π )]
πβπΈ(π β²|π )
π β²
where πΏ0(π β²|π , π) = {1 if π(π β²|π , π) = 0 0 if π(π β²|π , π) β 0
(Eq. 3)
Moreover the probability of a base/out state leading to π runs at the end of the frame is denoted
by π§π(β) and is calculated by the following formula.
π§π(π ) = β β β[π§π(π β²)πΏπ(π β²|π , π)π(π β², π|π )]
π
π=0πβπΈ(π β²|π )
π β²
W e a v e r | 5
where πΏπ(π β²|π , π) = {1 if π(π β²|π , π) = π β π0 otherwise
(Eq. 4)
2.3 The Basic Model
The basic model aims to find out when given a certain game state whether the choice of the bunt
strategy or the choice of the nonbunt strategy will lead to higher expected run total by the end of
the frame. Choosing the strategy that leads to the higher expected run total may not always be the
best choice, however, as there may be situations where only one run is needed to win the game.
In these situations it would be best to pick the strategy that has the highest probability of scoring
at least one run, which may be a different strategy than the one that yields the higher expected
number of runs. Nonetheless, in the basic model the only goal is to find which strategy yields the
higher number of expected runs. There are three conditions/assumptions that must be made to
achieve this goal.
First, only plate appearances where the batter uses only the bunt strategy or only the
nonbunt strategy will be considered. Before every pitch the batter must decide whether to use the
bunt or nonbunt strategy as there is too little time to decide once the pitcher throws it. For most
plate appearances, the batter chooses only one of these strategies for every pitch he faces, but
there exist times when a batter uses both the bunt and nonbunt strategy in a plate appearance. To
avoid the complications of doing a pitch by pitch analysis, the basic model will discard plate
appearances where both the bunt and nonbunt strategy are used; this allows a simpler model
where only the outcome of the plate appearance need be noted. Later on in the investigation,
mixed strategies of bunt and nonbunt will be considered as well.
Second, only the effects occurring from the last pitch of a plate appearance (via either the
bunt or nonbunt strategy) will be considered in the basic model. The assumption here (perhaps
erroneously as will be seen later) is that changes in the game state that occur during the plate
appearance are independent of the strategy chosen by the batter. In reality, though, it might be
the case that changes in the game state during a plate appearance are less favourable (or more
favourable) for the offence by use of the bunt strategy rather than the nonbunt strategy. These
effects will be considered at a later time.
To illustrate what is meant in the last paragraph, consider the scenario where a player
comes to bat in base/out state π 1000 (recall this means that there is a runner on 1st base, no runner
on 2nd
or 3rd
and there are zero outs). In case one, the batter uses the bunt strategy for every pitch,
and after the plate appearance the game is in state π 0201. In case two, the batter uses the nonbunt
strategy for every pitch, but during the middle of the plate appearance, the runner steals second
putting the game in state π 0200. In the last pitch of this plate appearance he hits the ball but gets
out, and the game state transitions from π 0200 to π 0031. In the basic model, the value of the bunt
W e a v e r | 6
strategy is that of changing the game state from π 1000 to π 0201. The value of the nonbunt strategy
is that of changing the game state from π 0200 to π 0031, even though the game state has changed
from π 1000 to π 0301 as a result of that one plate appearance. The assumption is that the runner
stealing second had nothing to do with the choice of strategy of bunt or nonbunt; the real value of
the bunt or nonbunt strategy was the change in game state that occurred during the last pitch of
the plate appearance.
The last condition imposed in the basic model is that it only considers bunts that occur in
the first five innings. The reason for this condition is that when empirical data is considered the
goal of the investigation is to see whether players are playing optimally. It might be the case in
later innings that scoring at least a run is a better strategy than gaining the highest expected
number of runs, and so a player might be playing optimally by not playing the suggested strategy
from the basic model.
The choice of the fifth inning as the cut-off is somewhat arbitrary, but there is plenty of
reasoning behind the decision. The strategy that should be chosen in a game is the one that
increases the odds of winning by the most; in other words, the strategy that yields the highest win
expectancy. The concept of win expectancy has been thoroughly analyzed and one study [5]
comes to the conclusion that early in a game in order to maximize win expectancy a team should
try to maximize run expectancy. They chose a cut-off of the sixth inning as a time when
maximizing run expectancy may not fully maximize win expectancy and when the strategy of
playing for at least one run starts to have some significant effect. Using their argument β and just
to be incredibly safe β we have set a cut-off as the fifth inning. This should act as good cut-off so
that enough data can be collected for analysis, and so that it only includes innings where
maximizing run expectancy is the optimal strategy.
Overall the basic model states that when a player goes to the plate in the first five innings
and can use only either a pure bunt strategy or a pure nonbunt strategy (and changes to the game
state during a plate appearance are assumed independent to choice of strategy) then he should
choose the strategy that will maximize run expectancy. By maximizing run expectancy, the
assumption is that once the game state has changed as a result of the bunt or nonbunt, players
will play as the usually do in the new game state regardless of how they got there; the caveat is
players may not usually play optimally and if they did play optimally it could change which
strategy the first player wishes to use.
Mathematically, the basic model states that the player wishes to do the following:
πππ₯{π£(π , ππ’ππ‘), π£(π , πππ’ππ‘)}
W e a v e r | 7
= πππ₯ {β π£(π β²)
π β²
+ π(π β²|π )][π(π β², ππ’ππ‘|π )], β π£(π β²)
π β²
+ π(π β²|π )][π(π β²), πππ’ππ‘|π )]} (Eq. 5)
With regards to notation, π£(π , ππ’ππ‘) is the value of game state π when the pure bunt strategy is
used. Similarly, π(π β², ππ’ππ‘|π ) is the probability of getting to game state π β² by a bunt given that
the starting game state was π . The nonbunt strategy is denoted πππ’ππ‘. Furthermore the union of
the events in the game that fall under ππ’ππ‘ and the events in the game that fall under πππ’ππ‘
is πππ π. This is because all events that lead to change in the game state because of a change of
batter are a result of either a bunt or nonbunt; this is also why π(π β²|π ) is the same regardless of
whether the bunt or nonbunt strategy is used.
2.4 Player Ability and Park Factor Improvements to Basic Model
This section will incorporate player ability and park factor into the Basic Model. The same three
conditions that are imposed on the basic model will be imposed here as well; solutions to these
conditions will be considered afterwards. The main philosophy behind this section is that
comparing the bunt and nonbunt strategy cannot be done by considering all bunts and nonbunts
at once. For example, bunts are more common with weaker batters and against good pitchers.
This means the expected value of the bunt cannot be compared with the expected value of the
nonbunt directly as there needs to be some way to account for the fact that bunts are more
common in situations where they expect the nonbunt strategy to yield a low expected value.
To incorporate player ability into the model, there needs to be a way of modelling the
expected outcome of a plate appearance by considering the specific players involved.
Fortunately, much research has been done on modelling pitcher-batter confrontations as a way of
predicting an outcome of a plate appearance from previous statistics. The main conclusion drawn
was that how a player (either batter or a pitcher) usually fares against a player of certain strength
is the best indicator of future success [4-5, 12]. The alternatives considered were looking at how
a particular batter faced against a particular pitcher, and seeing how a batter faced against a
family of pitchers who were deemed to have similar characteristics. One study showed that the
considering how a player usually fares against players of certain strengths was consistently a
better indicator of future success than how that player fared against a certain pitcher or a certain
family of pitchers [5]. Two other studies confirm this [4, 12], though one notes that there is some
evidence to support that some players consistently fare a little better or worse than would be
expected against some pitchers [12] or against some family of pitchers [4]. However, they both
concede that quantifying such relationships can be difficult, and that while more complex models
may do a better job there is some ways to go in perfecting it [4, 12]. What all three studies did
W e a v e r | 8
agree on is that the effects of handedness are real [4-5, 12]. For example, if a right handed batter
is facing a left handed pitcher, it is much better to consider how a right handed batter of that
strength usually does against a left handed pitcher of that strength, then to consider overall how a
batter of a certain strength does against a pitcher of a certain strength. Last, and rather
importantly, one study showed that βgood pitching beats good hitting as much as good hitting
beats good pitchingβ [5]. That is to say that if a pitcher allows π₯ runs per inning against batters of
a certain handedness and if a batter (if he hits every plate appearance) scores π¦ runs per inning
against a pitcher of that handedness then we would expect the same result if the pitcher allows π¦
runs per innings and the batter scored π₯ runs per inning.
One last factor that has been well documented (and accepted by sabermetricians) to have
an effect on the outcome of a pitcher-batter confrontation is that of park factor, which is a factor
to account for the fact that it is easier to score runs in some ballparks than others. This can be due
to many reasons including park dimensions, air thickness (the thin air in Denver leads to Coors
Field being a hitter friendly park) and field conditions (especially turf versus grass). A simple
park factor, as calculated by ESPN, is given by the following formula [13]:
ππππ πΉπππ‘ππ =
π π’ππ ππππππ ππ‘ π»πππ + π π’ππ π΄ππππ€ππ ππ‘ π»πππππ’ππππ ππ πΊππππ ππ‘ π»πππ
π π’ππ ππππππ π΄π€ππ¦ + π π’ππ π΄ππππ€ππ π΄π€ππ¦ππ’ππππ ππ πΊππππ π΄π€ππ¦
Clearly, such a formula is lacking as it fails to take into account such factors as discrepancies in
player strengths in these games. A team of well-regarded sabermetricians have tried to account
for such disparities and have a much more complicated method of calculating park factor [14].
While, as they admit, there is no perfect method of calculating park factors, it is by this method
that park factors will be calculated for this investigation. Baseball-Reference.com calculates park
factors by this method and reports them on their website. For example, Coors Field in Colorado
had a park factor of 119 in 2013, which means that the exact same team would be expected to
score 1.19 times more runs than would be expected if they played at an average (neutral park
factor) park [15].
In the end, essentially what that these studies show (when park factor is considered) is
that averaging a pitcherβs ability against a certain handedness and a batterβs ability against a
certain handedness gives a fairly accurate expected result for a pitcher-batter confrontation. Of
course there is no perfect model (nor a way to quantify errors) as players are human and sudden
unpredictable things can cause a sharp change in their abilities. For our purposes, this model will
suffice; moreover, there is no mainstream alternative at the current time. From these conclusions,
W e a v e r | 9
it will be possible to incorporate expected outcomes of pitcher-batter confrontations into the
Basic Model developed in section 2.3.
The way this modelling will be of use is by categorizing all pitcher-batter confrontations
into four groups, and seeing how batters fared in certain situations by group by using the bunt
and nonbunt strategy. These groups will be called Group 1, Group 2, Group 3 and Group 4 where
Group 1 is the group of confrontations that are most beneficial to the batter and Group 4 is the
groups of confrontations that are last beneficial to the batter. Splitting players into groups is a
way of getting around the issues in modelling pitcher-batter confrontations as despite the small
issues we can feel confident that we can properly partition these confrontations (except for small
errors around the boundaries) into four large groups. Another advantage to splitting it into four
groups is that it can give managers/players some rules by which to optimize their strategy as a
manager/player probably has a decent idea in real time as to which of the four groups his current
matchup is. If it is an all-star batter against a weak bullpen pitcher, they probably know it is a
matchup in Group 1, while if it is an all-star batter facing a Cy Young candidate pitcher, they
may realize the matchup is in Group 2 or Group 3 (depending on playersβ relative strengths). The
group method is limited, and may not be the most optimal method but it is a start. For example, if
in a certain situation the bunt strategy is much better than the nonbunt strategy for Group 4, then
it does not guarantee that it is better for every matchup in Group 4. Nevertheless, if batters in
Group 4 are almost always playing the nonbunt strategy, then they are likely not playing
optimally.
(I want to play with the data first before I do a discussion on why 4 groups were chosen
and how they were chosen).
Once the four groups have been figured out, then the Basic Model just needs to be
applied to each of the four groups. The same three conditions apply from the Basic Model, which
are as follows: only plate appearances in first 5 innings are considered, the player can only use a
pure bunt or nonbunt strategy, and changes in game state during a plate appearance are
independent of choice of strategy by the batter. To play optimally under these conditions, a batter
should pick the strategy that maximizes run expectancy in the group that contains their current
confrontation. Mathematically, the improved basic model states that the batter should do the
following where π is the group number:
πππ₯{π£(π , ππ’ππ‘, π ), π£(π , πππ’ππ‘, π)}
= πππ₯ {β π£(π β²)
π β²
+ π(π β²|π )][π(π β², ππ’ππ‘|π , π)], β π£(π β²)
π β²
+ π(π β²|π )][π(π β²), πππ’ππ‘|π , π)]} (Eq. 6)
W e a v e r | 10
3 Empirical Investigation
3.1 Basic Empirical Information
Now that the model has presented, actual empirical evidence can be analyzed to see which
strategies should be played in certain situations. Table 1 shows how many plays start (there is a
possibility of rare errors in the play by play data) with a certain base/out state for all 2430 regular
season games in the 2013 MLB season (henceforth 2013 MLB season refers to all regular season
games excluding the tiebreaker game between the Texas Rangers and Tampa Bay Rays on
September 30, 2013 even though MLB counts that as a regular season game). Each play causes
the current base/out state to transition into another base/out state, and so a distribution of the
changes from one state to another can be created. Appendix 1 gives the distribution of transitions
from every specific base/out state.
Table 1: Number (Percentage) of Plays Starting in a Certain
Base/Out State (2013 MLB Season)
Base State 0 outs 1 out 2 outs
000 45574 (23.83%) 32846 (17.17%) 26163 (13.68%)
100 10995 (5.75%) 13194 (6.90%) 13550 (7.09%)
020 3355 (1.75%) 5648 (2.95%) 7294 (3.81%)
003 478 (0.25%) 1837 (0.96%) 2950 (1.54%)
120 2590 (1.35%) 4638 (2.43%) 5802 (3.03%)
103 974 (0.51%) 2097 (1.10%) 2987 (1.56%)
023 604 (0.32%) 1525 (0.80%) 1877 (0.98%)
123 629 (0.33%) 1622 (0.85%) 2017 (1.05%)
The 2013 MLB regular season had 43780 full and partial frames played. A full frame is
when it is played to three outs, while a partial frame is one that ends prematurely. A frame can
end prematurely for numerous reasons such as one team winning the game before the frame ends
and a game being called due to rain after the fifth inning where the team winning is up at bat.
The number of runs that were scored in each frame (including partial ones) are given in Table 2.
Table 2: Frequency table of runs scored for all frames in 2013
season
Runs Count (%) Runs Count (%)
0 32303 (73.783%) 6 81 (0.185%)
1 6395 (14.607%) 7 38 (0.087%)
2 2898 (6.619%) 8 14 (0.032%)
3 1255 (2.867%) 9 4 (0.009%)
4 584 (1.334%) 10 1 (0.002%)
5 207 (0.473%) 11 1 (0.002%)
W e a v e r | 11
3.2 Empirical Run Expectancies vs Markov Run Expectancies
The empirical method for finding run expectancies is to find the average number of runs scored
from a certain base/out state to the end of the frame. This issue is that some games end with a
partial frame and there could be more runs scored if the frame was played till its third out. To
deal with this, the final frame of every game has been disregarded (a partial frame can only occur
during the last frame of a game). Only partial frames could be disregarded, but this might skew
the results as partial frames tend to be ones where runs are scored; even disregarding all final
frames of games might skew the run expectancies as will be seen shortly. Regardless, tabulating
the empirical run expectancies using data from the MLB 2013 season gives Table 3. The run
distributions are also given, which shows the probability that a certain base/out state leads to a
specific number of runs.
Table 3: Empirical Scoring Distribution by Base/Out State for 2013 MLB Season (Final
Frames Not Included)
Base State Outs RE 0 runs 1 runs 2 runs 3 runs 4 runs 5+ runs
000 0 0.4684 73.68% 14.46% 6.74% 2.96% 1.35% 0.82%
000 1 0.2456 84.86% 9.15% 3.74% 1.43% 0.57% 0.25%
000 2 0.0931 93.64% 4.32% 1.39% 0.44% 0.16% 0.05%
100 0 0.8296 59.51% 16.93% 12.74% 6.00% 2.92% 1.91%
100 1 0.5039 73.40% 11.86% 8.97% 3.62% 1.40% 0.75%
100 2 0.2155 87.64% 5.92% 4.43% 1.43% 0.44% 0.14%
020 0 1.1062 39.44% 33.70% 13.63% 7.58% 3.04% 2.61%
020 1 0.6294 61.69% 23.39% 8.64% 4.04% 1.43% 0.80%
020 2 0.3107 78.84% 14.54% 4.29% 1.56% 0.59% 0.18%
003 0 1.3156 18.76% 55.01% 11.73% 8.53% 3.62% 2.35%
003 1 0.9176 34.83% 48.07% 10.43% 4.54% 1.57% 0.56%
003 2 0.3491 74.52% 19.28% 3.89% 1.65% 0.46% 0.21%
120 0 1.3989 39.65% 22.11% 15.96% 11.50% 6.24% 4.54%
120 1 0.8558 59.15% 16.98% 10.85% 8.15% 3.08% 1.79%
120 2 0.4080 78.24% 10.59% 5.59% 3.84% 1.29% 0.45%
103 0 1.8167 14.41% 41.74% 16.95% 13.56% 6.67% 6.67%
103 1 1.1157 36.91% 36.51% 12.47% 8.64% 3.83% 1.64%
103 2 0.4860 72.19% 15.97% 5.56% 4.30% 1.50% 0.49%
023 0 2.0086 12.33% 26.03% 33.73% 14.55% 7.19% 6.16%
023 1 1.3885 31.37% 28.14% 23.88% 9.4% 3.50% 3.71%
023 2 0.5569 74.73% 5.16% 13.65% 3.80% 1.76% 0.91%
123 0 2.1926 15.91% 26.97% 22.61% 13.07% 10.39% 11.06%
123 1 1.5764 33.61% 25.00% 16.32% 10.43% 8.55% 6.09%
123 2 0.7261 69.05% 8.08% 11.84% 5.30% 4.22% 1.49%
W e a v e r | 12
The other way to find run expectancies is by a Markov process. This involves only
observing the probabilities of going from one game state to another to calculate run expectancies
(the method is outlined in full in Section 2.2). Using the data from the 2013 MLB season gives
Table 4.
Table 4: Markov Chain Scoring Distribution by Base/Out State for 2013
MLB Season
Base State Outs RE 0 runs 1 runs 2 runs 3+ runs
000 0 0.4661 73.66% 14.69% 6.61% 5.04%
000 1 0.2459 84.75% 9.41% 3.65% 2.19%
000 2 0.0930 93.54% 4.49% 1.38% 0.59%
100 0 0.8251 59.47% 17.45% 12.38% 10.70%
100 1 0.4929 73.91% 12.03% 8.48% 5.58%
100 2 0.2138 87.58% 6.15% 4.45% 1.82%
020 0 1.0890 39.16% 34.40% 14.13% 12.31%
020 1 0.6391 61.07% 23.88% 8.91% 6.14%
020 2 0.3017 79.08% 14.73% 4.08% 2.11%
003 0 1.3265 17.39% 55.14% 14.64% 12.83%
003 1 0.9294 34.91% 48.36% 9.89% 6.84%
003 2 0.3492 74.65% 19.01% 4.11% 2.23%
120 0 1.4075 39.19% 22.46% 15.64% 22.71%
120 1 0.8843 58.88% 16.69% 10.72% 13.71%
120 2 0.4164 77.94% 10.84% 5.47% 5.75%
103 0 1.7157 15.87% 41.71% 18.06% 24.36%
103 1 1.1472 36.39% 36.95% 12.30% 14.36%
103 2 0.4941 72.17% 15.35% 6.26% 6.22%
023 0 1.9665 13.45% 27.20% 32.11% 27.24%
023 1 1.3565 32.77% 27.82% 22.89% 16.52%
023 2 0.5453 74.75% 5.25% 13.95% 6.05%
123 0 2.1825 15.87% 26.72% 21.23% 36.18%
123 1 1.5586 33.67% 25.32% 16.39% 24.62%
123 2 0.7347 68.75% 8.59% 11.44% 11.22%
The final issue to consider is which set of run expectancies should be used in the analysis:
the empirical run expectancies or those found through Markov processes. There are issues with
both. First baseball may not be a perfect Markov process, which means that a game state could
have a different value depending on the previous game state. For example game state π β² could
have a higher value if the previous game state was π instead of π . Second, the empirical run
expectancies do not include data for the final half innings of games and these frames could have
a different run distribution (perhaps games are more likely to end in frames where no runs are
scored). Last, some game states are rare in baseball so even with huge data sets differences could
W e a v e r | 13
persist between the Markov run expectancies and the empirical run expectancies even if baseball
is a perfect Markov game.
For this investigation, when the run expectancies for the entire game are used then we
will use the Markov run expectancies. This is because when a method is used to incorporate the
final frame of all games in the empirical method, then the Markov run expectancies and
empirical run expectancies are incredibly similar. Consider when the final frames of games are
not included then base/out state π 0000 occurs 43072 times, and these states lead to 20177 runs
scored from that state to when the frame ends; this a run expectancy of 0.4684. If the final frame
is included, then the base/out state π 0000 occurs 45574 times and leads to 21051 runs for a run
expectancy of 0.4619. The difference here is that final frames can end in a game state where
more runs are possible if the frame was played to completion. Using the empirical run
expectancies (excluding last frames), it was found that the 2502 occurrences of π 0000 in the final
frame lead to game states whose run expectancies accumulated to another 163.09 runs being
scored. Adding this to the 21051 runs scored, gives an overall empirical run expectancy of
0.4655 for state π 0000 over all frames. This is just a 0.0006 (or 0.13%) difference from the
Markov run expectancies, which is very small. Instead of doing this calculation for each game
state (and because there are some minor assumptions in this calculation), the Markov run
expectancies will be used because they are essentially equivalent to the empirical run
expectancies (and there are issues with both methods).
3.3 The Basic Model Applied to Empirical Data
The Basic Model will now be used to evaluate empirical data from the 2013 MLB season. As a
reminder, here are the three major conditions/assumptions of the Basic Model.
Only plate appearances in first 5 innings are considered
The player can only use a pure bunt or nonbunt strategy
Changes in game state during a plate appearance are independent of choice of
strategy by the batter
As a result of the first condition, the run expectancy tables derived in section 3.2 are not
satisfactory as these are run expectancies for the entire game; the first five innings have a
different run expectancy distribution. Also, the issue of partial frames is not present when only
the first five innings are considered, and so run expectancies over the first five innings in the
investigation will be the empirical run expectancies. The reason is games must be at least five
innings to be official, and no team can win via a walk off in the first five inning (there is a very
W e a v e r | 14
rare case of a game being called during the bottom half of the fifth inning if the winning team is
up at bat and the weather suddenly becomes extreme; in these cases previous game states in the
frame will be worth the final game state in the frame). Table 5 shows the empirical run
expectancies for the first five innings of the game.
TABLE 5 WILL GO HERE
TO BE COMPLETED
W e a v e r | 15
Working Bibliography
[1] http://www.amazon.ca/The-Baseball-Encyclopedia-Complete-Definitive/dp/0028608151
[2] http://www.amazon.ca/The-Bill-James-Handbook-2014/dp/0879465131
[3] http://www.amazon.ca/Baseball-Prospectus-2014/dp/1118459237
[4] http://www.hardballtimes.com/tht-live/its-the-hardball-times-annual-2014/
[5] http://www.amazon.ca/The-Book-Playing-Percentages-Baseball/dp/1597971294
[6] http://bleacherreport.com/articles/1639658-explaining-why-the-bunt-is-foolish-in-todays-mlb
[7] http://www.lookoutlanding.com/2013/8/5/4589844/the-evolution-of-the-sacrifice-bunt-part-1
[8] http://espn.go.com/blog/sweetspot/post/_/id/33556/the-sacrifice-bunt-isnt-dead-yet
[9] http://m.njit.edu/~bukiet/Papers/ball.pdf
[10] https://www.edsolio.com/media/2/265/files/Tesar_FinalDraft.pdf
[11] https://www.stat.berkeley.edu/~aldous/157/Papers/albert_streaky.pdf
[12] A Player Based Approach to Baseball Simulation (PHD THESIS) - A.P. Sugano
[13] http://espn.go.com/mlb/stats/parkfactor
[14] http://www.amazon.ca/Total-Baseball-Official-Encyclopedia-League/dp/1930844018
[15] http://www.baseball-reference.com/teams/COL/attend.shtml
Top Related