Preemptive Depot Returns
for a Dynamic Same-Day Delivery Problem
Marlin W. Ulmer Barrett W. Thomas Dirk C. Mattfeld
Abstract
In this paper, we explore same-day delivery routing and particularly how same-day delivery
vehicles can better integrate dynamic requests into delivery routes by taking advantage of
preemptive depot returns. A preemptive depot return occurs when a delivery vehicle returns to
the depot before delivering all of the packages currently on-board the vehicle. In this paper,
we assume that a single, uncapacitated vehicle serves requests in a particular delivery area.
Beginning the day with some known deliveries, the vehicle seeks to serve the known requests
as well as additional new requests that are received throughout the day. To serve the new
requests, the vehicle must return to the depot to pick up the packages for delivery. In contrast to
previous work on same-day delivery routing, in this paper, we allow the vehicle to return to the
depot before serving all loaded packages. To solve the problem, we couple an approximation
of the value of choosing any particular subset of requests for delivery with a routing heuristic.
Our approximation procedure is based on approximate dynamic programming and allows us
to capture both the current value of a subset selection decision but also its impact on future
rewards. Using extensive computational tests, we demonstrate the value of both preemptive
depot returns and of the value of the proposed approximation scheme in supporting preemptive
returns. We also identify characteristics of instances for which preemptive depot returns are
most likely to offer improvement.
Keywords: stochastic dynamic vehicle routing, same-day delivery, preemptive depot returns,
approximate dynamic programming
1
1 Introduction
In 2014, e-commerce continued its rapid growth with sales increasing 20% bring the market total
to nearly $840 billion. Albeit at a slower pace, growth is expected to continue with sales forecasted
to be over $1.5 trillion by the end of 2018 (Ben-Shabat et al. 2015). With Amazon continuing to
expand its same-day delivery services (Fedde 2016) and companies such as BestBuy and Macy’s
adding such services in the last year (Addady 2015, Kumar 2016), the same-day delivery segment
is expected to out pace general e-commerce growth with an annual growth rate of 40% (Yahoo!
Finance 2016). As companies continue to seek to take advantage of this growth and enter the
same-day delivery market, competition will force companies to most efficiently deliver the same-
day packages.
In this paper, we explore same-day delivery routing and particularly how same-day delivery
vehicles can better integrate dynamic requests into delivery routes by taking advantage of a pre-
emptive depot return. A preemptive depot return occurs when a delivery vehicle returns to the
depot before delivering all of the packages currently on-board the vehicle. In this paper, we as-
sume that a single, uncapacitated vehicle serves requests in a particular delivery area. This vehicle
can be viewed as part of a fleet, perhaps serving a particularly service area, but operating indepen-
dently of the other vehicles in the fleet. The vehicle begins the day with a set of known requests
and additional new requests are received throughout the day. These known requests represent or-
ders placed before the start of a day’s deliveries. To serve the new requests, the vehicle must return
to the depot to pick up the packages for delivery. In contrast to previous work on same-day de-
livery routing, in this paper, we allow the vehicle to return to the depot before serving all loaded
packages. This preemption of a route allows the vehicle to take advantage of the possibility of
efficiently integrating new service requests that are located close to the existing route. The vehicle
seeks to serve the known requests as well as many as possible of the same-day delivery requests
that occur over the problem’s time horizon. We assume that the vehicle can decide which subset of
dynamic requests to serve and that all remaining requests are served by an alternative, but more ex-
pensive means. Generally, the problem can be defined as stochastic dynamic one-to-many pickup
and delivery problem (SDPD).
To facilitate same-day delivery routing with preemptive depot returns, we introduce an ap-
2
proach that we call anticipatory preemptive depot return (APDR). In APDR, we couple an ap-
proximation of the value of choosing any particular subset of requests for delivery with a routing
heuristic. Our approximation procedure is based on approximate dynamic programming (ADP)
and allows us to capture both the current value of a subset selection decision but also its impact on
future rewards. Our approximation procedure relies on offline simulation and a reduction of the
state space via aggregation. Our proposed aggregation scheme explicitly captures planned depot
returns. While the aggregation scheme allows us to overcome the problem’s large state space, we
must also take steps to overcome the problem’s large decision space is the result of the need to not
only select subsets of customers for service but to also route them. We overcome this challenge by
introducing a routing heuristic that accounts for preemptive depot returns.
This paper makes a number of contributions to the literature on same-day delivery routing.
First, this paper is the first to introduce a method for preemptive depot returns for same-day de-
livery routing. Further, the proposed APDR approach makes use of offline simulation to generate
approximations and thus allows instant online decision making. Existing approaches rely on online
sampling or rollout procedures. In extensive computational studies, we demonstrate the value of
both preemptive depot returns and of the value of the proposed aggregation scheme in supporting
preemptive returns. We also show that the number of customers at the start of the day has a strong
influence on the value of preemptive depot returns as does the combination of depot location and
the customer distribution.
The remainder of this paper is structured as follows. In §2, we present the related literature. In
§3, we present a formal problem description and a Markov decision process model. The solution
approach APDR and the benchmark heuristics are defined in §4. We describe our experimental
design in §5 and the result of our computational experiments in §6. The paper concludes with a
summary of the results and directions for future research in §7.
2 Literature Review
The SDPD is dynamic and stochastic following the definition given in Kall and Wallace (1994).
The problem is dynamic because the problem information changes over the course of the day and
because the dispatcher can adapt plans and decisions in response to newly learned information.
3
The problem is stochastic because the customer realizations that appear over the course of the
day follow a know spatial and temporal probability distribution. In this literature review, we first
survey work on same-day delivery and then related work in the areas of dynamic pickup and
delivery, vehicle routing with stochastic demands, vehicle routing with stochastic requests, and
grocery delivery problems.
The most closely related work to that in this paper is the small set of literature concerning
same-day delivery. The literature on same-day delivery is primarily contained in three papers:
Voccia et al. (2015), Azi et al. (2012), and Klapp et al. (2015). There are two key differences
between the three papers and this paper. First, none of the three papers allows preemptive vehicle
returns. As we show in Section 6, allowing preemptive returns in a same-day delivery problem
has the potential to improve the number of customers who can be served on a given day. Second,
the bulk of the computation time for our solution approach comes from offline simulation that
takes place in advance of the problem horizon. The result is that, in runtime, decision making is
almost instantaneously. The methods presented in the three papers cited above all rely on online
solution methods. Thus, at runtime, the three cited papers require computation to make a decision,
potentially impacting the ability to make decisions in real time.
Voccia et al. (2015) present a problem setting similar to that in this paper in which problem
requests arrive over the course of the day and must be served on that day through multiple returns to
the depot. Each request must be served within a time window or by a deadline. Voccia et al. (2015)
introduce an sample-scenario planning approach to solve the problem. Rather than operating on
an estimation of the value function as we do in this paper, sample-scenario planning generates a
sample of future requests, combines them with existing requests, and produces solutions for each
set. Then, the method chooses the solution that is most similar to the others. Unlike the work in
this paper, the solution method in Voccia et al. (2015) does not explicitly account for preemptive
depot returns. Rather, vehicles must deliver all loaded packages before returning to the depot.
Such a strategy is known as plan-at-home (PAH). However, while Voccia et al. (2015) does not
allow preemptive depot returns, route selection in the sample-scenario method is designed such
that shorter routes and thus many depot returns are generated. We note that our H3 benchmark
is an ADP analog to the method in Voccia et al. (2015) in that it too can choose shorter or longer
routes depending on the approximated future value of a particular route length. Thus, while not
4
implementing it directly, we do use a benchmark that is similar to the approach presented in Voccia
et al. (2015) with the advantage that the calculation is conducted offline.
Also related to this paper is the work of Azi et al. (2012). Similar to this paper and Voccia et al.
(2015), Azi et al. (2012) study a problem in which requests are received throughout the day and are
served through multiple trips to a depot. Similar to Voccia et al. (2015), Azi et al. (2012) develop
a sample-scenario planning approach, and like Voccia et al. (2015), do not explicitly consider
preemptive returns to the depot. Instead, Azi et al. (2012) constrain the length of the tours on the
vehicles leaving the depot. In this way, Azi et al. (2012) implicitly recognize the value of depot
returns, but in contrast to this paper and Voccia et al. (2015), the length of tour limitations proposed
in Azi et al. (2012) are fixed throughout the horizon and do not adapt to changing state information.
Klapp et al. (2015) also explore the same-day delivery problem, but assume that all customers
are on a line. Thus, the routing and subset selection problem are integrated in a single decision,
importantly eliminating the expensive step of evaluating the cost of a chosen subset of customers,
a step that is necessary when considering a general network as we do in this problem. Further,
the formulation does not allow for preemptive depot returns. Given their problem setting, Klapp
et al. (2015) demonstrate how to efficiently find optimal a priori routes and then introduce an
approximate dynamic programming approach known as rollout (see Goodson et al. (2016b) for an
overview of rollout algorithms) to leverage the a priori routes in a dynamic setting.
The SDPD can also be seen as a special case of a dynamic pickup and delivery problem.
In stochastic and dynamic many-to-many pickup and delivery problems (DPDPs), usually both
origin and destination of a request (order) are stochastic. Thus, the main difference between the
SDPD and DPDPs is that, in the SDPD, the pickup location is known beforehand. An overview
on dynamic pickup and delivery problems is given by Berbeglia et al. (2010). Work considering
stochastic requests is conducted, e.g., by Saez et al. (2008), Pureza and Laporte (2008), Mes et al.
(2010). From a methodological perspective, the singular pickup location in the SDPD allows us
to propose a unique state-space aggregation scheme that allows for improved estimation of future
costs. Further, we proposed a solution method that relies on offline computation.
The concept of preemptive depot returns is mainly found in the literature on vehicle routing
with stochastic demands (VRPSD). For these problems, the customers are known but the amount
of demand at each customer is unknown prior to arrival. Thus, the vehicles may need to return to
5
the depot to replenish capacity. The most recent work on preemptive depot returns for the VRPSD
can be found in Goodson et al. (2016a). Goodson et al. (2016a) provide a comprehensive review
of literature related to preemptive depot returns for the VRPSD and present a rollout approach the
embeds a dynamic-programming approach to find optimal preemptive returns for fixed sequences
of customers. Other work on preemptive depot returns in the VRPSD literature can be found in
Bertsimas et al. (1996), Yang et al. (2000), and Secomandi (2003).
The work in this paper differs from the VRPSD literature in a number of ways. First, in the
SDPD, each customer requires a unique good, and as a result, the vehicle must return to the depot to
serve any customer who is not loaded on the vehicle at the start of the day. Further, we must make a
subset selection decision on every return to the depot, thus combining subset selection with routing
decisions when at the depot. In some of the VRPSD literature, only a subset of customers is served,
but the subset selection and routing do not need to be done simultaneously at the depot. Finally,
we note that it is possible that a variant of the solution approach and particularly the aggregation
scheme used in this paper could be adapted to the VRPSD.
The SDPD can be also seen as a generalization of the dynamic vehicle routing problem with
stochastic customer requests (VRPSR). For the VRPSR, only a few customers are known in the
beginning of the horizon. Additional customers request service throughout the day, and the vehicle
seeks to visit the requesting customer locations. However, time limits mean that not all of the
requests can be served, and decisions are made about which subsets of requests to accept and the
assignment and routing of the requests. For the VRPSR, Ulmer et al. (2016a) present an approxi-
mate dynamic programming approach that uses aggregated states and approximate value iteration
to develop a lookup table. They call their aggregation scheme the anticipatory time budgeting ap-
proach (ATB). We use a variant of the ATB approach as one of the benchmarks in this paper. In
this paper, we also extend the aggregation to include problem information specific to the SDPD
and demonstrate the value that this additional information brings to the approximation of the value
function. Ulmer et al. (2016a) demonstrates superior solution quality for the VRPSR compared to
state-of-the-art approaches in Ghiani et al. (2011) and Meisel (2011). Ulmer et al. (2015) combines
ATB with a rollout approach to generate solutions that improve on those of Ulmer et al. (2016a).
The combination of ATB variant proposed in this paper and rollout is an opportunity for future
work. Currently, we have not identified a way to generate heuristic policies without substantial
6
computation, thus limiting the ability to develop a rollout approach that can be executed in real
time.
Older VRPSR literature relied on waiting strategies, strategies that have the vehicle wait in
particular locations in anticipation of future requests. Mitrovic-Minic and Laporte (2004) provide
a nice overview of the strategies. Of the strategies, only the wait-at-start heuristic (WAS) presented
by Mitrovic-Minic and Laporte (2004) can be adapted to the problem in this paper. For WAS, the
vehicles idles at the depot as long as possible. It is possible to adapt WAS because all assignments
take place when the vehicle is located at the depot. For the SDPD, the application of WAS does
not require any depot returns after the vehicle has left the depot. However, preliminary tests reveal
WAS to perform poorly for the SDPD.
Also related to same-day delivery is the grocery-delivery problem. While the deliveries for
grocery delivery usually take place the day after orders are made, at the time an order is placed, the
decision maker must determine whether or not the order can be feasibly served on the next day’s
routes given the existing requests. In that case, the evaluation of each request is related to the need
in this problem to evaluate the routing cost of a subset of customers. Recent work can be found in
(Ehmke et al. 2015) and (Ehmke and Campbell 2014).
3 Problem Description and Model
In this section, we present the SDPD and model it as a Markov Decision Process (MDP).
3.1 Problem Description
We assume that a single vehicle delivers customer orders in a service areaA. The vehicle starts the
tour at a depot D, travels with constant speed ν and returns before the time limit tmax. The travel
time between two customers C1, C2 ∈ A is determined by a specific d(C1, C2) ∈ N+. The service
time at a customer is ζc. In the beginning, a set of initial orders (IOs) C0 ∈ A is known and must
be served. We assume that these IOs are loaded on the vehicle at the beginning of the day. During
the day, stochastic orders (SOs) C+ occur in A. When the vehicle is located at a customer or at
the depot, the dispatcher determines which customer to visit next or whether to return to the depot.
When at the depot, the vehicle can be loaded with packages destined for realized and assigned
7
Table 1: MDP: Customer Set Notation
Notation Description
C0 Set of initial orders
C+ Set of stochastic orders
Cl(k) Set of loaded orders in Sk
Cι(k) Set of not loaded but preliminarily assigned orders in Sk
Cε(k) Set of not loaded and preliminarily excluded orders in Sk
Cr(k) ⊂ Cε(k) Set of new requests in Sk
SOs. The loading time at a the depot ζd is independent of the number of loaded orders. We assume
that the IOs are loaded before the start of the horizon. In determining which SOs should be served,
the dispatcher ensures the existence of a feasible tour serving all assigned IOs, previously loaded
SOs, and newly loaded SOs. We assume that once loaded on the vehicle, the packages destined for
SOs must be delivered. The objective is to maximize the expected number of SOs served over the
horizon of the problem.
3.2 Markov Decision Process Model
In this section, we model the SDPD as a Markov Decision Process (MDP). The MDP for the
SDPD can be formulated such that assignment decisions are made only when the vehicle is located
at the depot. That is, when the orders are loaded. For algorithmic purposes, however, we consider
preliminary assignments of customers at every decision point. These preliminary assignments
are not binding and can be changed at future decisions, provided that the assigned customers
have not yet been loaded on the vehicle. To facilitate these preliminary assignments, we present
an alternative MDP-formulation that integrates a planned tour θk into the state variable Sk, the
decision variable x, and in the reward function R = (Sk, x). Because the planned tour is used only
for the purposes of a heuristic solution method, it does not alter the validity of the model and could
be omitted.
A decision point or decision epoch k occurs when the vehicle visits a customer or the depot.
8
The decision state Sk ∈ S is defined as follows. At a minimum, the state must contain all of the
data necessary for determining feasible decisions, the cost of those decisions as well as defining
the transitions to future states. For the SDPD, we can think of the state in terms resources and
information. The resources associated with the SDPD are the available time t(k) and the location
of the vehicle Pk. The information in the SDPD is the information about service requests and the
statuses of those requests. For this purpose, we have three sets per state, summarized in Table 1.
We let Cl(k) ⊂ C0 ∪ C+ be the set of customers whose packages have been loaded on the vehicle
for delivery. We call these loaded customers. We denote the set of assigned, but not loaded SOs
as Cι(k) ⊂ C+ and the set of excluded SOs, Cε(k). The set Cε(k) denotes the currently excluded
SOs containing Cr(k) ⊂ Cε(k), the SOs realized between t(k − 1) and t(k) for k > 0. As noted
previously, we also include in the state the planned feasible tour θk = (Pk, . . . ,D). This planned
tour θk = (C1, . . . , Cn,D, Cn+1, . . . , Cn+m,D) defines the planned sequence of customer and
depot returns and may contain preliminarily assigned SOs. These preliminary SOs are SOs that
are part of the tour for planning purposes but are not yet loaded to the vehicle. These preliminary
SOs may in fact not be served by the vehicle if later decision change their assignment status. Thus,
the state at decision point k is given by the tuple Sk = (t(k), θk,Pk, Cl(k), Cι(k), Cε(k)).
Decisions x ∈ X(Sk) are made about the subsets C(ι,x)(k) ⊂ Cε(k) ∪ Cι(k) to preliminary
assign, the resulting subset C(ε,x)(k) to preliminary exclude, and the according update of θk to
θxk . The update of the planned tour determines the next location to visit. This location Cnext ∈
D ∪ Cl(k) ∪ Pk can be chosen from the set of loaded customers, the depot, or the current
location, a choice which implies that the vehicle idles at the current location for a length of time t.
If the vehicle is located at the depot, Pk = D, the selection of assigned SOs C(ι,x)(k) at epoch k are
loaded to the vehicle. Decision x is feasible if θxk starts inPk, serves all customers Cl,x(k)∪C(ι,x)(k),
and returns to the depot within the time limit. Formerly, feasibility is given by:
d(θxk) ≤ tmax − t(k).
Each decision results in an immediate reward. This reward is equal to the number of newly
assigned SOs minus the number of excluded formally included SOs. Formally, the rewardR(Sk, x)
is defined as
9
R(Sk, x) = |C(l,x)(k)|+ |C(ι,x)(k)| − |Cl(k)| − |Cι(k)|.
The execution of x results in a post-decision state Sxk ∈ Sx. The post decision-state represents
the deterministic transition from Sk that results from the execution of x. Notably, the post-decision
state captures the updated route plan and the updated sets of loaded, assigned, and excluded cus-
tomers resulting from x: C(l,x)(k), C(ι,x)(k), C(ε,x)(k). If x calls for the loading of orders, the accord-
ing sets of loaded and not loaded customers are modified as follows. The customers corresponding
to the loaded goods are transferred from Cι(k) to C(l,x)(k) and C(ι,x)(k) = ∅. The transition also
updates the vehicle location to the next visit location resulting from x. Formally, given Sk and x,
the post-decision state is
Sxk = (t(k), θxk , Cnext, C(l,x)(k), C(ι,x)(k), C(ε,x)(k)).
At the next decision epoch, the point of time at which the vehicle arrives to the next location
and finished service at that location. The time of this decision point is a function of the location to
which the vehicle is traveling and is deterministic. The next decision points occurs at t(k + 1) =
t(k) + d(Pk, Pk+1) + ζ with ζ ∈ ζc, ζd, 0 dependent on the next location Pk+1. Time ζ = 0 is
the special case, when k+ 1 = K. That is, the vehicle eventually returns to the depot at the end of
the horizon.
With the new decision point, there occurs a stochastic transition from the post-decision state
to the next pre-decision state. This transition is defined by the realization ωk+1 ∈ Ωk+1. This
realization identifies the set of customer orders that were realized between t(k) and t(k + 1).
Specifically, the realization ωk+1 provides a set of customers Cr(k + 1) = C1, . . . , Ch. The next
state Sk+1 contains the time t(k + 1), the remaining loaded and unloaded customers dependent on
C(l,x)(k), C(ι,x)(k), the vehicle’s location Pk+1 = Pxk , and the planned tour θk+1. If no customer is
visited in t(k + 1), Cl(k + 1) = C(l,x)(k) and θk+1 = θxk remain. Otherwise, the visited customer
Ck+1 is removed from the according set as θk+1 = θxk\Ck+1.
The initial state S0 is defined by t(0) = 0, P0 = D, the initial orders C0, and the initial tour
θ0 = (D,D). Termination state SK is defined by t(K) = tmax, PK = D, Cl(K) = ∅, and θK = (D).
The objective for the SDPD is to determine an optimal decision policy π∗ ∈ Π leading to the
highest expected sum of rewards. Formally, the objective is
10
P P PPosition
Customer(loaded)
Customer(notloaded)
Customer(excluded)
Depot
t=120 t=120
x
Figure 1: Exemplary State, Decision, and Post-Decision State
π∗ = arg maxπ∈Π
E
[K∑k=0
R(Sk, Xπk (Sk))|S0
]. (1)
Decision rule Xπk (Sk) determines the decision x selected by policy π in state Sk.
3.3 Example
In this section, we present an example of the components of the MDP for the SDPD. The example
is given in Figure 1. The example depicts the exemplary state at time t(k) = 120, the vehicle
having just served a customer. The planned tour θk is depicted by the dashed lines. Tour θk plans
to visit the loaded customer on the left side of the area, return to the depot, and visit the unloaded
but assigned customer and the loaded customer in the bottom of the area. Currently, two SOs
are excluded, located in the right of the service area. They may be new SOs or orders formerly
excluded.
The selection of x leads to a post-decision state Sxk . In Figure 1, this post-decision state is
represented on the right-hand side of the block arrow. Decision x excludes the customer on the
bottom left that is not currently loaded and includes the SOs located on the right side of the service
area. The next location to visit is the customer on the left side. The dashed lines represent the
feasible tour θxk . This planned tour may change as the vehicle returns to the depot before serving
the currently assigned but not yet loaded customers. Since two orders are assigned and one is
11
excluded, the resulting reward associated with the decision x is R(Sk, x) = 2− 1 = 1.
4 Anticipatory Preemptive Depot Return Approach
It is well known that Equation 1 can be solved by using backward induction applied to the Bellman
Equation or
V (Sk) = maxx∈X(Sk)
R(Sk, x) + E[V (Sk+1) | Sk ] . (2)
However, the backward induction approach suffers from the “curse of dimensionality.” That is,
in many problems and the problem presented in this paper, the number of states is so large that
a backward induction approach is impossible both in terms of computation time and the memory
needed to store the solution. This curse has led to the development of what is known as approximate
dynamic programming (see Powell (2011) for an overview of approximate dynamic programming).
In contrast to backward dynamic programming, approximate dynamic programming relies on a
forward approach. However, in stepping forward, the second term of Equation 2 is unknown and
must be approximated. As a result, these methods operate on the approximate Bellman Equation
given by
V (Sk) = maxx∈X(Sk)
R(Sk, x) + E
[V (Sk+1) | Sk
]. (3)
For our purposes, it will be more convenient to operate on an equivalent approximate Bellman
Equation, the post-decision Bellman Equation, given as:
V (Sk) = maxx∈X(Sk)
R(Sk, x) + V (Sxk )
, (4)
where V (Sxk ) is known as the value of the post-decision state.
There are many methods for creating this approximation. The simplest is the myopic approach
that simply sets the second term of Equation 4 to zero. We will use such an approach as one of our
benchmarks. However, to anticipate the impact of the current state decision on future decisions and
orders, we seek to learn values of the second term in Equation 4. To do so, we employ an offline
simulation procedure known as Approximate Value Iteration (AVI). We highlight the specifics of
our AVI procedure in §4.1. Because of the large number of states, it is not possible to implement
AVI directly. We must instead operate on an aggregation of the states. We detail our aggregation
scheme in §4.2.
12
Unfortunately, in addition to the proliferation of states, the problem discussed in this paper also
suffers from another curse of dimensionality, the size of the decision space. Notably, not only must
we selection a subset at each decision epoch, but we must also route that selected subset. In this
paper, we heuristically reduce the possible decision space by using a simple routing heuristic that
it incorporates depot returns. The details of our procedure can be found in §4.3. Combining the
three elements of the solution approach, we call our solution approach the anticipatory preemptive
depot return approach (APDR).
4.1 Approximate Value Iteration
In this section, we define our method for determining the approximate value of the second term in
Equation 4. In this section, we present the method generally and as if we are determining an ap-
proximate value for each post-decision state. However, in execution, we operate on an aggregated
set of states. We define this aggregation in the following section.
Our AVI method is derived from (Powell 2011, pp. 391ff) and uses offline simulation to deter-
mine approximated values. The approximated values are stored in a lookup table and can then be
used to solve Equation 4 in real time. Thus, the computational burden of the approach is mainly
done prior to when decision making is required and greatly reduces the computational computa-
tional burden at runtime.
The procedure is described in Algorithm 1. AVI starts with initial values V0(Sxk ) for every post-
decision state Sxk . Then, AVI iterates through a set of sample path realizations Ω = ω1, . . . , ωN.
Then, at each iteration i and each step in a given sample path realization ωi, the algorithm solves
the approximate Bellman equation (line 12) using the current approximation of the post-decisions
states Vi−1. The value of the selected decision, given in line 14, is used in line 20 to update the ap-
proximated post-decision state values. The algorithm returns values VN that we use to approximate
the second term of Equation 4 at runtime.
4.2 State Space Aggregation
Because of the large numbers of post-decision states required, we cannot actually find values for
each post-decision state and instead develop the approximation on aggregated post-decision states.
13
Algorithm 1: Approximate Value Iteration
Input : Initial Values V0, Realizations Ω = ω1, . . . , ωN, Step Size α
Output : Values VN
22 // Simulation
3 i← 1
4 while (i ≤ N) do
5 k ← −1
6 R−1 ← 0
7 S0 ← Sωi
0
8 while (Sxk 6= SK) do
9 k ← k + 1
10 if k ≥ 1 then Sk ← (Sxk−1, ωik−1)
11 else Sk = S0
12 xk ← arg maxx∈X(Sk)
R(Sk, x) + Vi−1(Sxk )
13 Sxk ← (Sk, x)
14 Rk ← Rk−1 +R(Sk, x)
15 Sx ← Sx ∪ Sxk
16 end
1818 // Update
19 for all Sxk ∈ Sx do
20 Vi(Sxk )← (1− α)Vi−1(Sxk ) + α(RK −Rk)
21 end
22 i← i+ 1
23 end
24 return VN
In aggregating the post-decision states, we seek to meet two criteria first presented in Barto
(1998, p. 193). First, the resulting space needs to be of a size that allows a sufficient number
of observations and a reliable approximation. Second, the aggregation must maintain the main
distinguishing features of the post-decision space. As a starting point, we draw on Ulmer et al.
14
(2016a) which proposes the parameters point of time t and free time budget b as the basis for
aggregation for the VRPSR.
The current point in time, t(k) is given in the state. The free time budget b(k) follows from the
current time and current planned tour. Essentially, the free time budget is the amount of time left
before the end of the horizon after serving the remaining planned tour θxk starting at time t(k). For
the VRPSR, the earlier it is in the horizon and the more free time budget is left, the higher may the
value of a post-decision state be. Formally, we define the free time budget as
b(k) = tmax − t(k)− d(θxk). (5)
While point of time and free time budget provide a sufficient aggregation for the VRPSR, the
SDPD is complicated by the return trips to the depot, and future depot returns may significantly
impact the value of a state. For example, if a depot return early in the horizon might find only a
few orders available for loading, and to serve customers requesting later in the horizon, the vehicle
may need to return to the depot an additional time. This early return time will then likely have a
relatively lower future value. Alternatively, a depot return near the end of the horizon might find
many requests that need to be served, but very little time to serve them. Then, again, the future
value associated with such a depot return is low.
Given the dynamic nature of the problem, we do not know exactly when the vehicle will return
to depot if at all. However, with θk in the state, we do know at what time a depot return is currently
scheduled. We integrate the time of this scheduled return into the aggregation. We denote the time
associated with the first return to the depot given sk as a(k).
The proposed aggregation A : Sx → P ( N3 results in post-decision states A(Sxk ) = pk
represented by 3-dimensional vectors pk = (t(k), b(k), a(k)) ∈ P . Representation P spans a
3-dimensional vector space as defined in Equation 6:
P = A(Sxk ) : Sxk ∈ Sx. (6)
The value of a post-decision state Sxk can now be represented by the value V of the vector pk:
V (Sxk ) ≈ V (A(Sxk )) = V (pk). The values Vi(Sxk ) are therefore replaced by Vi(pk) in lines 12 and
20 of Algorithm 1. The application of A results in a significantly smaller vector space Q. Since
for the SPDP all three parameters are discrete, P can be associated with a 3-dimensional lookup
15
table (LT) with dimensions t, b, a ∈ 0, . . . , tmax. Based on the aggregated post-decision states in
the LT, we are now able to store the values for AVI.
4.3 Subset Selection and Preemptive Depot Return Routing
The final component of our solution approach overcomes the challenge of the large decision space
present in this problem. The decision space’s dimensionality is vast due to two reasons. First, as
long as a request is not loaded onto the vehicle, the MDP model allows for reconsideration of SO-
assignments. In combination with new SOs, this leads to a significant subset selection subproblem.
Second, the set of potential routing plans is vast, especially when considering potential depot
returns.
Given this curse of dimensionality in the decision space, in the APDR, we use a heuristic to
limit the number of subsets and routing schemes that must be considered when solving line 12
of Algorithm 1 and in runtime when solving Equation 4. These heuristics can be thought of as a
means to restricting the decision space at each decision epoch.
To alleviate the subset selection complexity, at every decision point, APDR and the benchmark
policies presented in §4.4 maintain the already assigned SOs and determine only the subset of new
SOs to assign. An SO that unassigned SO at a particular decision point is permanently excluded
if not assigned at that decision point. We do not reconsider assignments and exclusions due to the
combinatorial complexity of considering all possible subsets.
For each state on each sample path of each iteration of Algorithm 1, a decision is selected in
line 12. This decision involves the selection and routing of a subset of customers. As discussed
previously, the size of the decision space is such that it is impossible to solve Equation 4 optimally.
Instead, for each subset of new customer requests, we heuristically generate the new planned tour
θxk , for a state Sk, a current tour θk, and a set of SOs Cr. We call the approach the preemptive depot
return routing approach (PDR).
To heuristically generate tours for a set of SOs, PDR draws on a modification of cheapest in-
sertion (CI), which is first introduced by Rosenkrantz et al. (1974). We derive our implementation
from that proposed in Azi et al. (2012). CI has the advantage of being efficient at every decision
point. Further, the resulting routes are such that they are comprehensible to the driver. Because CI
maintains the sequence of customers, the dispatcher might even be able to communicate approxi-
16
1. 2. 3. 4.
Figure 2: Routing and Insertion for PDR
mate delivery times (Ulmer and Thomas 2016). A downside of PDR is that it does not necessarily
return optimal routes and thus may reduce the set of feasible orders.
The procedure of PDR is described in Algorithm 2 found in the Appendix. Let D denote the
depot, Pk the vehicle’s position, Cl the loaded IOs. Further, we let Cn represent the assigned
unloaded SOs. The current planned tour can then be described as
θk = (Pk, Cl, . . . , Cl,D, Cn, Cl, Cn, . . . , Cl,D).
Let θjk refer to the j th component of θk, e.g., θ1k = Pk. Further, let Cr = C1
r , . . . , Chr be the
subset of new SOs to assign. PDR first removes the depot from θk leading to an infeasible tour θ.
In this infeasible tour, the customers Cr are subsequently inserted via CI at the cheapest position.
Procedure Insert(θ, θ∗, C∗) inserts the new order C∗ after θ∗ in tour θ. When all new customers
are inserted, the depot is inserted between the current position and the first not loaded customer
(Cn or Cr) via CI resulting in a tour θxk . If θxk does not violate the time limit, the tour is feasible.
We assume an initial tour θ0 = (D,D) without customers, starting and ending at the depot. Due
to the stochasticity of the problem, the integration of the IOs at k = 0 may lead to an initial tour
duration higher then the time limit. In these cases, the vehicle serves all IOs and none of the SOs.
For k > 0, there always exists a feasible decision x assigning no new SOs to the tour. In these
cases, θxk is equal to θk.
Figure 2 shows an example for PDR. The first step shows state Sk, θk, and the candidate set of
new SOs Cr. The state contains three loaded, one assigned but not loaded, and one new customer.
The current depot return is planned after serving the customer located at the top left of the service
area. In the second step, PDR removes the depot within the sequence θk. This resulting tour
is infeasible because a depot return is required to pickup customer orders at the depot. In the
17
third step, PDR then inserts the candidate subset of new customers via CI. The resulting tour is
again infeasible. Feasibility is restored by the addition of a depot return before the first not loaded
customer in the tour. The fourth step shows the depot return being inserted between Pk and the
first not loaded customer Cn.
4.4 Benchmark Heuristics
In this section, we present the benchmark heuristics that we use to test the quality of the proposed
approach. As with our proposed approach, each benchmark includes a strategy for estimating
the future value of a decision and a strategy for heuristically routing a subset of requests. For
anticipation, we consider both ATB aggregation proposed by Ulmer et al. (2016b) for the VRPSR
and a myopic assignment strategy. For routing, we consider both the preemptive method proposed
in the previous section and the well-established plan-at-home heuristic (PAH). The combination
of these anticipation and routing strategies results in four benchmarks. We also consider a fifth
benchmark derived from combining the proposed three-parameter aggregation scheme described
in §4.2 with the PAH routing scheme described subsequently.
Anticipation: Myopic and ATB
We compare our proposed anticipation approach to both ATB and myopic anticipation. As de-
scribed previously, ATB is similar to the approach proposed in this paper, but the aggregation does
not include information about depot returns. Thus, ATB aggregates over only the point of time and
the free time budget. For this paper, the lookup table for ATB is created using a AVI-procedure
analogous to that described in Algorithm 1.
For an additional point of comparison, we also consider using a myopic policy for anticipation.
The myopic approach sets the second term of Equation 4 to zero. The myopic assignment strategy
selects decision x leading to the assignment of the largest feasible subset in every decision point k.
If several decisions with the same subset cardinality exist, the strategy selects the decision of these
leading to the highest free time budget.
18
Depot Returns: Plan at Home
This paper introduces a preemptive return strategy. As a benchmark, we consider the PAH. In the
benchmark, we replace the subset selection and routing discussed in §4.3 with PAH. Particularly,
PAH is an approach that does not account for the possibility of preemptive depot returns. Thus, at
time of the return to the depot, the vehicle is empty. Upon return, a set of new requests is selected
for service, and the vehicle begins a new route.
Specifically, the PAH approach is a modification of Algorithm 2. The modification removes
lines 2 through 6 and lines 18 through 25. In addition, line 10 is modified such that j = k, . . . , |θ|−
1, where k is the position in θ of the depot. We note that the modification of line 10 means that
PAH does not serve IOs and SOs on the same tours. Like the approach discussed in §4.3, PAH
is a means of restricting the decision space. At each decision epoch, the PAH approach seeks to
accept some newly occurring requests for inclusion on the tour that will take place once the vehicle
returns to the depot. The routing of these accepted requests follows the just described version of
Algorithm 2. In the case of a myopic assignment strategy, the selection of requests amounts to
choosing the maximal feasible subset of requests. In the case of the three-parameter aggregation
strategy and ATB, the subset selection follows the scheme discussed in §4.3 and the routing is
replaced by the modification of Algorithm 2 described in the previous paragraph.
As noted in §2, Azi et al. (2012), Voccia et al. (2015), and Klapp et al. (2015) implement PAH
strategies. Our three-parameter and ATB anticipation schemes mimic the schemes in Azi et al.
(2012), Voccia et al. (2015), and Klapp et al. (2015) that, while not making preemptive depot
returns, control the length of the route to induce a depot return in the PAH scheme.
Policy Notation
The combination of routing and assignment strategies results in six different policies Pg,Hg, g =
1, 2, 3. Parameter g indicates the assignment strategy, P and H the routing. The value g = 1
indicates myopic assignments, g = 2 the ATB-assignment based on the 2-dimensional aggregation,
and g = 3 the assignments based on 3-dimensional aggregation presented in §4.2. Indicator P
represents preemptive PDR-routing and H PAH-routing.
19
5 Experimental Design
In this section, we describe the test instances and implementation that we use to demonstrate the
value of preemptive depot returns and of our proposed solution approach. We first present the
scheme for generating instances and then the details of the implementation of our APDR approach
as well as the proposed benchmarks.
5.1 Instance Generation
For all instances, we assume a closed, rectangular service area A assumed to be 20km × 20km, a
time horizon of 480 minutes discretized into 1 minute increments, and a vehicle speed ν of 20km/h.
Assuming a minimum travel time of 1 minute, the travel time between any two points (ax1 , ay1) and
(ax2 , ay2) in A is given by
d(C1, C2) = max
(⌈((ax1 − ax2)2 + (ay1 − a
y2)2)1/2
60−1ν
⌉, 1
). (7)
For all instances, we also assume the service time at a customer is ζc = 2 minutes and the loading
time at the depot is ζd = 5 minutes.
Each instance is defined by a set of parameters: expected number of customers c, the degree
of dynamism dod, depot locations D, and customer distribution F . The expected number of cus-
tomers is the sum of the IOs and SOs. We test instances with c = 30, 40, 50, 60, 80, 100 expected
customers. The degree of dynamism, first discussed in Larsen et al. (2002), is the percentage of the
expected number of customers that are dynamic. That is, the degree of dynamism is the percent
of customers who are SOs. We test instances with dod = 0.25, 0.5, 0.75. We denote the expected
number of IOs as c0 = c · (1− dod).
To analyze the interdependency of depot location and customer distribution, we define three
different depot locations and customer distributions, respectively. We set the depot locations at
D1 = (10, 10), D2 = (0, 20), and D3 = (0, 0). The latter two depot locations represent the
situation in which the vehicle is part of a fleet, but operates independently in a predefined service
area. For customer locations, we consider uniform and clustered customer distributions. We refer
to uniformly distributed customer locations asU . We define two clustered distributions of customer
locations. The first is a two cluster distribution, called 2C, with two clusters centered at µ1 = (5, 5)
20
Table 2: Instance Parameters
Parameter Values
Service area A 20km× 20km
Vehicle speed ν 20km/h
Expected number of customers c 30, 40, 50, 60, 80, 100
Degree of Dynamism dod 0.25, 0.5, 0.75
Depot location D ∈ A D1 = (10, 10), D2 = (0, 20), D3 = (0, 0)
Customer distribution F U, 2C, 3C
and µ2 = (15, 15). Customer requests are equally assigned to the clusters, and the locations follow
Normal distributions with respect to the cluster centers and standard deviation of σ = 1. Finally,
we define a three-cluster distribution of locations, called 3C. In 3C, the cluster centers are located
at µ1 = (5, 5), µ2 = (5, 15), and µ3 = (15, 5). We assign 50% of the orders to the second cluster,
25% to each of the other clusters. The standard deviations are set to σ = 1.
A summary of the instance parameters is given in Table 2. In combination, we generate a set
of 162 instances. We note that, for the uniform customer distribution, depot positions D2 and
D3 result in identical instance settings. For each instance setting, we generate 1,000 realizations.
We apply the proposed APDR and benchmarks to every realization. The details of realization
generation can be found in the Appendix.
5.2 Implementation Details
For AVI, we run 5 million approximation runs. For effective and efficient approximation, we
partition the vector space with the a dynamic lookup table approach (DLT) introduced in Ulmer
et al. (2016a). The DLT starts with a coarse-grained initial partitioning. During the approximation
process, this partitioning adapts with respect to the observations and value deviation. Entries with
a high number of observations and high value deviation are considered in more detailed while other
entries stay in their initial design. For the SDPD, all DLTs start with equidistant intervals of 16 in
each dimension. Based on preliminary tests, the disaggregation thresholds are set to τ = 3.0 for
21
both APDR and H3 and τ = 1.5 for both P2 and H2. A disaggregation divides each interval of
the entry in two equidistant halves. The number of observations is distributed equally to the new
entries. The standard deviation of the new entries is set to the standard deviation of the original
entry. The disaggregation of an entry stops when the entry reaches an interval length of 1 minute.
The update parameter α is set to the inverse of the number of observations. Ulmer et al. (2016a)
demonstrates the quality of this step-size rule for AVI coupled with DLT.
6 Computational Evaluation
In this section, we present the results of our computational experiments. We compare the proposed
APDR approach to the five previously described benchmarks. Our results demonstrate the quality
of the proposed approach and also the value of preemptive depot returns. In our presentation, we
characterize the instance parameters that both favor preemptive depot returns and those that do not.
6.1 Overall Solution Quality
In this section, we analyze the solution quality of the six different policies. Detailed results for ev-
ery instance are available in Table A1 in the Appendix. To analyze the improvement, we measure
the five benchmark policies versus the APDR. We first look at the overall quality of each solution
to the others by comparing the average over all instance settings of the percentage differences in
the average number of SOs served per instance setting. To do so, for each benchmark i and for
the APDR, we compute the average number of SOs served over all realizations for each instance
setting. For each benchmark i and instance setting j, we compute this value Qij for each bench-
mark i and QAPDR,j for the APDR approach. Then, for every benchmark i and instance setting j,
we compute the percentage difference between a benchmark i and the APDR as
QAPDR,j −QijQAPDR,j
× 100%. (8)
We then average over these percentage differences to get the average percentage difference between
APDR and each benchmark i.
Figure 3 presents the average percentage difference in solution quality of the approaches. On
the x-axis, each benchmark policy i is depicted. On the y-axis, the percentage improvement relative
22
7.4%
4.6%
9.7%
18.3%
19.3%
0%
5%
10%
15%
20%
P2
P1
H3
H2
H1
Diff
eren
ce R
elat
ive
to P
3
Figure 3: Percentage Difference of the Average Stochastic Orders Served by APDR and the Bench-
mark Policies
to APDR is shown. Positive values show that APDR is outperforming the benchmark.
The values indicate that the proposed APDR approach is best overall. The improvement of
APDR is at least 4.6% and with a difference of 19.3% when compared to H1. The results also
show that the quality of the APDR approach is due both to the preemptive returns and also to
the inclusion of planned depot return information in the aggregation. With P1 being 4.6% worse
than APDR and P2 being 7.4% worse than APDR, both less than the difference between the
APDR and the plan-at-home approaches, the results also show the relative advantage of preemptive
depot returns. In §6.3, we analyze the reason that preemptive depot returns are beneficial and also
characterize the instance settings in which preemptive returns provide the most value.
We also observe a significant gap between APDR and benchmarks P2 and P1 as well as
between H3 and benchmarks H2 and H1. Notably, the improvement from APDR to P2 is 7.4%,
even higher than compared that of APDR compared to P1. Recall that policy P2 is based on the
2-dimensional aggregation of the state space. This aggregation ignores the planned arrival time
of a return to the depot. Likewise, H3 is 8.6% and 9.6% percentage points better than H2 and
H1, respectively. These significant improvements of the 3-dimensional aggregation over the 2-
23
dimension case indicate the benefit of capturing the planned depot return time in the aggregation.
In §6.2, we investigate how including the planned arrival time in the aggregation impacts the value
of a state and the resulting subset selection.
6.2 The Value of Including Planned Depot Arrival Time in the State Space
Aggregation
In this section, we analyze why the benefit of 3-dimensional aggregation, the inclusion of the
planned depot arrival time in the aggregation. To do this, we use an example to show how the
value of the aggregated states changes with the planned depot arrival times for both the APDR
and H3 approaches. Showing these changes demonstrates the sensitivity of the post-decision state
value to the planned depot return time.
Specifically, we focus on the instance setting in which c = 50, dod = 0.5, the customers are
distributed in two clusters (2C), and the depot is in the center (D1). For this instance setting, the
solution quality of APDR and H3 are nearly similar with 10.0 and 10.1 assignments on average,
respectively. For the purposes of the example, we focus on time t = 180. At t = 180, for H3,
the vehicle has usually not returned to the depot yet. We select a free time budget of b = 100
since preliminary tests have revealed frequent observations for the combination of t = 180 and
b = 100. With the time and time budget fixed, only the planned arrival time to the depot varies in
our example. As a result, only arrival times of 180 ≤ a ≤ 380 = 480− b are possible.
For the just described setting, Figure 4 presents the post-decision state values across planned
arrival time values for both the APDR and H3 at time 180 and time budget 100. The x-axis shows
the planned arrival time a and the y-axis the value for the according vector. That is, y-axis shows
how many assignments are expected for the corresponding post-decision states. The solid line
depicts the value of the APDR approach and the dashed line H3. The occasional plateaus in the
values are the result of the varying interval sizes of the DLT.
Figure 4 shows that the value of the post-decision state is sensitive to the planned arrival time.
For example, the post-decision state value of the APDR at a = 200 is 3.86 while for a = 350,
the value is 4.82, a difference of nearly 25%. Likewise, the post-decision state value of the H3 at
a = 200 is 4.12 while for a = 350, the value is 4.74, a difference of nearly 15%. In contrast, P2
24
Figure 4: Value for APDR and H3, Instance Setting c = 50, dod = 0.5, 2C,D1. Point of Time
t = 180, Free Budget b = 100
and H2 neglect parameter a and evaluate every post-decision state with t = 180, b = 100 with the
values 4.62 or 3.83, respectively. As a result, the performance of these benchmarks is inferior to
APDR and H3, respectively.
The question remains as to why the value of the post-decision state is sensitive to the time of the
planned depot return. To answer this question, we first examine H3. We observe an increase in the
value of the post-decision state until about time a = 260. For 260 ≤ a ≤ 300, the value remains
relatively constant and drops for a > 300. This behavior can be explained by two influencing
factors. First, as more time that passes, more new requests will be accumulated. Further, because
it is a plan-at-home policy, the insertion costs for H3 decline as more customers are added to a
tour. That is, insertion costs improve with density. Accumulating enough customers to achieve this
density takes time. This factor explains the initial rise in the value of the post-decisions states for
H3.
The second factor is the length of the initial tour. As a plan-at-home strategy, the H3 approach
must finish its initial tour before returning to the depot. For the instance setting chosen for this
25
example, the average initial tour is 288.2 and thus the H3 strategy often achieves only a single
depot return. As a result, the majority of SOs requesting after the first depot return are not assigned
to the vehicle. Thus, while a later arrival time allows the accumulation of more assignments and
thus more efficient tours of those assignments, a depot return too late in the horizon begins to limit
the number of orders that can be served. Yet, it is important to note that, the closer the arrival
time is to 380, the value of a planned arrival is decreasing dramatically. The information about
the arrival time is valuable in determining the requests that should be loaded at the vehicle’s first
return to the depot. Essentially, the inclusion of the planned depot return time in the aggregation
helps determine whether or not the second tour should longer or shorter.
The APDR post-decision state values exhibit a different behavior than those for H3. The post-
decisions state values increase until a ≈ 300, a much slower rate of increase than is exhibited by
H3. After a = 300, a similar behavior to that of H3 is observed. As with H3, the increasing value
of the post-decision state up to a < 300 is the result of the need for the accumulation of requests
and the value to tours that result from the accumulation. However, because of the preemptive
returns possible with APDR, the increase in value is slower than with H3. If the initial tour is
too long, the APDR strategy can simply choose to return to depot to pickup accumulated requests.
Again though, depot returns too late in the horizon offer little value as there is simply too little
time to service additional requests.
We also note that, in this example, the value of APDR is generally lower than that of H3. This
behavior does not imply that APDR performs worse than H3. Rather, at this point in time, APDR
has already assigned more customers to the planned route. Therefore, the expected value of the
future is lower than that of H3.
To further study the impact that the planned depot returns have on routing decisions, we turn to
a second example. The example draws on a realization of the instance setting with 80 customers, a
degree of dynamism of 0.5, two clusters of customers (2C), and the depot in the third position (D3).
We choose this example because it is a good demonstration of the value of combining preemption
with the planned depot return times. For this instance setting, the average initial free time budget is
only b(0) = 46.1 minutes. That is, less than 10% of the horizon is available to serve new requests.
The average required detour to return to the depot for APDR is δ = 15.0 minutes plus five minutes
of loading at the depot. Details can be found in Tables A1 and A2 in the Appendix.
26
0
20
0 20
Tour 1
Tour 2
(a) APDR
0
20
0 20
Tour 1
Tour 2
(b) P1, P2
0
20
0 20
Tour 1
Tour 2
(c) H1, H2, H3
Figure 5: Routing for a realization of Instance c = 80, dod = 0.5, D2, and 2C
Figure 5 depicts the routes for the APDR and benchmarks. The first tour is depicted by the
circles, the second tour, that occurring after a depot return, by the triangles. The blank markers
indicate IOs, the filled markers assigned SOs. The routing of policy APDR is shown in Figure 5a.
Figure 5b shows the routes for P1 and P2, which are the same for this realization. result in the
same routing shown in Figure 5b. Figure 5c shows the routes for H1, H2, and H3, which are also
the same for this realization.
For all H-policies, the vehicle serves all IOs before returning to the depot and is only able
to serve a single SO in the second tour. Policies P2 and P1, preemptive approaches that do not
consider planned depot return times, exhibit returns to the depot immediately after serving the first
IO. As a result, almost no SO requests have been accumulated, resulting in only two SOs being
assigned the second tour. Due to the consideration of the depot arrival time, the APDR approach
avoids the early return and instead chooses to return as the vehicle is about to travel from one
cluster to the next. As a result, there has been more time for SOs to accumulated, and five SOs can
be integrated.
6.3 The Value of Preemptive Depot Returns
As seen in §6.1, APDR performs on average 9.7% better than H3. Yet, as the first example in the
previous section indicates, there exists some instance settings for which preemption does not add
27
DOD: 25 DOD: 50 DOD: 75
2C
3C
U
2C
3C
U
2C
3C
U
2C
3C
U
2C
3C
U
2C
3C
U
Custom
ers: 30C
ustomers: 40
Custom
ers: 50C
ustomers: 60
Custom
ers: 80C
ustomers: 100
−25
%
0% 25%
50%
75%
100%
−25
%
0% 25%
50%
75%
100%
−25
%
0% 25%
50%
75%
100%
value
Dis
trib
utio
n DepotD1
D2
D3
Figure 6: Improvement of APDR compared to H3
value. In this section, we analyze the instance settings with respect to the improvement enabled by
preemptive depot returns.
Figure 6 shows percentage difference between average solution value returned by APDR and
that by H3 across all instance settings. Each column of the figure represents a degree of dynamism
and each row a different number of customers. The y-axis of each row represents the depot loca-
tions, and the x-axis of each column the percentage difference of the average solution values.
Figure 6 shows a general pattern with respect to the number of customers and the degree of
dynamism. As the number of customers increase and the degree of dynamism decrease, the perfor-
mance of APDR compared to H3 improves. Of note, when either the number of customers is large
or the degree of dynamism decreases, the expected number of initial orders is relatively higher.
For the example, with 80 customers and a degree of dynamism of 0.5, 40 IOs can be expected per
realization. In cases such as this, it is more likely that a realized SO is close to an existing IO and
thus the marginal cost of serving this SO is relatively lower. Preemptive returns allow APDR to
take advantage of these lower marginal insertion costs. As the number of expected IOs decreases,
the marginal costs of serving SOs in the existing tour increases and the value of preemption and
thus APDR declines. For example, in the case of 30 and a degree of dynamism of 0.75, depicted in
28
0.00
0.25
0.50
0.75
1.00
20 40 60Expected Number of IOs
Per
cent
age
Impr
ovem
ent o
f AD
PR
Rel
ativ
e to
H3
Distribution
2C
3C
U
Depot D1
D2
D3
Figure 7: Improvement with respect to the expected number of IOs c0
the first row of the right column, only 7.5 IOs are expected per realizations. Accordingly, APDR
does not offer improvement.
We further examine the impact of the expected number of IOs in Figure 7. The x-axis is the
expected number of IOs. The y-axis represents the improvement of APDR relative to H3 for the
given expected number of IOs. A trendline runs from left to right. The general pattern described
previously is evident. We observe an increasing positive difference between APDR andH3 with an
increasing number of IOs. The trendline suggests that the shift from negative to positive happens
just before 20 expected IOs. For the proposed service area size and travel speed, 20 IOs creates
a density such that value is gained by integrating SOs into the existing tour. This result suggests
that, when partitioning an area into service zones for fleets, care should be taken to partition in a
way that allows each zone to have a sufficient number of initial requests.
While there exists a general pattern with regard to the number of customers and the degree of
dynamism, the patterns are less clear with regard to the depot locations and customer distributions.
To better understand this phenomenon, we focus on the instance settings with 80 customers and
a degree of dynamism of 0.5. These results are depicted in the second column and fifth row of
Figure 6 and for convenience in Figure 8. The second example in §6.2 is given in the third bar
29
DOD: 50
2C
3C
U
Custom
ers: 80
−25
%
0% 25%
50%
75%
100%
value
Dis
trib
utio
n DepotD1
D2
D3
Figure 8: Improvement of APDR compared to H3 for c = 80, dod = 0.5
from the bottom in Figure 8. In this setting, the first depot position (D1) generally results in a
positive difference between APDR and H3. Essentially, the first depot position is generally placed
among the customers such that the cost in terms of travel time of a depot return does not overwhelm
the relatively low cost of being able to insert new requests into the existing route.
The same cannot be said for the second and third depot positions. Consider the case of the
two cluster customer distribution (2C). In this case, with the depot in the third position (D3),
improvement of APDR over H3 is 72.7% as we showed in the second example in the previous
section. Yet, for the second depot position (D2), the difference is negative at−5.4%. This negative
difference can be explained by the typical routing for this instance setting. The depot is located in
the lower left corner of the service area, close to the first customer cluster. The second cluster is
far away. For the first and third depot locations, a preemptive depot return is either conducted after
serving customers in the first cluster or after serving customers in both clusters. For the second
depot position, a depot return after serving the first cluster is costly. In the second case, the routing
of APDR is similar to H3. In both cases, the potential of preemptive returns cannot be exploited.
For the three cluster customer distribution (3C), the relationship between the third depot position
30
experiences a negative difference and the second a positive difference. The difference results from
the relative cost of a return to the depot and a sufficient passage of time to accumulate SOs. In
essence, the depot location significantly impacts the potential of preemptive depot returns. The
results related to the 2C and 3C depot locations suggest that the decision of whether or not to
implement preemptive depot returns should be made for a vehicle based on the characteristics of
the service area being served by the vehicle.
7 Conclusion and Outlook
In this paper, we explore preemptive depot returns for the SDPD, a dynamic one-to-many pickup
and delivery problem induced by a same-day delivery application. We present an anticipatory
assignment and routing policy APDR. APDR is based on approximate dynamic programming and
enables explicit decisions about preemptive depot returns. In extensive computational studies, we
show that preemptive depot returns and our APDR approach in particularly increase the number of
deliveries per workday. Our analysis of our computational tests that ADPR is most beneficial when
density is high enough to reduce the relative marginal cost of serving a new request. Our results
also show that preemptive returns are most effective when the returns occur late enough in the
horizon that enough time passed so that a sufficient number of stochastic customer requests have
accumulated but not so much that there is no longer time to serve the new requests. If considering
a fleet of vehicles, these results provide guidelines for how the delivery area can be partitioned so
that the delivery vehicles can benefit from preemptive depot returns.
There are a number of directions for future research. First, the presented state-space aggrega-
tion does not explicitly account for spatial information. Notably, the routing behavior presented
in Figure 5 is not achieved for every realization. For some realizations, APDR results in the same
routing as P2 and P1. Such cases might benefit from the inclusion of spatial information in the
aggregation scheme. The authors are not aware of any fully offline approximate dynamic pro-
gramming approach, whether it state-space aggregation or value-function approximation, that has
successfully incorporated spatial information in the routing of vehicles.
A second area of future research might consider a fleet of vehicles that are not constrained by
delivery zones. As noted previously, our second and third depot positions can represent the position
31
of a depot for a fleet divided into delivery zones, but we do not explicitly consider integrated
decision making for a fleet. In the integrated fleet context, the approach presented in this paper,
particularly the state space aggregation would require alteration to consider the impact of multiple,
interacting vehicles.
A third area of future research would be variants of the problem that incorporate third party
and/or crowdsourced vehicles. In addition, APDR may be extended to communicate potential de-
livery times to the customers. These could be also used for pricing decisions for time windows.
Finally, the general area of same-day delivery additionally offers challenges on strategic and tacti-
cal decision levels. For instance, future research might consider suitable depot locations as well as
the flow of inventory between depots.
References
Addady, Michal. 2015. Macy’s is taking on Amazon with same-day delivery in 17 cities. Fortune Avail-
able from http://fortune.com/2015/08/04/macys--amazon--delivery/, accessed
on July 14, 2016.
Azi, Nabila, Michel Gendreau, Jean-Yves Potvin. 2012. A dynamic vehicle routing problem with multiple
delivery routes. Annals of Operations Research 199(1) 103–112.
Barto, Andrew G. 1998. Reinforcement learning: An introduction. MIT press.
Ben-Shabat, Hana, Parvaneh Nilforoushan, Christine Moriarty, Mike nad Yuen. 2015. The 2015 Global
Retail E-Commerce Index: Global retail e-commerce keeps on clicking. Tech. rep., ATKearny,
Available from https://www.atkearney.com/documents/10192/5691153/Global+
Retail+E-Commerce+Keeps+On+Clicking.pdf/abe38776-2669-47ba-9387-
5d1653e40409, accessed on July 14, 2016.
Berbeglia, Gerardo, Jean-Francois Cordeau, Gilbert Laporte. 2010. Dynamic pickup and delivery problems.
European Journal of Operational Research 202(1) 8 – 15.
Bertsimas, Dimitris J, P. Chervi, M. Peterson. 1996. Computational approaches to stochastic vehicle routing
problems. Transportation Science 29(4) 342–352.
Ehmke, Jan Fabian, Ann Melissa Campbell. 2014. Customer acceptance mechanisms for home deliveries in
metropolitan areas. European Journal of Operational Research 233(1) 193–207.
32
Ehmke, Jan Fabian, Ann Melissa Campbell, Timothy L Urban. 2015. Ensuring service levels in routing
problems with time windows and stochastic travel times. European Journal of Operational Research
240(2) 539–550.
Fedde, Corey. 2016. Amazon expands same-day delivery – for some. Christian Science
Monitor Available from http://www.csmonitor.com/Business/2016/0407/Amazon-
-expands--same--day--delivery--for--some, accessed on July 14, 2016.
Ghiani, G., E. Manni, B. W. Thomas. 2011. A Comparison of Anticipatory Algorithms for
the Dynamic and Stochastic Traveling Salesman Problem. Transportation Science 46(3) 374–
387. doi:10.1287/trsc.1110.0374. URL http://transci.journal.informs.org/cgi/doi/
10.1287/trsc.1110.0374.
Goodson, Justin C, Barrett W Thomas, Jeffrey W Ohlmann. 2016a. Restocking-based rollout policies for
the vehicle routing problem with stochastic demand and duration limits. Transportation Science 50(2)
591 – 607.
Goodson, Justin C, Barrett W Thomas, Jeffrey W Ohlmann. 2016b. A rollout algorithm framework
for heuristic solutions to finite-horizon stochastic dynamic programs,. Available from http://
www.slu.edu/˜goodson/papers/GoodsonRolloutFramework.pdf.
Kall, P, SW Wallace. 1994. Stochastic Programming. John Wiley & Sons.
Klapp, Mathias, Alan L Erera, Alejandro Toriello. 2015. The one-dimensional dynamic dispatch waves
problem. Tech. rep., Georgia Institute of Technology.
Kumar, Kavita. 2016. Best buy rolls out same-day delivery in 13 markets. Minneapolis Star Tribune Avail-
able from http://www.startribune.com/best--buy--rolls--out--same--day-
-delivery--in--a--dozen--markets/374807051/, accessed on July 14, 2016.
Larsen, Allan, OBGD Madsen, Marius Solomon. 2002. Partially dynamic vehicle routing-models and algo-
rithms. Journal of the Operational Research Society 637–646.
Meisel, Stephan. 2011. Anticipatory Optimization for Dynamic Decision Making, Operations Re-
search/Computer Science Interfaces Series, vol. 51. Springer Science+Business Media, New York.
Mes, Martijn, Matthieu van der Heijden, Peter Schuur. 2010. Look-ahead strategies for dynamic pickup and
delivery problems. OR spectrum 32(2) 395–421.
Mitrovic-Minic, Snezana, Gilbert Laporte. 2004. Waiting strategies for the dynamic pickup and delivery
problem with time windows. Transportation Research Part B: Methodological 38(7) 635–655.
33
Powell, W. 2011. Approximate Dynamic Programming: Solving the Curses of Dimensionality. 2nd ed. John
Wiley and Sons, Hoboken, NJ, USA.
Pureza, Vitoria, Gilbert Laporte. 2008. Waiting and buffering strategies for the dynamic pickup and delivery
problem with time windows. INFOR: Information Systems and Operational Research 46(3) 165–176.
Rosenkrantz, Daniel J, Richard Edwin Stearns, PM Lewis. 1974. Approximate algorithms for the traveling
salesperson problem. Switching and Automata Theory, 1974., IEEE Conference Record of 15th Annual
Symposium on. IEEE, 33–42.
Saez, Doris, Cristian E Cortes, Alfredo Nunez. 2008. Hybrid adaptive predictive control for the multi-vehicle
dynamic pick-up and delivery problem based on genetic algorithms and fuzzy clustering. Computers
& Operations Research 35(11) 3412–3438.
Secomandi, Nicola. 2003. Analysis of a rollout approach to sequencing problems with stochastic routing
applications. Journal of Heuristics 9(4) 321–352.
Ulmer, Marlin W, Justin C Goodson, Dirk C Mattfeld, Marco Hennig. 2015. Offline-online approximate
dynamic programming for dynamic vehicle routing with stochastic requests. Tech. rep., Technische
Universitat Braunschweig, Germany.
Ulmer, Marlin W., Dirk C. Mattfeld, Felix Koster. 2016a. Budgeting time for dynamic vehicle routing with
stochastic customer requests. To appear in Transportation Science .
Ulmer, Marlin W, Dirk C Mattfeld, Ninja Soeffker. 2016b. Dynamic multi-period vehicle routing: approxi-
mate value iteration based on dynamic lookup tables. Tech. rep., Technische Universitat Braunschweig,
Germany.
Ulmer, Marlin W, Barrett W Thomas. 2016. Estimating arrival times for service vehicle routing with stochas-
tic and dynamic requests. Working Paper .
Voccia, Stacy A, Ann Melissa Campbell, Barrett W Thomas. 2015. The same-day delivery problem for
online purchases. Tech. rep., University of Iowa.
Yahoo! Finance. 2016. Increased value-added services expected to boost the same-day delivery market in
the US, says Technavio. Yahoo! Finance Available from http://finance.yahoo.com/news/
increased--value--added--services--expected--204000803.html, accessed on
July 14, 2016.
Yang, W., K. Mathur, R. Ballou. 2000. Stochastic vehicle routing problem with restocking. Transportation
Science 34(1) 99–112.
34
Appendix
In the Appendix, we present instance generation details, the PDR-algorithm as well as the results
and parameters for every instance setting.
A.1 Instance Generation Details
In the following, we describe how the realizations for the computational evaluation are generated.
The number of customers and the order times for a realization are generated by a Poisson process
P. With c0 the expected number of IOs, the number of IOs is generated by P(c0). The spatial
and temporal probability distribution for order times and locations is divided into two independent
probability distributions. The times of SO occurrences are (discretely) uniformly distributed t ∼
UZ[1, tmax − 1]. Customer locations f(C) ∈ A are realizations f ∼ F of the spatial probability
distribution F : A → [0, 1]. A realization of the order time is again conducted by a Poisson
process P for every minute 0 < t < tmax. Given two points of time 0 < tj < th < tmax, this results
in an expected number of customers of cthtj = Eω∈Ω
∣∣Cωi ∈ Cω+ : tj < ti ≤ th
∣∣ ordering in times
tj < ti ≤ th as described in Equation A1.
cth
tj = dod · c · th − tj
T − 2(A1)
35
A.2 Preemptive Depot Returns: Algorithm
This section presents a detailed algorithm for the PDR routing heuristic that was described in §4.3.
Algorithm 2: Preemptive Depot ReturnsInput : Tour θk = (Pk, Cl, . . . , Cl,D, Cn, Cl, Cn, . . . , Cl,D), New orders Cr = C1
r , . . . , Chr
Output : New tour θ
1 θ ← ∅
2 // Remove Depot
3 for all θik, i = 1, . . . , |θk| − 1 do
4 if θik 6= D then θ ← θ ∪ θik
5 end
6 θ ← θ ∪ D
7 // Integrate Orders
8 while Cr 6= ∅ do
9 δ ←M
10 for all Ci, θj , i = 1, . . . , |Cr|, j = 1, . . . , |θ| − 1 do
11 if d(θj , Ci) + d(Ci, θj+1)− d(θj , θj+1) ≤ δ then
12 C∗ ← Ci, θ∗ ← θj
13 δ ← d(θj , Ci) + d(Ci, θj+1)− d(θj , θj+1)
14 end
15 end
16 θ ← Insert(θ, θ∗, C∗), Cr ← Cr\C∗
17 end
18 // Integrate Depot
19 δ ←M , j ← 0
20 while θj /∈ Cn, Cr,D do
21 if d(θj ,D) + d(D, θj+1)− d(θj , θj+1) ≤ δ then
22 θ∗ ← θj , δ = d(θj ,D) + d(D, θj+1)− d(θj , θj+1)
23 end
24 j ← j + 1
25 end
26 θ ← Insert(θ, θ∗,D)
27 return θ
Detailed Results
In this section, we present the detailed results for the computational experiments discussed in
§6. The first table presents the average number of assignments for APDR and each benchmark
over all realizations for each instance setting. The second table presents the average number of
depot returns, the average time required for a depot return, and the initial free time budget over all
36
Table A1: Average Number of Assignments
Distribution U 2C 3C
dod Depot c APDR P2 P1 H3 H2 H1 APDR P2 P1 H3 H2 H1 APDR P2 P1 H3 H2 H1
0.25 D1 30 2.7 2.7 2.7 2.4 2.2 2.3 4.6 4.6 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.6 4.5 4.5
0.5 D1 30 6.4 6.3 6.1 6 5.5 5.6 8.6 8.6 8.7 8.5 8.5 8.5 8.3 8.2 8.3 8.3 8.3 8.3
0.75 D1 30 10.6 10.5 10.1 10.3 10.2 9.7 12.4 12.3 12.5 12.3 12.1 12.1 12.2 12.2 12.2 12.4 12.2 12.2
0.25 D2 30 1.1 1 1.1 1 0.9 1 3.2 3.1 3.1 3.3 3.3 3.3 2.6 2.5 2.6 2.8 2.6 2.8
0.5 D2 30 3.8 3.5 3.6 3.8 3.3 3.5 6.5 6.3 6.3 6.9 6.8 6.7 5.6 5.3 5.4 6.4 6.3 6.1
0.75 D2 30 7.3 6.9 7 7.8 7.7 7.2 10 9.6 9.8 10.6 10.4 10.5 9 8.8 8.8 9.7 9.6 9.5
0.25 D3 30 1.1 1.1 1.1 1 0.9 1 2.2 2.2 2.2 2.2 2 2.1 3.2 3.1 3.2 3.5 3.4 3.4
0.5 D3 30 3.8 3.5 3.6 3.8 3.3 3.5 4.7 4.5 4.6 5.3 5.2 5.1 6.4 6 6.1 6.9 6.8 6.7
0.75 D3 30 7.3 7 7 7.8 7.7 7.2 8.1 7.9 8 8.8 8.8 8.6 10.1 9.6 9.6 10.6 10.2 10.3
0.25 D1 40 1.5 1.4 1.4 1.1 1 1 4.9 4.8 4.9 4.8 4.6 4.7 4.4 4.4 4.4 4.4 3.8 4.3
0.5 D1 40 5.9 5.6 5.4 5 4.4 4.3 9.6 9.3 9.6 9.8 9.5 9.6 9 8.8 8.9 9.3 8.8 9
0.75 D1 40 11.5 11.2 10.6 10.9 10.6 9.5 14.6 14.4 14.8 14.7 14 14.2 14.1 13.8 13.9 14.3 13.9 13.9
0.25 D2 40 0.4 0.4 0.4 0.4 0.4 0.4 3.2 3.1 3.1 3.3 3.2 3.2 2.2 2.1 2.2 2.2 1.9 2.1
0.5 D2 40 3 2.6 2.8 2.8 2.3 2.5 7.3 6.7 6.9 7.8 7.6 7 5.6 5 5.4 6.3 5.8 5.6
0.75 D2 40 7.7 7.1 7.2 8.1 7.7 6.6 11.8 10.9 11.3 12.6 12.4 12.3 10.2 9.7 9.9 11.5 11.4 10.7
0.25 D3 40 0.4 0.4 0.4 0.3 0.3 0.3 1.7 1.6 1.7 1.1 1 1.1 2.8 2.5 2.7 3.1 2.5 2.9
0.5 D3 40 2.9 2.6 2.8 2.8 2.3 2.5 5.2 4.9 4.9 4.9 4.6 4.5 6.8 6 6.3 7.6 7.3 6.7
0.75 D3 40 7.8 7.3 7.3 8.3 7.9 6.7 9.2 8.8 8.8 10.2 10 9.3 11.7 10.7 10.9 12.5 12.1 11.6
0.25 D1 50 0.4 0.4 0.4 0.3 0.3 0.3 4.6 4.4 4.5 4.2 3.4 4 3.7 3.6 3.6 3.2 2.6 3
0.5 D1 50 4.6 4.2 3.9 3.5 3 2.8 10 9.6 9.9 10.1 9.3 9.3 9.3 8.9 8.9 9.3 7.9 8.3
0.75 D1 50 12.2 11.7 10.8 11.4 10.5 8.9 16.4 15.9 16.3 16.3 15.8 15.8 15.3 14.8 15 15.7 15.4 14.8
0.25 D2 50 0.1 0.1 0.1 0.1 0.1 0.1 2.7 2.4 2.6 2.9 2.2 2.8 1.3 1.1 1.2 1.2 1 1.1
0.5 D2 50 1.8 1.7 1.7 1.7 1.4 1.5 7.4 6.4 6.7 7.8 7.6 6.5 5.4 4.6 5.1 5.5 4.7 4.5
0.75 D2 50 7.8 7.2 7.3 8.1 7.5 6 13.2 11.7 12.4 13.7 13.4 12.7 11.2 10.4 10.7 12.6 12.5 10.7
0.25 D3 50 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.7 0.8 0.3 0.2 0.3 2 1.6 1.8 2.1 1.5 2
0.5 D3 50 1.8 1.6 1.7 1.7 1.4 1.5 4.9 4.4 4.5 3.4 3.1 3.1 6.7 5.4 5.8 7.4 6.2 5.8
0.75 D3 50 8 7.4 7.3 8.1 7.4 5.9 10 9.6 9.4 10.6 10.3 8.6 12.7 10.9 11.5 13.5 13.3 11.5
0.25 D1 60 0.1 0.1 0.1 0 0 0 3.9 3.8 3.8 3 2.2 2.8 2.5 2.3 2.3 1.6 1.3 1.6
0.5 D1 60 2.6 2.3 2.3 1.8 1.6 1.5 10 9.6 9.6 9.8 8.4 8.3 9.1 8.5 8.5 8.4 6.3 6.9
0.75 D1 60 12.2 11.4 10.1 10.7 9.3 7.6 17.7 17 17.3 17.6 17.1 16.9 16.4 15.7 15.7 16.7 16.2 15
0.25 D2 60 0 0 0 0 0 0 1.7 1.4 1.6 1.8 1.3 1.8 0.6 0.5 0.6 0.5 0.4 0.5
0.5 D2 60 1 0.9 0.9 0.9 0.8 0.8 7.3 5.9 6.4 7.9 6.6 6.4 4.6 3.8 4.2 4.3 3.2 3.5
0.75 D2 60 7.6 6.9 6.8 7.5 6.6 5.1 14.3 12.3 13.1 14.7 14.4 11.8 11.4 10.4 11 12.7 12.2 9.5
0.25 D3 60 0 0 0 0 0 0 0.2 0.1 0.2 0 0 0 0.9 0.7 0.8 0.9 0.6 0.9
0.5 D3 60 0.8 0.7 0.8 0.7 0.6 0.7 3.9 3.3 3.5 1.9 1.6 1.7 6 4.4 4.9 6.7 4.3 5.2
0.75 D3 60 7.6 6.9 6.9 7.5 6.3 5.2 10.7 10.2 10.2 10.3 10 7.5 13.5 11.4 11.9 14.4 14.2 10.9
0.25 D1 80 0 0 0 0 0 0 1.4 1.3 1.3 0.5 0.4 0.5 0.3 0.3 0.3 0.1 0.1 0.1
0.5 D1 80 0.4 0.4 0.4 0.3 0.3 0.2 9.2 8.5 7.9 7.2 4.8 5.2 6.4 5.4 5.4 4.4 3 3.4
0.75 D1 80 11.2 10.1 8.5 8.7 7.2 5.6 19.3 18 17.9 18.7 17.6 15.7 17.7 16.5 16.1 17.2 15.8 12.7
0.25 D2 80 0 0 0 0 0 0 0.2 0.2 0.2 0.2 0.1 0.2 0 0 0 0 0 0
0.5 D2 80 0.1 0.1 0.1 0.1 0.1 0.1 4.8 3.6 4.1 5.1 3.1 4.4 2.1 1.9 2 1.8 1.2 1.6
0.75 D2 80 5.7 5.1 5.1 5.3 4.1 3.8 14.9 12.5 13.5 15.8 15.5 9.6 11.3 10.3 10.8 11.4 10.8 7.1
0.25 D3 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.5 D3 80 0.1 0.1 0.1 0.1 0.1 0.1 1 0.9 0.9 0.3 0.2 0.3 2.9 2.3 2.5 2.9 1.7 2.4
0.75 D3 80 5.5 4.9 4.9 5.1 3.9 3.6 11.3 10.5 10.1 8.5 7.8 5.8 13.8 11.2 11.9 15.3 14.5 8.8
0.25 D1 100 0 0 0 0 0 0 0.1 0.1 0.1 0 0 0 0 0 0 0 0 0
0.5 D1 100 0 0 0 0 0 0 6.2 5.5 4.9 3.1 2 2.4 2.8 2.3 2.4 1.4 1.1 1.2
0.75 D1 100 7.9 7 5.7 5.3 4.6 3.5 20.2 18.3 17.2 18.4 17.4 12 17.8 15.6 14.9 15.7 14.1 9.8
0.25 D2 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.5 D2 100 0 0 0 0 0 0 1.8 1.5 1.7 1.6 1.1 1.4 0.4 0.4 0.4 0.4 0.3 0.3
0.75 D2 100 3.5 3 3 3.1 2.4 2.3 14.2 11.9 12.9 15.7 14.3 8.3 10.4 9.4 9.9 9.7 8.7 5.6
0.25 D3 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.5 D3 100 0 0 0 0 0 0 0.1 0.1 0.1 0 0 0 0.6 0.5 0.6 0.5 0.4 0.5
0.75 D3 100 3.6 3.1 3.2 3.2 2.4 2.4 10.4 9.2 8.5 5.7 5.3 4.2 12.3 9.9 10.7 13.4 11.3 7
38
Table A2: Parameters for APDR
Distribution U 2C 3C
dod Depot c Depot Visits Detour δ Budget b(0) Depot Visits Detour δ Budget b(0) Depot Visits Detour δ Budget b(0)
0.25 D1 30 2.4 18.7 95.3 3.7 23.4 207.2 3.6 21.8 189.2
0.5 D1 30 3.4 21.7 173.2 4.5 25.3 255.7 4.4 24 240.9
0.75 D1 30 4.2 23.1 268.3 5 26.6 311.9 5 24.7 308.8
0.25 D2 30 1.5 31 76.4 2.6 52.9 181.4 2.2 63.8 155.6
0.5 D2 30 2.2 56.3 148.2 3.2 50.7 228.3 2.7 66.4 210.3
0.75 D2 30 2.6 61.7 235.7 3.4 50.2 277.1 3 66.8 271.9
0.25 D3 30 1.5 31 77.3 1.9 78.2 150.4 2.5 51.5 162
0.5 D3 30 2.2 56.8 148 2.2 88.8 201 3.1 50.5 215.8
0.75 D3 30 2.6 61.6 236.6 2.6 88.1 257.7 3.4 51.9 281.8
0.25 D1 40 1.3 9.1 38.5 3.4 22.7 160.7 3.1 21.2 136.6
0.5 D1 40 2.8 18.8 121 4.2 23.9 223.6 4.1 22.3 205.5
0.75 D1 40 3.8 22 232.8 4.8 25.8 292.1 4.7 24.1 284.9
0.25 D2 40 0.8 10.4 28.1 2.5 43.8 136.9 2 48.7 107.7
0.5 D2 40 1.8 42.6 99.6 3.1 45.7 195.8 2.5 60.7 171
0.75 D2 40 2.4 56.5 202.3 3.4 46.8 257.8 2.9 62.8 248.8
0.25 D3 40 0.8 9.8 27.1 1.6 55.2 103.4 2.2 42.4 110.8
0.5 D3 40 1.8 43.1 100.9 2 85.5 165.8 2.8 46.9 177.7
0.75 D3 40 2.4 56 205.6 2.4 89.5 235.6 3.3 47.9 255.2
0.25 D1 50 0.4 2.5 9.3 2.9 18.9 118.5 2.5 17.4 90.5
0.5 D1 50 2.1 14.6 77.1 3.9 21.5 191.8 3.7 20.3 168.7
0.75 D1 50 3.5 20.5 203.5 4.5 24.1 273.1 4.5 22.8 260.2
0.25 D2 50 0.2 1.5 5.5 2.1 37.6 94.1 1.5 28.9 64.2
0.5 D2 50 1.4 24.8 56.9 2.8 43.9 167.5 2.2 54.6 138.6
0.75 D2 50 2.3 53.3 176.9 3.4 43.9 243.1 2.8 57.9 228.4
0.25 D3 50 0.2 1.6 5.4 1.2 24.3 61.6 1.6 29.1 66.9
0.5 D3 50 1.4 24.7 57.4 1.8 76.1 133.5 2.4 45.4 145.5
0.75 D3 50 2.3 52.2 177.5 2.1 91.9 217.3 3.2 44.8 236.7
0.25 D1 60 0.1 0.3 1.3 2.4 13.5 78.4 1.7 11.1 47.6
0.5 D1 60 1.2 8.8 38.1 3.4 19.1 161.5 3.2 18.8 136.4
0.75 D1 60 3.2 19 172.4 4.3 23 255.3 4.3 22.6 241.3
0.25 D2 60 0 0.2 0.6 1.4 26.2 57.6 0.9 11.5 29.4
0.5 D2 60 0.9 11.4 28.8 2.4 42.4 136.9 2 45.3 107.9
0.75 D2 60 2.1 49.7 147.8 3.2 42.3 228.1 2.6 56.9 208.4
0.25 D3 60 0 0.1 0.5 0.7 5.2 25.3 1 14.2 32.9
0.5 D3 60 0.9 11.7 28.4 1.5 56.5 101.8 2 42 112.5
0.75 D3 60 2.1 49 149.3 2 91 200.7 2.9 43.8 216.4
0.25 D1 80 0 0 0 0.8 3.2 17.8 0.3 1.4 5.1
0.5 D1 80 0.3 1.6 6.3 2.7 13.6 104.5 2.1 13.7 73
0.75 D1 80 2.5 16.2 121.3 3.8 21.5 222.6 3.7 20.3 203.9
0.25 D2 80 0 0 0 0.4 3.4 9 0.1 0.4 1.8
0.5 D2 80 0.2 1.1 3.1 1.7 34.3 81.6 1.3 24.6 51
0.75 D2 80 1.8 38.8 100.5 2.8 40.7 198 2.3 51.7 173.2
0.25 D3 80 0 0 0 0.1 0 1.3 0.1 0.7 2.3
0.5 D3 80 0.1 0.8 2.5 1 15 46.1 1.3 24.7 53.3
0.75 D3 80 1.8 38.3 99.5 1.8 81.5 164.7 2.4 42.9 177.2
0.25 D1 100 0 0 0 0.1 0.1 0.9 0 0.1 0.2
0.5 D1 100 0 0.1 0.3 1.7 7 54 1 6.6 28.7
0.75 D1 100 1.7 12.9 76.4 3.4 19.4 190.7 3.2 18.4 170.1
0.25 D2 100 0 0 0 0 0.1 0.4 0 0 0
0.5 D2 100 0 0 0 1.1 15.8 35.5 0.5 5.8 14.5
0.75 D2 100 1.4 24.4 57.6 2.4 39.6 168.2 2.1 46.2 139.6
0.25 D3 100 0 0 0 0 0 0 0 0 0
0.5 D3 100 0 0.1 0.1 0.4 1.4 12.3 0.5 6.2 14.7
0.75 D3 100 1.4 24.6 60.4 1.6 70.6 132.9 2 40.9 144.1
39
Top Related