Discrete MDP problem with Admission and Inventory Control ...
Transcript of Discrete MDP problem with Admission and Inventory Control ...
Discrete MDP problem with Admission and Inventory Control in Service
Facility Systems
C. Selvakumar, P. Maheswari, and C. Elango
Research Department of Mathematics, Cardamom Planters’ Association College,
Bodinayakanur- 625 513.
E - mail: [email protected]
Abstract
In this article, we study a discrete-time MDP model of service facility system
maintaining inventory. Decisions are taken at discrete time epochs to control both admission
to service facility and inventory replenishment management. Here the queue before the server
is divided into eligible queue and potential queue. Control system is used to transfer
customers from potential queue to eligible queue. The MDP based on average cost criteria be
used to find the optimal policy to be implemented for the system. Numerical example is
provided to illustrate the problem vividly.
Keywords:
Markov Decision Processes, Inventory Control, Admission Control, Service Facility
System, Average Cost Criteria. 1. Introduction
Markov decision model is a versatile and powerful tool for analyzing probabilistic sequential
decision processes with an infinite planning horizon. This model is a fusion of two concepts
Markov Process and Dynamic programming. The Dynamic programming(DP) concept being
developed by Bellman in the early 1950s. Basic principles of DP are states, the principle of
optimality and functional equations.
At much as the same time Bellman(1957) developed the theory of dynamic
programming, Howard(1960) used basic principles of Markov Chain Theory and Dynamic
programming to develop a policy-iteration algorithm for Markov decision processes
problems: means sequential decision process with an infinite planning horizon. A theoretical
foundations to Howard Policy-iterations method has been developed by Blackwell(1962),
Denardo & Fox(1968) and Veinolt(1966). Markov decision model with Linear Programming
method has been first given by De Ghellinck(1960) and Manne(1960), Derman(1970) and
Hordijk and Kallenberg(1979, 1984). Another method for ordering MDP problem is Value-
iteration algorithm which was developed by Odani(1969) and Hastings(1971). He developed
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
478
lower and upper bounds for the minimal average cost.
The Markov decision models finds applications in a wide variety of fields. Some
important application in the fields of Machine maintenance has been done in the last
eighties(Golabi et al. (1982), Kawai(1983), Stengos and Thomas(1980) and Tijims and Van der
Duyn Schouten(1985)). A survey of real application of MDP models can be found in
White(1985).
In this article, we considered a discrete time MDP in a service facility system in which
inventory is maintained to complete the service. The arrival of customers to the system is
controlled by taking decisions at discrete decision epochs. The demand for service exist
throughout the period, and they are waiting in a queue when there is a customer in the counter
for service.
The revenue/cost and demand distribution are constant throughout the time period. The
maximum inventory of the system is assumed to be M. In the last section a numerical examples
are provided to illustrate the model.
2. Model description
(i)The system is observed every 0t unit of time and the decision epochs are 0, ,2 ,...t t
(ii)Admissions to the service facility is controlled, by spliting the queue into Eligible queue
and Potential queue. At each decision epoch the controller observes the number of items in
stock and number of customers in the system(Eligible queue + Server).
(iii) Number of customers to be admitted at time epoch t = Number of items in stock - Number
of customer in the eligible queue at time t . Other customers are rejected.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
479
(iv) Arriving customers to service facility system follows a probability distribution g and
the arriving customers are placed in potential customers queue.
(v) Only the eligible queue(main queue) customers get service.
(vi) No partial service completion allowed during any period.
(vii) All serviced customers take unit item from inventory and depart the system at end of the
period.
(viii) The 1,M M policy is adopted for replenishing inventory (one for- one policy).
Replenishment is instantaneous.
(ix) Decision to order additional stock is made at the beginning of each period and delivery
occurs instantaneously.
Let tX denote the number of customer in the system immediately prior to the decision
epoch t and tZ is the number of customers arrive in the period t . Customer arriving in the
period 1t enter the potential customer queue. At the decision epoch t the controller admits
t t tI X u of customers from the potential customer queue into the system. Let tY denote
the number of “possible service completions” during period t . Let tI denote the number of
items in stock at time epoch t .
Time Potential Queue System
t
t
1t
1tZ
0
tZ
tX
t tX u
1
t t t t t t
t
t t t tt
X Z Y Z Iif
I
XX
fY Z I Xi
Hence t denotes a point in time immediately after the control has been implemented but
prior to any service completions.
The system state is denoted by the pair ( , I ).t tX
The two component of the system state is given by
1
t t t t t t
t
t t t tt
X Z Y Z Iif
I
XX
fY Z I Xi
1 .t t tI I Y
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
480
We can admit only tt tu I X customers, so that 0 .t tu I
The random variable tY assume non-negative integer values and follows a time invariant
probability distribution ( ) Pr , 0,1,2,...tf n Y n t and tZ assumes a non-negative values
which follows a time invariant probability distribution g( ) Pr , 0,1,2,...tn Z n t
Reward/ cost structure:
The stationary cost structure consist of three components: a constant cost of R units
for every completed service and an expected holding cost h x per period when there are x
items in inventory and a waiting cost k y per period when there are y customers in the
system.
3. MDP formulation
We consider the problem on MDP having five components (tuples) , p |, ,, s t tT S A r
Decision Epochs:
0, ,2 t,...T t
States:
S1 - the number of customers in the system
S2 - the number of items in stock.
1 20,1,2,..., N 0,1,2,..., ,S M S S where .N M
Actions:
1 2,
, , , 0,1,2ss sA l m A l m
0,02,2A
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
481
20, 1 s 1
2,0 , 2,1M
A
0,M2,0A
1 s 1 1 s 11 2,
0,1 , 1,0 , 0,0 , 1,1N M
A
1 s 1 ,1
0,0 , 0,1N M
A
,M0,0 .
NA
Cost:
1 1 1 2( , ) min( , ) ( ) ( ), , ( , ).t t s
s S
c s a R E Y s a h s a k y a A A s s s
Transition Probability:
1
1 1 2 1 1
2 1 1
2 1 1
1 1
( ') ( ') ' 0
( ) ( ') ' 0, 0' | ,
( ) ' 0
0 ' 0.
i s at
f s a s g s if a s s
f i g s if s a sP s s a
g s if s a s
if s a s
where 1 2 1 2( , ), ' ( ', ').s s s s s s
The expected number of service completion in period t is
1
1
1
1 1
1
min( , ) ( ) ( ) ( ).s a
t
i i s a
E Y s a i f i s a f i
4. Analysis
The one step costs are given by, 1 2,, ,t s a s sc s .
Let , It tX denote the state of the system of decision epoch t (beginning of tht period).
Assume the stationary policy R and hence the transition probability
1 1 1 2 1 2' | s, , I , Ir = ' | , , ' ' , , .',t t t t ts a P X s X s a s s s sp s s
regardless the past history of the system up to time epoch t .
Then ; : 0t tX I t is a Markov chain with discrete state space 1 2.S S S The t - step
transition probabilities of the Markov chain under policy R is given by
0 0 1 2 1 2Pr = ' | ' | s , I , I ', ', ' , , .t t ts X s X s s s sp R s s s
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
482
Define 1 2, ,,tV s R s s s denote the total expected cost over the first t decision epochs
with initiate state 1 2,s s and policy R is adopted.
Then
1
' ' 1 2 1 2
0 '
, ' , 's,R ' ' , ,,t
k
t s s
k s S
p s s R c R s s s s s sV
where ,
sC R service cost per period + holding cost of inventory/period + waiting cost of
customer/period.
1 2 +C K h I C L
where K means number of customers served per period, I the average inventory in
stock during the tht period and L denotes the number of customers in the eligible queue + 1 in
service counter.
5. Cost Analysis
The average cost function sg R is given by 1 2
1lim s,R , ,st
tg R V s s St
. The
elements of the above average cost function is due to the Theorem (Puterman(1994) &
Tijims(2003)).
Theorem 5.1
For all 1 2 1 2' ' , ,', ,s s s s Ss s
1
1lim ' | ,
tk
tt
k
p s s Rt
always exists and for any
1 2',' ' Ss s s .
( )
1
1'1
'lim ' |
0 ' .
tk
st
k
if state s is recurrentp s s
tif state s is transient
Where 's denote the mean recurrent time from state 1 2', 's s to itself.
Also ( ) ( ') (k)
( ) 2
1
1 2
1
1
1 1lim ' | lim ' , , ',, ' ' .
t tk s
s tt t
k k
s s sp s s f p s st
st
s
Since the Markov Chain , I : 0,1,2,...t tX t is a unichain which is irreducible,
all its states are ergodic and have a unique equilibrium distribution.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
483
Thus, 1 2' 1 2
( )
1
1lim ' | , , , ' , ,''
tk
st
k
s s s s sR p s s Rt
s
exist and is independent of
initial state , such that P and 1.
ss S
6. Optimal Policy
A stationary policy *R is said to be an average cost optimal policy if
1 2 1 2,s ,s*s sg R g R for each stationary policy R uniformly in the initial state 1 2,s s .
The relative value associated with a given policy R provides a tool for constructing a
new policy *R whose average cost is more than that of the current policy R .
The objective is to improve the given policy R whose average cost is g R and
relative value 1 2
2, 1, , .s s
s sv R S
By constructing a new policy R such that for each 1 2,s s S ,
( ) , ' '
'
* * .................(1) s s s s
s
ss sS
c p v vR g R R
Where 1 2,s s s and 1 2'' ',s s s .
We obtain an improved rule *R with *g R g R . We have to find the optimal policy
*sR satisfying (1) is to minimize the cost functions '
' ' | s,ai t
s
s
S
c g R p s v Ra
over
all actions ( )a A s .
7 Algorithm
Step 0: (Initialization)
Choose a stationary policy R for the periodic review based admission control in
service facility system maintaining inventory.
Step 1: (Value determination step)
For the current policy R , compute the unique solution , v ( )sg R R to the following
linear equations
1 2
'
' | s ', , ,t s s
s S
s s s p s R v s s sv c R g S
1 20, ,ssv where s s is arbitarily chosen state in S
Step 2:(Policy Improvement)
For each state 1 2,s s s S determine the actions yielding
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
484
'
' | s, 'r aa g mins
s t s
s Sa A
c a g p s v R
The new stationary policy *R is obtained by choosing *s sR a .
Step 3:(Convergence test)
If the new policy *R R , the old one. Then the process of searching stops with policy
R . Otherwise go to Step 1 with R replaced by new *R .
8. Numerical Example:
Consider a MDP formulation of a service facility system with inventory maintenance which
controls the customers admission to the system and controls the ordering level of inventory.
Decisions at equidistant time epochs are taken to admit the eligible number of customers by
observing the inventory level of the system.
The inventory maintained in the system in reviewed at decision epochs and refilled upto
maximum level M. Decisions are made at each time epoch for inventory.
For the system we are
5N and 5M . Let the state space be 1 0,1,2,3,4,5S and 2 0,1,2,3,4,5S
1 2 { 0,0 , 0,1 , 0,2 , 0,3 , 0,4 , 0,4 , 0,5 , 1,1 , 1,2 , 1,3 , 1,4 , 1,5 , 2,2 ,S S
2,3 , 2,4 , 2,5 , 3,3 , 3,4 , 3,5 , 4,4 , 4,5 , 5,5 }, where .N M
Admission Control:
Assume that the costs for holding at level 1s S are respectively: c4=3, c3=5, c2=7,
c1=9, c0=cf=10.
1 1\ 's s 5 4 3 2 1 0
5 0.2 0.55 0.25 0 0 0
4 0 0.15 0.65 0.20 0 0
3 0 0 0.25 0.6 0.15 0
2 0 0 0 0.15 0.75 0.1
1 0 0 0 0 0.15 0.85
0 0 0 0 0 0.9 0.1
Inventory Control:
Let us assume that costs for ordering inventory at level 2s S are respectively
cp4=5, cp3=4, cp2=3.2, cp1=2, cp0=cf=1.5
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
485
Then Ch(Holding cost)=0.1(per inventory) and Inventory cost= 0.3 (per item).
2 2\ 's s 5 4 3 2 1 0
5 0.3 0.4 0.2 0.1 0 0
4 0 0.2 0.4 0.3 0.1 0
3 0 0 0.25 0.55 0.15 0.05
2 0 0 0 0.25 0.65 0.1
1 0 0 0 0 0.4 0.6
0 0 0 0 0 0 1.0
Computational Procedure:
For any given policy ,R the policy improvement quantity is given by
'
'
a, ' | , a, .s s t s s s s
s S
T R c a g R p s s a v a where T R v R for a R
Iteration 1:
For the given policy 1
0,0,0,0,0,2 (Admission )R control the linear equations connecting
the average cost 1
g R and the relative values are given by
v5 = -g + 0.2v5 + 0.55v4 + 0.25v3
v4 = -g + 0.15v4 + 0.65v3 + 0.2v2
v3 = -g + 0.25v3 + 0.6v2 + 0.15v1
v2 = -g + 0.15v2 + 0.75v1 + 0.1v0
v1 = -g + 0.15v1 + 0.85v0
v0 = 10- g + 0.9v1 + 0.1v0
By assuming v5=0 and solving we get,
1
5v R =0, 1
4v R = 4.687757456, 1
3v R = 9.115505026, 1
2v R = 14.58329214,
1
1v R = 19.62530895, 1
0v R =25.33959466, 1g R = 4.857142857.
For the given policy 1
0,0,0,0,0,2 (Inventory )R control the linear equations connecting
the average cost 1
m R and the relative values are given by
w5 = -m + 0.3w5 + 0.4w4 + 0.2w3 + 0.1w2
w4 = -m + 0.2w4 + 0.4w3 + 0.3w2 + 0.1w1
w3 = -m + 0.25w3 + 0.55w2 + 0.15w1 + 0.05w0
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
486
w2 = -m + 0.25w2 + 0.65w1 + 0.1w0
w1 = -m + 0.4w1 + 0.6w0
w0 = 1.5 – m + 1.0w0
By assuming w5=0 and solving we get,
1
5w R =0, 1
4w R = 1.527777778, 1
3w R = 2.500000000, 1
2w R = 3.888888889,
1
1w R = 5.555555556, 1
0w R =8.055555556, 1m R = 1.500000000.
The test quantity ,sT a R for admission and inventory control has following values:
1 2
1 2,, ,
s sT a a R
1 2,s s denote the number of customers admitted to the system(eligible queue) and number of
items in stock respectively. 1 2,a a denote the decision(action) for admission and inventory
control respectively .
(1)
1 20,1, , :T a a R
2,0 = 15.65555555, 2,1 = 13.3
(1)
1 20,2, , :T a a R
2,0 = 14.08888889, 2,1 = 14.3
(1)
1 20,3, , :T a a R
2,0 = 12.80000000, 2,1 = 14.9
(1)
1 20,4, , :T a a R
2,0 = 11.92777779, 2,1 = 15.7
(1)
1 21,1, , :T a a R
0,0 =25.28086450, 0,1 =22.92530895, 1,0 =14.65555556, 1,1 =12.3
(1)
1 21,2, , :T a a R
0,0 =23.71419784, 0,1 =23.92530895, 1,0 =13.08888889, 1,1 =13.3
(1)
1 21,3, , :T a a R
0,0 =22.42530895, 0,1 =24.52530895, 1,0 =11.80000000, 1,1 =13.9
(1)
1 21,4, , :T a a R
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
487
0,0 =21.55308674, 0,1 =25.32530895, 1,0 =10.92777778, 1,1 =14.7
(1)
1 22,2, , :T a a R
0,0 =18.67218103, 0,1 =18.88329214, 1,0 =11.08888889, 1,1 =11.3
(1)
1 22,3, , :T a a R
0,0 =17.38329214, 0,1 =19.48329214, 1,0 =9.800000000, 1,1 =11.9
(1)
1 22,4, , :T a a R
0,0 =16.51106993, 0,1 =20.28329214, 1,0 =8.927777779, 1,1 =12.7
(1)
1 23,3, , :T a a R
0,0 =11.91550503, 0,1 =14.01550503, 1,0 =7.800000000, 1,1 =9.9
(1)
1 23,4, , :T a a R
0,0 =11.04328281, 0,1 =14.81550503, 1,0 =6.927777779, 1,1 =10.7
(1)
1 24,4, , :T a a R
0,0 =6.615535235, 0,1 =10.38775746, 1,0 =4.927777779, 1,1 =8.7
(1)
1 21,5, , :T a a R
0,0 =19.62530895, 1,0 =9,
(1)
1 22,5, , :T a a R
0,0 =14.58329214, 1,0 =7,
(1)
1 23,5, , :T a a R
0,0 =9.115505027, 1,0 =5,
(1)
1 24,5, , :T a a R
0,0 =4.687757456, 1,0 =3,
(1)
0,52,0 , 10T R
(1)
0,02,2 , 10 1.5T R
(1)
5,50,0 , 0T R
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
488
The new policy will be 2
0,1,1,1,1,2 (Admission )R control and
20,0,0,0,1,2 (Inventory )R control . Since the new policy
2R is different from the initial
policy 1
R .
Iteration 2:
For the policy 2(Admission )R control the linear equations connecting the average cost
2
g R and the relative values are given by
v5 = -g + 0.2v5 + 0.55v4 + 0.25v3
v4 = 3 - g + 0.15v4 + 0.65v3 + 0.2v2
v3 = 5 - g + 0.25v3 + 0.6v2 + 0.15v1
v2 = 7 - g + 0.15v2 + 0.75v1 + 0.1v0
v1 = 9 - g + 0.15v1 + 0.85v0
v0 = 10 - g + 0.9v1 + 0.1v0
By assuming v5=0 and solving we get,
2
5v R =0, 2
4v R = 9.870448179, 2
3v R = 16.22787115, 2
2v R = 21.63739496,
2
1v R = 24.49453782, 2
0v R =25.06596639, 2g R = 9.485714286.
For the policy 2(Inventory )R control the linear equations connecting the average cost
2
m R and the relative values are given by
w5 = -m + 0.3w5 + 0.4w4 + 0.2w3 + 0.1w2
w4 = -m + 0.2w4 + 0.4w3 + 0.3w2 + 0.1w1
w3 = -m +0.25w3 + 0.55w2 + 0.15w1 + 0.05w0
w2 = -m + 0.25w2 + 0.65w1 + 0.1w0
w1 = 2 - m + 0.4w1 + 0.6w0
w0 = 1.5 - m + 1.0w0
By assuming w5=0 and solving we get,
2
5w R =0, 2
4w R = 1.558994709, 2
3w R = 2.423809524, 2
2w R = 3.916402116,
2
1w R = 6.027513228, 2
0w R =5.194179894, 2
m R = 1.500000000.
The test quantity ,sT a R for admission and inventory control has following values:
(2)
1 20,1, , :T a a R
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
489
2,0 = 14.12751323, 2,1 = 13.3
(2)
1 20,2, , :T a a R
2,0 = 14.11640212, 2,1 = 14.3
(2)
1 20,3, , :T a a R
2,0 = 12.72380952, 2,1 = 14.9
(2)
1 20,4, , :T a a R
2,0 = 11.95899471, 2,1 = 15.7
(2)
1 21,1, , :T a a R
0,0 =19.62205105, 0,1 =18.79453782, 1,0 =13.12751323, 1,1 =12.3
(2)
1 21,2, , :T a a R
0,0 =19.61093994, 0,1 =19.79453782, 1,0 =13.11640212, 1,1 =13.3
(2)
1 21,3, , :T a a R
0,0 =18.21834733, 0,1 =20.39453782, 1,0 =11.72380952, 1,1 =13.9
(2)
1 21,4, , :T a a R
0,0 =17.45353253, 0,1 =21.19453782, 1,0 =10.95899471, 1,1 =14.7
(2)
1 22,2, , :T a a R
0,0 =18.75379709, 0,1 =18.93739497, 1,0 =11.11640212, 1,1 =11.3
(2)
1 22,3, , :T a a R
0,0 =17.36120448, 0,1 =19.53739497, 1,0 =9.723809524, 1,1 =11.9
(2)
1 22,4, , :T a a R
0,0 =16.59638968, 0,1 =20.33739497, 1,0 =8.958994710, 1,1 =12.7
(1)
1 23,3, , :T a a R
0,0 =13.95168067, 0,1 =16.12787116, 1,0 =7.723809524, 1,1 =9.9
(1)
1 23,4, , :T a a R
0,0 =13.18686587, 0,1 =16.92787116, 1,0 =6.958994710, 1,1 =10.7
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
490
(2)
1 24,4, , :T a a R
0,0 =8.829442893, 0,1 =12.57044818, 1,0 =4.958994710, 1,1 =8.7
(2)
1 21,5, , :T a a R
0,0 =15.49453782, 1,0 =9,
(2)
1 22,5, , :T a a R
0,0 =14.63739497, 1,0 =7,
(2)
1 23,5, , :T a a R
0,0 =11.22787116, 1,0 =5,
(2)
1 24,5, , :T a a R
0,0 =6.870448183, 1,0 =3,
(2)
0,52,0 , 10T R
(2)
0,02,2 , 10 1.5T R
(2)
5,50,0 , 0T R
The new policy 3
0,1,1,1,1,2R (Admission )control and
30,0,0,0,1,2R (Inventory )control which is the identical with the policy 2
.R After two
iterations we obtained the optimal policies * 0,1,1,1,1,2 (Admission )R control and
* 0,0,0,0,1,2 (Inventory ).R control It I beneficial to allow customers to the system at
states:1,2,3,4 only and replenishment order is placed when the system state is in state
1(inventory level). At state (0,0) compulsory admission and replenishment is suggested.
9. Conclusion and Future Research:
In this article we presented an application of Markov Decision Process(MDP) in
admission and replenishment control using classical approach namely „policy iteration’. This
result can be extended to admission and service control in service facility systems. We are
currently studying Markov Decision Process in discrete time with admission and service
control. In future we would like to extend the model to control both service and replenish order
simultaneously.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
491
Acknowledgement P. Maheswari's research is supported by the University Grants
Commission, Govt. of India, under NFOBC Scheme (F./2015-16/NFO-2015-17-OBC-TAM-
46773/(SA-III/Website)).
References
[1] Bellman, R., (1957), Dynamic Programming, Princeton University Press, Princeton, NJ.
[2] Berman, O. and Sapna, K.P., (2001), Optimal Control of Service for facilities holding
inventory, Computers and operations Research, 28: 429-441.
[3] Blackwell, D., (1962), Discrete dynamic programming, Ann. Math. Statist., 33, 719-726.
[4] De Ghellinck, G., (1960), Les problѐmes de dѐcisions sѐquentielles, Cahiers Centre Etudes
Recherche Opѐr., 2, 161-179.
[5] Denardo, E.V. and Fox, B.L., (1968), Multi-chain Markov renewal programs, SIAM J.
Appl. Math. 16, 468-487.
[6] Derman, C., (1970), Finite State Markovian Decision Processes, Academic Press, New
York.
[7] Elango, C. and Rozario, G.M., Optimal Policy for a Inventory System with Partial
Backlogging, Working paper, Madurai Kamaraj university.
[8] Golabi, K., Kulkarni, R.B. and Way, C.B., (1982), A statewide pavement management
system, Interfaces, 12, no. 6, 5-21.
[9] Hastings, N.A.J., (1971), Bounds on the gain of a Markov decision process, Operat. Res.,
19, 240-244.
[10] He, Q.-M. and Buzacott J., (2002), Optimal and near-optimal inventory control policies
for a make-to-order inventory-production system, European Journal of Operational Research,
141: 113-132.
[11] Hordijk, A. and Kallenberg, L.C.M., (1979), Linear programming and Markov decision
chains, Management Sci., 25, 352-362.
[12] Hordijk, A. and Kallenberg, L.C.M., (1984), Constrained undiscounted stochastic
dynamic programming, Math. Operat. Res., 9, 276-289.
[13] Howard, R.A., (1960), Dynamic Programming and Markov Processes, John Wiley and
sons, Inc, New York.
[14] Kawai, H., (1983), An optimal ordering and replacement policy of a Markovian
degradation system under complete observation, part I. J. Operat. Res. Soc. Japan, 26, 279-
290.
[15] Manne, A., (1960), Linear programming and sequential decisions, Management Sci., 6,
259-267.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
492
[16] Mine, H. and Osaki, S., (1970), Markov Decision Processes, American Elsevier
Publishing Company Inc, New York.
[17] Odoni, A., (1969), On finding the maximal gain for Markov decision processes, Operat.
Res., 17, 857-860.
[18] Puterman, M.L., (1994), Markov Decision Processes: Discrete Stochastic Dynamic
Programming, John Wiley and Sons, Inc New York.
[19] Selvakumar, C., Maheswari, P. and Elango, C., Discrete MDP Problem in Service Facility
Systems with Inventory Management, communicated to (ICMMCMSE2017).
[20] Stengos, D. and Thomas, L.C., (1980), The blast furnaces problem Eur. J. Operat. Res., 4,
330-336.
[21] Tijims, H.C., (2003), A First Course in Stochastic Models, John Wiley and Sons Ltd,
England.
[22] Tijms, H.C. and Van der Duyn Schouten, F.A., (1985), A Markov decision algorithm for
optimal inspections and revisions in a maintenance system with partial information, Eur. J.
Operat. Res., 21, 245-253.
[23] Veinott, A.F. Jr, (1966), On finding optimal policies in discrete dynamic programming
with no discounting, Ann. Math. Statist., 37, 1284-1294.
[24] White, J., (1985), Real Applications Markov Decision Processes, INFORMS, 15:6, 73-83.
International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com
493