Applying reinforcement learning to single and multi-agent economic problems
-
Upload
anucrawfordphd -
Category
Economy & Finance
-
view
206 -
download
0
description
Transcript of Applying reinforcement learning to single and multi-agent economic problems
Applying reinforcement learning to economics
Neal Hughes
Australian National University
November 17, 2014
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23
Machine learning
Machine learningI algorithms that ‘learn’ from data, i.e., build models from data with
minimal theory / human involvement.I goes hand in hand with ‘Big Data’
Supervised LearningI estimating functions mapping ‘input‘ variables X to ‘target’ variables Y.I aka non-parametric regression
Reinforcement learningI learning to make optimal (reward maximising) decisions in dynamic
environments: learning optimal policy functions for Markov DecisionProcesses (MDPs)
I aka approximate dynamic programming
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23
Reinforcement learning
Agent
Environment
State, st
st+1
Action, atReward, rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23
A (single agent) water storage problemInflow, It+1
Release point, F1t
Storage, St
Demand node
1
Extraction, Et
Extraction point, F2t
End of system, F3t
2
3
Return flow, Rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23
A (single agent) water storage problem
max{Wt}t=∞
t=0
E
{∞
∑t=0
βtΠ(Qt , It)
}
Subject to:
St+1 = min{St −Wt − δ0αS2/3t + It+1, K}
0 ≤ Wt ≤ St
Qt ≤ max{(1− δ1b)Wt − δ1a, 0}
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23
Why reinforcement learning?
0 200 400 600 800 1000
Storage (GL)
0
500
1000
1500
2000
Inflo
w(G
L)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23
The Q function
The standard Bellman equation with state value function V (s)
V ∗(s) = maxa
{R(s, a) + β
∫
ST (s, a, s ′)V ∗(s ′) ds ′
}
The Bellman equation with action-value function Q(a, s)
Q∗(a, s) = R(s, a) + β∫
ST (s, a, s ′)max
aQ∗(a, s ′) ds ′
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23
Fitted Q Iteration
Algorithm 1: Fitted Q Iteration
1 initialise s02 Run a simulation with exploration for T periods
3 Store the samples {at , st , st+1, rt}Tt=0
4 initialise Q(at , st)5 repeat // Iterate until convergence
6 for t = 0 to T do7 set Q̂t = rt + β. maxa .Q(a, st+1)8 end
9 estimate Q by regressing Q̂t against (at , st)
10 until a stopping rule is satisfied ;
With large dense data, computing maxa Q(a, .) for each point is wastefulAlternative: max over a sample of points and fit a value function (FittedQ-V iteration)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23
Single agent reinforcement learning
Figure : An approximately equidistant grid in two dimensions
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
(a) 10000 iid standard normal points
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
(b) 100 points at least 0.4 apart
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23
Tilecoding
input space
tiling layer 1
tiling layer 2
input point Xt
activated tile, layer 1
activated tile, layer 2
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23
Single fine grid
0.0 0.2 0.4 0.6 0.8 1.00.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23
Single chunky grid
0.0 0.2 0.4 0.6 0.8 1.00.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23
Tilecoding: many chunky grids
0.0 0.2 0.4 0.6 0.8 1.00.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23
Tilecoding
Fitting
Averaging
Averages Stochastic Gradient Descent
Setup
Regular grids
‘Optimal’ displacement vectors
Linear extrapolation
Implementation
Cython with OpenMP
Perfect ‘hashing’
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23
A test case
10000 20000 30000 40000 50000 60000 70000 80000
Number of samples
0.993
0.994
0.995
0.996
0.997
0.998
0.999
1.000
Soc
ialw
elfa
reas
perc
enta
geof
SD
P
SDP TC-A TC-ASGD
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23
A test case
Table : Computation time
5000 10000 20000 50000 80000
SDP 6.6 7.2 7.5 7.4 7.4TC-A 0.4 0.4 0.5 0.6 0.8TC-ASGD 0.4 0.6 0.9 1.3 1.9
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23
Multi agent problems
Nash equilibrium concepts for stochastic games (Economics)
Markov Perfect Equilibrium
Oblivious Equilibrium
Learning in games (Economics)
Factious play
Partial best response dynamic
Multi-agent learning (Computer Science / Economics)
each agent follows a single agent RL method
or we combine RL with game theory / equilibrium concepts
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23
Multi-agent fitted Q-V iteration
Each agent follows a fitted Q-V iteration algorithm except...
I only a sample of agents update their policies each stage(similar to partial best response)
I each new batch of samples is blended with the existing batch of samples(similar to fictitious play)
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 18 / 23
Conclusions
RL can be successfully applied to economic problems
Batch methods (such as fitted Q-V iteration) are suited to our context
tilecoding is a great approximation method for low dimension problems
Our multi-agent method provides a middle ground between macro-DPmethods and agent based-evolutionary methods
Allows us to consider complex multi-agent problems with externalities,but still have near optimal agents
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 19 / 23
A (multi-agent) water storage problemInflow, It+1
Release point, F1t
Storage, St
Demand node
1
Extraction, Et
Extraction point, F2t
End of system, F3t
2
3
Return flow, Rt
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 20 / 23
Example: capacity sharing
Initial balance Updated balance
Total Inflow
Inflow credit
Internal Spill
20 ML
+10 ML +10 ML
10 ML
User 1 Volume
10 ML
User 2 Volume
50 ML
User 1 Airspace
40 ML
User 1 Volume
30 ML
User 2 Volume
50 ML
User 1 Airspace
20 ML
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 21 / 23
A test case
Figure : Mean storage by iteration
0 5 10 15 20
Iteration
550
600
650
700
750
800
Mea
nst
orag
eSt
(GL)
CS NS OA SWA
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 22 / 23
A test case
Figure : Mean social welfare by iteration
0 5 10 15 20
Iteration
192.0
192.5
193.0
193.5
194.0
194.5
195.0
195.5
Mea
nso
cial
wel
fare∑n i=
1uit
($M
)
CS NS OA SWA
Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 23 / 23