Applying reinforcement learning to single and multi-agent economic problems

23
Applying reinforcement learning to economics Neal Hughes Australian National University [email protected] November 17, 2014 Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23

description

Crawford PhD Conference 2014

Transcript of Applying reinforcement learning to single and multi-agent economic problems

Page 1: Applying reinforcement learning to single and multi-agent economic problems

Applying reinforcement learning to economics

Neal Hughes

Australian National University

[email protected]

November 17, 2014

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 1 / 23

Page 2: Applying reinforcement learning to single and multi-agent economic problems

Machine learning

Machine learningI algorithms that ‘learn’ from data, i.e., build models from data with

minimal theory / human involvement.I goes hand in hand with ‘Big Data’

Supervised LearningI estimating functions mapping ‘input‘ variables X to ‘target’ variables Y.I aka non-parametric regression

Reinforcement learningI learning to make optimal (reward maximising) decisions in dynamic

environments: learning optimal policy functions for Markov DecisionProcesses (MDPs)

I aka approximate dynamic programming

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 2 / 23

Page 3: Applying reinforcement learning to single and multi-agent economic problems

Reinforcement learning

Agent

Environment

State, st

st+1

Action, atReward, rt

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 3 / 23

Page 4: Applying reinforcement learning to single and multi-agent economic problems

A (single agent) water storage problemInflow, It+1

Release point, F1t

Storage, St

Demand node

1

Extraction, Et

Extraction point, F2t

End of system, F3t

2

3

Return flow, Rt

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 4 / 23

Page 5: Applying reinforcement learning to single and multi-agent economic problems

A (single agent) water storage problem

max{Wt}t=∞

t=0

E

{∞

∑t=0

βtΠ(Qt , It)

}

Subject to:

St+1 = min{St −Wt − δ0αS2/3t + It+1, K}

0 ≤ Wt ≤ St

Qt ≤ max{(1− δ1b)Wt − δ1a, 0}

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 5 / 23

Page 6: Applying reinforcement learning to single and multi-agent economic problems

Why reinforcement learning?

0 200 400 600 800 1000

Storage (GL)

0

500

1000

1500

2000

Inflo

w(G

L)

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 6 / 23

Page 7: Applying reinforcement learning to single and multi-agent economic problems

The Q function

The standard Bellman equation with state value function V (s)

V ∗(s) = maxa

{R(s, a) + β

ST (s, a, s ′)V ∗(s ′) ds ′

}

The Bellman equation with action-value function Q(a, s)

Q∗(a, s) = R(s, a) + β∫

ST (s, a, s ′)max

aQ∗(a, s ′) ds ′

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 7 / 23

Page 8: Applying reinforcement learning to single and multi-agent economic problems

Fitted Q Iteration

Algorithm 1: Fitted Q Iteration

1 initialise s02 Run a simulation with exploration for T periods

3 Store the samples {at , st , st+1, rt}Tt=0

4 initialise Q(at , st)5 repeat // Iterate until convergence

6 for t = 0 to T do7 set Q̂t = rt + β. maxa .Q(a, st+1)8 end

9 estimate Q by regressing Q̂t against (at , st)

10 until a stopping rule is satisfied ;

With large dense data, computing maxa Q(a, .) for each point is wastefulAlternative: max over a sample of points and fit a value function (FittedQ-V iteration)

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 8 / 23

Page 9: Applying reinforcement learning to single and multi-agent economic problems

Single agent reinforcement learning

Figure : An approximately equidistant grid in two dimensions

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

(a) 10000 iid standard normal points

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

(b) 100 points at least 0.4 apart

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 9 / 23

Page 10: Applying reinforcement learning to single and multi-agent economic problems

Tilecoding

input space

tiling layer 1

tiling layer 2

input point Xt

activated tile, layer 1

activated tile, layer 2

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 10 / 23

Page 11: Applying reinforcement learning to single and multi-agent economic problems

Single fine grid

0.0 0.2 0.4 0.6 0.8 1.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 11 / 23

Page 12: Applying reinforcement learning to single and multi-agent economic problems

Single chunky grid

0.0 0.2 0.4 0.6 0.8 1.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 12 / 23

Page 13: Applying reinforcement learning to single and multi-agent economic problems

Tilecoding: many chunky grids

0.0 0.2 0.4 0.6 0.8 1.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 13 / 23

Page 14: Applying reinforcement learning to single and multi-agent economic problems

Tilecoding

Fitting

Averaging

Averages Stochastic Gradient Descent

Setup

Regular grids

‘Optimal’ displacement vectors

Linear extrapolation

Implementation

Cython with OpenMP

Perfect ‘hashing’

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 14 / 23

Page 15: Applying reinforcement learning to single and multi-agent economic problems

A test case

10000 20000 30000 40000 50000 60000 70000 80000

Number of samples

0.993

0.994

0.995

0.996

0.997

0.998

0.999

1.000

Soc

ialw

elfa

reas

perc

enta

geof

SD

P

SDP TC-A TC-ASGD

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 15 / 23

Page 16: Applying reinforcement learning to single and multi-agent economic problems

A test case

Table : Computation time

5000 10000 20000 50000 80000

SDP 6.6 7.2 7.5 7.4 7.4TC-A 0.4 0.4 0.5 0.6 0.8TC-ASGD 0.4 0.6 0.9 1.3 1.9

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 16 / 23

Page 17: Applying reinforcement learning to single and multi-agent economic problems

Multi agent problems

Nash equilibrium concepts for stochastic games (Economics)

Markov Perfect Equilibrium

Oblivious Equilibrium

Learning in games (Economics)

Factious play

Partial best response dynamic

Multi-agent learning (Computer Science / Economics)

each agent follows a single agent RL method

or we combine RL with game theory / equilibrium concepts

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 17 / 23

Page 18: Applying reinforcement learning to single and multi-agent economic problems

Multi-agent fitted Q-V iteration

Each agent follows a fitted Q-V iteration algorithm except...

I only a sample of agents update their policies each stage(similar to partial best response)

I each new batch of samples is blended with the existing batch of samples(similar to fictitious play)

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 18 / 23

Page 19: Applying reinforcement learning to single and multi-agent economic problems

Conclusions

RL can be successfully applied to economic problems

Batch methods (such as fitted Q-V iteration) are suited to our context

tilecoding is a great approximation method for low dimension problems

Our multi-agent method provides a middle ground between macro-DPmethods and agent based-evolutionary methods

Allows us to consider complex multi-agent problems with externalities,but still have near optimal agents

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 19 / 23

Page 20: Applying reinforcement learning to single and multi-agent economic problems

A (multi-agent) water storage problemInflow, It+1

Release point, F1t

Storage, St

Demand node

1

Extraction, Et

Extraction point, F2t

End of system, F3t

2

3

Return flow, Rt

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 20 / 23

Page 21: Applying reinforcement learning to single and multi-agent economic problems

Example: capacity sharing

Initial balance Updated balance

Total Inflow

Inflow credit

Internal Spill

20 ML

+10 ML +10 ML

10 ML

User 1 Volume

10 ML

User 2 Volume

50 ML

User 1 Airspace

40 ML

User 1 Volume

30 ML

User 2 Volume

50 ML

User 1 Airspace

20 ML

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 21 / 23

Page 22: Applying reinforcement learning to single and multi-agent economic problems

A test case

Figure : Mean storage by iteration

0 5 10 15 20

Iteration

550

600

650

700

750

800

Mea

nst

orag

eSt

(GL)

CS NS OA SWA

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 22 / 23

Page 23: Applying reinforcement learning to single and multi-agent economic problems

A test case

Figure : Mean social welfare by iteration

0 5 10 15 20

Iteration

192.0

192.5

193.0

193.5

194.0

194.5

195.0

195.5

Mea

nso

cial

wel

fare∑n i=

1uit

($M

)

CS NS OA SWA

Neal Hughes (ANU) Applying reinforcement learning to economics November 17, 2014 23 / 23