Multi-Agent Shared Hierarchy Reinforcement Learning

20
Multi-Agent Shared Multi-Agent Shared Hierarchy Hierarchy Reinforcement Learning Reinforcement Learning Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University

description

Multi-Agent Shared Hierarchy Reinforcement Learning. Neville Mehta Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University. Highlights. Sharing value functions Coordination Framework to express sharing & coordination with hierarchies RTS domain. - PowerPoint PPT Presentation

Transcript of Multi-Agent Shared Hierarchy Reinforcement Learning

Page 1: Multi-Agent Shared Hierarchy Reinforcement Learning

Multi-Agent Shared Multi-Agent Shared HierarchyHierarchy

Reinforcement LearningReinforcement LearningNeville Mehta

Prasad Tadepalli

School of Electrical Engineering and Computer Science

Oregon State University

Page 2: Multi-Agent Shared Hierarchy Reinforcement Learning

22

HighlightsHighlights

Sharing value functions Coordination Framework to express sharing &

coordination with hierarchies RTS domain

Page 3: Multi-Agent Shared Hierarchy Reinforcement Learning

33

Previous WorkPrevious Work

MAXQ, Options, ALisp Coordination in the hierarchical

setting (Makar, Mahadevan) Sharing flat value functions (Tan) Concurrent reinforcement learning for

multiple effectors (Murthi, Russell, …)

Page 4: Multi-Agent Shared Hierarchy Reinforcement Learning

44

OutlineOutline

Average Reward Learning RTS domain Hierarchical ARL MASH framework Experimental results Conclusion & future work

Page 5: Multi-Agent Shared Hierarchy Reinforcement Learning

55

SMDPSMDP

Semi-Markov Decision Process (SMDP) extends MDPs by allowing for temporally extended actions– States S– Actions A– Transition function P(s’, N|s, a)– Reward function R(s’|s, a)– Time function T(s’|s, a)

Given an SMDP, an agent in state s following policy ,Gain ½¼(s) = limN ! 1

E(P Ni =0 r i )

E(P Ni =0 ti )

Page 6: Multi-Agent Shared Hierarchy Reinforcement Learning

66

Average Reward LearningAverage Reward Learning Taking action a in state s

– Immediate reward r(s, a)– Action duration t(s, a)

Average-adjusted reward = Optimal policy * maximizes the RHS, and leads to the

optimal gain

h¼(s0) = E [(r(s0;a) ¡ ½t(s0;a)) + (r(s1;a) ¡ ½t(s1;a)) +¢¢¢]

) h¼(s0) = E [r(s0;a) ¡ ½t(s0;a)]+h¼(s1)

s0 s1 s2 sn

s0 sn

r(s0;a0) ¡ ½t(s0;a0) r(s0;a1) ¡ ½t(s0;a1) r(s0;a2) ¡ ½t(s0;a2)

Parent task

Child task

r(s;a) ¡ ½¼t(s;a)½¼¤ ¸ ½¼

Page 7: Multi-Agent Shared Hierarchy Reinforcement Learning

77

RTS DomainRTS Domain

Grid world domain Multiple peasants

mine resources (wood, gold) to replenish the home stock

Avoid collisions with one another

Attack the enemy’s base

Page 8: Multi-Agent Shared Hierarchy Reinforcement Learning

88

RTS Domain Task HierarchyRTS Domain Task Hierarchy

Root

Harvest(l)

Deposit

Goto(k)

EastSouthNorth West

Pick Put

Offense(e)

Idle

AttackPrimitive Task

Composite Task

MAXQ task hierarchy– Original SMDP is split into sub-SMDPs (subtasks)– Solving the Root task solves the entire SMDP

Each subtask Mi is defined by <Bi, Ai, Gi>– State abstraction Bi

– Actions Ai

– Termination (goal) predicate Gi

Page 9: Multi-Agent Shared Hierarchy Reinforcement Learning

99

Hierarchical Average Reward Hierarchical Average Reward LearningLearning

Value function decomposition for a recursively gain-optimal policy in Hierarchical H learning:

If the state abstractions are sound, Root task = Bellman equation

hi (s) = r(s) ¡ ½¢t(s); if i is a primitive subtask

= 0; if s is a terminal/ goal state for i; otherwise

= maxa2A i (s)

½ha(Ba(s)) +

X

s02S

P(s0js;a) ¢hi (s0)¾

ha(Ba(s)) = ha(s)

Page 10: Multi-Agent Shared Hierarchy Reinforcement Learning

1010

Hierarchical Average Reward Hierarchical Average Reward LearningLearning

No pseudo rewards No completion function Scheduling is a learned behavior

Page 11: Multi-Agent Shared Hierarchy Reinforcement Learning

1111

Hierarchical Average Reward Hierarchical Average Reward LearningLearning

Sharing requires coordination Coordination part of state not action

(Mahadevan) No need for each subtask to see

reward

Page 12: Multi-Agent Shared Hierarchy Reinforcement Learning

1212

Single Hierarchical AgentSingle Hierarchical Agent

Root

Harvest(W1)

Goto(W1)

North

Root

Harvest(l)

Deposit

Goto(k)

EastSouthNorth West

Pick Put

Offense(e)

Idle

Attack

Page 13: Multi-Agent Shared Hierarchy Reinforcement Learning

1313

Simple Multi-Agent SetupSimple Multi-Agent Setup

Root

Harvest(l)

Deposit

Goto(k)

EastSouthNorth West

Pick Put

Offense(e)

Idle

Attack

Root

Offense(E1)

Attack

Root

Harvest(l)

Deposit

Goto(k)

EastSouthNorth West

Pick Put

Offense(e)

Idle

Attack

Root

Harvest(W1)

Goto(W1)

North

Page 14: Multi-Agent Shared Hierarchy Reinforcement Learning

1414

MASH SetupMASH Setup

Root

Harvest(l)

Deposit

Goto(k)

EastSouthNorth West

Pick Put

Offense(e)

Idle

Attack

Root

Offense(E1)

Attack

Root

Harvest(W1)

Goto(W1)

North

Page 15: Multi-Agent Shared Hierarchy Reinforcement Learning

1515

Experimental ResultsExperimental Results

2 agents in a 15 x 15 grid, Pr(Resource Regeneration) = 5%; Pr(Enemy) = 1%; Rewards = (-1, 100, -5, 50); 30 runs

4 agents in a 25 × 25 grid, Pr(Resource Regeneration) = 7.5%; Pr(Enemy) = 1%; Rewards = (0, 100, -5, 50); 30 runs

Couldn’t run separate agents coordination for 4 agents 25 × 25

Page 16: Multi-Agent Shared Hierarchy Reinforcement Learning

1616

Experimental ResultsExperimental Results

Page 17: Multi-Agent Shared Hierarchy Reinforcement Learning

1717

Experimental Results (2)Experimental Results (2)

Page 18: Multi-Agent Shared Hierarchy Reinforcement Learning

1818

ConclusionConclusion

Sharing value functions Coordination Framework to express sharing &

coordination with hierarchies

Page 19: Multi-Agent Shared Hierarchy Reinforcement Learning

1919

Future WorkFuture Work

Non-Markovian & non-stationary Learning the task hierarchy

– Task – subtask relationships– State abstractions– Termination conditions

Combining MASH framework with factored action models

Recognizing opportunities for sharing & coordination

Page 20: Multi-Agent Shared Hierarchy Reinforcement Learning

2020

Current WorkCurrent Work

Murthi, Russell features