Distributed Reinforcement Learning for a Traffic Engineering Application

20
Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina Schweikert

description

Distributed Reinforcement Learning for a Traffic Engineering Application. Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina Schweikert. Distributed Reinforcement Learning for Traffic Engineering Problem. Intelligent Cruise Control System - PowerPoint PPT Presentation

Transcript of Distributed Reinforcement Learning for a Traffic Engineering Application

Page 1: Distributed Reinforcement Learning for a Traffic Engineering Application

Distributed Reinforcement Learning for a TrafficEngineering Application

Mark D. PendrithDaimlerChrysler Research & Technology Center

Presented by: Christina Schweikert

Page 2: Distributed Reinforcement Learning for a Traffic Engineering Application

Distributed Reinforcement Learning for Traffic Engineering Problem

Intelligent Cruise Control System Lane change advisory system based on

traffic patterns Optimize a group policy by maximizing

freeway utilization as shared resource Introduce 2 new algorithms (Monte

Carlo-based Piecewise Policy Iteration, Multi-Agent Distributed Q-learning) and compare their performance in this domain

Page 3: Distributed Reinforcement Learning for a Traffic Engineering Application

Distronic Adaptive Cruise Control

Page 4: Distributed Reinforcement Learning for a Traffic Engineering Application

Distronic Adaptive Cruise Control

Signals from radar sensor, which scans the full width of a three-lane motorway over a distance of approximately 100m and recognizes any moving vehicles ahead

Reflection of the radar impulses and the change in their frequency enables the system to calculate the correct distance and the relative speed between the vehicles

Page 5: Distributed Reinforcement Learning for a Traffic Engineering Application

Distronic Adaptive Cruise Control

Distance to vehicle in front reduces - cruise control system immediately reduces acceleration or, if necessary, applies the brake

Distance increases – acts as conventional cruise control system and, at speeds of between 30 and 180 km/h, will maintain the desired speed as programmed

Driver is alerted of emergencies

Page 6: Distributed Reinforcement Learning for a Traffic Engineering Application

Distronic Adaptive Cruise Control

Automatically maintains a constant distance to the vehicle in front of it, prevent rear-end collisions

Reaction time of drivers using Distronic is up to 40 per cent faster than that of those without this assistance system

Page 7: Distributed Reinforcement Learning for a Traffic Engineering Application

Distributed Reinforcement Learning

State – agents within sensing range Agents share a partially observable

environment Goal - Integrate agents’ experiences

to learn an observation-based policy that maximizes group performance

Agents share a common policy, giving a homogeneous population of agents

Page 8: Distributed Reinforcement Learning for a Traffic Engineering Application

Traffic Engineering Problem

Population of cars, each with a desired traveling speed, sharing a freeway network

Subpopulation with radar capability to detect relative speeds and distances of cars immediately ahead, behind, and around them

Page 9: Distributed Reinforcement Learning for a Traffic Engineering Application

Problem Formulation

Optimize average per time-step reward, by minimizing the per-car average loss at each time step

vd(i) desired speed of car i

va(i) actual speed of car i

n number of cars in simulation at time-step

Page 10: Distributed Reinforcement Learning for a Traffic Engineering Application

State Representation View of the world for each car represented

by 8-d feature vector – relative distances and speeds of surrounding cars

AL AC AR

CL Car CR

BL BC BR

Page 11: Distributed Reinforcement Learning for a Traffic Engineering Application

Pattern of Cars in Front of Agent

AL AC AR

0 – lane is clear (no car in radar range or nearest car is faster than agent’s desired speed)

1 – fastest car less than desired speed

2 – slower 3 - still slower

Page 12: Distributed Reinforcement Learning for a Traffic Engineering Application

Pattern of Cars Behind Agent

AL AC AR

0 – lane is clear (no car in radar range or nearest car is slower than agent’s current speed)

1 – slowest car faster than desired speed

2 – faster 3 - still faster

Page 13: Distributed Reinforcement Learning for a Traffic Engineering Application

Lane Change

CL CAR CR

0 – lane change not valid 1 – lane change valid

If there is not a safe gap in front and behind, land change is illegal.

Page 14: Distributed Reinforcement Learning for a Traffic Engineering Application

Monte Carlo-based Piecewise Policy Iteration

Performs approximate piecewise policy iteration where possible policy changes for each state are evaluated by Monte Carlo estimation

Piecewise - Policy for each state is changed one at a time, rather than in parallel

Searches the space of deterministic policies directly without representing the value function

Page 15: Distributed Reinforcement Learning for a Traffic Engineering Application

Policy Iteration

Start with arbitrary deterministic policy for given MDP

Generate better policy by calculating best single improvement in policy possible for each state (MC)

Combine all changes to generate successor policy

Continue until no improvement is possible – optimal policy

Page 16: Distributed Reinforcement Learning for a Traffic Engineering Application

Multi-Agent Distributed Q-Learning

Q-Learning Q-value estimates updated after each

time step based on state transition after action is selected

For each time step, only one state transition and one action used to update Q-value estimates

In DQL, there can be as many state transitions per time step as there are agents

Page 17: Distributed Reinforcement Learning for a Traffic Engineering Application

Multi-Agent Distributed Q-Learning

Takes the average backup value for a state/action pair <s, a> over all agents that selected action a from state s at the last time step

Qmax component of backup value is calculated over actions valid for a particular agent to select at the next time-step

Page 18: Distributed Reinforcement Learning for a Traffic Engineering Application

Simulation for Offline Learning

Advantages: o Since true state of the environment is

known, can directly measure loss metric o Can be run faster, many long learning

trials o Safety

Learn policies offline then integrate into intelligent cruise control system with lane advisory, route planning, etc.

Page 19: Distributed Reinforcement Learning for a Traffic Engineering Application

Traffic Simulation Specifications

Circular 3 lane freeway 13.3 miles long with 200 cars

Half follow “selfish drone” policy Rest follow current learnt policy and

active exploration decisions Gaussian distribution of desired speeds,

mean of 60 mph Cars have low level collision avoidance,

differ in lane change strategy

Page 20: Distributed Reinforcement Learning for a Traffic Engineering Application

Experimental Results

Selfish drone policy – consistent per-step reward of -11.9 (each agent traveling 11.9 below desired speed)

APPIA and DQL found policies 3-5% better Best policies with “look ahead” only “look behind” model provided more

stable learning “look behind” outperforms “look ahead”

at times when good policy is lost