Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots
-
Upload
ivomar-brito-soares -
Category
Engineering
-
view
64 -
download
4
Transcript of Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots
Departure MANagement with a ReinforcementLearning ApproachRespecting CFMU Slots
Ivomar Brito Soares, Yann-Michael De Hauwere, KrisJanuarius, Tim Brys, Thierry Salvant, Ann Nowe
[email protected] - [email protected]
ITSC 2015September 2015
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 2
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale Multi-Agent Systems (MAS)
Smart Energy Grids Intelligent Traffic Systems
Warehouse PlanningAir Traffic System
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 3
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale MAS
Characteristics
I Resources and control are inherently distributed.
I Different actors with mutual and conflicting interests.
I Highly stochastic.
I Full system dynamics is unknown (e.g. not all constraints areknown).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale MAS
Characteristics
I Resources and control are inherently distributed.
I Different actors with mutual and conflicting interests.
I Highly stochastic.
I Full system dynamics is unknown (e.g. not all constraints areknown).
Full mathematical description or a global centralized solution aredifficult to be calculated.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 5
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Why use RL?
I Artificial Intelligence (AI)I Machine Learning (ML)
I Reinforcement Learning (RL)
Reinforcement Learning
I Agent based modelling effort is reduced.I Exceeds human controller performance.I Adaptive and can change its decisions dynamically.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 6
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent RL
Single-Agent Reinforcement Learning (RL) Model[Kelbling et al. (1996)]
I Approach to solve a Markov Decision Process (MDP).I Agent must learn by itself (trial and error).I Maximize a long term numerical reward signal.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 7
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Multi-Agent RL
Multi-Agent Reinforcement Learning (MARL)[Nowe (2011)]
I Markov Game (MG).
I Multiple agents / multiple sequential decisions.
I More complex than Single-Agent RL.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 8
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 9
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Air Traffic System
I Air Traffic System (ATS)I Airport Ground Operations (AGO)
I Departure MANagement (DMAN)
Organize movement of departing aircraft from gate to take-offclearance.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Air Traffic System
I Air Traffic System (ATS)I Airport Ground Operations (AGO)
I Departure MANagement (DMAN)
Organize movement of departing aircraft from gate to take-offclearance.
DMAN Tasks
I Respect assigned Target Take-Off Time Windows(TTOTW).
I Increase of runway throughput and airport capacity.
I Reduce fuel consumtion, noise and CO2 emissions.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).
Flight Plan CallsignTarget Take-Off Time Window (TTOTW)
TTOTWmin TTOT TTOTWmax
AAL0005D 07:19:12 07:20:12 07:21:12
DAL0067D 07:22:12 07:23:12 07:24:12
JBU0065D 07:25:12 07:26:12 07:27:12
AAL0007D 07:28:12 07:29:12 07:30:12
DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).
Flight Plan CallsignTarget Take-Off Time Window (TTOTW)
TTOTWmin TTOT TTOTWmax
AAL0005D 07:19:12 07:20:12 07:21:12
DAL0067D 07:22:12 07:23:12 07:24:12
JBU0065D 07:25:12 07:26:12 07:27:12
AAL0007D 07:28:12 07:29:12 07:30:12
DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6
In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.
I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.
I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.
2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Fast Time Simulation (FTS)
I Modeling: EnRoute flight phase, Terminal ManeuveringArea (TMA), aircraft and airport handlers, vehicles groundmovements.
I Simulation clock runs faster than a regular clock (fast-time).
I AirTOp, SIMMOD, TAAM, Arc Port ALTO etc.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 13
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
John F. Kennedy International Airport (KJFK)
GridWorldEnvironment
New York City, USA (NY-Metro)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 14
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 15
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Environment
I Fast Time Simulation: AirTOp.
I Departure Flights: Off-Block → Pushing-Back → Taxiing →Runway acceleration → Take-Off → Standard InstrumentDeparture (SID) → EnRoute.
I Arrival Flights: EnRoute → Standard Terminal ArrivalRoute (STAR) → Land → Runway decceleration → Taxiing→ In-Block.
I Safety requirements: wake-vortex separations, runway usage.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 16
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent.I States:
1. Parked (initial state): one per departure aircraft with a TTOTwindow.
I Departure GateI Entry Time at departure gate.
2. Taxiing (intermediate states) (finite):I Entry NodeI Exit NodeI Entry Time at entry node.
3. Taken-Off (goal/absorbing states):I Taken-Off Inside WindowI Taken-Off Outside Window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions:
1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions:
1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off
I Reward Function:I Inside Window: Positive reward penalizing delay on ground.I Outside Window: 0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,
Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,
Q=0.5. Aircraft takes-off (ATOT) at 08:16:00.
rTTOTW ,ini =100-0.5 * 60 = 70. Q =
0.2*70=14.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 21
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Problem Size
Large Multi-Agent System (MAS)
I Number of departure flights (agents): 698
I Number of arrival flights: 711
I Two days of operations of KJFK
Independent Learning Scenarios
I Total: 42
Scenario 6-38 5 0 39 1,3,4 2,41 40
# of Dep 20 13 8 6 3 1 0
# of AT MA MA MA MA MA SA -Number of departure flights per learning scenario index
(Dep: Departure Flights, AT: Agents Type, MA: Multi-Agent, SA: Single-Agent).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 22
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average # of TTOTW in Average fuel consumption departureflights (Kg)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 23
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average reward (all agents) Average reward (per agent)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 24
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average gate delay (all agents) Average taxiing delay (all agents)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 25
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of Windows Respected
CaseEnvironment
Deterministic StochasticMachine Learning 99 96
Gate ControllersAverage 85 71Exact 97 44
Gate + RunwayControllers
Average 87 70Exact 96 44
Percentage of windows respected for all scenarios
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 26
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Fuel Consumption (kg)
CaseEnvironment
Deterministic StochasticMachine Learning 34,806 37,989
Gate ControllersAverage 35,057 38,106Exact 34,839 37,865
Gate + RunwayControllers
Average 35,412 36,613Exact 34,847 37,872
Fuel consumption for departure aircraft for all scenarios
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 27
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 28
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Questions?
Ivomar Brito Soares
[email protected] - [email protected]
AI Lab (VUB): https://ai.vub.ac.be
Airtopsoft SA: http://www.airtopsoft.
com
Innoviris: http://www.innoviris.be
Youtube Channel:RLDMAN: http://www.youtube.com/
channel/UC8uJBsMej5A1as8trbVxbbQ.Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 30
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
References
1. A. G. Barto, Reinforcement Learning: An Introduction.MIT press, 1998.
2. K. Tumer and A. Agogino, Improving Air TrafficManagement with a Learning Multiagent System, IEEEIntelligent Systems, vol. 24, no. 1, pp. 1821, 2009.
3. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell,Machine learning: An Artificial Intelligence Approach.Springer Science Business Media, 2013.
4. A. Nowe, P. Vrancx, and Y.-M. De Hauwere, Game Theoryand Multi-agent Reinforcement Learning. Springer, 2012,ch. Reinforcement Learning: State of the Art, pp. 441470.
5. Y.-M. De Hauwere, Sparse Interactions in Multi-AgentReinforcement Learning, 2011.
6. R. De Neufville and A. Odoni, Airport Systems: Planning,Design and Management, 2013.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 31
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
References
7. S. Stroiney, B. Levy, and C. Knickerbocker, DepartureManagement: Savings in Taxi Time, Fuel Burn, andEmissions, in Integrated Communications Navigation andSurveillance Conference (ICNS). IEEE, 2010, pp. J217.
8. J.-B. Gotteland, N. Durand, and J.-M. Alliot, HandlingCFMS Slots in Busy Airports, in 5th USA/Europe AirTraffic Management Research and Development Seminar,2003.
9. Airtopsoft, Airtop Fast Time Simulatorhttp://www.airtopsoft.com/, 2005, [Online; accessed23-February-2015].
10. C. J. Watkins and P. Dayan, Q-Learning, Machine learning,vol. 8, no. 3-4, pp. 279292, 1992.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 32
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Abstract
This paper considers how existing Reinforcement Learning (RL) techniques can be used to model and learnsolutions for large scale Multi-Agent Systems (MAS). The large scale MAS of interest is the context of themovement of departure flights in big airports, commonly known as the Departure MANagement (DMAN)problem. A particular DMAN subproblem is how to respect Central Flow Management Unit (CFMU) take-off timewindows, which are time windows planned by flow management authorities to be respected for the take-off time ofdeparture flights. A RL model to handle this problem is proposed including the Markov Decision Process (MDP)definition, the behavior of the learning agents and how the problem can be modeled using RL ranging from thesimplest to the full RL problem. Several experiments are also shown that illustrate the performance of the machinelearning algorithm, with a comparison on how these problems are commonly handled by airport controllersnowadays. The environment in which the agents learn is provided by the Fast Time Simulator (FTS) AirTOp andthe airport case study is the John F. Kennedy International Airport (KJFK) in New York City, USA, one of thebusiest airports in the world.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 33
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I S = {s1, ..., sN} is a finite set of states.I A = {a1, ..., ak} are the actions available to the agent.I Tsa: Each combination of starting state si , action choice
al ∈ A and next state sj has an associated transitionprobability T (si , al , sj).
I R = Immediate reward R(si , al).I γ ∈ [0, 1) is the discount factor (e.g., 0.9).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I E : Expected value.
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I s ′: Next state.I a′: Action taken on next state.
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I α: Learning rate.
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
I ε0 to be the initial value (e.g., 1.0).I τ ∈ (0, 1) the decay (e.g.,0.995, 1348 episodes).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Algorithm 1 RL Update With Q-Learning and ε-Greedy withDecay
1: Initialize Q(s, a) (e.g., 0)2: for Each episode do3: Agent returns to initial state.4: Decay ε: ε← ε0 ∗ τ episode5: while Final/absorbing state not reached do6: Generate random number n ∈ [0, 1):7: if n ≤ ε then8: Explore: Choose a at random9: else
10: Exploit: Choose among a with the highest Q
11: Execute action a12: Observe reward r , next state s
′
13: Update Q of a: Q(s, a) ← Q(s, a) + αt [R(s, a) +γmaxa′ Q(s ′, a′)− Q(s, a)] [Watkins, 1992]Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 35
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure.
I Departure aircraft i : acdepi , acdep,TTOTi .
I TTOT Window (TTOTW):
I Width: TTOTW wi > 0.
I Range: [TTOTWmini , TTOTmax
i ].I Constraints: TTOTWmin
i < TTOTi < TTOTWmaxi and
TTOTWmaxi = TTOTWmin
i + TTOTW wi .
I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin
i ,TTOTWmaxi ].
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure.
I Departure aircraft i : acdepi , acdep,TTOTi .
I TTOT Window (TTOTW):
I Width: TTOTW wi > 0.
I Range: [TTOTWmini , TTOTmax
i ].I Constraints: TTOTWmin
i < TTOTi < TTOTWmaxi and
TTOTWmaxi = TTOTWmin
i + TTOTW wi .
I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin
i ,TTOTWmaxi ].
In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .
I Average: T ot,ai =
Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.
I Exact: T ot,ei =
Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .
I Average: T ot,ai =
Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.
I Exact: T ot,ei =
Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).
2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent ai .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent ai .I States (S):
1. Parked (initial state) (spi ):I Departure Gate (gd)I Entry Time (ep)
2. Taxiing (intermediate states) (S ti,j = sti,1, ..., s
ti,N):
I Entry Node (ne)I Exit Node (nx)I Entry Time (et)
3. Taken-Off (goal/absorbing states):I Taken-Off Inside Window (sTTOTW ,in
i )I Taken-Off Outside Window (sTTOTW ,out
i )
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions (A):
1. Delay Off-Block (Ao = ao1 , ..., aoL)
2. Delay During Taxiing (At = at1, ..., atM)
3. Take-Off (ae)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions (A):
1. Delay Off-Block (Ao = ao1 , ..., aoL)
2. Delay During Taxiing (At = at1, ..., atM)
3. Take-Off (ae)
I Reward Function (R):I Inside Window:
rTTOTW ,ini = rmax − ptaxiingi = rmax − f taxiing ∗ d taxiing
i
I Outside Window: rTTOTW ,outi = 0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RL Set Up
I Independent learners.
I Q-learning: α = 0.2, γ = 0.8.
I Action selection mechanism is ε-greedy with parameter decay:ε0 = 1.0, τ = 0.995.
I Competitive setting: rmax = 10, 000, f taxiing = −1.0.
I Environment: deterministic, stochastic (non-deterministic).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 41
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RL Set Up
I Delay off-block actions: Ao with a range of [−10min, 10min]centered around TTOT − T ot,e and a step of 10sec for everyagent (121 actions per spi ).
I Delay during taxiing actions: At were defined with a range of[0, 1min] and a step of 10sec (7 actions per S t
i ) close to eachapron exit.
I Learning trial ends when ε = 0.001 for all agents (1378episodes).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 42
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 43
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 44
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 45
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 46
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 47