DeepHop on Edge: Hop-by-hop Routing by Distributed ...
Transcript of DeepHop on Edge: Hop-by-hop Routing by Distributed ...
DeepHop on Edge: Hop-by-hop Routing byDistributed Learning with Semantic Attention
Bo He1, Jingyu Wang1, Qi Qi1, Haifeng Sun1, Zirui Zhuang1, Cong Liu2, Jianxin Liao1
1Beijing University of Posts & Telecommunications2China Mobile Research Institute
State Key Laboratory of Networking and Switching Technology
Outline
2
2
3
4
• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results• Conclusions
State Key Laboratory of Networking and Switching Technology
Outline
3
2
3
4
• Routing in Edge Networks
State Key Laboratory of Networking and Switching Technology
Routing in Edge Networks
4
node v1
node v2link (v1,v2)WIFI/LTE/ETH…
queue…
Delay
sender
Bandwidth
Packet loss rate
State Key Laboratory of Networking and Switching Technology
Outline
5
• Routing in Edge Networks• Challenges
State Key Laboratory of Networking and Switching Technology
Challenges
6
It is hard for the traditional full-path routing to handle the unexpected trafficstress on distributed edge nodes
The traffic fluctuation leads to some elements of the network state exhibitinggreat variations, thus the heuristic centralized routing can hardly capture thewhole dynamic network state transitions
Network state elements have different significance for routing, but theirimplications are intractable to be distinguished and utilized reasonably in thetraffic data forwarding process
State Key Laboratory of Networking and Switching Technology
Outline
7
• Routing in Edge Networks• Challenges• Our DeepHop Framework
State Key Laboratory of Networking and Switching Technology
Challenges
8
It is hard for the traditional full-path routing to handle theunexpected traffic stress on distributed edge nodes
The traffic fluctuation leads to some elements of the networkstate exhibiting great variations, thus the heuristic centralizedrouting can hardly capture the global dynamic network statetransitions
Network state elements have different significance for routing,but their implications are intractable to be distinguished andutilized reasonably in the traffic data forwarding process
Hop-by-hop approach
Multi-Agent DRL method
Self-attention mechanism
The MAPOKTR algorithm
Contributions of DeepHop Framework
State Key Laboratory of Networking and Switching Technology
Our DeepHop Framework
9
Objective 1:minimizing the ratio of discarded packets:
Hop-by-hop routing mechanism of DeepHop
Objective 2:minimizing average slowdown of eachtraffic packet:
State Key Laboratory of Networking and Switching Technology
Our DeepHop Framework
10
Multi-Agent Deep Reinforcement Learning (MADRL) model
State Space:
Action Space:
Reward Function:
State Key Laboratory of Networking and Switching Technology
Our DeepHop Framework
11
The Multi-Agent Deep Reinforcement Learning algorithm--MAPOKTR:Multi-Agent Policy Optimization using Kronecker-Factored Trust Region
Loss Function:
the ratio of probability distributions:
the point probability distance:
State Key Laboratory of Networking and Switching Technology
Our DeepHop Framework
12
The sub-layers structure of DRL agents State Space:
𝑂𝑂1 = 𝑂𝑂𝜏𝜏,𝑗𝑗𝑇𝑇 :
The destination node index
𝑂𝑂2 = 𝑂𝑂𝜏𝜏𝑁𝑁 :
The network state (links and nodes)
𝑂𝑂3 = 𝑂𝑂𝜏𝜏,𝑗𝑗𝑃𝑃 :
The transmission priority
State Key Laboratory of Networking and Switching Technology
Our DeepHop Framework
13
The Semantic Attention Mechanism (SAM)
State Key Laboratory of Networking and Switching Technology
Outline
14
2
3
4
• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results
State Key Laboratory of Networking and Switching Technology
Experimental results
15
2
3
4
The experimental environments
Net1
Net2
the platform: a Dell desktop (CPU: Inteli7-8700 3.2GHz; Memory: 32G DDR4L2666Mhz; OS: 64-bit Ubuntu 16.04LTS)
the network topology graphs: two realnetwork topology Net1 (12 nodes) andNet2 (20 nodes)
the traffic data: an open real trafficdataset including 8 classes of traffic
State Key Laboratory of Networking and Switching Technology
Experimental results
16
2
3
4
Performance of hop-by-hop approach
the MADRL-based DeepHophandles the routing tasks betterunder all degrees of congestionin edge networks than theheuristic CRF protocol
DeepHop performs more stablythan CRF in lower degrees ofcongestion according to theirresults in Net1
Task Unfinished Ratio (TUR): the ratio of the packets beingdiscarded before reaching their own destination nodes
Speed of Injection: leads to different degrees of congestion
Coexistent Routing and Flooding (CRF) algorithm: a state-of-the-art heuristic routing protocol of the edge network
State Key Laboratory of Networking and Switching Technology
Experimental results
17
2
3
4
Performance of Different MADRL algorithms
MAACKTR algorithm: Multi-Agent Actor Critic using Kronecker-Factored Trust Region
MADDPG algorithm: Multi-Agent Deep Deterministic Policy Gradient
Injection of speed=3000(Net1) Injection of speed=4000(Net1) Injection of speed=5000(Net1) Injection of speed=6000(Net1)
2
Injection of speed=3000(Net2) Injection of speed=4000(Net2) Injection of speed=5000(Net2) Injection of speed=6000(Net2)
State Key Laboratory of Networking and Switching Technology
Experimental results
18
2
3
4
Performance of different neural network structures in DRL agents the structure of separate sub-layers
makes the neural network understandthe semantics effectively
the attention mechanism accelerate theconvergence of policies
MADRL-SAM: The neural networks with sub-layersand self-attention mechanism
MADRL-SM: The neural networks with sub-layers
MADRL: The neural networks only have fully-connected layers
Injection of speed=3000(Net1) Injection of speed=4000(Net1)
Injection of speed=5000(Net1) Injection of speed=6000(Net1)
State Key Laboratory of Networking and Switching Technology
Experimental results
19
2
3
Performance of different reward function
all elements of reward function areproved to have a positive effect whenthe agents learn to handle therouting tasks
Injection of speed=3000(Net1) Injection of speed=4000(Net1)
Injection of speed=5000(Net1) Injection of speed=6000(Net1)
Reward:
Reward1:
Reward2:
Reward3:
State Key Laboratory of Networking and Switching Technology
Experimental results
20
2
3
Performance of DeepHop in highly-dynamic networks
DeepHop can cope with the time-varying network congestion
To generate time-varying network congestion,the number of injected packets is randomlyselected from 3,000 to 6,000 per second inexperiments
DeepHop has a good robustness andit is suitable for actual edge networkenvironments
State Key Laboratory of Networking and Switching Technology
Outline
21
2
3
4
• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results• Conclusions
State Key Laboratory of Networking and Switching Technology
Conclusions
22
DeepHop utilizes the multi-agent deep reinforcement learning to determine hop-by-hop routing for traffic packets and deploys the agents on edge nodes based onthe MEC technology.
To learn the semantics of complicated state elements, DeepHop designs a self-attention mechanism to help the DRL agents learn better policies faster.
DeepHop adopts a novel MADRL algorithm, MAPOKTR, which introduces thepoint probability to its surrogate loss function for keeping the policy beingmonotonically improved.
For the future work, we will further explore the communication between theagents on nodes and solve the re-train problem of MADRL model when thetopology of edge networks changes significantly.
Thanks!