Multi-Objective Reinforcement Learning for Cognitive Radio ...
Transcript of Multi-Objective Reinforcement Learning for Cognitive Radio ...
Multi-Objective Reinforcement Learning for Cognitive Radio-based Satellite Communications
1
NASA GRC Grant: “Intelligent Media Access Protocol for SDR-based Satellite Communications.”
Grant number: NNC14AA01AP. V. R. Ferreira © 2016
Paulo Ferreira, Randy Paffenroth, Alexander Wyglinski - (WPI)
Timothy Hackett, Sven Bilen - (Penn State)
Richard Reinhart, Dale Mortensen - (NASA GRC)
Worcester Polytechnic Institute
Agenda
• What is a Cognitive Radio?
• CR applications
• The problem: Multi-objective performance
• Reinforcement Learning: The solution
• Satcom RL performance
2 P. V. R. Ferreira © 2016
What is a Cognitive Radio ?
3 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
What is a Cognitive Radio ?
4 P. V. R. Ferreira © 2016
What you want
What you see
What you can do
What you can tune
T. Collins, A.M. Wyglinski. “Enabling Security in Cognitive Radios and Wireless Spectrum.” MILCOM Tutorial, 2014.
Worcester Polytechnic Institute
What is a Cognitive Radio ?
• Learning algorithm
─ Explore vs. Exploit
5 P. V. R. Ferreira © 2016
0
100
Err
or
(%)
Global minima
Local minima
Iteration number
CR applications
6 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
CR applications
• Satcom
7 P. V. R. Ferreira © 2016
Reinhart, R. C. Using International Space Station For Cognitive System Research And Technology With Space-based Reconfigurable Software Defined Radios. 66th International Astronautical Congress, IAC 2015.
Worcester Polytechnic Institute
• Modulation scheme
• Encoding scheme
• Symbol rate
• Bandwidth
• Carrier frequency
• ADC/DAC resolution
• Antenna
• Transmission power level
P. V. R. Ferreira © 20168
CR Satcom – What you tune: PHY layer
Multi-Objective Comms Performance
The problem
9 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Multi-objective comms performance
P. V. R. Ferreira © 201610
M – modulation and encoding schemesR – data rateBER – bit error rateP – transmission powerW – bandwidthEb – energy per bit
Reinforcement Learning
The solution
11 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Reinforcement Learning
12 P. V. R. Ferreira © 2016
Agent
EnvironmentTx Rx
𝑨𝒕
𝑺𝒕+𝟏 𝑹𝒕+𝟏
𝑹𝒕
𝑺𝒕
Communications channel
Space segment
Ground segment
𝐴𝑡=action𝑅𝑡= reward𝑆𝑡=state
Worcester Polytechnic Institute
Reinforcement Learning
13 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Reinforcement Learning
14 P. V. R. Ferreira © 2016
• Satcom 𝑸 𝑺,𝑨
𝑄𝑘+1 𝑠𝑘 , 𝑢𝑘 = 𝑄𝑘 𝑠𝑘 , 𝑢𝑘 + 𝛼𝑘𝑟𝑘+1
𝑢𝑘 = ℎ 𝑠𝑘
𝑠𝑘+1 = 𝑔 𝑠𝑘 , 𝑢𝑘
𝑟𝑘+1 = 𝜌 𝑠𝑘 , 𝑢𝑘
State-action policy
State-transition function
Reward function
Satcom RL performance
15 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Fixed exploration probability (𝜀 = 0.5)
16 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Variable exploration probability 𝜀
17 P. V. R. Ferreira © 2016
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201618
Mission 1: Launch/re-entry Mission 2: Multimedia
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201619
Mission 3: Power saving Mission 4: Normal
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201620
Mission 5: Cooperation
THANK YOU!
WIWireless
Applied Research Solutions for a Connected Society
LaboratoryInnovation