Multi-Objective Reinforcement Learning for Cognitive Radio ...

22
Multi-Objective Reinforcement Learning for Cognitive Radio- based Satellite Communications 1 NASA GRC Grant: “Intelligent Media Access Protocol for SDR-based Satellite Communications.” Grant number: NNC14AA01A P. V. R. Ferreira © 2016 Paulo Ferreira, Randy Paffenroth, Alexander Wyglinski - (WPI) Timothy Hackett, Sven Bilen - (Penn State) Richard Reinhart, Dale Mortensen - (NASA GRC)

Transcript of Multi-Objective Reinforcement Learning for Cognitive Radio ...

Page 1: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Multi-Objective Reinforcement Learning for Cognitive Radio-based Satellite Communications

1

NASA GRC Grant: “Intelligent Media Access Protocol for SDR-based Satellite Communications.”

Grant number: NNC14AA01AP. V. R. Ferreira © 2016

Paulo Ferreira, Randy Paffenroth, Alexander Wyglinski - (WPI)

Timothy Hackett, Sven Bilen - (Penn State)

Richard Reinhart, Dale Mortensen - (NASA GRC)

Page 2: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Agenda

• What is a Cognitive Radio?

• CR applications

• The problem: Multi-objective performance

• Reinforcement Learning: The solution

• Satcom RL performance

2 P. V. R. Ferreira © 2016

Page 3: Multi-Objective Reinforcement Learning for Cognitive Radio ...

What is a Cognitive Radio ?

3 P. V. R. Ferreira © 2016

Page 4: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

What is a Cognitive Radio ?

4 P. V. R. Ferreira © 2016

What you want

What you see

What you can do

What you can tune

T. Collins, A.M. Wyglinski. “Enabling Security in Cognitive Radios and Wireless Spectrum.” MILCOM Tutorial, 2014.

Page 5: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

What is a Cognitive Radio ?

• Learning algorithm

─ Explore vs. Exploit

5 P. V. R. Ferreira © 2016

0

100

Err

or

(%)

Global minima

Local minima

Iteration number

Page 6: Multi-Objective Reinforcement Learning for Cognitive Radio ...

CR applications

6 P. V. R. Ferreira © 2016

Page 7: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

CR applications

• Satcom

7 P. V. R. Ferreira © 2016

Reinhart, R. C. Using International Space Station For Cognitive System Research And Technology With Space-based Reconfigurable Software Defined Radios. 66th International Astronautical Congress, IAC 2015.

Page 8: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

• Modulation scheme

• Encoding scheme

• Symbol rate

• Bandwidth

• Carrier frequency

• ADC/DAC resolution

• Antenna

• Transmission power level

P. V. R. Ferreira © 20168

CR Satcom – What you tune: PHY layer

Page 9: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Multi-Objective Comms Performance

The problem

9 P. V. R. Ferreira © 2016

Page 10: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Multi-objective comms performance

P. V. R. Ferreira © 201610

M – modulation and encoding schemesR – data rateBER – bit error rateP – transmission powerW – bandwidthEb – energy per bit

Page 11: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Reinforcement Learning

The solution

11 P. V. R. Ferreira © 2016

Page 12: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Reinforcement Learning

12 P. V. R. Ferreira © 2016

Agent

EnvironmentTx Rx

𝑨𝒕

𝑺𝒕+𝟏 𝑹𝒕+𝟏

𝑹𝒕

𝑺𝒕

Communications channel

Space segment

Ground segment

𝐴𝑡=action𝑅𝑡= reward𝑆𝑡=state

Page 13: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Reinforcement Learning

13 P. V. R. Ferreira © 2016

Page 14: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Reinforcement Learning

14 P. V. R. Ferreira © 2016

• Satcom 𝑸 𝑺,𝑨

𝑄𝑘+1 𝑠𝑘 , 𝑢𝑘 = 𝑄𝑘 𝑠𝑘 , 𝑢𝑘 + 𝛼𝑘𝑟𝑘+1

𝑢𝑘 = ℎ 𝑠𝑘

𝑠𝑘+1 = 𝑔 𝑠𝑘 , 𝑢𝑘

𝑟𝑘+1 = 𝜌 𝑠𝑘 , 𝑢𝑘

State-action policy

State-transition function

Reward function

Page 15: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Satcom RL performance

15 P. V. R. Ferreira © 2016

Page 16: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Fixed exploration probability (𝜀 = 0.5)

16 P. V. R. Ferreira © 2016

Page 17: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Variable exploration probability 𝜀

17 P. V. R. Ferreira © 2016

Page 18: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Time spent at performance levels

P. V. R. Ferreira © 201618

Mission 1: Launch/re-entry Mission 2: Multimedia

Page 19: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Time spent at performance levels

P. V. R. Ferreira © 201619

Mission 3: Power saving Mission 4: Normal

Page 20: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Worcester Polytechnic Institute

Time spent at performance levels

P. V. R. Ferreira © 201620

Mission 5: Cooperation

Page 21: Multi-Objective Reinforcement Learning for Cognitive Radio ...

THANK YOU!

WIWireless

Applied Research Solutions for a Connected Society

LaboratoryInnovation

Page 22: Multi-Objective Reinforcement Learning for Cognitive Radio ...

Contact us

http://www.wireless.wpi.edu/

[email protected]@wpi.edu