Reinforcement Learning in Telescope Scheduling
Transcript of Reinforcement Learning in Telescope Scheduling
![Page 1: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/1.jpg)
Reinforcement Learning in Telescope Scheduling
Nicholas Ermolov, Anne Foley
1
![Page 2: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/2.jpg)
Observing the Night Sky
● Logical approach:
○ Obtain high-quality images/data
○ Observe multiple objects
○ Applies even in recreational observations
● Often requires scheduling for efficiency and optimization
2
![Page 3: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/3.jpg)
“Messier Marathon”
From NASA’s Hubble Telescope
3
![Page 4: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/4.jpg)
Robotic Telescopes
● Sloan Digital Sky Survey (SDSS)
○ Over 300 million objects
○ Manual planning
● Dark Energy Survey (DES)
○ 16,000 pointings, image every
2 minutes
○ Hand-tuned computerized
simulations
SDSS telescope (right)and sky coverage (up)
Dark Energy Camera (right) and DES ‘footprint’ (up) 4
![Page 5: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/5.jpg)
Sky Surveying vs. Classic Games
● Surveying the sky: like navigating a 2D plane
● Can be compared to classic video games
● How have computers learned to play games?
○ Reinforcement learning and neural networks
Neural net using reinforcement learning to play Doom
5
![Page 6: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/6.jpg)
Our Project
● Understand the principles of reinforcement learning
● Apply the concepts of reinforcement learning to schedule optimization
● Represent telescope-sky interactions in the context of machine learning
● Implement software that can learn optimal policies on modeled data
6
![Page 7: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/7.jpg)
Reinforcement Learning
7
![Page 8: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/8.jpg)
Reinforcement Learning (RL) Concepts
● Agent interacts with its environment and
“learns”
● Rewards fed back from environment affect
agent’s policy
● Objective: maximise cumulative reward
● Example: Chess
AGENT ENVIRONMENT
action
state
POLICY
reward
S1 S2 S3
A1 A2 A3
8
![Page 9: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/9.jpg)
Applications of RL
9
![Page 10: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/10.jpg)
Why RL is Useful for our Project
● RL is powerful in learning Markov
Decision Processes
○ Learning best actions between
different states
○ Works for stochastic or
deterministic
● Our environment can be modeled as
an episodic MDP
10
![Page 11: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/11.jpg)
Bellman Equations
State value representation:
State-action pair value:
Optimal policy:
Using a statistical representation to navigate a MDP 11
![Page 12: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/12.jpg)
From Equations to Code
● Want to estimate the values of states
○ Used as a baseline for moving forward in our code
● Use ‘advantage’ to estimate relative values of actions
12
![Page 13: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/13.jpg)
Neural Networks
13
![Page 14: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/14.jpg)
Neural Networks (NN)
● Common approach to machine learning
● Input, hidden, and output layers
○ Path from one layer to the next consists of weights
● A series of linear transformations
● Variables within the network can be adjusted
14
![Page 15: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/15.jpg)
Loss and Policy Gradients
● Use gradients descent to adjust the
neural network (i.e. using chain rule)
● Goal is to minimize a defined loss
○ Along ‘dimensions’ that
represent weights and biases
15
![Page 16: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/16.jpg)
Building NN’s Using TensorFlow
● TensorFlow library
○ Creates nodes and layers
○ Calculates loss gradients
● Tensorboard for visualization
Building our network using TF 16
![Page 17: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/17.jpg)
Reinforcement Learning w/ Neural Networks
● Neural network becomes part of the agent
○ Input is the state, outputs an action
○ Reward adjusts the weights (policy)
○ Eventually becomes fine-tuned for optimization 17
![Page 18: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/18.jpg)
Challenges of Reinforcement Learning
● Difficult to define a loss function
○ Policy gradient adjustment method
○ Actor-critic method
● ‘Exploitation’ versus ‘Exploration’
Episode / 10 Episode / 10
Episode / 10 Episode / 10
Tota
l Rew
ard
/ Epi
sode
PG L
oss S
cala
rV
alue
Los
s Sca
lar
Val
ue L
oss +
PG
Los
s
Plotted data once every 10 episodes18
![Page 19: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/19.jpg)
Observation Planning using Reinforcement Learning
19
![Page 20: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/20.jpg)
How to Represent the States and Actions
● Agent takes the current time as input● Neural network learns the environment● Must be trained on every new
environment
● Environment as the input● Outputs entire schedule● Network is reusable for different
environments
20
![Page 21: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/21.jpg)
Summer Triangle
reward = 1
reward = 2
reward = 3
reward = -3
21
![Page 22: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/22.jpg)
Tensorboard for Summer Triangle
Episode number / 10
Tota
l Rew
ard
/ Epi
sode
Weight valuesBias values
Episode number / 10
22
![Page 23: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/23.jpg)
Neural Network and Training Parameters
● Cannot just ‘put together’ a neural network
● Need to adjust hyper-parameters
○ Setting up the neural network
○ Training the network
● Evaluate reward and loss
23
![Page 24: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/24.jpg)
A Less Stringent Approach
● Method used until now
○ Using the timestep (i.e. hour of
night) as input state
○ Choosing one object per timestep
○ Jumping from object to object
● Robotic telescopes
○ Capture entire images
○ Impractical to frequently redirect
Image from Dark Energy Camera 24
![Page 25: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/25.jpg)
Closer to an Atari Game
● Map celestial sphere onto 2D plane
● Reward function: define optimal and poor
observation conditions
○ Distance from horizon, number of
objects in frame
● Advantage: can account for more factors
● Disadvantage: less deterministic, more
processing power
25
![Page 26: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/26.jpg)
Next Steps using Reinforcement Learning
26
![Page 27: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/27.jpg)
Accounting for More Factors
● Add more complexity to
environment simulations
○ Weather
○ Multiple observations of
each object
○ Apply correct
wavelength filters
Crab Nebula
27
![Page 28: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/28.jpg)
LSST and More
● Mostly set survey strategy
○ Multiple objectives and classes of objects
○ Arguably more complex than SDSS and DES
○ Combination of multiple projects
● However, open for white papers on ‘mini surveys’
○ 10-20% of survey
28
![Page 29: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/29.jpg)
Key Takeaways
● Machine learning, neural networks, and reinforcement learning are simply iterative
algorithms that utilize linear algebra and statistics to achieve certain goals
● Reinforcement learning is a powerful tool for finding an optimal policy to a Markov
Decision Process
● Neural networks can provide the foundation for reinforcement learning algorithms
● Reinforcement learning is a fitting approach for optimizing telescope survey
strategies
29
![Page 30: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/30.jpg)
Our Code and More Resources
Our code on GitHub
Neural Networks and Deep Learning, by Michael Nielsen
TensorFlow Tutorials, from the TensorFlow website
Reinforcement Learning, an Introduction, by Richard S. Sutton
Gym by OpenAI, toolkit for reinforcement learning
30
![Page 31: Reinforcement Learning in Telescope Scheduling](https://reader031.fdocuments.net/reader031/viewer/2022013012/61cf8f6f63298629f320bfb5/html5/thumbnails/31.jpg)
Special Thanks
Jim Annis: SCD Scientist
Tom Carter: COD, QuarkNet Sponsor
George Dzuricsko: QuarkNet Teacher
Joshua Einstein-Curtis: AD Engineer
Angela Fava: ND Scientist, QuarkNet Mentor
Eric Neilsen: SCD Scientist
Brian Nord: SCD Scientist, Project Mentor
Gabe Perdue: SCD Scientist
31