Introduction to Machine Learning 2 - CERN
Transcript of Introduction to Machine Learning 2 - CERN
![Page 1: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/1.jpg)
Nov., 2018
D. Ratner,
SLAC National Accelerator Laboratory
Introduction to
Machine Learning 2
![Page 2: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/2.jpg)
Unsupervised learning
Supervised learning: X, y
Unsupervised learning: X
What can we hope to accomplish?
1. Clustering (classification)
2. Decomposition (e.g. “cocktail party problem”, species
identification)
3. Anomaly/breakout detection (e.g. fault detection/prediction)
What can be accomplished without labels?
Cat Dog
4. Generation (e.g. creating new examples within a class)
![Page 3: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/3.jpg)
Unsupervised learning
http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
Clustering: Divide x into k categories
K-means
K-means algorithm:
a. Pick ‘k’ random centroids
b. Loop until convergence {
1. Assign examples to nearest
centroid
2. Update centroids to mean of
clusters
}
What can be accomplished without labels?
See also: Hierarchical clustering,
DBSCAN, etc…
![Page 4: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/4.jpg)
Unsupervised learning
Time series data: Anomaly/Breakout/Changepoint Detection
Anomaly detection:
identify points that are statistical
outliers from a distribution
Breakout/Changepoint detection:
Find point in time at which
distribution changed X Y
PyAstronomy: Generalized ESD (GESD)
(Available from pip install)
Student’s
t-test
![Page 5: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/5.jpg)
Unsupervised learning
Generating new data
Deep dreaming of dogs
If you train a network to
recognize dogs…
…it will hallucinate dogs
Style transfer
Gatys, et al.
Unsupervised learning with neural networks:
train a model to generate new examples based on training set
![Page 6: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/6.jpg)
Unsupervised learning
Generating new data
Generative Adversarial Network (GAN)
Training Set
Generator
Discriminator
Real
Fake
NoiseCross entropy (log loss)
-
![Page 7: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/7.jpg)
r = p =
Reinforcement learning
Machine Learning for Games
AlphaGo
Third category: partial supervision
e.g. when playing a game, will not
have a known label for every position,
but will know who wins at the end
States: s
Actions: a
Transition probability: p
Rewards: r
https://en.wikipedia.org/wiki/Reinforcement_learning
Goal is to solve for a“policy”:
i.e. optimal action as, given state s
![Page 8: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/8.jpg)
Applications to Accelerators
But is it useful??
Look at examples for Free Electron Lasers:
1. Computer vision to process screens
2. Neural networks to solve inverse problems
3. GANs to augment data sets
4. Breakout detection for fault recovery
5. Bayesian optimization and RL for online tuning
6. Regularization and convex optimization to analyze data
![Page 9: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/9.jpg)
Supervised Learning:
Data Analysis – Pulse reconstruction
Computer vision
Aphex34 https://commons.wikimedia.org/w/index.php?curid=45679374
e-
en
erg
y (g)
Time (fs)
XTCAV
Longitudinal phase space of an electron beam for an XFEL
Stanford CS231n
![Page 10: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/10.jpg)
Supervised Learning:
Data Analysis – Pulse reconstruction
Computer vision
Stanford CS231n
D. Mendez
Aphex34 https://commons.wikimedia.org/w/index.php?curid=45679374
![Page 11: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/11.jpg)
Po
wer
(W)
Time (fs)
Supervised Learning:
Data Analysis – Pulse reconstruction
Xinyu Ren
XTCAV AnalysisBefore lasing
After lasing
e-
en
erg
y (g)
e-
en
erg
y (g)
CNN prediction
XTCAV
time
energ
y
![Page 12: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/12.jpg)
Supervised Learning:
Data Analysis – Pulse reconstruction
Full beam reconstruction
12
Measure amplitude of power/spectrum: can I recover phase?
Measure
Power
Measure
Spectral
power
Guess
Field
F(t),F(w)
Measure
Power
Measure
Spectral
power
Hidden layers
Output
Input
Guess
Field
F(t),F(w)
…106-109 times
Pulse #1
Pulse #n
Only solve once!
![Page 13: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/13.jpg)
groundtruthprediction
groundtruth
prediction
Amplitude Phase
Time (fs)
Supervised Learning:
Data Analysis – Pulse reconstruction
Xiao Zhang
Full beam reconstruction
13
Measure amplitude of power/spectrum: can I recover phase?
Measure
Power
Measure
Spectral
power
Hidden layers
Output
Input
Guess
Field
F(t),F(w)
Time (fs)
Only solve once!
![Page 14: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/14.jpg)
Unsupervised learning
Surrogate Models – FEL simulations
Genesis:
~1000 cpu-sec
GAN (neural net):
~0.001 gpu-sec
Generative adversarial network (GAN)
X. Ren
LCLS users need 100k to 1M pulses to prepare for beamtime
1 billion cpu-hours!
Conditional GAN (CGAN):
provide knob to control parameters
Electron
energy
![Page 15: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/15.jpg)
15
Unsupervised Learning:
Anomaly/Breakout/Changepoint detection
1. Alarm handling (e.g. drifting temperatures)
2. Identification of anomalous conditions (e.g. shorted quadrupole magnet)
3. Machine configuration setup (e.g. global optimization)
LCLS Temperature monitor (timing system)
N. Norvell
Breakout Detection
![Page 16: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/16.jpg)
16
Unsupervised Learning:
Anomaly/Breakout/Changepoint detectionF
EL
Pu
lse
En
erg
y
Cath
od
e Q
E zoom
Cathode quantum efficiency drop caused hours of downtime.
D. Sanzone
Breakout Detection
Median statistics
avoid anomalies
![Page 17: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/17.jpg)
Supervised Learning:
Optimization – Online tuning
Online tuning:• Twice daily, ~500 of hours/year
• A single task, quadrupole tuning, required 1 hour/day
Ocelot
![Page 18: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/18.jpg)
Joe Duris, Mitch McIntire
Supervised Learning:
Optimization – Bayesian optimization
Model-based optimization
Gradient optimizer
Advantage 1: Balance “exploitation vs. exploration”
Find global maximum
![Page 19: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/19.jpg)
Supervised Learning:
Optimization – Bayesian optimization
Model-based optimization
Advantage 1: Balance “exploitation vs. exploration”
Acquisition pointAcquisition pointAcquisition pointAcquisition pointAcquisition pointAcquisition pointAcquisition point Acquisition pointAcquisition point
Acquisition
function
Ground truth
Posterior
Joe Duris, Mitch McIntire
Find global maximum
![Page 20: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/20.jpg)
Acquisition
function
Ground truth
Posterior
20
Measured
Modeled
FEL vs quads
Model-based optimization
Advantage 2: Model can incorporate physics, experience
Learn from data, simulations
Supervised Learning:
Optimization – Bayesian optimization
X-focus Quad
Y-f
ocu
s Q
ua
d
![Page 21: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/21.jpg)
21
Supervised Learning:
Optimization – Bayesian optimization
Gaussian process: instance based learning method
Kernel (covariance):
observations
new point
prior mean
new point
to predict
taken from M. Ebner, GP for Regression
![Page 22: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/22.jpg)
22
Supervised Learning:
Optimization – Bayesian optimization
Gaussian process: instance based learning method
observations
new point
prior mean
new point
to predict
Prediction of new point:
Variance of new point:
taken from M. Ebner, GP for Regression
Kernel (covariance):
![Page 23: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/23.jpg)
23
Supervised Learning:
Optimization – Bayesian optimization
Gaussian process: instance based learning method
observations
new point
prior mean
new point
to predict
Acquisition
function:
Kernel (covariance):
![Page 24: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/24.jpg)
24
Supervised Learning:
Optimization – Bayesian optimization
Model knows history makes educated guesses where to explore
Example: tuning quadrupoles from noise
![Page 25: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/25.jpg)
X-ray energy
Experiment: Taper profile
Treat optimization like a game: FEL power is the score
Juhao Wu25
AlphaGo
LCLS Experiment: 5.5 KeV Self-seeding FEL,
Zig-zag doubles power from continuous profile
Reinforcement Learning:
Optimization – Online tuning
![Page 26: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/26.jpg)
Supervised Learning:
Statistical methods for data analysis
Ghost Imaging / Single Pixel Camera
Riddle: How can I take a picture with a spectrometer?
Answer: Have a friend with a flashlight
26
![Page 27: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/27.jpg)
Supervised Learning:
Statistical methods for data analysis
Target
reconstruction
target
B = [ 5037, 4783, 4891, 5940, … ]
Random patterns
Images, I
Ghost Imaging / Single Pixel Camera
=B A x
27
![Page 28: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/28.jpg)
Supervised Learning:
Statistical methods for data analysis
Compressive sensing + ghost imaging
Compressive ghost imaging
28
Compressive sensing concept:
Why record 660k points if only 39k needed?
Wikimedia
![Page 29: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/29.jpg)
Supervised Learning:
Statistical methods for data analysis
29S. Li200 images pretty good!
Target
Transmission at
camera
Experimental Results
![Page 30: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/30.jpg)
LCLS drive laser
Don’t need DMD:
exploit natural variation,
jitter of drive laser
Supervised Learning:
Statistical methods for data analysis
Siqi Li
Example application: photocathode quantum efficiency
30
DMD
Cathode
laserCathode
target
DMD modulation
Ground truth Reconstruction
![Page 31: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/31.jpg)
Pump
X-ray
Time
Probe
X-raye-
Sample, CO
As system (CO) evolves after absorbing X-ray, electron energies change
A. Picon
31
Supervised Learning:
Statistical methods for data analysis
Delay (fsec)
Ele
ctr
on e
nerg
y (
eV
)
![Page 32: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/32.jpg)
As system (CO) evolves after absorbing X-ray, electron energies change
Goal: can we recover R from M? 32
Supervised Learning:
Statistical methods for data analysis
A. Picon
Delay (fsec)
Ele
ctr
on e
nerg
y (
eV
)
![Page 33: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/33.jpg)
Supervised Learning:
Statistical methods for data analysis
33
Solving for the response, R(d,E):
Optimization formalism:
![Page 34: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/34.jpg)
L . Rokach
Conclusion
Some personal opinions on ML for accelerators
1. No data no learning: #1 task is building data
2. Noise, outliers, dropped data will dominate performance: #2 task is cleaning
3. Deep learning is the dream… but time spent thinking about physics is well-
rewarded.
![Page 35: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/35.jpg)
Conclusion
Applications for XFELs (and other accelerators)
1. Online tuning: transverse matching, longitudinal phase space, X-ray
spectrum, emittance minimization, etc.)
2. Surrogate modeling: efficient machine design, user support, predictive
control
3. Data analysis: X-ray pulse reconstructions, electron parameters, user
experiments
4. Prognostics: Fault prediction, fault recovery, identification of anomalous
conditions
When is ML useful?
• Tasks that humans can do, but hard to describe…
• When data is abundant and well labeled
• When simple algorithms fails
• When the goal is worth the effort
ML should be your last resort!
![Page 36: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/36.jpg)
Conclusion
Thanks for your attention!
And thanks to the people who did the work
shown here:
E. Cropp, J. Duris, A. Edelen, K. Kabra, D.
Kennedy, T. J. Lane, S. Li, T. Maxwell, P.
Musumeci, X. Ren, J. Wu, X. Zhang
![Page 37: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/37.jpg)
Data Analysis:
Statistical methods for data analysis
Siqi Li
Experimental Results
Ground truthADMM
reconstructionTarget
Transmission at
camera
target
DMD
Cathode
laser
Cathode
37
![Page 38: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/38.jpg)
J. Qiang
measurement simulation
Surrogate Models: Accelerator models
High fidelity physics simulations are remarkable:
…but also slow. (e.g. hours on NERSC)
How can we best support design of a new machine?
LCLS-II simulations, Y. Ding
LCLS microbunching
instability
38
![Page 39: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/39.jpg)
Surrogate Models: Accelerator models
39
Emma, Edelen, et
al. in preparation
• Predict XTCAV image from other diagnostic output or upstream
machine settings to create a non-destructive virtual diagnostic
• Simulation + neural network results match well for FACET-II (see left)
• Small study with LCLS machine data and XTCAV images (scan of
L1S phase and BC2 peak current at 13.4 GeV)
![Page 40: Introduction to Machine Learning 2 - CERN](https://reader036.fdocuments.net/reader036/viewer/2022062309/62a76a2c48b63038b0627e43/html5/thumbnails/40.jpg)
0 10 20 300
1
2
3
4
5
6
Time (fs)
Po
wer
(arb
. u
nits)
TDGI Example
Signal determined by probability of two photons separated by time d
40
0 10 20 30 400
0.2
0.4
0.6
0.8
1
Delay (fs)
Au
toco
rre
lation
(a
rb.
units)
d1
d2
d3