Using Bayesian Neural Networks for UQ of Hyperspectral ... ·...

Sandia National Laboratories is a multimis-sion laboratory managed and operated byNational Technology and Engineering So-lutions of Sandia, LLC., a wholly owned

subsidiary of Honeywell International, Inc.,for the U.S. Department of Energy’s National

Nuclear Security Administration under contractDE-NA-0003525. SAND NO. 2019-3409 C

Using Bayesian Neural Networks for UQof Hyperspectral Image Target Detection

AUTHORS

Daniel Ries, Braden Smith,Josh Zollweg,Lauren Hund

Outline

1. What is a neural network?2. What is a Bayesian neural network (BNN)?3. Estimating BNNs with variational inference4. Hyperspectral images and Megascene5. BNN on Megascene6. Concluding remarks

What is a neural network?

Logistic Regression as a Neural Network

Suppose Y1, ...,Yn are binary random variables, and x1, ...,xn arep-length vectors

If want to estimate P(Yi = 1|xi), we could use logistic regression:

Yiiid∼ Bernoulli(πi)

log(

πi1− πi

)= x′

iβ

Logistic Regression as a Neural Network

𝑥2

.

.

.

𝑥1

𝑥𝑝

𝑃 𝑌 = 1 𝒙𝑌

𝑖=1

𝑝

𝑥𝑖𝑤𝑖 + 𝑏

= 𝑧

𝑔(𝜋)𝑓 𝑧 = 𝜋

= 𝜋

A Simple Neural Network

𝑥2

.

.

.

𝑥1

𝑥𝑝


𝑖=1

𝑝

𝑥𝑖𝑤1𝑖 + 𝑏1

= 𝑧11

𝑖=1

𝑝


= 𝑧12

𝑖=1

2

𝑣1𝑖𝑤3𝑖 + 𝑏3

= 𝑧21

𝑓2 𝑧21 = 𝜋 𝑔(𝜋)

𝑓1 𝑧11 = 𝑣11

𝑓1 𝑧12 = 𝑣12= 𝜋

More Complicated Neural Network

𝑥2

.

.

.

𝑥1

𝑥𝑝

𝑃(𝑌 = 1|𝒙) 𝑌

𝑖=1

𝑝


= 𝑧11

𝑖=1

𝑝


= 𝑧12

𝑖=1

𝑝


= 𝑧13

𝑖=1

𝑝

𝑥𝑖𝑤1𝐾1𝑖 + 𝑏1𝐾1

= 𝑧1𝐾1

𝑖=1

𝐾1

𝑣1𝑖𝑤21𝑖 + 𝑏21

= 𝑧21

𝑖=1

𝐾1

𝑣1𝑖𝑤22𝑖 + 𝑏22

= 𝑧22

𝑖=1

𝐾1

𝑣1𝑖𝑤23𝑖 + 𝑏23

= 𝑧23

𝑖=1

𝐾1

𝑣1𝑖𝑤2𝐾2𝑖 + 𝑏2𝐾2

= 𝑧2𝐾2

𝑖=1

𝐾2

𝑣𝐿𝑖𝑤𝐿𝑖 + 𝑏𝐿

= 𝑧𝐿

.

.

.

.

.

.

𝑓 𝑧𝐿

𝑓1 𝑧1𝑖= 𝑣1𝑖

…

𝑓𝐿 𝑧𝐿𝑖= 𝑣𝐿𝑖

= 𝜋

𝑔(𝜋)

What is a Bayesian Neural Network?

For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)

For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)

For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.

For those of you who “do neither”:You are already “resting your eyes.”

An Example BNN

Yi ∼ Bernoulli(πi)

πi = f2(z21)

= f2

2∑j=1

f1

( p∑k=1

xkwjk + bj

)w2j + b3

wjk

iid∼ N (0, 1), ∀j, k

bliid∼ N (0, 1), l = 1, 2, 3

𝑥2

.

.

.

𝑥1

𝑥𝑝


𝑖=1

𝑝


= 𝑧11

𝑖=1

𝑝


= 𝑧12

𝑖=1

2

𝑣1𝑖𝑤3𝑖 + 𝑏3

= 𝑧21

𝑓2 𝑧21 = 𝜋 𝑔(𝜋)

𝑓1 𝑧11 = 𝑣11

𝑓1 𝑧12 = 𝑣12= 𝜋

Estimating (or Training) BNNs

Bayesians love to sample their posterior distributions via MCMC

But this can be slowwww

Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:

argminq∗

KL(q∗(θ)||p(w,b|y))

= argminq∗

KL(q∗(θ)||p(w,b, y))

• Put restrictions on set of q∗

• Result is a loss function we can do gradient descent as usualon


Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww


argminq∗


= argminq∗






Enter Variational Inference:

• Instead approximate posterior with q(θ) by solving:

argminq∗


= argminq∗







argminq∗


= argminq∗




Hyperspectral Images (HSI)

• Airborne spectrometers construct a (x, y, z) tensor• x and y describe the spatial dimension• z describes the spectra at a single pixel (x, y)

• HSI Target detection is where specific materials are identifiedby their reflectance

• The target object might be smaller than the projection of apixel on the ground

Megascene

• Simulated HSI scene using DIRSIG model• Targets inserted using paint reflectanceand bi-directional reflectancedistribution function data from NEFDS

• 3 environments, and 3 times of day for 9total scenes

• 125 targets placed randomly throughouteach scene

• Targets range from large to subpixel(radius of 0.1m to 4m), size distributednearly uniform

• Some targets only partially visible

BNN on Megascene

• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output

• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200

• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow

BNN on Megascene

• Architecture: Input-10-5-2-Output

• Activations: Hyperbolic tan between hidden layers, inverselogit for output



BNN on Megascene




BNN on Megascene


• Priors: Standard normal on all weights and biases

• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200


BNN on Megascene


• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum

• Training Data: Left half of mid latitude summer environmentat 1200


BNN on Megascene




BNN on Megascene



• Test Data: Right half of all environments at all times

• Software: Edward and Tensorflow

BNN on Megascene




Test Set Performance

Full test set. High confidence set.

High confidence set: 80% credible interval forπ ∈ (0, 0.2) ∪ (0.8, 1)

Moving Forward

There is still a lot of work to be done on BNNs

• Explore deeper and wider, more complicated (convolutional)architectures

• Full tuning of hyperparameters• Explore effects of prior• Assess coverage of credible intervals

Thank you for listening!

[email protected]


BNN

Stan-dardNN

Full test set High confidence set

Abundance vs Probability

Using Bayesian Neural Networks for UQ of Hyperspectral ... ·...

Documents

Transcript of Using Bayesian Neural Networks for UQ of Hyperspectral ... ·...