Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...

German Research Center for Artificial Intelligence (DFKI)

ALL RIGHTS RESERVED. No part of this work may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system without expressed written permission from the authors.

Understanding Deep Networks through

Properties of the Input Space

GTC 2019

By: Sebastian Palacio

NeuralNetwork

Deep Neural Networks WorkDUH!

NeuralNetwork

...yet they can be easily tricked

NeuralNetwork

Filter

Harden

Safeguarding becomesa “thing”

NeuralNetwork

Filter

Harden

Cat and Mouse Chase5

Modify the Network

How do Attacks Work?input

features features features output

Modify the Network

Modify the Input7

Modify the Input8

features features features

Pass input through the network: f(x)1.

features features features

Pass input through the network: f(x)

Compute sensitivity: f’(x)

Modify the Input11

Pass input through the network: f(x)

Compute sensitivity: f’(x)

Modify input according to sensitivity.

Gradients are good estimators of the input’s space distribution

INPUT gradient

Perturbation

1. Reconstruction:How do Attacks Work?

2. Classification:How do Attacks Work?

Idea against attacks!

Give me Gradients!

Reconstruction Gradients

Classification Gradients

AVOID THIS16

Hypothesis: bigger problems are better

Reconstruction Gradients

Classification GradientsMNIST

ImageNet17

YFCC100mSegNet +

ImageNet

...so we tried

Perceptually similar!

How to Compare:

ResNet-50SegNet

Noise Level

Model Accuracy

Targeted Vs Untargeted Attacks:

0.2𝚫y

Untargeted:Push the true class down until any other wins.

Targeted:Push a randomly selected target up until it wins.

Quick, pick one at random!

Input Input

Adversarial <-> Non adversarialHYPOTHESIS

PerturbationInput Gradients

Input Gradients Perturbation

Adversaries fighting an attack-agnostic Autoencoder on Imagenet

Baseline (no attack)

Classifier only (no defense)

Classifier with Autoencoder

ALP for targeted PGD (Kannan et al. 2018)

ALP for untargeted PGD (Engstrom et al. 2018)

Simple attack

Loop with clipping

Amount of noise

Same but in a loop

Fancy optimization

Adversaries fighting an attack-agnostic Autoencoder on Imagenet

ALP for targeted PGD (Kannan et al. 2018)

ALP for untargeted PGD (Engstrom et al. 2018)

Simple attack

Loop with clipping

Amount of noise

Same but in a loop

Fancy optimization

74.0271.19

No AE With AE

Structural Gradients Obstruct Gradient-Based Attacks*

*as long as structure is not tightly related to semantics

A closer look at adversarial noise

Reality:Non structural changes

Uninformative dimensions!

Expectation:Structural Change

Effects of extra dimensions

From 2D... ...to 3D

Semantic information on x,y

z-axis is uninformative

Decision Boundaries

Expected Boundary● Z-axis does not interfere● Perturbations need to go in

the direction of the training samples

Vulnerable Boundary● Small perturbations along the

“extra” dimension change the predicted class!

Vulnerable Boundary

● Class boundary extends over the domain of other classes

Extrapolating...

1D 2D 3D 784D... ...

Preserve only the information that is useful for classification

Step 3 Fine-tune the decoder with gradients from the classifier

train a classifier

Step 1

ImageNet

train an autoencoder

Step 2 YFCC100M

Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018) 36

Accuracy on ResNet-5074.02

No AE With AE

74.9474.02

With Fine-tuned AE

-2.83pp +0.92pp

Palacio, Sebastian et al. "What do Deep Networks Like to See?." CVPR (2018)

Looking up Reconstructions

Experiments with S2SNets (on Imagenet)39

Classifier with S2SNet

● Consistent offset (projection of unnecessary input signal)

● Not bounded to any specific adversarial attack.

● Zero compromise for clean images (no attack)

With S2SNet

So, did we solve adversarial attacks?

● Function is a proof of concept for a defense principle:○ Gradients are stable but convey information

that is less effective for adversarial attacks.○ No gradient obfuscation :)

● Content dependent.

● Still vulnerable under some specific but common threat conditions.

Manifold exploration is possible through input gradients.They express different things depending on the task

If structural info != semantic info, autoencoders can help with adversarial attacks.

Projection of redundant dimensions can be achieved via S2SNets

High dimensionality of the input space induces (exploitable) irregularities for decision boundaries

It’s a sound design principle against gradient-based attacks

Enhancing robustness against adversarial attacks!

Thank you!

DeepLearning

Sebastian Palaciosebastian.palacio@dfki.de@spalaciob

“Adversarial Defense using Structure-to-Signal Autoencoders”https://arxiv.org/abs/1803.07994

In collaboration with:● Joachim Folz (equal contribution)● Jörn Hees● Federico Raue

Supervisor:● Andreas Dengel

DFKI Kaiserslautern

Some images have been taken from www.pexels.com and www.openclipart.org

Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...

Documents

Transcript of Understanding Deep Networks through Properties of the Input … · 2019-03-29 · Understanding...

Deep near-infrared surface photometry and properties of Local Volume … · 2018-07-16 · Deep near-infrared surface photometry and properties of Local Volume dwarf irregular galaxies

Dissector: Input Validation for Deep Learning Applications by Crossing-layer … · 2020-06-30 · Dissector: Input Validation for Deep Learning Applications by Crossing-layer Dissection

Retrievals of ice cloud microphysical properties of deep ...weather.ou.edu/~chomeyer/assets/tian-et-al-2016.pdf · Retrievals of ice cloud microphysical properties of deep convective

Provable approximation properties for deep neural networkscpsc.yale.edu/sites/default/files/files/tr1513(2).pdf · Provable approximation properties for deep neural networks Uri Shahamy,

Lipschitz Properties for Deep Convolutional Networksrvbalan/PAPERS/MyPapers/LipConvNet...Lipschitz Properties for Deep Convolutional Networks Radu Balan , Maneesh Singh y, Dongmian

Deep Blue Retrievals of Asian Aerosol Properties During ...€¦ · Deep Blue Retrievals of Asian Aerosol Properties During ACE-Asia N. Christina Hsu, Si-Chee Tsay, Michael D. King,

Attention-based Multi-Input Deep Learning Architecture for … · 2019-09-18 · Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application

Hexagonal boron nitride: optical properties in the deep ...

Deep Convolutional Neural Networks on Multichannel Time ... · Deep Convolutional Neural Networks On Multichannel Time Series ... as a common input for the neural network ... Deep

Influence of Heat Input on Mechanical Properties and Microstructure of ... · PDF fileInfluence of Heat Input on Mechanical Properties and Microstructure of Austenitic 202 grade Stainless

Sedimentologic properties of mud-dominated deep-shelf ...

Deep Eutectic Solvents- Syntheses, Properties and Applications

Push-Net: Deep Planar Pushing for Objects with Unknown ... · jects of unknown physical properties for re-positioning and re-orientation, using only visual camera images as input.

Effect of Composition and Energy Input on Structure and Properties

Provable approximation properties for deep neural networks2).pdfProvable approximation properties for deep neural networks Uri Shahamy, Alexander Cloningerzand Ronald R. Coifmanz,

Malicious Input Detection for Deep Neural Networks · 2017. 6. 28. · Institute for Theoretical Information Technology Malicious Input Detection for Deep Neural Networks Emilio Balda

Effect of heat input on the mechanical properties of butt ...

Deep Eutectic Solvents Physicochemical Properties

Theory of Deep Learning III: Generalization Properties of … · CBMM Memo No. 067 July 19, 2017 Theory of Deep Learning III: Generalization Properties of SGD by

Towards Explaining Deep Learning: Asymptotic Properties of ...