Smoothed Inference for Adversarial Robustness · Smoothed Inference for Improving Adversarial...

Smoothed Inference for Improving AdversarialRobustness

Yaniv Nemcovsky?1, Evgenii Zheltonozhskii?1[0000−0002−5400−9321], ChaimBaskin?1, Brian Chmiel?1,2, Maxim Fishman1,2, Alex M. Bronstein1, and Avi

Mendelson1

1 Technion Israel Institute of Technology2 Intel Artificial Intelligence Products Group (AIPG)

[email protected]; [email protected];

[email protected]; [email protected];

[email protected]; [email protected]

Abstract. Deep neural networks are known to be vulnerable to adver-sarial attacks. Current methods of defense from such attacks are basedon either implicit or explicit regularization, e.g., adversarial training.Randomized smoothing, the averaging of the classifier outputs over arandom distribution centered in the sample, has been shown to guaranteethe performance of a classifier subject to bounded perturbations of theinput. In this work, we study the application of randomized smoothing asa way to improve performance on unperturbed data as well as to increaserobustness to adversarial attacks. The proposed technique can be appliedon top of any existing adversarial defense, but works particularly wellwith the randomized approaches. We examine its performance on com-mon white-box (PGD) and black-box (transfer and NAttack) attacks onCIFAR-10 and CIFAR-100, substantially outperforming previous art formost scenarios and comparable on others. For example, we achieve 60.4%accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperform-ing previous art by 11.7%. Since our method is based on sampling, it lendsitself well for trading-off between the model inference complexity and itsperformance. A reference implementation of the proposed techniques isprovided at the paper repository.

Keywords: Adversarial defenses, Randomized smoothing, CNN

1 Introduction

Deep neural networks (DNNs) are showing spectacular performance in a varietyof computer vision tasks, but at the same time are susceptible to adversarialexamples – small perturbations that alter the output of the network [16, 37]. Sincethe initial discovery of this phenomenon in 2013, increasingly stronger defenses[16, 18, 23, 29, 35, 45, 49, 50] and counterattacks [1, 7, 10, 16, 25, 29, 31, 32]were proposed in the literature. Adversarial attacks have also been shown to

? Equal contribution.

arX

iv:1

911.

0719

8v2

[cs

.LG

] 1

6 M

ar 2

020

mailto:[email protected]






https://github.com/yanemcovsky/SIAM

2 Y. Nemcovsky et al.

occur in tasks beyond image classification where they were first discovered: inreal-life object recognition [2, 6, 47], object detection [38], natural languageprocessing [9, 12, 21], reinforcement learning [14], speech-to-text [8], and pointcloud classification [43], just to mention a few. Moreover, the adversarial attackscan be used to improve the performance of the DNNs on unperturbed data[15, 36, 44].

Understanding the root cause of adversarial examples, how they are createdand how we can detect and prevent such attacks, is at the center of many researchworks. Gilmer et al. [13] argued that adversarial examples are an inevitableproperty of high-dimensional data manifolds rather than a weakness of specificmodels. In view of this, the true goal of an adversarial defense is not to get rid ofadversarial examples, but rather to make their search hard.

Current defense methods are based on either implicit or explicit regularization.Explicit regularization methods aim to increase the performance under adversarialattack by directly incorporating a suitable term into the loss of the network duringtraining, usually by incorporating adversarial examples for the dataset used inthe training process. In contrast, implicit regularization methods that do notchange the objective, such as variational dropout [24], seek to train the networkto be robust against any perturbations without taking into account adversarialexamples. In particular, adding randomness to the network can be especiallysuccessful [4, 18, 27], since information acquired from previous runs cannot bedirectly applied to a current run. Another way to make use of randomness toimprove classifier robustness is randomized smoothing [11]: averaging of theoutputs of the classifier over some random distribution centered in the inputdata point. The effects of these three approaches (explicit regularization, implicitregularization, and smoothing) do not necessarily line up with or contradict eachother. Thus, one could use a combination of the three, when devising adversarialdefenses.

Previous works that discuss randomized smoothing do so exclusively in thecontext of certified robustness [11, 34]. In contrast, we consider smoothing as aviable method to increase both the performance and adversarial robustness ofthe model. We show this effect on top of adversarial regularization methods –both implicit [18, 51] and explicit [49]. We discuss several smoothed inferencemethods, and ways to optimize a pre-trained adversarial model to improve theaccuracy of the smoothed classifier.

Contributions. This paper makes the following contributions. Firstly, we studythe effect of randomized smoothing on empirical accuracy of the classifiers, bothon perturbed and clean data. We show that even for a small amount of samples,accuracy increases in both cases. In addition, since the performance grows with thesample size, smoothing introduces a trade-off between inference time complexityand accuracy.

Secondly, we show that the smoothing can be applied along with any adver-sarial defense, and that it is especially efficient for methods based on implicitregularization by noise injection, such as PNI [18].

Smoothed Inference for Adversarial Robustness 3

Lastly, we propose a new family of attacks based on smoothing and demon-strate their advantage for adversarial training over conventional PGD.

The rest of the paper is organized as follows: Section 2 reviews the relatedwork, Section 3 describes our proposed method, Section 4 describes integrationof the method with adversarial training, Section 5 provides the experimentalresults, and Section 6 concludes the paper.

2 Related work

In this section, we briefly review the notions of white and black box attacks anddescribe the existing approaches for adversarial defense and certified defense.

Adversarial attacks. Adversarial attacks were first proposed by Szegedy et al.[37], who noted that it was possible to use the gradients of a neural networkto discover small perturbations of the input that drastically change its output.Moreover, it is usually possible to change the prediction to a particular class, i.e.,perform a targeted attack. It is common to divide adversarial attacks into twoclasses: white box attacks, which have access to the internals of the model (inparticular, its gradients); and black box attacks, which have access only to theoutput of the model for a given input.

White box attacks. One of the first practical white box adversarial attacks is thefast gradient sign method (FGSM) [16], which utilizes the (normalized) sign ofthe gradient as an adversarial perturbation:

x = x+ ε · sign(∇xL), (1)

where x and x denote the clean and perturbed inputs, respectively, L is theloss function of the network, which the attacker tries to maximize, and ε is thedesired attack strength.

Madry et al. [29] proposed using iterative optimization – specifically, projectedgradient ascent – to find stronger adversarial examples:

xk = ΠB(x,ε)

[xk−1 + α · sign(∇xL)

]. (2)

The projection operator Π restricts the perturbed input to be in some vicinityB(x, ε) of the unperturbed input. The iterations are initialized with x0 = x. Thisattack, referred to as PGD in the literature, is one of the most powerful attacksknown to date. Gowal et al. [17] further improved it by running a targeted PGDover a set of classes and choosing the best performing example, showing a notableimprovement in different settings.

C&W [7] is a family of attacks using other norm constraints, in particular,the L0, L2 and L∞ norms. They solve a minimization problem

min‖δ‖+ cL(x+ δ). (3)


In contrast to FGSM and PGD, which have a strict bound on the attack norm,the C&W attack can work in unbounded settings, seeking a minimal perturbationthat achieves the misclassification.

Rony et al. [32] proposed to improve this method via decoupling directionand norm (DDN) optimization, motivated by the fact that finding the adversarialexample in a predefined region is a simpler task. The attack iteratively changesthe norm depending on the success of a previous step:

xk = ΠB(x,εk)

[xk−1 + α ·∇xL

](4)

εk = (1 + s · γ)εk−1, (5)

where s = −1 if xk−1 is misclassified and s = 1 otherwise.

Black box attacks. The simplest way to attack a model F without accessing itsgradients is to train a substitute model F ′ to predict the outputs of F [31] andthen use its gradients to apply any of the available white box attacks. Liu et al.[28] extended this idea to transferring the adversarial examples from one model(or ensemble of models) to another, not necessary distilled one from another.

Other works proposed alternative methods for estimating the gradient. ZOO[10] made a numerical estimation, NATTACK [25] uses natural evolution strate-gies [41], and BayesOpt [33] employed Gaussian processes. A detailed review ofthese strategies is beyond the scope of this paper.

Adversarial defenses. Szegedy et al. [37] proposed to generate adversarial ex-amples during training and use them for training adversarially robust models,optimizing the loss

Ladv(x, y) = (1− w) · LCE(f(x), y) + w · LCE(f(x), y), (6)

where LCE is the cross-entropy loss, f is the classifier, x is a training instance withthe label y, x is the corresponding adversarial example, and w is a hyperparameterusually set to w = 0.5.

This method is particularly convenient if the generation of adversarial exam-ples is fast [16]. When combined with stronger attacks, it provides a powerfulbaseline for adversarial defenses [29], and is often utilized as part of defenseprocedures.

Many works proposed improvements over regular adversarial training byapplying stronger attacks during the adversarial training phase. For example,Khoury and Hadfield-Menell [23] proposed to use Voronoi cells instead of ε-ballsas a possible space for adversarial examples in the training phase. Liu et al. [26]added adversarial noise to all the activations, not only to the input. Jiang et al.[20] proposed using a learning-to-learn framework, training an additional DNN togenerate adversarial examples, which is used to adversarially train the classifier,resembling GAN training. Balaji et al. [3] heuristically updated the per-imageattack strength εi, decreasing it if the attack succeeded and increasing otherwise.Wang et al. [39] proposed to gradually increase the strength of the attacks basedon a first-order stationary condition for constrained optimization.


Randomization of the neural network can be a very powerful adversarialdefense since, even if provided access to gradients, the attacker does not haveaccess to the network, but rather some randomly perturbed version thereof. Oneof the first works involving randomization [52] proposed to improve robustnessby reducing the distance between two samples differing by a normally distributedvariable with a small variance. Zhang and Liang [50] proposed to add normalnoise to the input, which was shown to reduce the Kullback-Leibler divergencebetween the clean and the adversarially-perturbed inputs.

TRADES [49] uses a batch of randomly perturbed inputs to better cover theneighbourhood of the point. Together with replacing LCE(f(x), y) in Eq. (6)by LCE(f(x), f(x)), this provided a significant improvement over standard ad-versarial training. MART [40] further improves the defense performance bydifferentiating the misclassified and correctly classified examples.

Another random-based defense based on adversarial training is paramet-ric noise injection (PNI) [18]. PNI improves network robustness by injectingGaussian noise into parameters (weights or activations) with learned variance.Zheltonozhskii et al. [51] extended the idea of PNI by proposing to inject low-rankmultivariate Gaussian noise instead of independent noise.

Athalye et al. [1] observed that many defenses do not improve robustnessof the defended network but rather obfuscate gradients, making gradient-basedoptimization methods less effective. They identified common properties of obfus-cated gradients and organized them in a checklist. In addition, they proposedtechniques to overcome common instances of obfuscated gradients: in particular,approximating non-differentiable functions with a differentiable substitute andusing averaging on the randomized ones.

Randomized smoothing [11] is a method for increasing the robustness of aclassifier by averaging its outputs over some random distribution in the inputspace centered at the input sample. Cohen et al. [11] has shown that randomizedsmoothing is useful for certification of the classifier [42] – that is, proving itsperformance under norm-bounded input perturbations. In particular, they haveshown a tight bound on L2 certification using randomized smoothing withnormal distribution. Consequently, Salman et al. [34] used the smoothed classifierto generate stronger “smoothed” adversarial attacks, and utilized them foradversarial training. Such training allows the generation of a more accurate baseclassifier, and as a result, improves certified robustness properties.

3 Randomized smoothing as an adversarial defense

A smooth classifier fθ is a map assigning an input x the class label that the baseclassifier fθ is most likely to return for x under random perturbation η,

fθ(x) = arg maxy

P (fθ(x+ η) = y) = arg maxy

ˆ

Rn

dη p(η)1[fθ(x+ η) = y], (7)

where η is some random vector and p is its density function. Since the integral (7)is intractable, it should be approximated with the Monte Carlo method, i.e., by


averaging over a number of points from the distribution sampled independently.We denote by M the number of samples used for such an approximation.

In its most general form, the smoothed model output can be written as

fθ(x) = arg maxy

ˆ

Rn

dη pη(η)V (piy(x)) ≈ arg maxy

M∑i=1

V (piy(x)), (8)

where V is some voting function and piy(x) is the probability of class y beingpredicted for the i-th sample x+ ηi.

Since the smoothing is independent of the architecture of the base classifierfθ, we can take any (robust) model and test the overall improvement providedby smoothing. In contrast to previous works that also employed randomizedsmoothing [11, 34], we use it to improve the empirical adversarial robustnessrather than the certified one. While certification is a very strong and desiredguarantee, researchers were unable to achieve practical `∞ certification radii andit is unclear whether this is possible at all [5, 48]. Nevertheless, smoothing canstill be employed as a practical method of improving empirical performance ofthe classifier.

In what follows, we describe three different implementations of smoothing,differing in their voting function V .

Prediction smoothing. The simplest possible way to use multiple predictions isto perform prediction voting, i.e., to output the most frequent prediction amongthe samples. In other words, we take V to be the following indicator function:

V (piy) = 1

[arg max

y′piy′ = y

], (9)

yielding the following smoothed classifier,

fθ(x) = arg maxy

M∑i=1

1

[arg max

y′fy′(x+ ηi)

], (10)

where fy′(x) denotes the predicted probability that input x will belong to classy′.

This is the randomized smoothing previously discussed by Cohen et al. [11]and Salman et al. [34] for certified robustness. In Section 5 we show that bytaking into account M predictions, the accuracy of the classifier increases onboth clean and adversarially-perturbed inputs. It is important to emphasize thatthis voting scheme only takes into account the classification of each of the Msamples, discarding the predicted probabilities of each class.

Weighed smoothing. The former method can be generalized by assigning someweight to top-k predictions. Let us denote by k(piy) the rank of class y aspredicted by fθ(x + ηi), i.e., k = 1 for the most probable class, k = 2 for the


second most probable one, and so on. Then, for example, the following choices ofV are possible:

V (piy) = 21−k(piy), or (11)

VC(piy) =

1 k(piy) = 1

C k(piy) = 2

0 otherwise.

(12)

In particular, VC expresses the dependency on the second prominent class notedby Cohen et al. [11].

Soft prediction smoothing. In this case, we calculate the expectation of probabilityfor each class and use them for prediction generation:

fθ(x) = arg maxy

M∑i=1

softmax(fy(x+ ηi)). (13)

This method was previously mentioned by Salman et al. [34] as a way toapply adversarial training to a smoothed classifier. Since the probabilities for eachclass are now differentiable, this allows the classifier to be trained end-to-end. Incontrast to prediction smoothing, we now fully take into account the predictedclass probabilities of each of the M samples. If, however, used as an adversarialdefense, soft smoothing is easier to overcome by attackers. Even if the attack isnot successful, the probability of competing classes increases, which means thateven an unsuccessful attack on a base model does affect the prediction of thesmoothed model. In Section 4.2 we describe how this kind of smoothing can beused to create stronger attacks that can be used for adversarial training.

3.1 Relation to implicit regularization

In the case of prediction smoothing (or any other case in which V is not differen-tiable), we cannot train the smoothed model directly. We, therefore, would liketo train the base model in a way that optimizes the loss of the smoothed model.To this end, we use the 0-1 loss of n training samples

L01 =

n∑i=1

`01(yi, fθ(xi)), (14)

with the pointwise terms

`01(yi, fθ(xi)) = 1− 1[yi = arg max

y∈YPη[fθ(xi + η) = y

]], (15)


minimized over the model parameters θ. Denoting for brevity Py =Pη[fθ(xi + η) = y

], we can approximate the indicator as

1

[yi = arg max

y∈YPy

]= 1

[Pyi ≥ max

y′∈Y\{yi}Py′

]≈ (16)

≈ReLU

[Pyi − max

y′∈Y\{yi}Py′

]= max

y∈Y

[Py − max

y′∈Y\{yi}Py′

], (17)

where we approximate the Heaviside function 1 with a better-behaving ReLUfunction on the interval [−1, 1]. The last equality follows from the fact thaty = y′ unless yi is the most probable class. The expression in Eq. (17) resemblesthe bound Cohen et al. [11] has suggested for the radius of certification underadversarial attacks.

We now show a relation to training the base model under perturbation,similarly denoting 1y = 1[fθ(xi + η) = y]:

`01(yi, fθ(xi)) = 1−maxy∈Y

Eη[1y − max

y′∈Y\{yi}1y′

]. (18)

Written in this form, the 0-1 loss is now amenable to Monte Carlo approximation;however, working with such a non-convex loss is problematic. We, therefore,bound the max over the expectation by the expectation over the max,

`01(yi, fθ(xi)) ≤ 1− Eη maxy∈Y

1y + Eη maxy′∈Y\{yi}

1y′ . (19)

Since the classification events are disjoint, we obtain

L01 = Eη

n∑i=1

1−∑y∈Y

1y +∑

y′∈Y\{yi}

1y′

= (20)

= Eη

[n∑i=1

1− 1yi

]= Eη

n∑i=1

`01(yi, fθ(xi)), (21)

which is the 0-1 loss of the base classifier under Gaussian perturbation. In addition,for the `-th layer, we can rewrite the network inference as

f(xi) = f2 ◦ f1(xi) = f2(x′i), (22)

where f1 denotes the first `− 1 layers and f2 stands for the rest of the network.Repeating the computation for f2 and x′i instead of f and xi shows that implicitregularization in the form of injecting Gaussian noise in any layer is beneficialfor the smoothed classifier.

Noise strength learning. Having discussed the importance of noise injection forour method, we note that choosing the variance of the noise is not a trivial task.The optimal noise is dependent on both the architecture of the network and thetask. Therefore, while considering the combination of our method with implicitregularization, we allow the variance of the noise injected into the layers to belearned as in PNI [18].


Fig. 1: Accuracy as a function of injected noise strength with different numbersof samples for a CPNI base model with prediction smoothing on clean data (left)and under PGD attack (right).

4 Training a smoothed classifier

Salman et al. [34] have shown that incorporating the smoothing into the train-ing procedure of the base classifier improves the aforementioned certificationguarantees. We also use a similar concept to improve the empirical accuracy ofthe smoothed classifier. We consider two approaches to training the smoothedclassifier: one based on implicit regularization and prediction smoothing, andanother one based on soft prediction smoothing. In both cases, we start from anadversarially pre-trained base model and fine-tune it to improve the robustnessof the smooth classifier.

4.1 Prediction smoothing training

From Eq. (21) we conclude that by injecting Gaussian noise into the base classifierlayers, we minimize the loss of the smoothed counterpart. The way the injectednoise propagates through the layers of the network, however, renders the noiseinjected into the input layer particularly important. In addition, smoothing isinherently dependent on the strength of the noise injected during the process,and we show that similar models tend to achieve the best results around specificvalues of noise strength. We, therefore, fine-tune the classifier by training undernoise injection of the same type and strength.η is not restricted to the Gaussian distributions as long as the expectations

in the derivations presented in the previous sections exist. One can furthermorecombine Gaussian noise injection with adversarial training by letting

η ∼ N (0, σ2I) +B · δB ∼ Ber(q), (23)


where δ is the adversarial attack and q is the weight of adversarial samples inthe training procedure. Such adversarial training under Gaussian perturbationsrelates to minimizing the 0-1 loss of the smoothed classifier under adversarialattack. We refer to this method as CNI-I+W (input + weight noise).

4.2 Soft smoothing training

The smoothing of the classifier gives rise to a new family of attacks targeting thesmoothed classifier rather than the base model. In the case of soft smoothing,the output of the smoothed classifier is differentiable and thus can be used asa source of gradients for white box attacks. Salman et al. [34] discussed thisfamily of attacks as a way to improve the generalization of a base model andthus improve certified accuracy of a smoothed model.

We consider adversarial training with such attacks as a way to directly optimizethe smoothed model by training the base model. We expect it to increase theadversarial robustness of both the base and the smoothed models. Specifically,we consider PGD attacks where the gradients are computed based on severalrandomized samples. In our experiments, we chose to limit the number of samplesto M = 8, since further improvement in the prediction ability is counter-balancedby the increase in the computational complexity.

Expectation PGD. The aforementioned soft smoothing was previously exploredin various ways. Kaur et al. [22] showed that targeted attacks on softly smoothedmodels have perceptually-aligned gradients, i.e., they perturb the input to make itresemble a member of a different class, even for models that were not adversariallytrained. In contrast, non-smoothed classifiers exhibit this phenomenon only inadversarially-trained models. In general, perceptual alignment of the gradients isindirect evidence of model robustness [19, 30]. We expect such a perturbation, ifstrong enough, to be able to attack even a perfect classifier. Such phenomenamight indicate that adversarial training of the model makes the adversarial attacksconverge to soft smoothing-based attacks, which use the spatial information inproximity to the data point.

An important difference between the SmoothAdvPGD attack consideredby Salman et al. [34] and Kaur et al. [22], and the expectation PGD (EPGD)attack proposed in this paper is the order of the averaging and the softmaxoperator:

gSmoothAdv(x) = softmax

(1

M

M∑i=1

fc(x+ ηi)

)vs. (24)

gEPGD(x) =

M∑i=1

softmax(fc(x+ ηi)). (25)

Notice that an EPGD attack with k PGD steps andM samples requires computingthe gradients k ×M times, making its computational cost comparable with thatof a PGD attack with k×M steps. Therefore, we compare the accuracy of PGD,


EPGD and SmoothAdvPGD [34] as a function of k ×M and find that eachhas a distinct behavior.

5 Experiments

To demonstrate the effectiveness of the proposed method, we evaluated of theproposed adversarial defenses on CIFAR-10 and CIFAR-100 under white andblack box attacks for ResNet20 and Wide-ResNet34-10. Finally, we performedan extensive ablation study of our method.

Experimental settings. We studied the performance of each of the proposedsmoothing methods (prediction, soft prediction, and weighed smoothing) overdifferent base models. We first considered implicit regularization methods andcombined them with the adversarial training proposed by Madry et al. [29]. Wetrained five such models: adversarially-trained CNI-W (CNI with noise added toweights), and four additional models obtained by fine-tuning the base model withthe training methods described in Section 4. One of those models is CNI-I+W(CNI with noise added to weights and input), while the remaining ones wereobtained by using different attacks during the fine-tuning. We used EPGD attacks,SmoothAdvPGD attacks, or PGD with a large number of iterations k. ForEPGD attcks, we selected models based on clean validation set performance: onewith the best validation accuracy (labelled EPGD-converged) and another withthe worst validation accuracy (EPGD-diverged). The motivation for consideringthe latter derives from the fact that PGD and EPGD are highly similar in nature,to the point that PGD converges to EPGD with adversarial training. Therefore,training until convergence may result in a model very similar to CNI-W.

For the PGD attack we used k = 56 to get computational complexity similarto EPGD. Since this attack is too strong, the base model achieved significantlyworse performance on both clean and adversarial data. Nevertheless, after theapplication of smoothing, this model showed the highest performance on adver-sarial data with a somewhat lower accuracy on clean data. Secondly, we usedTRADES (model denoted CNI-W+T) to combine both implicit and explicitregularization in a single model, expecting to improve on each of those separately.For CNI-I+W, we fine-tuned the CNI-W model based on the noise maximizingperformance, σ = 0.24. For SmoothAdvPGD, we used the method describedin [34]. We report a number of the results for each base model and smoothingmethod in Table 4, and a comparison of different setups in Figs. 1 and 2. FromFig. 2 we conclude that the EPGD-converged and CNI-W base models result insimilar performance for all setups.

For ResNet-20 on CIFAR-10, we used the CNI-W base model [51] trained for400 epochs and chose the model with the highest performance on a validation set.The obtained model was fine-tuned with EPGD adversarial training, CNI-I+Wtraining or CNI-W with a higher k for up to 200 epochs. For Wide-ResNet34-10on CIFAR-100, we used the PNI-W model [18] trained for 100 epochs and chosethe model with the highest performance on a clean validation set. For Wide-ResNet34-10 on CIFAR-10, we used the same training regime, additionally using


the PNI-W+Trades model that was obtained by training a PNI-W architecturewith the TRADES method instead of adversarial training.

5.1 Comparison to other adversarial defenses

Table 1: Comparison of our method to prior art on CIFAR-10 on ResNet-20,under PGD attack with the number of iterations k = 7 and the `∞ radius ofattack ε = 8/255. † denotes our results based on code provided by the authors orour re-implementation. Smooth CNI-W uses k = 7 and M = 512 with σ = 0.24for the regular setting and σ = 0.00 for the no-noise setting during inference.For the high k setting, k = 56 was used for adversarial training and M = 512,σ = 0.01 during inference.

MethodAccuracy, mean±std%

Clean PGD

Adversarial training [29]† 83.84 39.14 ± 0.05PNI [18] 82.84 ± 0.22 46.11 ± 0.43

TRADES (1/λ = 1) [49]† 75.62 47.18

TRADES (1/λ = 6) [49]† 75.47 47.75CNI [51] 78.48 ± 0.41 48.84 ± 0.55Smoothed CNI-W (ours) 81.44 ± 0.06 55.92 ± 0.22Smoothed CNI-W (no noise, ours) 84.63± 0.05 52.8 ± 0.2Smoothed CNI-W (high k, ours) 78.10 ± 0.12 60.06± 0.29

For ResNet-20 on CIFAR-10, we compared our best-performing instance of thedefense (smoothed CNI-W fine-tuned with a high k) to the current state-of-the-art. As summarized in Table 1, in terms of adversarial accuracy, we outperformedthe best existing method by 11.7%. In addition, we presented a configuration ofsmoothed classifiers that achieved the highest accuracy on clean data, i.e., CNI-Wwithout noise. Table 2a presents the results of our method for Wide-ResNet-34 on CIFAR-10 against PGD with k = 10 steps. Our smoothed PNI-W+Tshows comparable results in terms of adversarial accuracy compared to theprevious state-of-the-art. For Wide-ResNet-34 on CIFAR-100, our smoothedPNI-W substantially outperforms prior art in terms of adversarial accuracy, asshown in Table 2b.

We tested our defense against black-box attacks, in particular, the transferableattack [28] and NAttack[25]. For the transferable attack, we trained anotherinstance of the CNI-W model and used it as a source model in two configurations:PGD with and without smoothing. The results are reported in Tables 3a and 3b.For the transferable attack, we conclude that our model performs well even ifthe source model is not smoothed, which is an argument against a randomizedgradient obfuscation effect. The NAttack comparison shows that our methodsubstantially outperforms the baseline benchmarks.


Table 2: Comparison of our method to prior art on CIFAR-10 (a) and CIFAR-100(b) on Wide-ResNet-34, under PGD attack with k = 10 and ε = 8/255. † denotesour results based on code provided by the authors or our re-implementation.+ denotes our results based on the checkpoint provided by the authors. ForCIFAR-10 smooth PNI-W+T and PNI-W, we use k = 10 and M = 64 withσ = 0.17 for the regular setting and σ = 0.03 for the low noise setting duringinference. For CIFAR-100, we use k = 10 and M = 16 with σ = 0.05.

MethodAccuracy, %

Clean PGD

Adv. training [29]† 89.68 46.61

PNI [18]† 89.26 52.9IAAT [3] 91.3 48.53CSAT [35] 87.65 54.77TRADES [49]+ 84.92 56.5MART [40]+ 83.62 57.3Smooth PNI-W (our) 89.27 54.73Smooth PNI-W+T

85.71 56.82(1/λ = 6, our)

(a)

MethodAccuracy, %

Clean PGD

IAAT [3] 68.1 26.17

Adv. training [29]† 65.8 26.15

PNI [18]† 61.72 27.33Smooth PNI-W (our) 66.30 29.68

(b)

Table 3: Results of black-box attacks on our method applied on PNI-W andCNI-W with prediction smoothing: (a) transferable attacks with σ = 0.02; (b)NAttack. Our Smooth PNI-W and CNI-W method uses M = 1, 4, 8, σ = 0.24. †

denotes our results based on code provided by authors or our re-implementation.In both cases we present results for ResNet-20 on CIFAR-10

IterationsAccuracy, mean±std

PGD PGD-s

4 58.01 ± 0.35 59.35 ± 0.158 58.49 ± 0.10 60.23 ± 0.14

(a)

Method Accuracy, %

Adv. training [29]† 33.11Smooth PNI-W (our, m = 1) 47.17Smooth PNI-W (our, m = 4) 50.29Smooth PNI-W (our, m = 8) 50.78Smooth CNI-W (our, m = 1) 48.83Smooth CNI-W (our, m = 4) 50.98Smooth CNI-W (our, m = 8) 51.56

(b)

5.2 Ablation study

We compared a different number of Monte Carlo samples: M = 2n, n ∈ {0, . . . , 9}.For each value of M , we considered multiple noise standard deviations in the range[0, 0.5]. The upper bound that we chose is the result of significant degradation


of the model’s performance. The best results were obtained with predictionsmoothing, even for a low number of samples. Although the advantage is ofthe order of a single standard deviation, this difference is persistent across basemodels, number of iterations, and attack types, and thus is likely to be systematic.This could indicate that the attackers utilize additional information provided byother methods better than the defender.

Table 4: Results of Smooth CNI-W for the PGD white-box attacks on CIFAR-10.Mean and standard deviation over five runs is presented in the form of mean±std.

Base Model Smoothing Iterations NoiseAccuracy, mean±std

Clean PGD

CNI-W Prediction 512 0.0 84.63± 0.05 52.8 ± 0.29CNI-W Prediction 512 0.24 81.44 ± 0.06 55.92 ± 0.22EPGD-diverged Prediction 256 0.0 83.05 ± 0.01 57.31 ± 0.22EPGD-diverged Prediction 256 0.18 81.07 ± 0.01 58.56 ± 0.14EPGD-diverged Soft 512 0.0 83.13 ± 0.07 57.07 ± 0.15EPGD-diverged Soft 512 0.19 80.98 ± 0.07 58.32 ± 0.33EPGD-diverged Weighted 256 0.19 80.98 ± 0.07 58.47 ± 0.18EPGD-converged Prediction 256 0.22 81.82 ± 0.05 55.9 ± 0.37CNI-I+W Prediction 256 0.37 81.25 ± 0.06 55.23 ± 0.37CNI-W (k = 56) Prediction 512 0.11 77.28 ± 0.05 60.54± 0.26

0.0 0.1 0.2 0.3 0.4 0.5Injected noise standard deviation

0.30

0.35

0.40

0.45

0.50

0.55

0.60

Acc

ura

cy

1 samples

2 samples

4 samples

8 samples

16 samples

32 samples

64 samples

128 samples

256 samples

512 samples


Fig. 2: Accuracy as a function of injected noise strength with different numbersof samples for CNI-I+W (left) and EPGD-diverged (right) with predictionsmoothing for PGD.

For the CNI-W and EPGD models, accuracy on clean data drops as thenoise variance increases. The CNI-I+W model is more resilient to noise injection,acquiring maximum at around 40% higher standard deviation of the injected


noise. This can be related to the fact that the certified radius is twice as small asthe standard deviation of the noise used in the smoothed classifier.

In contrast to the certified robustness scheme [11], we did not observe im-provement of the CNI-I+W model over CNI-W, which might be a result of theinterference of the CNI-induced noise and the Gaussian noise.

6 Conclusions

We proposed an approach to adversarial defense based on randomized smoothing,which shows state-of-the-art results for white-box attacks, namely PGD, onCIFAR-10 and CIFAR-100, even with a small number of iterations. We alsoconfirmed the efficiency of our defense against black-box attacks, by successfullydefending against transferring adversarial examples from different models. Ourmethod offers a practical trade-off between the inference time and model perfor-mance, and can be incorporated into any adversarial defense. We showed thatadversarial training of a smoothed classifier is a non-trivial task and studiedseveral approaches to it. In addition, we investigated a family of attacks thattake smoothing into account against smoothed classifiers. By incorporating thoseattacks into adversarial training, we were able to train classifiers with higherperformance in smoothed settings. Complexity-wise, however, when taking intoaccount the computations required for smoothing during the training, they arecomparable to but not better than other known attacks. Finally, we show that ourmethod can exploit both implicit and explicit regularization, which emphasizesthe importance of incorporating implicit regularization, explicit regularizationand smoothed inference together into adversarial defenses.

Acknowledgments The research was funded by the Hyundai Motor Companythrough the HYUNDAI-TECHNION-KAIST Consortium, National Cyber Se-curity Authority, and the Hiroshi Fujiwara Technion Cyber Security ResearchCenter.

Bibliography

1. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a falsesense of security: Circumventing defenses to adversarial examples. In: Dy,J., Krause, A. (eds.) Proceedings of the 35th International Conference onMachine Learning, Proceedings of Machine Learning Research, vol. 80, pp.274–283, PMLR, Stockholmsmssan, Stockholm Sweden (10–15 Jul 2018),URL http://proceedings.mlr.press/v80/athalye18a.html (cited on pp.1 and 5)

2. Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust ad-versarial examples. In: Dy, J., Krause, A. (eds.) Proceedings of the 35thInternational Conference on Machine Learning, Proceedings of MachineLearning Research, vol. 80, pp. 284–293, PMLR, Stockholmsmssan, Stock-holm Sweden (10–15 Jul 2018), URL http://proceedings.mlr.press/v80/

athalye18b.html (cited on p. 2)3. Balaji, Y., Goldstein, T., Hoffman, J.: Instance adaptive adversarial training:

Improved accuracy tradeoffs in neural nets. arXiv preprint arXiv:1910.08051(2019), URL http://arxiv.org/abs/1910.08051 (cited on pp. 4 and 13)

4. Bietti, A., Mialon, G., Chen, D., Mairal, J.: A kernel perspective for regular-izing deep neural networks. arXiv preprint arXiv:1810.00363 (2018), URLhttp://arxiv.org/abs/1810.00363 (cited on p. 2)

5. Blum, A., Dick, T., Manoj, N., Zhang, H.: Random smoothing might beunable to certify `∞ robustness for high-dimensional images. arXiv preprintarXiv:2002.03517 (2020), URL http://arxiv.org/abs/2002.03517 (citedon p. 6)

6. Brown, T.B., Mane, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch.arXiv preprint arXiv:1712.09665 (2017), URL http://arxiv.org/abs/1712.

09665 (cited on p. 2)7. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks.

In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (May2017), https://doi.org/10.1109/SP.2017.49 (cited on pp. 1 and 3)

8. Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks onspeech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp.1–7, IEEE (2018) (cited on p. 2)

9. Chaturvedi, A., KP, A., Garain, U.: Exploring the robustness of nmt systemsto nonsensical inputs. arXiv preprint arXiv:1908.01165 (2019), URL http:

//arxiv.org/abs/1908.01165 (cited on p. 2)10. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: Zeroth order op-

timization based black-box attacks to deep neural networks without trainingsubstitute models. In: Proceedings of the 10th ACM Workshop on ArtificialIntelligence and Security, pp. 15–26, AISec ’17, ACM, New York, NY, USA(2017), ISBN 978-1-4503-5202-4, https://doi.org/10.1145/3128572.3140448,URL http://doi.acm.org/10.1145/3128572.3140448 (cited on pp. 1and 4)

http://proceedings.mlr.press/v80/athalye18a.html

http://proceedings.mlr.press/v80/athalye18b.html

http://proceedings.mlr.press/v80/athalye18b.html

http://arxiv.org/abs/1910.08051





https://doi.org/10.1109/SP.2017.49



https://doi.org/10.1145/3128572.3140448

http://doi.acm.org/10.1145/3128572.3140448


11. Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via ran-domized smoothing. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedingsof the 36th International Conference on Machine Learning, Proceedings ofMachine Learning Research, vol. 97, pp. 1310–1320, PMLR, Long Beach,California, USA (09–15 Jun 2019), URL http://proceedings.mlr.press/

v97/cohen19c.html (cited on pp. 2, 5, 6, 7, 8, and 15)

12. Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversar-ial text sequences to evade deep learning classifiers. In: 2018 IEEE Securityand Privacy Workshops (SPW), pp. 50–56, IEEE (2018) (cited on p. 2)

13. Gilmer, J., Metz, L., Faghri, F., Schoenholz, S.S., Raghu, M., Wattenberg,M., Goodfellow, I.: Adversarial spheres. arXiv preprint arXiv:1801.02774(2018), URL http://arxiv.org/abs/1801.02774 (cited on p. 2)

14. Gleave, A., Dennis, M., Kant, N., Wild, C., Levine, S., Russell, S.: Ad-versarial policies: Attacking deep reinforcement learning. arXiv preprintarXiv:1905.10615 (2019), URL http://arxiv.org/abs/1905.10615 (citedon p. 2)

15. Gong, C., Ren, T., Ye, M., Liu, Q.: Maxup: A simple way to improvegeneralization of neural network training. arXiv preprint arXiv:2002.09024(2020), URL http://arxiv.org/abs/2002.09024 (cited on p. 2)

16. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing ad-versarial examples. arXiv preprint arXiv:1412.6572 (2014), URL http:

//arxiv.org/abs/1412.6572 (cited on pp. 1, 3, and 4)

17. Gowal, S., Uesato, J., Qin, C., Huang, P.S., Mann, T., Kohli, P.: An al-ternative surrogate loss for pgd-based adversarial testing. arXiv preprintarXiv:1910.09338 (2019), URL http://arxiv.org/abs/1910.09338 (citedon p. 3)

18. He, Z., Rakin, A.S., Fan, D.: Parametric noise injection: Trainable random-ness to improve deep neural network robustness against adversarial attack. In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(June 2019), URL http://openaccess.thecvf.com/content_CVPR_2019/

html/He_Parametric_Noise_Injection_Trainable_Randomness_to_

Improve_Deep_Neural_Network_CVPR_2019_paper.html (cited on pp. 1,2, 5, 8, 11, 12, and 13)

19. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry,A.: Adversarial examples are not bugs, they are features. arXiv preprintarXiv:1905.02175 (2019), URL http://arxiv.org/abs/1905.02175 (citedon p. 10)

20. Jiang, H., Chen, Z., Shi, Y., Dai, B., Zhao, T.: Learning to defense by learningto attack. arXiv preprint arXiv:1811.01213 (2018), URL http://arxiv.org/

abs/1811.01213 (cited on p. 4)

21. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is bert really robust? natu-ral language attack on text classification and entailment. arXiv preprintarXiv:1907.11932 (2019), URL http://arxiv.org/abs/1907.11932 (citedon p. 2)

http://proceedings.mlr.press/v97/cohen19c.html

http://proceedings.mlr.press/v97/cohen19c.html







http://openaccess.thecvf.com/content_CVPR_2019/html/He_Parametric_Noise_Injection_Trainable_Randomness_to_Improve_Deep_Neural_Network_CVPR_2019_paper.html








22. Kaur, S., Cohen, J., Lipton, Z.C.: Are perceptually-aligned gradients a generalproperty of robust classifiers? arXiv preprint arXiv:1910.08640 (2019), URLhttp://arxiv.org/abs/1910.08640 (cited on p. 10)

23. Khoury, M., Hadfield-Menell, D.: Adversarial training with voronoi con-straints. arXiv preprint arXiv:1905.01019 (2019), URL http://arxiv.org/

abs/1905.01019 (cited on pp. 1 and 4)

24. Kingma, D.P., Salimans, T., Welling, M.: Variational dropoutand the local reparameterization trick. In: Cortes, C., Lawrence,N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances inNeural Information Processing Systems 28, pp. 2575–2583, Cur-ran Associates, Inc. (2015), URL http://papers.nips.cc/paper/

5666-variational-dropout-and-the-local-reparameterization-trick.

pdf (cited on p. 2)

25. Li, Y., Li, L., Wang, L., Zhang, T., Gong, B.: NATTACK: Learning thedistributions of adversarial examples for an improved black-box attack ondeep neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedingsof the 36th International Conference on Machine Learning, Proceedings ofMachine Learning Research, vol. 97, pp. 3866–3876, PMLR, Long Beach,California, USA (09–15 Jun 2019), URL http://proceedings.mlr.press/

v97/li19g.html (cited on pp. 1, 4, 12, and 22)

26. Liu, A., Liu, X., Zhang, C., Yu, H., Liu, Q., He, J.: Training robust deep neuralnetworks via adversarial noise propagation. arXiv preprint arXiv:1909.09034(2019), URL http://arxiv.org/abs/1909.09034 (cited on p. 4)

27. Liu, X., Cheng, M., Zhang, H., Hsieh, C.J.: Towards robust neural networksvia random self-ensemble. In: Proceedings of the European Conference onComputer Vision (ECCV), pp. 369–385 (2018) (cited on p. 2)

28. Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarialexamples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016),URL http://arxiv.org/abs/1611.02770 (cited on pp. 4 and 12)

29. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deeplearning models resistant to adversarial attacks. In: International Confer-ence on Learning Representations (2018), URL https://openreview.net/

forum?id=rJzIBfZAb (cited on pp. 1, 3, 4, 11, 12, 13, and 24)

30. Nakkiran, P.: A discussion of ’adversarial examples are not bugs, theyare features’: Adversarial examples are just bugs, too. Distill (2019),https://doi.org/10.23915/distill.00019.5, https://distill.pub/2019/advex-bugs-discussion/response-5 (cited on p. 10)

31. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami,A.: Practical black-box attacks against machine learning. In: Proceedingsof the 2017 ACM on Asia Conference on Computer and CommunicationsSecurity, pp. 506–519, ASIA CCS ’17, ACM, New York, NY, USA (2017),ISBN 978-1-4503-4944-4, https://doi.org/10.1145/3052973.3053009, URLhttp://doi.acm.org/10.1145/3052973.3053009 (cited on pp. 1 and 4)

32. Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger,E.: Decoupling direction and norm for efficient gradient-based L2 adversarial




http://papers.nips.cc/paper/5666-variational-dropout-and-the-local-reparameterization-trick.pdf



http://proceedings.mlr.press/v97/li19g.html

http://proceedings.mlr.press/v97/li19g.html



https://openreview.net/forum?id=rJzIBfZAb

https://openreview.net/forum?id=rJzIBfZAb

https://doi.org/10.23915/distill.00019.5

https://doi.org/10.1145/3052973.3053009

http://doi.acm.org/10.1145/3052973.3053009


attacks and defenses. In: The IEEE Conference on Computer Vision andPattern Recognition (CVPR) (June 2019) (cited on pp. 1 and 4)

33. Ru, B., Cobb, A., Blaas, A., Gal, Y.: Bayesopt adversarial attack. In: In-ternational Conference on Learning Representations (2020), URL https:

//openreview.net/forum?id=Hkem-lrtvH (cited on p. 4)

34. Salman, H., Yang, G., Li, J., Zhang, P., Zhang, H., Razenshteyn, I., Bubeck, S.:Provably robust deep learning via adversarially trained smoothed classifiers.arXiv preprint arXiv:1906.04584 (2019), URL http://arxiv.org/abs/1906.

04584 (cited on pp. 2, 5, 6, 7, 9, 10, and 11)

35. Sarkar, A., Gupta, N.K., Iyengar, R.: Enforcing linearity in dnn succoursrobustness and adversarial image generation. arXiv preprint arXiv:1910.08108(2019), URL http://arxiv.org/abs/1910.08108 (cited on pp. 1 and 13)

36. Sun, H., Wang, R., Chen, K., Utiyama, M., Sumita, E., Zhao, T.: Robust unsu-pervised neural machine translation with adversarial training. arXiv preprintarXiv:2002.12549 (2020), URL http://arxiv.org/abs/2002.12549 (citedon p. 2)

37. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfel-low, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprintarXiv:1312.6199 (2013), URL http://arxiv.org/abs/1312.6199 (cited onpp. 1, 3, and 4)

38. Wang, D., Li, C., Wen, S., Nepal, S., Xiang, Y.: Daedalus: Breakingnon-maximum suppression in object detection via adversarial examples.arXiv preprint arXiv:1902.02067 (2019), URL http://arxiv.org/abs/1902.

02067 (cited on p. 2)

39. Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., Gu, Q.: On the convergenceand robustness of adversarial training. In: Chaudhuri, K., Salakhutdinov,R. (eds.) Proceedings of the 36th International Conference on MachineLearning, Proceedings of Machine Learning Research, vol. 97, pp. 6586–6595, PMLR, Long Beach, California, USA (09–15 Jun 2019), URL http:

//proceedings.mlr.press/v97/wang19i.html (cited on p. 4)

40. Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversar-ial robustness requires revisiting misclassified examples. In: InternationalConference on Learning Representations (2020), URL https://openreview.

net/forum?id=rklOg6EFwS (cited on pp. 5, 13, and 24)

41. Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolutionstrategies. In: 2008 IEEE Congress on Evolutionary Computation (IEEEWorld Congress on Computational Intelligence), pp. 3381–3387 (June 2008),https://doi.org/10.1109/CEC.2008.4631255 (cited on p. 4)

42. Wong, E., Kolter, Z.: Provable defenses against adversarial examples viathe convex outer adversarial polytope. In: Dy, J., Krause, A. (eds.) Pro-ceedings of the 35th International Conference on Machine Learning, Pro-ceedings of Machine Learning Research, vol. 80, pp. 5286–5295, PMLR,Stockholmsmssan, Stockholm Sweden (10–15 Jul 2018), URL http://

proceedings.mlr.press/v80/wong18a.html (cited on p. 5)

https://openreview.net/forum?id=Hkem-lrtvH

https://openreview.net/forum?id=Hkem-lrtvH








http://proceedings.mlr.press/v97/wang19i.html

http://proceedings.mlr.press/v97/wang19i.html

https://openreview.net/forum?id=rklOg6EFwS

https://openreview.net/forum?id=rklOg6EFwS

https://doi.org/10.1109/CEC.2008.4631255

http://proceedings.mlr.press/v80/wong18a.html

http://proceedings.mlr.press/v80/wong18a.html


43. Xiang, C., Qi, C.R., Li, B.: Generating 3d adversarial point clouds. In: TheIEEE Conference on Computer Vision and Pattern Recognition (CVPR)(June 2019) (cited on p. 2)

44. Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A., Le, Q.V.: Adversarialexamples improve image recognition. arXiv preprint arXiv:1911.09665 (2019),URL http://arxiv.org/abs/1911.09665 (cited on p. 2)

45. Xie, C., Wu, Y., Maaten, L.v.d., Yuille, A.L., He, K.: Feature denoising forimproving adversarial robustness. In: The IEEE Conference on ComputerVision and Pattern Recognition (CVPR), pp. 501–509 (June 2019) (cited onp. 1)

46. Xu, H., Caramanis, C., Mannor, S.: Robustness and regularization of supportvector machines. Journal of Machine Learning Research 10(Jul), 1485–1510(2009) (cited on p. 21)

47. Xu, K., Zhang, G., Liu, S., Fan, Q., Sun, M., Chen, H., Chen, P.Y.,Wang, Y., Lin, X.: Evading real-time person detectors by adversarial t-shirt. arXiv preprint arXiv:1910.11099 (2019), URL http://arxiv.org/

abs/1910.11099 (cited on p. 2)48. Yang, G., Duan, T., Hu, E., Salman, H., Razenshteyn, I., Li, J.: Randomized

smoothing of all shapes and sizes. arXiv preprint arXiv:2002.08118 (2020),URL http://arxiv.org/abs/2002.08118 (cited on p. 6)

49. Zhang, H., Yu, Y., Jiao, J., Xing, E., Ghaoui, L.E., Jordan, M.: Theoreticallyprincipled trade-off between robustness and accuracy. In: Chaudhuri, K.,Salakhutdinov, R. (eds.) Proceedings of the 36th International Conferenceon Machine Learning, Proceedings of Machine Learning Research, vol. 97,pp. 7472–7482, PMLR, Long Beach, California, USA (09–15 Jun 2019), URLhttp://proceedings.mlr.press/v97/zhang19p.html (cited on pp. 1, 2, 5,12, and 13)

50. Zhang, Y., Liang, P.: Defending against whitebox adversarial attacks viarandomized discretization. In: Chaudhuri, K., Sugiyama, M. (eds.) Pro-ceedings of Machine Learning Research, Proceedings of Machine Learn-ing Research, vol. 89, pp. 684–693, PMLR (16–18 Apr 2019), URL http:

//proceedings.mlr.press/v89/zhang19b.html (cited on pp. 1 and 5)51. Zheltonozhskii, E., Baskin, C., Nemcovsky, Y., Chmiel, B., Mendelson, A.,

Bronstein, A.M.: Colored noise injection for training adversarially robustneural networks. arXiv preprint arXiv:2003.02188 (2020), URL http://

arxiv.org/abs/2003.02188 (cited on pp. 2, 5, 11, and 12)52. Zheng, S., Song, Y., Leung, T., Goodfellow, I.: Improving the robustness of

deep neural networks via stability training. In: 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), pp. 4480–4488 (June2016), https://doi.org/10.1109/CVPR.2016.485 (cited on p. 5)





http://proceedings.mlr.press/v97/zhang19p.html

http://proceedings.mlr.press/v89/zhang19b.html

http://proceedings.mlr.press/v89/zhang19b.html



https://doi.org/10.1109/CVPR.2016.485


A Defense by random noise injection

To get some intuition on the effect of noise injection on adversarial attacks, weperform an analysis of a simple classification model – a support vector machine(SVM) f(x) = w · x+ b with the parameters w and b. Specifically, we considerthe formulation of SVM under adversarial attacks with zero-mean random noiseinjected into the input. We assume that the attacker is aware of this noise, but islimited to observing the effect of a single realization thereof.

We start from the expectation of the SVM objective on a single input samplexi with ground truth label yi:

f(xi) = Eη maxδ

ReLU

[1− yi(w · (xi + ηi − δi) + b)

], (A.26)

where ηi is the injected noise and δi is the adversarial noise. Denoting δ′i = δi−ηi,and using the result of Xu et al. [46], we write Eq. (A.26) as

f(xi) = Eη maxδw · δ′i + ReLU

[1− yi(w · xi + b)

](A.27)

Since the expectation of η is 0, Eη[w · ηi] = 0, leading to

f(xi) = maxδw · δi + ReLU

[1− yi(w · xi + b)

], (A.28)

which is nothing but the SVM objective without the injected noise. Thus, theexpectation of an adversarial attack will not change due to the noise injection,and it is unclear whether the attacker can devise a better strategy to breach thedefense effect provided by the noise injection.

The effect of noise injection is similar to that of random gradient obfuscation.Let x be some input point, η a realization of the random perturbation, δ theadversarial attack that would have been chosen for x, and δ′ the adversarial attackcorresponding to x+η. Since we add noise to the input in each forward iteration,the adversary computes δ′ instead of δ, which has some random distributionaround the true adversarial direction. Denoting by Πa the projection on thedirection chosen, by Π⊥ the projection on the space orthogonal to this direction,and by ‖δ‖p ≤ ε the Lp-bound on the attack strength, yields

Π⊥(δ′) = Π⊥(η) ≡ η0 (A.29)

‖Πa(δ′)‖p =(εp − ‖η0‖pp

)1/p

(A.30)

For p =∞, the second term equals ε. The first term, however, is a random variablethat moves δ′ farther away from the adversarial direction and, therefore, decreasesthe probability of a successful adversarial attack. This effect accumulates whenthe adversary computes the gradients multiple times (such as in PGD).



0.3

0.4

0.5

0.6

0.7

0.8A

ccura

cy

CNI prediction

CNI soft

CNI weighted

EPGD-d prediction

EPGD-d soft

EPGD-d weighted

EPGD-c prediction

EPGD-c soft

EPGD-c weighted

Gauss prediction

Gauss soft

Gauss weighted

CNI-high_k prediction

CNI-high_k soft

CNI-high_k weighted


Fig. B.1: Accuracy as a function of injected noise strength with M = 8 for allbase model and all smoothing methods on clean data (left) and under PGDattack (right).


0.3

0.4

0.5

0.6

0.7

0.8

Acc

urac

y

1 sample

2 samples

4 samples

8 samples

16 samples

32 samples

64 samples

128 samples

256 samples

512 samples

(a)


0.3

0.4

0.5

0.6

0.7

0.8

Acc

urac

y

(b)

Fig. B.2: Accuracy as a function of injected noise strength with different numbersof samples for a CNI-W (k = 56) fine-tuned model with prediction smoothing onclean data (left) and under PGD attack (right).

B Additional results

Table B.1 presents the additional evaluation of our method on a black box attack(NAttack [25]) compared with prior art. Our method substantially outperformsthem. In addition, we evaluated the impact of noise injection levels on a PGD-56pre-trained model for different numbers of samples. From Fig. B.2 we concludethat clean and adversarial accuracy reaches it nominal values for almost vanishingnoise levels.

Fig. B.1 demonstrates that there is a single optimal value of noise strengthfor each base model, independent of the smoothing method. In all the figures we


0 25 50 75 100 125 150 175 200k ×M

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70A

dver

sari

alac

cura

cyEPGD,M = 512

EPGD,M = 8

EPGD,M = 1

SmoothAdvPGD,M = 512

SmoothAdvPGD,M = 8

SmoothAdvPGD,M = 1

PGD,M = 512

PGD,M = 8

PGD,M = 1

Fig. B.3: Accuracy of CNI-W model with prediction smoothing under PGD,EPGD and SmoothAdvPGD attacks with different iteration numbers (k). Weused M = 1, 8, 512 for smoothing, with a fixed noise standard deviation σ = 0.24.EPGD and SmoothAdvPGD attacks averaged the gradients over Mbackward = 8samples in each attack iteration.

see a single maxima, which is fixed for each of the base models and unrelated tothe smoothing method. This indicates that for each such model, there should bean optimal standard deviation of the injected noise.

In Fig. B.3 we compared the effectiveness of the attacks compared to theircomputational complexity k ×M , for the PGD, EPGD and SmoothAdvPGDattacks. PGD attacks are the most effective for lower k×M , but converge to thesame value as EPGD attacks without smoothing. In contrast, SmoothAdvPGDattacks both with and without smoothing are less effective for all values of k×Mand converge to a higher accuracy in all cases. This could indicate that EPGDmakes better use of the spatial information encoded in the multiple samples, somuch so that it produces an attack comparable to PGD with 1/8 of the requirediterations. Moreover, smoothing is more effective for EPGD and SmoothAdvPGD,since both are using smoothing while constructing the attack. The accuracy undersmoothing is better and converges to higher values for all attacks. This aids invalidating the effectiveness of our method.

We evaluated our best-performing model against a PGD attack with differentstrengths, ε, to study the effect of transferring defense on attacks of differentstrengths. Results are shown in Fig. B.4.


0 10 20 30 40 50 60Attack radius (255 scale)

0.0

0.2

0.4

0.6

0.8Ac

cura

cy

Fig. B.4: Accuracy of the CNI-W model with prediction smoothing over 8 itera-tions under the PGD attack with different attack radii. Standard deviation issmaller than line width.

Table B.1: Results of NAttack black-box attacks on our method applied onPNI-W and CNI-W with prediction smoothing, (a) ResNet-20 on CIFAR-10.(b) Wide-ResNet-34 on CIFAR-10. Our Smooth PNI-W and CNI-W methodsuses M = 1, 4, 8, σ = 0.24. † denotes our results based on code provided by theauthors or our re-implementation.

Method Accuracy, %

Adv. training [29]† 33.11Smooth PNI-W (our, m = 1) 47.17Smooth PNI-W (our, m = 4) 50.29Smooth PNI-W (our, m = 8) 50.78Smooth CNI-W (our, m = 1) 48.83Smooth CNI-W (our, m = 4) 50.98Smooth CNI-W (our, m = 8) 51.56

(a)

Method Accuracy, %

Adv. training [29]† 43.75Smooth PNI-W (our, m = 1) 56.25Smooth PNI-W (our, m = 4) 55.06

MART [40]+ 47.02

(b)

Smoothed Inference for Adversarial Robustness · Smoothed Inference for Improving Adversarial...

Documents

Transcript of Smoothed Inference for Adversarial Robustness · Smoothed Inference for Improving Adversarial...