7 min read

I recently read an intriguing paper by Ilyas, et. al. about a radically different way to view adversarial examples 1, titled “Adversarial Examples Are Not Bugs, They Are Features”.

The authors propose the existence of so-called “robust” and “non-robust” features in the images used for training image classifiers. Robust features can be thought of as features that humans naturally use for classification e.g. flappy ears are indicative of certain breeds of dogs, while black and white stripes are indicative of zebras. Non-robust features, on the other hand, are features that humans aren’t sensitive to 2, but are truly indicative of a particular class (i.e. they correlate with the class over the whole (train and test) dataset). The authors argue that adversarial examples are produced by replacing the non-robust features in an image with non-robust features of another class.

I highly recommend reading the paper or at the very least, the accompanying blog post.

One figure in the paper that particularly struck me as interesting was the following graph showing the correlation between transferability of adversarial examples to the ability to learn similar non-robust features.


One way to interpret this graph is that it shows how well a particular architecture is able to capture non-robust features in an image. 3

Notice how far back VGG is compared to the other models.

In the unrelated field of neural style transfer, VGG is also quite special since non-VGG architectures are known to not work very well 4 without some sort of parameterization trick. The above interpretation of the graph provides an alternative explanation for this phenomenon. Since VGG is unable to capture non-robust features as well as other architectures, the outputs for style transfer actually look more correct to humans! 5

Before proceeding, let’s quickly discuss the results obtained by Mordvintsev, et. al. in Differentiable Image Parameterizations, where they show that non-VGG architectures can be used for style transfer with a simple technique. In their experiment, instead of optimizing the output image in RGB space, they optimize it in Fourier space, and run the image through a series of transformations (e.g jitter, rotation, scaling) before passing it through the neural network.

Style transfer on non-VGG architectures via decorrelated parameterization and transformation robustness. From Differentiable Image Parameterizations by Mordvintsev, et. al.

Can we reconcile this result with our hypothesis linking neural style transfer and non-robust features?

One possible theory is that all of these image transformations weaken or even destroy non-robust features. Since the optimization can no longer reliably manipulate non-robust features to bring down the loss, it is forced to use robust features instead, which are presumably more resistant to the applied image transformations (a rotated and jittered flappy ear still looks like a flappy ear).

Testing this hypothesis is fairly straightforward: Use an adversarially robust classifier for (regular) neural style transfer and see what happens.

A quick experiment

Fortunately, Engstrom, et. al. open-sourced their code and model weights for a robust ResNet-50, saving me the trouble of having to train my own. I compared a regularly trained (non-robust) ResNet-50 with a robustly trained ResNet-50 on their performance on Gatys, et. al.’s original neural style transfer algorithm. For comparison, I also performed the style transfer with a regular VGG-19.

My experiments can be fully reproduced inside this Colab notebook. To ensure a fair comparison despite the different networks having different optimal hyperparameters, I performed a small grid search for each image and manually picked the best output per network. Further details can be read in a footnote 6.

The results of the experiment can be explored in the diagram below.

Content image Style image

Success! The robust ResNet shows drastic improvement over the regular ResNet. Remember, all we did was switch the ResNet’s weights, the rest of the code for performing style transfer is exactly the same!

A more interesting comparison can be done between VGG-19 and the robust ResNet. At first glance, the robust ResNet’s outputs seem on par with VGG-19. Looking closer, however, the ResNet’s outputs seem slightly noisier and exhibit some artifacts 7.

Texture synthesized with VGG.
Mild artifacts.
Texture synthesized with robust ResNet.
Severe artifacts.
A comparison of artifacts between textures synthesized by VGG and ResNet. Interact by hovering around the images. This diagram was repurposed from Deconvolution and Checkerboard Artifacts by Odena, et. al.

It is currently unclear exactly what causes these artifacts. One theory is that they are checkerboard artifacts (Odena, et. al.) caused by non-divisible kernel size and stride in the convolution layers. They could also be artifacts caused by the presence of max pooling layers (Henaff, et. al.). Whatever the case, these artifacts, while problematic, seem largely distinct from the problem that adversarial robustness solves in neural style transfer.

VGG remains a mystery

Although this experiment started because of an observation about a special characteristic of VGG nets, it did not provide an explanation for this phenomenon. Indeed, if we are to accept the theory that adversarial robustness is the reason VGG works out of the box with neural style transfer, surely we’d find some indication in existing literature that VGG is naturally more robust than other architectures.

Unfortunately, I could not find anything supporting this.

If anything, I found evidence that AlexNet is actually above VGG in terms of “natural robustness” (Table 5 in Galloway, et. al., Figure 3 in Hendrycks, et. al.).

Perhaps adversarial robustness just happens to incidentally fix or cover up the true reason non-VGG architectures fail at style transfer (or other similar algorithms 8) i.e. adversarial robustness is a sufficient but unnecessary condition for good style transfer. Whatever it is, I think further examination of VGG is a very interesting direction for future work.

Future work

Admittedly, my little experiment probably raises a lot more questions than it answers. Aside from figuring out VGG’s mysteries, here are a few other ideas for future work:

  • Figure out the cause of the robust ResNet artifacts and attempt to fix them. This Medium post by Sahil Singla shows a few good techniques. Adjusting the stride value so it can cleanly divide the kernel size might eliminate checkerboard artifacts. Replacing max pooling layers with average pooling layers might also help reduce artifacts. One can also try the techniques from Differentiable Image Parameterizations and apply image transformations and a decorrelated parameterization in conjunction with robustness.
  • Experiment with hyperparameters, particularly the layers used for style and content. I stuck with the same set of layers for ResNet and did not do a lot of exploration in this area.
  • To my knowledge, the robust ResNet I used from Engstrom, et. al. was trained on a restricted set of ImageNet with only 9 classes. It would be interesting to see if a robust classifier trained on the full ImageNet dataset would produce better outputs.

If you’d like to build on this experiment, all the code is available in this Colab notebook.


This post was mostly inspired by this series of papers by Ilyas, et. al., Engstrom, et. al., and Santurkar, et. al. and built on top of their open-sourced code and model weights. The diagram comparing artifacts was repurposed from Odena et. al.’s Deconvolution and Checkerboard Artifacts. Chris Olah pointed out that feature visualization works well on VGG without priors or regularization. All experiments were performed on Google Colaboratory.


If you found this work useful, please cite it as:

title={Neural Style Transfer with Adversarially Robust Classifiers}, 
author={Reiichiro Nakano}, 
  1. Adversarial examples are inputs that are specially crafted by an attacker to trick a classifier into producing an incorrect label for that input. There is an entire field of research dedicated to adversarial attacks and defenses in deep learning literature. 

  2. This is usually defined as being in some pre-defined perturbation set such as an L2 ball. Humans don’t notice individual pixels changing within some pre-defined epsilon, so any perturbations within this set can be used to create an adversarial example. 

  3. Since the non-robust features are defined by the non-robust features ResNet-50 captures, , what this graph really shows is how well an architecture captures

  4. This phenomenon is discussed at length in this Reddit thread

  5. To follow this argument, note that the perceptual losses used in neural style transfer are dependent on matching features learned by a separately trained image classifier. If these learned features don’t make sense to humans (non-robust features), the outputs for neural style transfer won’t make sense either. 

  6. L-BFGS was used for optimization as it showed faster convergence over Adam. For ResNet-50, the style layers used were the ReLu outputs after each of the 4 residual blocks, while the content layer used was . For VGG-19, style layers were used with a content layer . In VGG-19, max pooling layers were replaced with avg pooling layers, as in the original paper by Gatys, et. al. 

  7. This is more obvious when the output image is initialized not with the content image, but with Gaussian noise. 

  8. In fact, neural style transfer is not the only pretrained classifier-based iterative image optimization technique that magically works better with adversarial robustness. In a more recent paper from Engstrom, et. al., they show that feature visualization via activation maximization works on robust classifiers without enforcing any priors or regularization (e.g. image transformations and decorrelated parameterization) used by previous work. In a recent chat I had with Chris Olah, he pointed out that the aforementioned feature visualization techniques actually work well on VGG without these priors, just like style transfer!