How do we interpret these vision models?
Visualizing what they have learned
Filters
Can simply visualize the first layer’s filters by seeing they’re shapes
- Higher than first layer get much more complicated- intractable
Final Layer Features
Can do cool stuff for a given images’s final feature vectors
For example
- Get L2 neighbors in that feature space
- Visualize the space of feature vectors (using dimensionality reduction)
- Can even plot the images on an x-y grid and see its underanding
Activations
Understanding input pixels
Important Pixels
Process
- Run images through network- record values of a chosen channel
- Visualize image patches which correspond to maximal activations
Saliency via occlusion
- Mask part of an image and see how much predicted probabilities change
Saliency via backprop
Do a forward pass on an image and compute the gradient with respect to image pixels
- Absolute value and max over RGB channels
This will generate a saliency map where white corresponds to impact on the gradient
This can help illuminate biases
- e.g. classifying husky with white snow
Guided backprop to visualize features
Process
- Pick a single intermediate channel
- Compute gradient of neuron value with respect to image pixels
- Illuminates intermediate features
Gradient ascent to visualize features
Gradient ascent
- Generate synthetic image which maximally activates a neuron
Process
- Initialize image to zeros
- Repeat the following
- Forward pass to get current score
- Backprop to get gradient of neuron value
- Make small update (gradient ascent) to the image
Asecent :
- : score for class c (before softmax)
- : simple regularizer Can do cool stuff with “muti-faceted” visualization
Adversarial perturbations
General process
- Pick an artbitrayry image
- Pick an arbitrary class
- Modify image to maximize class
- Repeat until network is fooled
Very subtle changes!
Style Transfer
Features Inversion
Given CNN feature vector, get new image whcih
- matches feature vector
- looks natural
Basically
-
- But not todo instead is a similar symbol
Deep dream
Instead of synthesizing image to maximize a specific neuron, amlify neuron activations at some layer in the network
Basic process
- Choose image + layer in CNN
- Repeat
- Compute layer’s activations
- Set gradient of layer equal to activation
- Compute gradient on image
- Update image
Texture Synthesis
Goal: patch of texture → bigger image of same texture
Couple of methods
Nearest neighbor
- Typical nearest neighbors
- Generate pixel one at a time in scanline order- form neighborhood of already generated pixels and copy nearest neighbor from input
Neural Texture Synthesis: Gram Matrix
- Each layer of CNN gives
- C x H x W tensor of features
- Equal to: H x W grid of C-dimensoinal vectors
- From outer product of two C-dimensional vectors, get C x C matrix measuring co-occurence
- Average over all HW pairs of vectors, gives
- Gram matrix of shape C x C Process
- Pretrain CNN
- Run input texture forward through CNN, record activations
- At each layer compute gram matrix
- Initialize generated image frmo random noise
- Pass image through CNN, compute gram matrix on each layer
- Shape is
- Compute loss
- Weighted sum of L2 distance between Gram matrices
- Backprop to get gradient on image
- Gradient step on image
- Go to step 5
Neural Style Transfer
Feature + gram reconstruction
Basic idea- Content Image + Style Image → Stylized image (ie style transfer)
TODO review
Cons
- Many forward / bacpward passes
Solution: fast style transfer
- Train another neural network to perform style transfer for us
Quick review
- Lots of ways to understand CNN’s representations
- Activatoins
- NN
- Dimensionality reduction
- Maximal patches
- Occlusion
- Gradients
- Saliency maps
- Class visualiation
- Fooling images
- Feature inversion
- Fun stuff
- DeepDream- amplify neuron activations at some layer in the network
- Style transfer- usage of gram matrices