Remmeber the calsses
- Supervised vs unsupervised
- Supervised: learn - given (x, y)
- Semantic segmentation
- Object detection
- Image captioning
- Classification
- Unsupervised: learn some underlying hidden structure of the data- given just data, no labels
- Clustering
- Dimensionality reduction
- Feature learning
- Density estimation
Discriminative vs Generative (for data x and label y)
- Discriminative: learn
- Can’t handle unreasonable inputs- gives label distribution for all images
- Functoin: assigment label to data (feature learning with labels)
- Generative: learn
- Requires deep understanding of images
- Can “reject” unreasonable inputs
- Functions: detect outliers, feature learning without labels, sample from to generate new data
- Conditional genereative model- learn
- Assign labels while rejecting outliers
- Generate new dat conditioned on input labels
Remember Bayes’ Rule
- Well, we can build if we learn all of these!
- And is prior over labels
Taxonomy of generative models
- Explicit density: can compute
- Tractable density- can compute
- Approximate density- approximation to
- Variational
- Markov Chain
- Implicit density- does nt explicitly compute but can sample from
- Markov Chain
- Direct
Autoregressive models
Tractable density model
Gaol: write down
- proability of next subpart givcen all previous subparts
PixelRNN
Generates pixes one t a time
Compute hidden state for each pixel which depends on hidden states / RGB alues from left + above (LSTM recurrence)
Each pixel- preict red, then bue, then green- softmax over
Problem: Slow (N x N image requires 2N - 1 sequential steps)
PixelCNN
- still start from corner- but dependency is modeled using a CNN
Training: still maximize the likelihood of training images
Still generate starting from corner
Faster, but still sequential generation
Variational Autoencoders
Regular Autoencoders
- Unsupervised method for learning feature vectors, by doing encoding then decoding, without label s just raw data
- Basic loss function of MSE- L2 distance between input + reconstructed data
After training, throw away decoder- use encoder for downstream task
These autoencoders learn latent features
Variational Autoencoders
Probabilistic spin on rergualr autoencoders
- learn latent features
- Sample from model ot generate new data
Assume training data generated from latent representation
- Intuition: x is an image, z is latent factors used to generate x
Sampling new data
- Sample from conditional
- Sample z from prior
Assume a simple prior - e.g. Gaussa
Represent with neural network (similar to decoder from auto-encoder)
Decoder must be probabilistic
- But how?
Encoder network
- Input: x (image, flattened to vector)
Decoder network
- Input: z (vector)
Note that bosth use the diagonal guassian trick
Training goal:
- Maximizing variational lower bound
How to train though?
- Run input through encoder to get distribution over latent codes
- Output should match the prior
- Second term
- Basically we want the output to match the prior we have chosen
- Sample code from encoder output
- Run code through decoder → distribution over data samples
- We want to maximize the likelihood of the data of x under predicted distributino of the decoder when we feed in a sample z
- Data reconstruction term
- Original input data should be under distribution output from step 4
- Can sample a reconstruction from 4
basically, we’re trying to put some kind of limit (through KL divergence) on the kind of latent variables we’re trying to predict, while jointy training with the decoder network that reconstructs the latent variables into image
once trained
Generating new data
- Sample from prior
- Run through decoder to get distribution over data
- sample from distribution in step 2 to generate data
Editing images after training
Since we enforce diagonal prior on distribution of , the dimensions are independent, so each latent variable should encode something different
- Run input data through encoder to get distirbution over latent codes
- Sample code from encoder output
- Modify some dimensions of sampled code
- Run modified z through decoder to get distribution over data sample
- Sample new data from step 4
Summary of variational autoencoders
- Probabilisitc spin on traditional autoencoders to allow for data generation
- Functions: define intractable density → derive + optimize variational lower bound
- Pro
- Principled approach
- Allows inference of - can be useful for other tasks
- Cons
- Maximizes lower bound of likelihood- not super good evaluation
- Blurrier samples than other methods
Generative models summary
- Autoregressive: directly maximize likelihood of training data
- Variational autoencoders: introduce latent - maximize a lower bound
- GANs: give up on modeling , but be able to sample from
GANs
Assume that we have drawn from - we want to sample from
Idea: introduce latent variable with simple prior
Sample and pass to generator → x = g(z)
Then, x is sample from the generator distribution.
- Goal: We want
How?
- Train G to convert z into fake data x sampled from by fooling discriminator
- Goal: converges to
Train discriminator to classify data as real or fake
Called the minimax game:
Above equation just shows
- Discriminator is trying to maximize
- The probability that real data is classified as 1
- The probability that fake data is classified as 0
- Generator is trying to minimize
- probability that fake data is classified as 1
Train G and D with alternative gradient updates
But how do we look at loss??? No overall loss or training curves
- Pretty challenging to train
Beginning of training
- Generator sucks- vanishing gradients
- Solution: train G to maximize instead of minimizing
Some GAN Architectures
Can interpolate between points in latent space
Lots of stuff more recently probably