Pytorch
TODO good tutorials
3 levels of abstraction
Barebones: low-level pytorch tensor
PyTorch Module API: nn.Module
to define arbitrary nn architectures
PyTorch Sequential API: nn.Sequential
1: Barebones
Pytorch has high-level APIs, but for now let’s just use barebone PyTorch elemneets to undrstand autograd.
First, let’s make a simple fully-connected ReLU network with 2 hidden layers and no biases. Let’s use PyTorch autograd to compute the tensors
Flatten function
Tensor is like a numpy array.
Image data is typically stored in Tensors with dimensions N x C x H x W
N: no. of datapoints
C: no. of channels
H: pixel height of intemediate feature map
W: pixel width of intemediate feature map
view
is analogous to reshape
in numpy
def flatten (x):
N = x.shape[ 0 ] # read in N, C, H, W
return x.view(N, - 1 ) # "flatten" the C * H * W values into a single vector per image
Barebone PyTorch: Two-Layer network
Important to understand this implementation
import torch.nn.functional as F # useful stateless functions
def two_layer_fc (x, params):
"""
A fully-connected neural networks; the architecture is:
NN is fully connected -> ReLU -> fully connected layer.
Note that this function only defines the forward pass;
PyTorch will take care of the backward pass for us.
The input to the network will be a minibatch of data, of shape
(N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
and the output layer will produce scores for C classes.
Inputs:
- x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
input data.
- params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
w1 has shape (D, H) and w2 has shape (H, C).
Returns:
- scores: A PyTorch Tensor of shape (N, C) giving classification scores for
the input data x.
"""
# first we flatten the image
x = flatten(x) # shape: [batch_size, C x H x W]
w1, w2 = params
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.
# you can also use `.clamp(min=0)`, equivalent to F.relu()
x = F.relu(x.mm(w1))
x = x.mm(w2)
return x
def two_layer_fc_test ():
hidden_layer_size = 42
x = torch.zeros(( 64 , 50 ), dtype = dtype) # minibatch size 64, feature dimension 50
w1 = torch.zeros(( 50 , hidden_layer_size), dtype = dtype)
w2 = torch.zeros((hidden_layer_size, 10 ), dtype = dtype)
scores = two_layer_fc(x, [w1, w2])
print (scores.size()) # you should see [64, 10]
two_layer_fc_test()
# torch.Size([64, 10])
Implement a 3-layer ConvNet with the following specification using nn.Functional
Architecture
Convolutional layer (with bias) with channel_1 filters, each with shape KW1 x KH1, and zero-padding of two
ReLU
Convolutional layer (with bias) with channel_2 filters, each with shape KW2 x KH2, and zero-padding of one
ReLU
FC with bias, producing scores for C classes.
def three_layer_convnet (x, params):
"""
Performs the forward pass of a three-layer convolutional network with the
architecture defined above.
Inputs:
- x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
- params: A list of PyTorch Tensors giving the weights and biases for the
network; should contain the following:
- conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
for the first convolutional layer
- conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
convolutional layer
- conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
weights for the second convolutional layer
- conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
convolutional layer
- fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
figure out what the shape should be?
- fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
figure out what the shape should be?
Returns:
- scores: PyTorch Tensor of shape (N, C) giving classification scores for x
"""
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
Answer
def three_layer_convnet (x, params):
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
x = F.relu(F.conv2d(x, conv_w1, conv_b1, padding = 2 ))
x = F.relu(F.conv2d(x, conv_w2, conv_b2, padding = 1 ))
scores = flatten(x).mm(fc_w) + fc_b
return scores
barebones pytorch: initialization
Understand it
Separate functions for
Initializing weight tensor with Kaiming normalization method
Initialing weight tensor with all zeros
def random_weight (shape):
"""
Create random Tensors for weights; setting requires_grad=True means that we
want to compute gradients for these Tensors during the backward pass.
We use Kaiming normalization: sqrt(2 / fan_in)
"""
if len (shape) == 2 : # FC weight
fan_in = shape[ 0 ]
else :
fan_in = np.prod(shape[ 1 :]) # conv weight [out_channel, in_channel, kH, kW]
# randn is standard normal distribution generator.
w = torch.randn(shape, device = device, dtype = dtype) * np.sqrt( 2 . / fan_in)
w.requires_grad = True
return w
def zero_weight (shape):
return torch.zeros(shape, device = device, dtype = dtype, requires_grad = True )
Barebones pytorch: training loop
Understand it
Uses torch.functional.cross_entropy to compute loss
def train_part2 (model_fn, params, learning_rate):
"""
Train a model on CIFAR-10.
Inputs:
- model_fn: A Python function that performs the forward pass of the model.
It should have the signature scores = model_fn(x, params) where x is a
PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
model weights, and scores is a PyTorch Tensor of shape (N, C) giving
scores for the elements in x.
- params: List of PyTorch Tensors giving weights for the model
- learning_rate: Python scalar giving the learning rate to use for SGD
Returns: Nothing
"""
for t, (x, y) in enumerate (loader_train):
# Move the data to the proper device (GPU or CPU)
x = x.to( device = device, dtype = dtype)
y = y.to( device = device, dtype = torch.long)
# Forward pass: compute scores and loss
scores = model_fn(x, params)
loss = F.cross_entropy(scores, y)
# Backward pass: PyTorch figures out which Tensors in the computational
# graph has requires_grad=True and uses backpropagation to compute the
# gradient of the loss with respect to these Tensors, and stores the
# gradients in the .grad attribute of each Tensor.
loss.backward()
# Update parameters. We don't want to backpropagate through the
# parameter updates, so we scope the updates under a torch.no_grad()
# context manager to prevent a computational graph from being built.
with torch.no_grad():
for w in params:
w -= learning_rate * w.grad
# Manually zero the gradients after running the backward pass
w.grad.zero_()
if t % print_every == 0 :
print ( 'Iteration %d , loss = %.4f ' % (t, loss.item()))
check_accuracy_part2(loader_val, model_fn, params)
print ()
2: Python Module API
Now let’s use nn.module
Torch.nn docs
A few steps to use it
Subclass nn.Module
with intuitive name like TwoLayerFC
In constructor __init__()
define all layers you nee as class attributes
Note: layer objects like nn.Conv2D
are nn.modue
subclasses
In the forward()
method, define the connectivity of your network. Use the attributes defined in __init__
as function calls that take tensors as input and output transformed tensors. Don’t create any new layers with learning parameters in forward()
! leave that for __init__()
Example 2-layer network
class TwoLayerFC ( nn . Module ):
def __init__ (self, input_size, hidden_size, num_classes):
super (). __init__ ()
# assign layer objects to class attributes
self .fc1 = nn.Linear(input_size, hidden_size)
# nn.init package contains convenient initialization methods
# http://pytorch.org/docs/master/nn.html#torch-nn-init
nn.init.kaiming_normal_( self .fc1.weight)
self .fc2 = nn.Linear(hidden_size, num_classes)
nn.init.kaiming_normal_( self .fc2.weight)
def forward (self, x):
# forward always defines connectivity
x = flatten(x)
scores = self .fc2(F.relu( self .fc1(x)))
return scores
def test_TwoLayerFC ():
input_size = 50
x = torch.zeros(( 64 , input_size), dtype = dtype) # minibatch size 64, feature dimension 50
model = TwoLayerFC(input_size, 42 , 10 )
scores = model(x)
print (scores.size()) # you should see [64, 10]
test_TwoLayerFC()
Implement a three layer network according to these specs
Architecture
Convolutional layer with channel_1 5x5 filters with zero-padding of 2
ReLU
Convolutional layer with channel_2 3x3 filters with zero-padding of 1
ReLU
Fully-connected layer to num_classes classes
Initialize with Kaiming normal initialization method
class ThreeLayerConvNet ( nn . Module ):
def __init__ (self, in_channel, channel_1, channel_2, num_classes):
super (). __init__ ()
def forward (self, x):
Answer
class ThreeLayerConvNet ( nn . Module ):
def __init__ (self, in_channel, channel_1, channel_2, num_classes):
super (). __init__ ()
self .conv1 = nn.Conv2d(in_channel, channel_1, 5 , padding = 2 )
nn.init.kaiming_normal_( self .conv1.weight)
self .conv2 = nn.Conv2d(channel_1, channel_2, 3 , padding = 1 )
nn.init.kaiming_normal_( self .conv2.weight)
self .fc = nn.Linear(channel_2 * 32 * 32 , num_classes)
nn.init.kaiming_normal_( self .fc.weight)
def forward (self, x):
scores = None
x = F.relu( self .conv1(x))
x = F.relu( self .conv2(x))
scores = self .fc(flatten(x))
return scores
Module API: how to check accuracy
def check_accuracy_part34 (loader, model):
if loader.dataset.train:
print ( 'Checking accuracy on validation set' )
else :
print ( 'Checking accuracy on test set' )
num_correct = 0
num_samples = 0
model.eval() # set model to evaluation mode
with torch.no_grad():
for x, y in loader:
x = x.to( device = device, dtype = dtype) # move to device, e.g. GPU
y = y.to( device = device, dtype = torch.long)
scores = model(x)
_, preds = scores.max( 1 )
num_correct += (preds == y).sum()
num_samples += preds.size( 0 )
acc = float (num_correct) / num_samples
print ( 'Got %d / %d correct ( %.2f )' % (num_correct, num_samples, 100 * acc))
Module API: how to run a training loop
Rather than updating the values of the weights ourselves, we use an Optimizer object from the torch.optim package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.
def train_part34 (model, optimizer, epochs = 1 ):
"""
Train a model on CIFAR-10 using the PyTorch Module API.
Inputs:
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model
- epochs: (Optional) A Python integer giving the number of epochs to train for
Returns: Nothing, but prints model accuracies during training.
"""
model = model.to( device = device) # move the model parameters to CPU/GPU
for e in range (epochs):
for t, (x, y) in enumerate (loader_train):
model.train() # put model to training mode
x = x.to( device = device, dtype = dtype) # move to device, e.g. GPU
y = y.to( device = device, dtype = torch.long)
scores = model(x)
loss = F.cross_entropy(scores, y)
# Zero out all of the gradients for the variables which the optimizer
# will update.
optimizer.zero_grad()
# This is the backwards pass: compute the gradient of the loss with
# respect to each parameter of the model.
loss.backward()
# Actually update the parameters of the model using the gradients
# computed by the backwards pass.
optimizer.step()
if t % print_every == 0 :
print ( 'Iteration %d , loss = %.4f ' % (t, loss.item()))
check_accuracy_part34(loader_val, model)
print ()
Module API: Training Loop
Now it just looks like
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC( 3 * 32 * 32 , hidden_layer_size, 10 )
optimizer = optim.SGD(model.parameters(), lr = learning_rate)
train_part34(model, optimizer)
3: Sequential API
This API merges the 3 steps of the module API into just one
Understand Sequential API: Defining a 2-layer FCN
Also training it
class Flatten ( nn . Module ):
def forward (self, x):
return flatten(x)
hidden_layer_size = 4000
learning_rate = 1e-2
model = nn.Sequential(
Flatten(),
nn.Linear( 3 * 32 * 32 , hidden_layer_size),
nn.ReLU(),
nn.Linear(hidden_layer_size, 10 ),
)
# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr = learning_rate,
momentum = 0.9 , nesterov = True )
train_part34(model, optimizer)
Build a 3-layer FCN accrding to specs with nn.sequential
Architecture
Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
ReLU
Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
ReLU
Fully-connected layer (with bias) to compute scores for 10 classes
Default weight init
Optimize with SGD with esterov momentum 0.9
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2
model = None
optimizer = None
# Your code
train_part34(model, optimizer)
Answer
Use nn.Sequential
to construct architecture (also the above Flatten()
function)
Use optim.SGD
to construct optimizer
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2
model = nn.Sequential(
nn.Conv2D( 3 , channel_1, 5 , padding = 2 ),
nn.ReLU(),
nn.Conv2d(channel_1, channel_2, 3 , padding = 1 ),
nn.ReLU(),
Flatten(),
nn.Linear(channel_2 * 32 * 32 , 10 )
)
optimizer = optim.SGD(model.parameters(), lr = learning_rate, momentum = 0.9 , nesterov = True )
train_part34(model, optimizer)