Pytorch

TODO good tutorials

3 levels of abstraction

  1. Barebones: low-level pytorch tensor
  2. PyTorch Module API: nn.Module to define arbitrary nn architectures
  3. PyTorch Sequential API: nn.Sequential

1: Barebones

Pytorch has high-level APIs, but for now let’s just use barebone PyTorch elemneets to undrstand autograd.

First, let’s make a simple fully-connected ReLU network with 2 hidden layers and no biases. Let’s use PyTorch autograd to compute the tensors

Flatten function

Tensor is like a numpy array.

Image data is typically stored in Tensors with dimensions N x C x H x W

  • N: no. of datapoints
  • C: no. of channels
  • H: pixel height of intemediate feature map
  • W: pixel width of intemediate feature map

view is analogous to reshape in numpy

def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

Barebone PyTorch: Two-Layer network

Important to understand this implementation

import torch.nn.functional as F  # useful stateless functions
 
def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.
    
    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]
    
    w1, w2 = params
    
    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x
    
 
def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]
 
two_layer_fc_test()
     
# torch.Size([64, 10])

Implement a 3-layer ConvNet with the following specification using nn.Functional

Architecture

  1. Convolutional layer (with bias) with channel_1 filters, each with shape KW1 x KH1, and zero-padding of two
  2. ReLU
  3. Convolutional layer (with bias) with channel_2 filters, each with shape KW2 x KH2, and zero-padding of one
  4. ReLU
  5. FC with bias, producing scores for C classes.
def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.
 
    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be?
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be?
    
    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None

Answer

def three_layer_convnet(x, params):
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
 
 
    x = F.relu(F.conv2d(x, conv_w1, conv_b1, padding=2))
    x = F.relu(F.conv2d(x, conv_w2, conv_b2, padding=1))
    scores = flatten(x).mm(fc_w) + fc_b
 
    return scores

barebones pytorch: initialization

Understand it

Separate functions for

  • Initializing weight tensor with Kaiming normalization method
  • Initialing weight tensor with all zeros
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w
 
def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

Barebones pytorch: training loop

Understand it

Uses torch.functional.cross_entropy to compute loss

def train_part2(model_fn, params, learning_rate):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model_fn: A Python function that performs the forward pass of the model.
      It should have the signature scores = model_fn(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD
    
    Returns: Nothing
    """
    for t, (x, y) in enumerate(loader_train):
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)
 
        # Forward pass: compute scores and loss
        scores = model_fn(x, params)
        loss = F.cross_entropy(scores, y)
 
        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()
 
        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad
 
                # Manually zero the gradients after running the backward pass
                w.grad.zero_()
 
        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            check_accuracy_part2(loader_val, model_fn, params)
            print()
 

2: Python Module API

Now let’s use nn.module

Torch.nn docs

A few steps to use it

  1. Subclass nn.Module with intuitive name like TwoLayerFC
  2. In constructor __init__()define all layers you nee as class attributes
    1. Note: layer objects like nn.Conv2D are nn.modue subclasses
  3. In the forward() method, define the connectivity of your network. Use the attributes defined in __init__ as function calls that take tensors as input and output transformed tensors. Don’t create any new layers with learning parameters in forward()! leave that for __init__()

Example 2-layer network

class TwoLayerFC(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.fc1 = nn.Linear(input_size, hidden_size)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init 
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        nn.init.kaiming_normal_(self.fc2.weight)
    
    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores
 
def test_TwoLayerFC():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_TwoLayerFC()

Implement a three layer network according to these specs

Architecture

  1. Convolutional layer with channel_1 5x5 filters with zero-padding of 2
  2. ReLU
  3. Convolutional layer with channel_2 3x3 filters with zero-padding of 1
  4. ReLU
  5. Fully-connected layer to num_classes classes

Initialize with Kaiming normal initialization method

class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
    def forward(self, x):

Answer

class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channel, channel_1, 5, padding=2)
        nn.init.kaiming_normal_(self.conv1.weight)
        self.conv2 = nn.Conv2d(channel_1, channel_2, 3, padding=1)
        nn.init.kaiming_normal_(self.conv2.weight)
        self.fc = nn.Linear(channel_2 * 32 * 32, num_classes)
        nn.init.kaiming_normal_(self.fc.weight)
 
    def forward(self, x):
	    scores = None
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        scores = self.fc(flatten(x))
        return scores

Module API: how to check accuracy

def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

Module API: how to run a training loop

Rather than updating the values of the weights ourselves, we use an Optimizer object from the torch.optim package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.

def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
 
            scores = model(x)
            loss = F.cross_entropy(scores, y)
 
            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()
 
            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()
 
            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()
 
            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

Module API: Training Loop

Now it just looks like

hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
 
train_part34(model, optimizer)

3: Sequential API

This API merges the 3 steps of the module API into just one

Understand Sequential API: Defining a 2-layer FCN

Also training it

class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)
 
hidden_layer_size = 4000
learning_rate = 1e-2
 
model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)
 
# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)
 
train_part34(model, optimizer)

Build a 3-layer FCN accrding to specs with nn.sequential

Architecture

  1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
  2. ReLU
  3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
  4. ReLU
  5. Fully-connected layer (with bias) to compute scores for 10 classes

Default weight init

Optimize with SGD with esterov momentum 0.9

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2
 
model = None
optimizer = None
 
# Your code
 
train_part34(model, optimizer)

Answer

  • Use nn.Sequential to construct architecture (also the above Flatten() function)
  • Use optim.SGD to construct optimizer

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2
 
model = nn.Sequential(
	nn.Conv2D(3, channel_1, 5, padding=2),
	nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, 3, padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2 * 32 * 32, 10)
)
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, nesterov=True)
 
train_part34(model, optimizer)