Intro#

PyTorch a great library that provides really convenient and flexible interfaces for building neural networks.

For installation check this page.

import torch
import torch.nn.functional as F

Tensor#

Tensor is a generalisation of a matrix to the case of arbitrary dimensionality. Basic entity with wich torch operates is tensor. FInd out more in specific page.


The following example demonstrates how to create a specific tensor. In this tensor, the elements are denoted as \(\left[ijk\right]\), where \(i\) represents the layer index in the third dimension, \(j\) denotes the row index, and \(k\) indicates the column index.

torch.tensor([
    [
        [111,112,113,114],
        [121,122,123,124],
        [131,132,133,134]
    ],
    [
        [211,212,213,214],
        [221,222,223,224],
        [231,232,233,244]
    ],
])
tensor([[[111, 112, 113, 114],
         [121, 122, 123, 124],
         [131, 132, 133, 134]],

        [[211, 212, 213, 214],
         [221, 222, 223, 224],
         [231, 232, 233, 244]]])

Gradient#

A key feature of PyTorch that sets it apart from NumPy is its ability to automatically compute gradients for tensors involved in computations. You just need to call the backward method on the result of your computations. The tensors that participated in these computations will then have a grad attribute containing the gradients. Find out more on the relevant page.


As example consider fuction:

\[f(\overline{X})=\sum_i x_i^2, \overline{X} = (x_1, x_2, x_3)\]

Suppose we want to calculate the gradient of the \(f\) on \(x\) in point \((1,2,3)\):

\[\nabla f=(2x_1, 2x_2, 2x_3) \Rightarrow \nabla f(1,2,3)=(2,4,6)\]

Now repeat the same procedure with the torch.

X = torch.tensor([1,2,3], dtype=torch.float, requires_grad=True)
res = (X**2).sum()
res.backward()
X.grad
tensor([2., 4., 6.])

Loss functions#

Torch implements common loss functions. The following table shows some of them:

Loss Function

Description

torch.nn.functional.binary_cross_entropy

Binary Cross Entropy

torch.nn.functional.binary_cross_entropy_with_logits

Binary Cross Entropy with Logits

torch.nn.functional.cross_entropy

Cross Entropy Loss

torch.nn.functional.hinge_embedding_loss

Hinge Embedding Loss

torch.nn.functional.kl_div

Kullback-Leibler Divergence Loss

torch.nn.functional.l1_loss

Mean Absolute Error Loss

torch.nn.functional.mse_loss

Mean Squared Error Loss

torch.nn.functional.margin_ranking_loss

Margin Ranking Loss

torch.nn.functional.multi_label_margin_loss

Multi-Label Margin Loss

torch.nn.functional.multi_label_soft_margin_loss

Multi-Label Soft Margin Loss

torch.nn.functional.smooth_l1_loss

Smooth L1 Loss

torch.nn.functional.triplet_margin_loss

Triplet Margin Loss

torch.nn.functional.nll_loss

Negative Log Likelihood Loss

torch.nn.functional.cosine_embedding_loss

Cosine Embedding Loss

Find out more in the particular page.


The followgin cell shows applying mse_loss.

F.mse_loss(
    torch.tensor([1,2,3], dtype=torch.float),
    torch.tensor([2,3,4], dtype=torch.float)
)
tensor(1.)

Reduction#

The reduction parameter allows you to specify the type of aggregation to apply to the results of the function. The three commonly used values are none, mean, and sum.


The following cell demonstrates how different types of reduction are applied to the same inputs:

tens1 = torch.tensor([1,2,3], dtype=torch.float)
tens2 = torch.tensor([2,3,4], dtype=torch.float)

for reduction in ["mean", "sum", "none"]:
    res = F.mse_loss(tens1, tens2, reduction=reduction)
    print(f"reduction - {reduction}, res={res}")
reduction - mean, res=1.0
reduction - sum, res=3.0
reduction - none, res=tensor([1., 1., 1.])

Layers#

PyTorch provides a variety of tools for creating neural network layers. Find out more on the relevant page.

Note: In theory, the term “layer” often refers to a combination of connections and activation functions. However, PyTorch has a more specific abstraction where there are dedicated layers for different functionalities. It’s important to keep this in mind to avoid confusion.

The following table categorizes layers and counts the layers corresponding to each category. All the objects mentioned in the table are accessible via torch.nn.

Category

Layers

Description

Linear Layer

Linear

Fully connected layers for mapping inputs to outputs.

Convolutional Layers

Conv1d, Conv2d, Conv3d

Layers for applying convolution operations on data.

Inverse convolution

ConvTranspose1d, ConvTranspose2d, ConvTranspose3d

Layers that invert convolution.

Pooling Layers

MaxPool1d, MaxPool2d, MaxPool3d, AvgPool1d, AvgPool2d, AvgPool3d

Layers for downsampling data while preserving important features.

Normalization Layers

BatchNorm1d, BatchNorm2d, BatchNorm3d, LayerNorm, InstanceNorm1d, InstanceNorm2d, InstanceNorm3d

Layers that normalize the input to improve training stability.

Activation Functions

Sigmoid, ReLU, LeakyReLU, ELU, RReLU, SELU, Softmax, Softplus, Softshrink, Tanh, Hardtanh

Non-linear functions applied to layer outputs to introduce non-linearity.

Recurrent Layers

RNN, LSTM, GRU

Layers designed for processing sequential data.

Padding Layers

ZeroPad1d, ZeroPad2d, ZeroPad3d, ConstantPad1d, ConstantPad2d, ConstantPad3d, ReplicationPad1d, ReplicationPad2d, ReplicationPad3d, ReflectionPad1d, ReflectionPad2d, ReflectionPad3d, CircularPad1d, CircularPad2d, CircularPad3d

Layers that modify the dimensions of data by adding padding.

Other Layers

Embedding, Dropout, Transformer, TransformerEncoder, TransformerDecoder, Flatten, Unflatten

Miscellaneous layers including embedding, dropout, and transformer components.

This table now shows the names of the layers alongside their descriptions.

Consider typical features of such objets. As an example, let’s take a linear layer without going into its peculiarities.


The following cell shows that you can apply layer to the operand.

layer = torch.nn.Linear(10, 3)
layer(torch.rand(3, 10))
tensor([[ 0.5347, -0.0643, -0.2821],
        [ 0.2541,  0.2737, -0.2114],
        [ 0.4634,  0.2516, -0.2575]], grad_fn=<AddmmBackward0>)

You can use a layer as part of the computation, and it can participate in the backward pass to compute gradients. The following cell demonstrates how to obtain the gradient for the weight attribute of a layer.

layer(torch.rand(3, 10)).sum().backward()
layer.weight.grad

Managing network#

Neural networks are built by composing layers. PyTorch provides powerful tools and concepts for building these compositions. This section will delve into these important concepts.

The core of this is the torch.nn.Module class, which represents the network and handles its parameters and operations.

Find out more in the particular page.


Now let’s take a quick practical overview of the abilities of that class. The following cell defines the ShowNN as an descendant of the torch.nn.Module, it uses a Linear and ReLU layers inside it.

class ShowNN(torch.nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.layer1 = torch.nn.Linear(3,4)
        self.layer2 = torch.nn.Sigmoid()

    def forward(self, X: torch.Tensor) -> torch.Tensor:
        return self.layer2(self.layer1(X))

The following cell demonstrates that instances of torch.nn.Module are aware of the layers within them, and when used as callable objects, they apply the transformations described in the forward method.

network = ShowNN()

print(network.__str__())
network(torch.normal(mean=0, std=10, size=[4,3]))
ShowNN(
  (layer1): Linear(in_features=3, out_features=4, bias=True)
  (layer2): Sigmoid()
)
tensor([[1.5974e-02, 5.8761e-01, 9.9243e-01, 5.0744e-01],
        [2.9262e-01, 4.2199e-02, 3.4829e-01, 2.0401e-01],
        [2.6856e-02, 8.9178e-01, 9.9853e-01, 7.6134e-01],
        [1.1712e-04, 9.8912e-01, 1.0000e+00, 8.3337e-01]],
       grad_fn=<SigmoidBackward0>)

Device#

For tensors and the model you are using, you can select the device in which the tensor is to be used. Find out more in the specific page.


The following example shows how to check the device for your tensor. By default it’s cpu.

torch.randn([5, 5]).device
device(type='cpu')

In PyTorch, most objects typically encapsulate tensors, allowing you to access their device through the tensors’ devices. The following example demonstrates how to access the devices for the weights and biases of two linear layers.

example_sequential = torch.nn.Sequential(
    torch.nn.Linear(3,3),
    torch.nn.Linear(3,3)
)

for param in example_sequential.parameters():
    print(param.device)
cpu
cpu
cpu
cpu

Reproducibility#

Reproducibility is fundamental when working with data, but there is inherent randomness in machine learning algorithms. To control this in PyTorch, the following tools are available:

  • Set seeds using torch.manual_seed, torch.cuda.manual_seed, and torch.mps.manual_seed.

  • Some objects, like torch.utils.data.DataLoader, require setting a generator object to control randomness.

  • Certain CUDA algorithms use non-deterministic approaches. By using torch.use_deterministic_algorithms(True), you can ensure that PyTorch avoids non-deterministic operations where possible, and it will alert you when it can’t.

For more details, check the official reproducibility guideline.


The following example generates a random tensor with the seed set to 10:

torch.manual_seed(10)
torch.randn(4, 4)
tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
        [-0.1974, -0.9643, -0.5133,  2.6278],
        [-0.7465,  1.0051, -0.2568,  0.4765],
        [-0.6652, -0.3627, -1.4504, -0.2496]])

By running the same code again, the exact same tensor will be returned.

torch.manual_seed(10)
torch.randn(4, 4)
tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
        [-0.1974, -0.9643, -0.5133,  2.6278],
        [-0.7465,  1.0051, -0.2568,  0.4765],
        [-0.6652, -0.3627, -1.4504, -0.2496]])

Data primitives#

Torch provides special tools for handling data: torch.utils.data.Dataset and torch.utils.data.DataLoader. Learn more in the:

Dataset contains the data, while DataLoader allows you to split the data into batches.

Check special Data primitives page.


The following cell defines an example of a custom Dataset. This dataset contains 10 items, where each item returns a number from 0 to 9 as the x observation, and the square of the corresponding x as the y value.

class SimpleDataset(torch.utils.data.Dataset):
    def __init__(self):
        self.data = list(range(10))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x = self.data[idx]
        y = x ** 2
        return x, y

Here is an example of using that class.

simple_dataset = SimpleDataset()
simple_dataset[4]
(4, 16)

The main advantage here is that you can use it in combination with DataLoader. DataLoader requires a Dataset and acts as an iterable object that, at each iteration, returns a batch with x and y values. Following cell shows it.

data_loader = torch.utils.data.DataLoader(
    simple_dataset, batch_size=3, shuffle=True
)
for batch in data_loader:
    print(batch)
[tensor([3, 0, 5]), tensor([ 9,  0, 25])]
[tensor([8, 2, 6]), tensor([64,  4, 36])]
[tensor([4, 1, 9]), tensor([16,  1, 81])]
[tensor([7]), tensor([49])]

We get shuffled batches, but each x still corresponds to its square in y.