Intro

Intro#

PyTorch a great library that provides really convenient and flexible interfaces for building neural networks.

For installation check this page.

import torch
import torch.nn.functional as F

Tensor#

Tensor is a generalisation of a matrix to the case of arbitrary dimensionality. Basic entity with wich torch operates is tensor. FInd out more in specific page.

The following example demonstrates how to create a specific tensor. In this tensor, the elements are denoted as \(\left[ijk\right]\), where \(i\) represents the layer index in the third dimension, \(j\) denotes the row index, and \(k\) indicates the column index.

torch.tensor([
    [
        [111,112,113,114],
        [121,122,123,124],
        [131,132,133,134]
    ],
    [
        [211,212,213,214],
        [221,222,223,224],
        [231,232,233,244]
    ],
])

tensor([[[111, 112, 113, 114],
         [121, 122, 123, 124],
         [131, 132, 133, 134]],

        [[211, 212, 213, 214],
         [221, 222, 223, 224],
         [231, 232, 233, 244]]])

Gradient#

A key feature of PyTorch that sets it apart from NumPy is its ability to automatically compute gradients for tensors involved in computations. You just need to call the backward method on the result of your computations. The tensors that participated in these computations will then have a grad attribute containing the gradients. Find out more on the relevant page.

As example consider fuction:

\[f(\overline{X})=\sum_i x_i^2, \overline{X} = (x_1, x_2, x_3)\]

Suppose we want to calculate the gradient of the \(f\) on \(x\) in point \((1,2,3)\):

\[\nabla f=(2x_1, 2x_2, 2x_3) \Rightarrow \nabla f(1,2,3)=(2,4,6)\]

Now repeat the same procedure with the torch.

X = torch.tensor([1,2,3], dtype=torch.float, requires_grad=True)
res = (X**2).sum()
res.backward()
X.grad

tensor([2., 4., 6.])

Loss functions#

Torch implements common loss functions. The following table shows some of them:

Loss Function	Description
`torch.nn.functional.binary_cross_entropy`	Binary Cross Entropy
`torch.nn.functional.binary_cross_entropy_with_logits`	Binary Cross Entropy with Logits
`torch.nn.functional.cross_entropy`	Cross Entropy Loss
`torch.nn.functional.hinge_embedding_loss`	Hinge Embedding Loss
`torch.nn.functional.kl_div`	Kullback-Leibler Divergence Loss
`torch.nn.functional.l1_loss`	Mean Absolute Error Loss
`torch.nn.functional.mse_loss`	Mean Squared Error Loss
`torch.nn.functional.margin_ranking_loss`	Margin Ranking Loss
`torch.nn.functional.multi_label_margin_loss`	Multi-Label Margin Loss
`torch.nn.functional.multi_label_soft_margin_loss`	Multi-Label Soft Margin Loss
`torch.nn.functional.smooth_l1_loss`	Smooth L1 Loss
`torch.nn.functional.triplet_margin_loss`	Triplet Margin Loss
`torch.nn.functional.nll_loss`	Negative Log Likelihood Loss
`torch.nn.functional.cosine_embedding_loss`	Cosine Embedding Loss

Find out more in the particular page.

The followgin cell shows applying mse_loss.

F.mse_loss(
    torch.tensor([1,2,3], dtype=torch.float),
    torch.tensor([2,3,4], dtype=torch.float)
)

tensor(1.)

Reduction#

The reduction parameter allows you to specify the type of aggregation to apply to the results of the function. The three commonly used values are none, mean, and sum.

The following cell demonstrates how different types of reduction are applied to the same inputs:

tens1 = torch.tensor([1,2,3], dtype=torch.float)
tens2 = torch.tensor([2,3,4], dtype=torch.float)

for reduction in ["mean", "sum", "none"]:
    res = F.mse_loss(tens1, tens2, reduction=reduction)
    print(f"reduction - {reduction}, res={res}")

reduction - mean, res=1.0
reduction - sum, res=3.0
reduction - none, res=tensor([1., 1., 1.])

Layers#

PyTorch provides a variety of tools for creating neural network layers. Find out more on the relevant page.

Note: In theory, the term “layer” often refers to a combination of connections and activation functions. However, PyTorch has a more specific abstraction where there are dedicated layers for different functionalities. It’s important to keep this in mind to avoid confusion.

The following table categorizes layers and counts the layers corresponding to each category. All the objects mentioned in the table are accessible via torch.nn.

Category	Layers	Description
Linear Layer	`Linear`	Fully connected layers for mapping inputs to outputs.
Convolutional Layers	`Conv1d`, `Conv2d`, `Conv3d`	Layers for applying convolution operations on data.
Inverse convolution	`ConvTranspose1d`, `ConvTranspose2d`, `ConvTranspose3d`	Layers that invert convolution.
Pooling Layers	`MaxPool1d`, `MaxPool2d`, `MaxPool3d`, `AvgPool1d`, `AvgPool2d`, `AvgPool3d`	Layers for downsampling data while preserving important features.
Normalization Layers	`BatchNorm1d`, `BatchNorm2d`, `BatchNorm3d`, `LayerNorm`, `InstanceNorm1d`, `InstanceNorm2d`, `InstanceNorm3d`	Layers that normalize the input to improve training stability.
Activation Functions	`Sigmoid`, `ReLU`, `LeakyReLU`, `ELU`, `RReLU`, `SELU`, `Softmax`, `Softplus`, `Softshrink`, `Tanh`, `Hardtanh`	Non-linear functions applied to layer outputs to introduce non-linearity.
Recurrent Layers	`RNN`, `LSTM`, `GRU`	Layers designed for processing sequential data.
Padding Layers	`ZeroPad1d`, `ZeroPad2d`, `ZeroPad3d`, `ConstantPad1d`, `ConstantPad2d`, `ConstantPad3d`, `ReplicationPad1d`, `ReplicationPad2d`, `ReplicationPad3d`, `ReflectionPad1d`, `ReflectionPad2d`, `ReflectionPad3d`, `CircularPad1d`, `CircularPad2d`, `CircularPad3d`	Layers that modify the dimensions of data by adding padding.
Other Layers	`Embedding`, `Dropout`, `Transformer`, `TransformerEncoder`, `TransformerDecoder`, `Flatten`, `Unflatten`	Miscellaneous layers including embedding, dropout, and transformer components.

This table now shows the names of the layers alongside their descriptions.

Consider typical features of such objets. As an example, let’s take a linear layer without going into its peculiarities.

The following cell shows that you can apply layer to the operand.

layer = torch.nn.Linear(10, 3)
layer(torch.rand(3, 10))

tensor([[ 0.5347, -0.0643, -0.2821],
        [ 0.2541,  0.2737, -0.2114],
        [ 0.4634,  0.2516, -0.2575]], grad_fn=<AddmmBackward0>)

You can use a layer as part of the computation, and it can participate in the backward pass to compute gradients. The following cell demonstrates how to obtain the gradient for the weight attribute of a layer.

layer(torch.rand(3, 10)).sum().backward()
layer.weight.grad

Managing network#

Neural networks are built by composing layers. PyTorch provides powerful tools and concepts for building these compositions. This section will delve into these important concepts.

The core of this is the torch.nn.Module class, which represents the network and handles its parameters and operations.

Find out more in the particular page.

Now let’s take a quick practical overview of the abilities of that class. The following cell defines the ShowNN as an descendant of the torch.nn.Module, it uses a Linear and ReLU layers inside it.

class ShowNN(torch.nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.layer1 = torch.nn.Linear(3,4)
        self.layer2 = torch.nn.Sigmoid()

    def forward(self, X: torch.Tensor) -> torch.Tensor:
        return self.layer2(self.layer1(X))

The following cell demonstrates that instances of torch.nn.Module are aware of the layers within them, and when used as callable objects, they apply the transformations described in the forward method.

network = ShowNN()

print(network.__str__())
network(torch.normal(mean=0, std=10, size=[4,3]))

ShowNN(
  (layer1): Linear(in_features=3, out_features=4, bias=True)
  (layer2): Sigmoid()
)

tensor([[1.5974e-02, 5.8761e-01, 9.9243e-01, 5.0744e-01],
        [2.9262e-01, 4.2199e-02, 3.4829e-01, 2.0401e-01],
        [2.6856e-02, 8.9178e-01, 9.9853e-01, 7.6134e-01],
        [1.1712e-04, 9.8912e-01, 1.0000e+00, 8.3337e-01]],
       grad_fn=<SigmoidBackward0>)

Device#

For tensors and the model you are using, you can select the device in which the tensor is to be used. Find out more in the specific page.

The following example shows how to check the device for your tensor. By default it’s cpu.

torch.randn([5, 5]).device

device(type='cpu')

In PyTorch, most objects typically encapsulate tensors, allowing you to access their device through the tensors’ devices. The following example demonstrates how to access the devices for the weights and biases of two linear layers.

example_sequential = torch.nn.Sequential(
    torch.nn.Linear(3,3),
    torch.nn.Linear(3,3)
)

for param in example_sequential.parameters():
    print(param.device)

cpu
cpu
cpu
cpu

Reproducibility#

Reproducibility is fundamental when working with data, but there is inherent randomness in machine learning algorithms. To control this in PyTorch, the following tools are available:

Set seeds using torch.manual_seed, torch.cuda.manual_seed, and torch.mps.manual_seed.
Some objects, like torch.utils.data.DataLoader, require setting a generator object to control randomness.
Certain CUDA algorithms use non-deterministic approaches. By using torch.use_deterministic_algorithms(True), you can ensure that PyTorch avoids non-deterministic operations where possible, and it will alert you when it can’t.

For more details, check the official reproducibility guideline.

The following example generates a random tensor with the seed set to 10:

torch.manual_seed(10)
torch.randn(4, 4)

tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
        [-0.1974, -0.9643, -0.5133,  2.6278],
        [-0.7465,  1.0051, -0.2568,  0.4765],
        [-0.6652, -0.3627, -1.4504, -0.2496]])

By running the same code again, the exact same tensor will be returned.

torch.manual_seed(10)
torch.randn(4, 4)

tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
        [-0.1974, -0.9643, -0.5133,  2.6278],
        [-0.7465,  1.0051, -0.2568,  0.4765],
        [-0.6652, -0.3627, -1.4504, -0.2496]])

Data primitives#

Torch provides special tools for handling data: torch.utils.data.Dataset and torch.utils.data.DataLoader. Learn more in the:

Datasets & DataLoaders tutorial on the official website.

Dataset contains the data, while DataLoader allows you to split the data into batches.

Check special Data primitives page.

The following cell defines an example of a custom Dataset. This dataset contains 10 items, where each item returns a number from 0 to 9 as the x observation, and the square of the corresponding x as the y value.

class SimpleDataset(torch.utils.data.Dataset):
    def __init__(self):
        self.data = list(range(10))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x = self.data[idx]
        y = x ** 2
        return x, y

Here is an example of using that class.

simple_dataset = SimpleDataset()
simple_dataset[4]

(4, 16)

The main advantage here is that you can use it in combination with DataLoader. DataLoader requires a Dataset and acts as an iterable object that, at each iteration, returns a batch with x and y values. Following cell shows it.

data_loader = torch.utils.data.DataLoader(
    simple_dataset, batch_size=3, shuffle=True
)
for batch in data_loader:
    print(batch)

[tensor([3, 0, 5]), tensor([ 9,  0, 25])]
[tensor([8, 2, 6]), tensor([64,  4, 36])]
[tensor([4, 1, 9]), tensor([16,  1, 81])]
[tensor([7]), tensor([49])]

We get shuffled batches, but each x still corresponds to its square in y.