Intro#
PyTorch a great library that provides really convenient and flexible interfaces for building neural networks.
For installation check this page.
import torch
import torch.nn.functional as F
Tensor#
Tensor is a generalisation of a matrix to the case of arbitrary dimensionality. Basic entity with wich torch operates is tensor. FInd out more in specific page.
The following example demonstrates how to create a specific tensor. In this tensor, the elements are denoted as \(\left[ijk\right]\), where \(i\) represents the layer index in the third dimension, \(j\) denotes the row index, and \(k\) indicates the column index.
torch.tensor([
[
[111,112,113,114],
[121,122,123,124],
[131,132,133,134]
],
[
[211,212,213,214],
[221,222,223,224],
[231,232,233,244]
],
])
tensor([[[111, 112, 113, 114],
[121, 122, 123, 124],
[131, 132, 133, 134]],
[[211, 212, 213, 214],
[221, 222, 223, 224],
[231, 232, 233, 244]]])
Gradient#
A key feature of PyTorch that sets it apart from NumPy is its ability to automatically compute gradients for tensors involved in computations. You just need to call the backward
method on the result of your computations. The tensors that participated in these computations will then have a grad
attribute containing the gradients. Find out more on the relevant page.
As example consider fuction:
Suppose we want to calculate the gradient of the \(f\) on \(x\) in point \((1,2,3)\):
Now repeat the same procedure with the torch.
X = torch.tensor([1,2,3], dtype=torch.float, requires_grad=True)
res = (X**2).sum()
res.backward()
X.grad
tensor([2., 4., 6.])
Loss functions#
Torch implements common loss functions. The following table shows some of them:
Loss Function |
Description |
---|---|
|
Binary Cross Entropy |
|
Binary Cross Entropy with Logits |
|
Cross Entropy Loss |
|
Hinge Embedding Loss |
|
Kullback-Leibler Divergence Loss |
|
Mean Absolute Error Loss |
|
Mean Squared Error Loss |
|
Margin Ranking Loss |
|
Multi-Label Margin Loss |
|
Multi-Label Soft Margin Loss |
|
Smooth L1 Loss |
|
Triplet Margin Loss |
|
Negative Log Likelihood Loss |
|
Cosine Embedding Loss |
Find out more in the particular page.
The followgin cell shows applying mse_loss
.
F.mse_loss(
torch.tensor([1,2,3], dtype=torch.float),
torch.tensor([2,3,4], dtype=torch.float)
)
tensor(1.)
Reduction#
The reduction
parameter allows you to specify the type of aggregation to apply to the results of the function. The three commonly used values are none
, mean
, and sum
.
The following cell demonstrates how different types of reduction are applied to the same inputs:
tens1 = torch.tensor([1,2,3], dtype=torch.float)
tens2 = torch.tensor([2,3,4], dtype=torch.float)
for reduction in ["mean", "sum", "none"]:
res = F.mse_loss(tens1, tens2, reduction=reduction)
print(f"reduction - {reduction}, res={res}")
reduction - mean, res=1.0
reduction - sum, res=3.0
reduction - none, res=tensor([1., 1., 1.])
Layers#
PyTorch provides a variety of tools for creating neural network layers. Find out more on the relevant page.
Note: In theory, the term “layer” often refers to a combination of connections and activation functions. However, PyTorch has a more specific abstraction where there are dedicated layers for different functionalities. It’s important to keep this in mind to avoid confusion.
The following table categorizes layers and counts the layers corresponding to each category. All the objects mentioned in the table are accessible via torch.nn
.
Category |
Layers |
Description |
---|---|---|
Linear Layer |
|
Fully connected layers for mapping inputs to outputs. |
Convolutional Layers |
|
Layers for applying convolution operations on data. |
Inverse convolution |
|
Layers that invert convolution. |
Pooling Layers |
|
Layers for downsampling data while preserving important features. |
Normalization Layers |
|
Layers that normalize the input to improve training stability. |
Activation Functions |
|
Non-linear functions applied to layer outputs to introduce non-linearity. |
Recurrent Layers |
|
Layers designed for processing sequential data. |
Padding Layers |
|
Layers that modify the dimensions of data by adding padding. |
Other Layers |
|
Miscellaneous layers including embedding, dropout, and transformer components. |
This table now shows the names of the layers alongside their descriptions.
Consider typical features of such objets. As an example, let’s take a linear layer without going into its peculiarities.
The following cell shows that you can apply layer to the operand.
layer = torch.nn.Linear(10, 3)
layer(torch.rand(3, 10))
tensor([[ 0.5347, -0.0643, -0.2821],
[ 0.2541, 0.2737, -0.2114],
[ 0.4634, 0.2516, -0.2575]], grad_fn=<AddmmBackward0>)
You can use a layer as part of the computation, and it can participate in the backward
pass to compute gradients. The following cell demonstrates how to obtain the gradient for the weight
attribute of a layer.
layer(torch.rand(3, 10)).sum().backward()
layer.weight.grad
Managing network#
Neural networks are built by composing layers. PyTorch provides powerful tools and concepts for building these compositions. This section will delve into these important concepts.
The core of this is the torch.nn.Module
class, which represents the network and handles its parameters and operations.
Find out more in the particular page.
Now let’s take a quick practical overview of the abilities of that class. The following cell defines the ShowNN
as an descendant of the torch.nn.Module
, it uses a Linear
and ReLU
layers inside it.
class ShowNN(torch.nn.Module):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self.layer1 = torch.nn.Linear(3,4)
self.layer2 = torch.nn.Sigmoid()
def forward(self, X: torch.Tensor) -> torch.Tensor:
return self.layer2(self.layer1(X))
The following cell demonstrates that instances of torch.nn.Module
are aware of the layers within them, and when used as callable objects, they apply the transformations described in the forward
method.
network = ShowNN()
print(network.__str__())
network(torch.normal(mean=0, std=10, size=[4,3]))
ShowNN(
(layer1): Linear(in_features=3, out_features=4, bias=True)
(layer2): Sigmoid()
)
tensor([[1.5974e-02, 5.8761e-01, 9.9243e-01, 5.0744e-01],
[2.9262e-01, 4.2199e-02, 3.4829e-01, 2.0401e-01],
[2.6856e-02, 8.9178e-01, 9.9853e-01, 7.6134e-01],
[1.1712e-04, 9.8912e-01, 1.0000e+00, 8.3337e-01]],
grad_fn=<SigmoidBackward0>)
Device#
For tensors and the model you are using, you can select the device in which the tensor is to be used. Find out more in the specific page.
The following example shows how to check the device
for your tensor. By default it’s cpu.
torch.randn([5, 5]).device
device(type='cpu')
In PyTorch, most objects typically encapsulate tensors, allowing you to access their device through the tensors’ devices. The following example demonstrates how to access the devices for the weights and biases of two linear layers.
example_sequential = torch.nn.Sequential(
torch.nn.Linear(3,3),
torch.nn.Linear(3,3)
)
for param in example_sequential.parameters():
print(param.device)
cpu
cpu
cpu
cpu
Reproducibility#
Reproducibility is fundamental when working with data, but there is inherent randomness in machine learning algorithms. To control this in PyTorch, the following tools are available:
Set seeds using
torch.manual_seed
,torch.cuda.manual_seed
, andtorch.mps.manual_seed
.Some objects, like
torch.utils.data.DataLoader
, require setting agenerator
object to control randomness.Certain CUDA algorithms use non-deterministic approaches. By using
torch.use_deterministic_algorithms(True)
, you can ensure that PyTorch avoids non-deterministic operations where possible, and it will alert you when it can’t.
For more details, check the official reproducibility guideline.
The following example generates a random tensor with the seed set to 10:
torch.manual_seed(10)
torch.randn(4, 4)
tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
[-0.1974, -0.9643, -0.5133, 2.6278],
[-0.7465, 1.0051, -0.2568, 0.4765],
[-0.6652, -0.3627, -1.4504, -0.2496]])
By running the same code again, the exact same tensor will be returned.
torch.manual_seed(10)
torch.randn(4, 4)
tensor([[-0.8173, -0.5556, -0.8267, -1.2970],
[-0.1974, -0.9643, -0.5133, 2.6278],
[-0.7465, 1.0051, -0.2568, 0.4765],
[-0.6652, -0.3627, -1.4504, -0.2496]])
Data primitives#
Torch provides special tools for handling data: torch.utils.data.Dataset
and torch.utils.data.DataLoader
. Learn more in the:
Datasets & DataLoaders tutorial on the official website.
Dataset
contains the data, while DataLoader
allows you to split the data into batches.
Check special Data primitives page.
The following cell defines an example of a custom Dataset
. This dataset contains 10 items, where each item returns a number from 0 to 9 as the x
observation, and the square of the corresponding x
as the y
value.
class SimpleDataset(torch.utils.data.Dataset):
def __init__(self):
self.data = list(range(10))
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
x = self.data[idx]
y = x ** 2
return x, y
Here is an example of using that class.
simple_dataset = SimpleDataset()
simple_dataset[4]
(4, 16)
The main advantage here is that you can use it in combination with DataLoader
. DataLoader
requires a Dataset
and acts as an iterable object that, at each iteration, returns a batch with x
and y
values. Following cell shows it.
data_loader = torch.utils.data.DataLoader(
simple_dataset, batch_size=3, shuffle=True
)
for batch in data_loader:
print(batch)
[tensor([3, 0, 5]), tensor([ 9, 0, 25])]
[tensor([8, 2, 6]), tensor([64, 4, 36])]
[tensor([4, 1, 9]), tensor([16, 1, 81])]
[tensor([7]), tensor([49])]
We get shuffled batches, but each x
still corresponds to its square in y
.