Managing network

Managing network#

Is a class that allows you to define complex neural networks in Torch. Simply use this class as a descendant.

import torch
from torch import nn

from pathlib import Path
from copy import deepcopy

Sequential#

You can use torch.nn.Sequential to combine multiple network layers into a sequential chain. Find out more in the specific page.

The following cell demonstrates a basic example where a linear transformation is applied to the input, followed by a ReLU activation function.

size = 3

sequential = torch.nn.Sequential(
    torch.nn.Linear(size, size, bias=False),
    torch.nn.ReLU()
)

X = torch.randn([3, 3])
sequential(X)

tensor([[0.0000, 0.0000, 0.8781],
        [0.4362, 0.0000, 0.7350],
        [0.0000, 0.0000, 1.1225]], grad_fn=<ReluBackward0>)

Separate class#

You can define a neural network as a separate class, which allows you to add custom logic for initialization or network-specific procedures. To create a network class, follow these rules:

Inherit from torch.nn.Module: This establishes your class as a PyTorch module, providing access to its functionality.
Call super().__init__() in the constructor: This initializes the base nn.Module class, ensuring proper setup.
Define a forward method: This method implements the computational procedure of your network. It defines how input data flows through your layers to produce output.

The following cell defines a set of Linear layers whose size is determined during class creation. The forward method standardizes the data before applying the network.

class ExampleNetwork(torch.nn.Module):
    def __init__(self, layers_number: int, neurons: int):

        super().__init__()

        self.network = torch.nn.Sequential(*[
            torch.nn.Linear(neurons, neurons)
            for i in range(layers_number)
        ])

    def forward(self, X: torch.Tensor):
        X = (X - X.mean(axis=0, keepdim=True))/X.std(axis=0, keepdim=True)
        return self.network(X)

Let’s check if the network we’ve defined works as expected.

ExampleNetwork(layers_number=10, neurons=3)(X = torch.randn([5, 3]))

tensor([[-0.2482,  0.0882,  0.4507],
        [-0.2465,  0.0897,  0.4466],
        [-0.2531,  0.0827,  0.4587],
        [-0.2463,  0.0899,  0.4459],
        [-0.2461,  0.0892,  0.4429]], grad_fn=<AddmmBackward0>)

Parameters#

To be able to optimize network properly you need tools that allows to access paraterers and manage them. As this section we consider typical methods that help to manage model parameters in torch.

Access parameters#

To access model parameters, use the torch.nn.Module.parameters method. This method returns a generator that iterates over the parameters of all layers in the network.

Check the official documentation on the parameters method.

In the following cell we have an empty nn.Module - so when we try to unpack it generator to list we have just an empty list:

class EmptyNetwork(nn.Module):
    pass
empty_network = EmptyNetwork()
[i for i in empty_network.parameters()]

[]

This cell implements such a descendant of the nn.Module, taking some parameters from its files. To be more specific, there are two fully connected layers defined here. So we end up with four tensors, two matrices for fully connected layers and their biases:

class ParametersNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.foo = nn.Linear(3, 3)
        self.bar = nn.Linear(5, 5)

network = ParametersNetwork()
for i in network.parameters():
    print(i.data)

tensor([[-0.4889, -0.2448, -0.1750],
        [ 0.0770, -0.0333,  0.2421],
        [-0.0755, -0.2302, -0.4851]])
tensor([ 0.1668, -0.5771,  0.4508])
tensor([[ 0.1828, -0.3526, -0.3598,  0.4468, -0.2286],
        [-0.0492,  0.3426,  0.2613,  0.2133, -0.2792],
        [-0.2052,  0.2514,  0.0616,  0.4382, -0.2944],
        [-0.2796, -0.0471, -0.4185,  0.4359,  0.2697],
        [ 0.3577,  0.4372, -0.0179,  0.1575,  0.2003]])
tensor([-0.4275,  0.0130,  0.4131, -0.2934,  0.3826])

Requires grad#

You can manipulate the set of parameters that will compute gradients in a neural network. You can do this directly by accessing the parameters and setting their requires_grad attribute to False. However, there is a requires_grad_() method that allows you to set the gradient property for all weights of an nn.Module.

The following cell defines a network and prints the requires_grad attribute of its weights.

torch.manual_seed(10)
model = torch.nn.Sequential(
    torch.nn.Linear(in_features=10, out_features=10),
    torch.nn.ReLU(),
    torch.nn.Linear(in_features=10, out_features=1),
)

for name, parameters in model.named_parameters():
    print(name, parameters.requires_grad)

weight True
bias True
weight True
bias True

By default, all parameters require gradients. The following cell applies requires_grad_(False) to the entire network and then sets requires_grad(True) for just one of the layers.

model.requires_grad_(False)
model[2].requires_grad_(True)

for name, parameters in model.named_parameters():
    print(name, parameters.requires_grad)

weight False
bias False
weight True
bias True

As a result, only the corresponding parameters will require gradients. During the optimization process, only those parameters that require gradients will be updated.

Saving model#

Saving and loading PyTorch models is crucial because any model you build needs to be transferred and deployed in some way. Check the official tutorial. Here, we’ll experiment with the options from the tutorial.

There are various ways to save a Torch model:

Convert the model to a state dict and save the state dict.
Apply the torch.save function directly to the model.
Use TorchScript.

All of these methods are discussed on the specific page.

The preferred method is to save by converting the model into a state dict. The following code shows ways to accomplish this.

model = torch.nn.Sequential(
    torch.nn.Linear(3, 3),
    torch.nn.Linear(3, 3)
)

torch.save(
    obj=model.state_dict(),
    f=Path("/tmp")/"model.pth"
)

Now, by using torch.load, you can extract the state dict you saved earlier.

state_dict = torch.load(f=Path("/tmp")/"model.pth", weights_only=False)
state_dict

OrderedDict([('0.weight',
              tensor([[ 0.4638, -0.0362, -0.3954],
                      [-0.5537, -0.2766, -0.4095],
                      [ 0.0272,  0.4614, -0.5757]])),
             ('0.bias', tensor([-0.5566, -0.5410, -0.0075])),
             ('1.weight',
              tensor([[ 0.1332, -0.5475,  0.4317],
                      [ 0.1927, -0.4506, -0.2670],
                      [-0.3406, -0.4758,  0.4249]])),
             ('1.bias', tensor([ 0.4226, -0.5480, -0.3352]))])

When you have the state dict, you can easily load it into the model using torch.nn.Module.load_state_dict, as shown in the following code.

model.load_state_dict(state_dict)

<All keys matched successfully>

Copying model#

There are many cases where you will need to make a copy of a torch model. However, there are some issues associated with this. This section discusses those issues.

An obvious example is that copying through the = operator merely assigns a new name to the same object. The following cell creates a simple torch.nn.Module and displays its parameters.

model = torch.nn.Linear(in_features=3, out_features=3, bias=False)
next(iter(model.parameters()))

Parameter containing:
tensor([[-0.3469,  0.2282, -0.4584],
        [ 0.3923, -0.0788, -0.3660],
        [ 0.3362, -0.4471, -0.4651]], requires_grad=True)

Now, suppose, you’ve made a copy of the model using the = operator.

copy_model = model

Now, imagine the original object was edited in some way—the following cell assigns a matrix of ones to the parameters we saw before. The key point is that the parameters of the “copy” were also edited.

next(iter(model.parameters())).data = torch.ones(3, 3)
next(iter(copy_model.parameters()))

Parameter containing:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)

Deep copy#

One option is to use the “classical” Python copy.deepcopy.

The following cell demonstrates that a copy made through deepcopy preserves all the properties of the original model at the moment of creation.

model = torch.nn.Linear(in_features=3, out_features=3, bias=False)
models_copy = deepcopy(model)
next(iter(model.parameters())).data = torch.ones(3, 3)
next(iter(models_copy.parameters()))

Parameter containing:
tensor([[-0.0346, -0.4097,  0.0998],
        [ 0.2439,  0.1766, -0.2577],
        [-0.0638, -0.3555,  0.1458]], requires_grad=True)

Note: deepcopy will replicate the model on the device the original model is on. The following cell attempts to deepcopy a model on the GPU, and the copied model will also be on the GPU.

model = torch.nn.Linear(
    in_features=3, 
    out_features=10, 
    bias=False
).to(
    device=torch.device("cuda")
)
model_copy = deepcopy(model)
next(iter(model_copy.parameters())).device.type

'cuda'

State dict#

Another option is to recreate model and pass to load_state_dict of the recreated model, state_dict of the original model.

In the following cell, this trick is demonstrated — copy_model still retains the parameters of the copied model as they were at the moment of copying.

model = torch.nn.Linear(
    in_features=3,
    out_features=3,
    bias=False
)

copy_model = torch.nn.Linear(
    in_features=3,
    out_features=3,
    bias=False
)
copy_model.load_state_dict(model.state_dict())
next(iter(model.parameters())).data = torch.zeros(3, 3)
next(iter(copy_model.parameters()))

Parameter containing:
tensor([[-0.2850, -0.3748,  0.5246],
        [ 0.0166, -0.5607, -0.5215],
        [ 0.4310, -0.5429, -0.2672]], requires_grad=True)

And there aren’t any issues with GPUs. The copied model will retain the device it had during creation — meaning the state dict will be loaded to the cpu, which is the preferable option in most cases.

model = torch.nn.Linear(
    in_features=3,
    out_features=10,
    bias=False
).to(device=torch.device("cuda"))

copy_model = torch.nn.Linear(
    in_features=3,
    out_features=10,
    bias=False
)
copy_model.load_state_dict(state_dict=model.state_dict())
next(iter(copy_model.parameters())).device.type

'cpu'

Float type#

Torch typically stores tensors of parameters as float32. Suppose, for some reason, you want to handle everything in floats with different precision. You can achieve this by changing the dtype of each parameter tensor in your network.

Consider simpe instance of the torch.nn.Module:

model = nn.Sequential(
    nn.Linear(in_features=10, out_features=4),
    nn.Linear(in_features=4, out_features=7)
)

If you try to pass a tensor with the torch.float16 dtype, you will essentially get an error.

X = torch.randn(20, 10, dtype=torch.float16)
try:
    model(X)
except Exception as e:
    print(e)

mat1 and mat2 must have the same dtype, but got Half and Float

But after iterating over all parameters of the network and changing their types to float16, everything works fine.

for p in model.parameters():
    p.data = p.data.to(torch.float16)

model(X)

tensor([[ 0.4626,  0.5537,  0.5444, -0.9111, -0.7881,  0.7598, -0.2137],
        [ 0.2803,  0.2041,  0.1069, -0.4792,  0.4983, -0.0650, -0.7432],
        [ 0.2164,  0.2216,  0.0839, -0.4746,  0.7012,  0.0741, -0.7227],
        [ 0.3647,  0.4707,  0.0446, -0.0297,  0.2322,  0.1843, -0.3499],
        [ 0.3674,  0.3972,  0.0907, -0.2739,  0.1069,  0.1489, -0.5220],
        [ 0.2817,  0.4045,  0.0179, -0.0303,  0.6118,  0.1503, -0.3899],
        [ 0.3469,  0.6929, -0.1037,  0.4578,  0.3857,  0.4629, -0.0354],
        [ 0.3003,  0.6733, -0.0101,  0.0889,  0.2793,  0.7119, -0.1522],
        [ 0.3501,  0.1066,  0.2686, -0.5713,  0.3899, -0.3650, -0.6670],
        [ 0.3330,  0.4702, -0.1165,  0.2805,  0.5518,  0.0474, -0.3538],
        [ 0.3865,  0.2443,  0.2698, -0.4753,  0.1649, -0.1420, -0.5107],
        [ 0.3003,  0.2277,  0.1753, -0.5820,  0.3350,  0.0219, -0.7061],
        [ 0.3093,  0.3950, -0.0828,  0.0101,  0.4932,  0.0659, -0.5503],
        [ 0.3025,  0.0906,  0.4180, -1.2441,  0.0133,  0.0803, -0.9263],
        [ 0.3074,  0.6479,  0.3215, -0.8247, -0.4050,  1.1904, -0.3408],
        [ 0.2764,  0.1730,  0.1339, -0.4465,  0.6064, -0.1582, -0.6943],
        [ 0.1396,  0.2974, -0.1041, -0.3875,  0.8413,  0.3357, -0.8579],
        [ 0.2384,  0.1720,  0.3137, -0.9600,  0.3567,  0.2079, -0.7729],
        [ 0.3989,  0.4893,  0.2861, -0.2346,  0.0409,  0.2910, -0.1187],
        [ 0.2690,  0.4705,  0.0439, -0.2369,  0.3794,  0.4604, -0.4500]],
       dtype=torch.float16, grad_fn=<AddmmBackward0>)