Normalisation layers

Contents

Normalisation layers#

This page discusses layers that normalize input data.

import torch

Batch norm#

With torch.nn.BatchNorm1d, torch.nn.BatchNorm2d, and torch.nn.BatchNorm3d, you can normalize the data passed to those layers. They accumulate the normalization parameters from each batch that passes through the layer, and in inference mode, they simply apply the accumulated statistics for normalization.


Consider a two-dimensional example where each object is represented by a row vector.

objecs_numer = 5; features_number = 4
X = (
    torch.normal(0, 1, [objecs_numer, features_number]) +\
    torch.arange(features_number)
)
X
tensor([[ 2.2821,  0.4937,  2.2798,  3.1782],
        [ 0.8915,  0.6284,  0.9562,  2.9096],
        [ 0.3376,  0.5322, -0.5936,  2.1784],
        [-0.2011, -0.0289,  2.5192,  3.6398],
        [ 0.0640,  0.2830,  1.4539,  3.7226]])

Let’s examine standardization implemented manually – simply substituting the mean and dividing by the standard deviation along each column.

(X - X.mean(axis=0))/(X.std(axis=0))
tensor([[ 1.6310,  0.4277,  0.7702,  0.0838],
        [ 0.2199,  0.9421, -0.2954, -0.3453],
        [-0.3422,  0.5747, -1.5431, -1.5133],
        [-0.8889, -1.5676,  0.9630,  0.8212],
        [-0.6199, -0.3769,  0.1053,  0.9536]])

The following cell demonstrates the application of the most straightforward normalization method, nn.BatchNorm1d.

bn = torch.nn.BatchNorm1d(num_features=features_number)
bn(X)
tensor([[ 1.8235,  0.4781,  0.8612,  0.0937],
        [ 0.2459,  1.0532, -0.3303, -0.3860],
        [-0.3826,  0.6425, -1.7252, -1.6919],
        [-0.9938, -1.7524,  1.0766,  0.9181],
        [-0.6930, -0.4213,  0.1177,  1.0661]],
       grad_fn=<NativeBatchNormBackward0>)

We obtained results very close to manual normalization, but not perfectly identical. This discrepancy arises from the numerous mechanisms involved in layer fitting, where computation aligns with classical calculation only under specific circumstances. The following cell replicates such conditions.

bn = torch.nn.BatchNorm1d(num_features=features_number, eps=0, affine=False, momentum=1)
bn(X)
bn.eval()
bn(X)
tensor([[ 1.6310,  0.4277,  0.7702,  0.0838],
        [ 0.2199,  0.9421, -0.2954, -0.3453],
        [-0.3422,  0.5747, -1.5431, -1.5133],
        [-0.8889, -1.5676,  0.9630,  0.8212],
        [-0.6199, -0.3769,  0.1053,  0.9536]])

Therefore, we achieved identical numerical results to the manual implementation.