Linear#
The torch.nn.Linear
layer performs the following operation:
Where:
\(l\): number of inputs
\(k\): number of outputs
\(n\): number of input samples
\(X_{n \times l}\): input tensor
\(\omega_{k \times l}\): weight matrix of the layer
\(b_k\): bias vector of the layer
import torch
from torch.nn import Linear
Access to parameters#
This layer allows to get parameters using attributes weights
and bias
.
Here’s an example of how to do it:
linear_layer = Linear(in_features=3, out_features=4)
default_weights = torch.ones_like(linear_layer.weight)
default_biases = torch.zeros_like(linear_layer.bias)
with torch.no_grad():
linear_layer.weight.copy_(default_weights)
linear_layer.bias.copy_(default_biases)
After completing the process, we have the weight
tensor initialized with ones and the bias
tensor initialized with zeros:
print(linear_layer.weight)
print(linear_layer.bias)
Parameter containing:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)
More dimentions#
Unlike classical matrix multiplication, a Linear layer can operate on tensors with higher dimensionality.
Suppose you have a tensor with dimensions \(\left(d_1, d_2, \dots, d_{m-2}, d_{m-1}, d_m\right)\).
A Linear layer designed for such input will have \(d_m\) input features and \(k\) output features, with a weight matrix \(\omega \in \mathbb{R}^{k \times d_m}\).
The output will retain the shape \(\left(d_1, d_2, \dots, d_{m-2}, d_{m-1}, k\right)\), where each of the \(\prod_{i=1}^{m-2} d_i\) subtensors of size \(d_{m-1} \times d_m\) is independently multiplied with \(\omega\).
For example, consider an input that takes shape.
X = torch.randn(3, 5, 4)
X
tensor([[[ 0.6648, 0.0795, -0.3961, 0.0717],
[ 2.2550, 1.3696, -1.4603, 0.9347],
[ 0.2754, -0.6647, -0.0767, 0.2089],
[ 0.7514, 0.3045, -1.1518, -0.4475],
[-0.8777, 0.4888, -0.1978, -0.9798]],
[[-1.7346, 0.5344, -1.8987, 0.5710],
[ 0.5810, -0.0143, 0.7732, -0.3079],
[-0.6366, 0.5068, -1.8391, 1.4452],
[-1.1583, 0.9299, 0.6273, -1.8185],
[ 0.7702, -1.7367, -0.8410, -0.3621]],
[[ 0.2885, -0.1347, 0.8165, -0.4481],
[-0.1231, 0.8926, -0.1328, 0.8820],
[-0.9528, 1.1596, -0.3776, -0.5287],
[ 0.2178, -0.4286, 1.1390, 1.9489],
[-0.7107, 2.1834, 0.6254, 1.3248]]])
Convenient to think of as 3 matrices, with \(5 \times 4\) dimensionality.
A layer that can handle hundreds of such entries is created in the next cell.
linear = Linear(
in_features=4,
out_features=2,
bias=False
)
Directly applying the layer to the data yields \(3\) matrices with \(5 \times 2\) dimensionality.
linear(X)
tensor([[[-0.1042, -0.1680],
[-0.8402, -0.3794],
[ 0.2441, -0.3321],
[-0.1327, -0.1373],
[-0.0276, 0.4840]],
[[ 0.1779, -0.0869],
[-0.1385, 0.1409],
[-0.0116, -0.4490],
[-0.2179, 1.0413],
[ 0.7147, -0.7942]],
[[-0.0410, 0.1841],
[-0.3835, 0.0896],
[-0.3047, 0.5820],
[-0.0099, -0.4006],
[-0.9295, 0.6689]]], grad_fn=<UnsafeViewBackward0>)
The same result could be achieved by taking the input matrices one by one and multiplying them by the weight of the layer.
torch.stack([x @ linear.weight.data.T for x in X])
tensor([[[-0.1042, -0.1680],
[-0.8402, -0.3794],
[ 0.2441, -0.3321],
[-0.1327, -0.1373],
[-0.0276, 0.4840]],
[[ 0.1779, -0.0869],
[-0.1385, 0.1409],
[-0.0116, -0.4490],
[-0.2179, 1.0413],
[ 0.7147, -0.7942]],
[[-0.0410, 0.1841],
[-0.3835, 0.0896],
[-0.3047, 0.5820],
[-0.0099, -0.4006],
[-0.9295, 0.6689]]])