# Transformers

This page discusses the layers of the Torch that implements transformers architecture.

In [3]:
import torch

## Encoder layer

The encoder layer is implemented using the `torch.nn.TransformerEncoderLayer`.

The crucial parameters:

- `d_model`: The dimentionality of the processed sequence unit.
- `nhead`: Number of heads that process the sequence.


The ouput of the layer has the same dimentionality as the input.

---

The following cell defines the regular encoder layer.

In [6]:
encoder = torch.nn.TransformerEncoderLayer(
    d_model=5, nhead=1, batch_first=True
)

Here is defined the kind of input it supposes to process:

In [10]:
inp = torch.rand(32, 10, 5)

A batch of 32 data units, each of which is a sequence of 10 elements from $\mathbb{R}^5$.

In [11]:
encoder(inp).shape

torch.Size([32, 10, 5])

## Transformer encoder

A transformer encoder is an object that stacks a specified number of the transformer layers together. The output of the each layer just becomes the input of the following layer.

---

The next code shows how to initialize the transformer encoder.

In [40]:
transformer_encoder = torch.nn.TransformerEncoder(
    encoder_layer=torch.nn.TransformerEncoderLayer(
        d_model=5, nhead=1, batch_first=True
    ),
    num_layers=2,
    enable_nested_tensor=False
)

Here is an example data shape that can be processed by the `TransformerEncoder` defined earlier.

In [36]:
inp = torch.rand(32, 10, 5)

Here is an example of the data that came through the layer.

In [37]:
transformer_encoder(inp).shape

torch.Size([32, 10, 5])

### Nested tensor

The `TransformerEncoder` has a parameter `enable_nested_tensor`. `True` value forses torch to use special datastucture - nested tensor which is optimised to work with seqences.

**Note:** Nested tensors have requirements for the data they process. If the dimentionality of the data is not even, torch automatically sets the value of `enable_nested_tensor` to `False`.

---

The following cell attempts to define a `TransformerEncoder` that uses a nested tensor with `TransformerEncoderLayer` that have an odd dimentionality for the input tensor.

In [42]:
transformer_encoder = torch.nn.TransformerEncoder(
    encoder_layer=torch.nn.TransformerEncoderLayer(
        d_model=5, nhead=1, batch_first=True
    ),
    num_layers=2,
    enable_nested_tensor=True
)

