Forward input

Forward input#

This page discusses the input of the forward recurrent layer.

import torch
from torch.nn import RNN

Batched vs unbatched#

You can pass batched and unbatched input to the RNN layers. Batch input supposes that it would be passed a group of the sequences.


Consider what options might be available with the layer using the parameters defined in the following cell:

input_size = 2
hidden_size = 3
sequence_len = 10

rnn = RNN(input_size=input_size, hidden_size=hidden_size)

If you pass a two-dimensional input, it will be considered as unbatched input. You can think of it as a sequence of vectors. The following code demonstrates this:

input = torch.randn(sequence_len, input_size)
rnn(input)
(tensor([[-0.7128,  0.2305,  0.6574],
         [-0.6411,  0.1731,  0.8624],
         [-0.9517, -0.0183,  0.7720],
         [-0.7536,  0.2490,  0.8430],
         [-0.4232, -0.1032,  0.8938],
         [-0.4463,  0.0858,  0.8568],
         [-0.6446,  0.1017,  0.8546],
         [-0.5241, -0.5186,  0.8515],
         [-0.7050,  0.0501,  0.7555],
         [-0.5168, -0.2317,  0.8506]], grad_fn=<SqueezeBackward1>),
 tensor([[-0.5168, -0.2317,  0.8506]], grad_fn=<SqueezeBackward1>))

There is a set of hidden states at each step.

The other variant assumes that a set of sequences is passed to the layer. The following cell demonstrates passing a batch of size samples_number through the RNN layer.

samples_number = 5
input = torch.randn(sequence_len, samples_number, input_size)
output, hidden = rnn(input)
output.shape, hidden.shape
(torch.Size([10, 5, 3]), torch.Size([1, 5, 3]))

Finally, you get a matrix for each step in the sequence. Each row in a matrix represents a single element of the batch.

Hidden state#

With the second argument of the forward method, you can pass the initial hidden states with which the layer should start. It should be tensor of dimentionality \(\left(D \times L , H_{out} \right)\) for unbatched input and \(\left(D \times L, N , H_{out} \right)\) for batched input.

Where:

  • \(D\): 2 if layer is bidirectional and 1 otherwise.

  • \(H_{out}\): Size of the vector describing the hidden state.

  • \(N\): Batch size.

  • \(L\): Layers number.


Consider an example with parameters defined in the following cell.

input_size = 2
hidden_size = 3
sequence_len = 10

Note the second argument of the forward, it should be the initial hidden state. There is one hidden state for each sample.

rnn = RNN(input_size=input_size, hidden_size=hidden_size)
output, hidden = rnn(
    torch.randn(sequence_len, samples_number, input_size),
    torch.randn(1, samples_number, hidden_size)
)
output.shape
torch.Size([10, 5, 3])

Note: Extra outer dimention of the hidden state might look ambigious but it just because we are dealing with layer that contains just one straight layer.

The following cell defines \(RNN\) with the arguments num_layers=3 and bidirectional=True. The number of hidden states increases accordingly.

num_layers = 3
rnn = RNN(
    input_size=input_size, 
    hidden_size=hidden_size, 
    bidirectional=True, 
    num_layers=num_layers
)
output, hidden = rnn(
    torch.randn(sequence_len, samples_number, input_size),
    torch.randn(num_layers*2, samples_number, hidden_size)
)
output.shape
torch.Size([10, 5, 6])