Pytorch學習筆記——nn.RNN()

原創

2020-07-07 06:54

pytorch 中使用 nn.RNN 類來搭建基於序列的循環神經網絡，其構造函數如下：
nn.RNN(input_size, hidden_size, num_layers=1, nonlinearity=tanh, bias=True, batch_first=False, dropout=0, bidirectional=False)

RNN的結構如下：

RNN 可以被看做是同一神經網絡的多次賦值，每個神經網絡模塊會把消息傳遞給下一個，我們將這個圖的結構展開
參數解釋如下：

input_size：The number of expected features in the input x，即輸入特徵的維度，一般rnn中輸入的是詞向量，那麼 input_size 就等於一個詞向量的維度。
hidden_size：The number of features in the hidden state h，即隱藏層神經元個數，或者也叫輸出的維度（因爲rnn輸出爲各個時間步上的隱藏狀態）。
num_layers：Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN,with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
即網絡的層數。
nonlinearity：The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'，即激活函數。
bias：If False, then the layer does not use bias weights b_ih and b_hh. Default: True，即是否使用偏置。
batch_first：If True, then the input and output tensors are provided as (batch, seq, feature). Default: False，即輸入數據的形式，默認是 False，如果設置成True，則格式爲(seq(num_step), batch, input_dim)，也就是將序列長度放在第一位，batch 放在第二位。
dropout：If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to :attr:dropout. Default: 0，即是否應用dropout, 默認不使用，如若使用將其設置成一個0-1的數字即可。
birdirectional：If True, becomes a bidirectional RNN. Default: False，是否使用雙向的 rnn，默認是 False。

nn.RNN() 中最主要的參數是 input_size 和 hidden_size，這兩個參數務必要搞清楚。其餘的參數通常不用設置，採用默認值就可以了。

RNN輸入輸出的shape

Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
or :func:torch.nn.utils.rnn.pack_sequence
for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.
Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN,
- for each t. If a :class:torch.nn.utils.rnn.PackedSequence has
been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output.view(seq_len, batch, num_directions, hidden_size),with forward and backward being direction 0 and 1 respectively.
Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len.
Like output, the layers can be separated using
h_n.view(num_layers, num_directions, batch, hidden_size).
Shape:
- Input1: :math: $(L, N, H_{in})$ tensor containing input features where
:math: $H_{in}=\text{input\_size}$ and L represents a sequence length.
- Input2: :math: $(S, N, H_{out})$ tensor
containing the initial hidden state for each element in the batch.
:math: $H_{out}=\text{hidden\_size}$
Defaults to zero if not provided. where :math: $S=\text{num\_layers} * \text{num\_directions}$
If the RNN is bidirectional, num_directions should be 2, else it should be 1.
- Output1: :math: $(L, N, H_{all})$ where :math: $H_{all}=\text{num\_directions} * \text{hidden\_size}$
- Output2: :math: $(S, N, H_{out})$ tensor containing the next hidden state for each element in the batch

輸入shape :input_shape = [時間步數, 批量大小, 特徵維度] =[num_steps(seq_length), batch_size, input_size]
在前向計算後會分別返回輸出 $o$ 和隱藏狀態 $h$ ，其中輸出 $o$ 指的是隱藏層在各個時間步上計算並輸出的隱藏狀態，它們通常作爲後續輸出層的輸⼊。需要強調的是，該“輸出”本身並不涉及輸出層計算，形狀爲output_shape = [時間步數, 批量大小, 隱藏單元個數]=[num_steps(seq_length), batch_size, hidden_size]；隱藏狀態指的是隱藏層在最後時間步的隱藏狀態：當隱藏層有多層時，每⼀層的隱藏狀態都會記錄在該變量中；對於像⻓短期記憶（LSTM），隱藏狀態是⼀個元組 $(h, c)$ ，即hidden state和cell state(此處普通rnn只有一個值)，隱藏狀態 $h$ 的形狀爲hidden_shape = [層數, 批量大小,隱藏單元個數] = [num_layers, batch_size, hidden_size]

代碼

rnn_layer = nn.RNN(input_size=vocab_size, hidden_size=num_hiddens, )

定義模型，其中vocab_size = 1027, hidden_size = 256

num_steps = 35
batch_size = 2
state = None    # 初始隱藏層狀態可以不定義
X = torch.rand(num_steps, batch_size, vocab_size)
Y, state_new = rnn_layer(X, state)
print(Y.shape, len(state_new), state_new.shape)

輸出

torch.Size([35, 2, 256])     1       torch.Size([1, 2, 256])

具體計算過程爲：
$H_t = input * W_{x_h} + H_{t-1} * W_{h_h} + bias$
爲了便於觀察，假設num_step=1，維度變化過程如下：
[batch_size, input_size] * [input_size, hidden_size] + [batch_size, hidden_size] *[hidden_size, hidden_size] +bias
可以發現每個隱藏狀態形狀都是[batch_size, hidden_size], 起始輸出也是一樣的。

另外，可以通過查看源代碼rnn.py文件來分析：

參考鏈接：https://blog.csdn.net/orangerfun/article/details/103934290

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Pytorch學習筆記——nn.RNN()

Python多線程編程深度探索：從入門到實戰

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

Leetcode#300 最長上升子序列

四軸飛行器——電調校準

Leetcode#3 最大無重複字符子串

2018_WWW_DKN- Deep Knowledge-Aware Network for News Recommendation閱讀筆記

Pytorch學習筆記——nn.RNN()

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結