GRU和GRUCell
最直觀的圖解
一、API
1.1 GRU
官方計算方式如下:
API如下
nn.GRU(*args, **kwargs):
input_size: 輸入的每一維的特徵數(或許叫做feature_num比較容易理解)
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
would mean stacking two GRUs together to form a `stacked GRU`,
with the second GRU taking in outputs of the first GRU and
computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
GRU layer except the last layer, with dropout probability equal to
:attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``
示例如下:
import torch
import torch.nn as nn
rnn = nn.GRU(input_size=10, hidden_size=20, num_layers=2,batch_first=True)
# (batch_size, seq_len, input_size)
input = torch.randn(3 , 5 , 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)
# output shape :[5, 3, 20]
output, hn = rnn(input, h0)
1.2 GRUCell
計算表達式:
是不是很眼熟,是不是感覺和上面的形式是一模一樣的,他們本來就是一個東西,只是一個GRU可以有多個GRUCell,經過多個時間步組合起來的GRUCell就可以看成是GRU了。
函數的簽名(意義和GRU一致,不重複了,手動狗頭):
nn.GRUCell(input_size, hidden_size, bias=True)
實例如下:
import torch
import torch.nn as nn
rnn = nn.GRUCell(10, 20)
# (_,batch_size.input_size)
input = torch.randn(6, 3, 10)
# (batch, hidden_size)
hx = torch.randn(3, 20)
output = []
for i in range(6):
hx = rnn(input[i], hx)
output.append(hx)
# 6
print(len(output))
# (batch, hidden_size)=[3, 20]
output[0].shape
二、使用以及區別
其實在上面已經表明,經過多個時間步組合起來的GRUCell就可以看成是GRU了。但是爲什麼有了GRU還會有GRUCell呢,明明已經封裝的這麼好了,幹嘛要再拆開?
可能是另外有大用處吧,個人用的GRU
較多,基本沒怎麼用GRUCell
,如果後續有到確定要用GRUCell
的情況會再來補充。