GRU和GRUCell
最直观的图解
一、API
1.1 GRU
官方计算方式如下:
API如下
nn.GRU(*args, **kwargs):
input_size: 输入的每一维的特征数(或许叫做feature_num比较容易理解)
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
would mean stacking two GRUs together to form a `stacked GRU`,
with the second GRU taking in outputs of the first GRU and
computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
GRU layer except the last layer, with dropout probability equal to
:attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``
示例如下:
import torch
import torch.nn as nn
rnn = nn.GRU(input_size=10, hidden_size=20, num_layers=2,batch_first=True)
# (batch_size, seq_len, input_size)
input = torch.randn(3 , 5 , 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)
# output shape :[5, 3, 20]
output, hn = rnn(input, h0)
1.2 GRUCell
计算表达式:
是不是很眼熟,是不是感觉和上面的形式是一模一样的,他们本来就是一个东西,只是一个GRU可以有多个GRUCell,经过多个时间步组合起来的GRUCell就可以看成是GRU了。
函数的签名(意义和GRU一致,不重复了,手动狗头):
nn.GRUCell(input_size, hidden_size, bias=True)
实例如下:
import torch
import torch.nn as nn
rnn = nn.GRUCell(10, 20)
# (_,batch_size.input_size)
input = torch.randn(6, 3, 10)
# (batch, hidden_size)
hx = torch.randn(3, 20)
output = []
for i in range(6):
hx = rnn(input[i], hx)
output.append(hx)
# 6
print(len(output))
# (batch, hidden_size)=[3, 20]
output[0].shape
二、使用以及区别
其实在上面已经表明,经过多个时间步组合起来的GRUCell就可以看成是GRU了。但是为什么有了GRU还会有GRUCell呢,明明已经封装的这么好了,干嘛要再拆开?
可能是另外有大用处吧,个人用的GRU
较多,基本没怎么用GRUCell
,如果后续有到确定要用GRUCell
的情况会再来补充。