GRU和GRUCell

GRU和GRUCell

最直觀的圖解

圖解LSTM and LSTMCell

一、API

1.1 GRU

官方計算方式如下:
rt=σ(Wirxt+bir+Whrh(t1)+bhr)zt=σ(Wizxt+biz+Whzh(t1)+bhz)nt=tanh(Winxt+bin+rt(Whnh(t1)+bhn))ht=(1zt)nt+zth(t1) r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)}
API如下

nn.GRU(*args, **kwargs):

input_size: 輸入的每一維的特徵數(或許叫做feature_num比較容易理解)
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
        would mean stacking two GRUs together to form a `stacked GRU`,
        with the second GRU taking in outputs of the first GRU and
        computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
        Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
        as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
        GRU layer except the last layer, with dropout probability equal to
        :attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``

示例如下:

import torch
import torch.nn as nn

rnn = nn.GRU(input_size=10, hidden_size=20, num_layers=2,batch_first=True)
# (batch_size, seq_len, input_size)
input = torch.randn(3 , 5 , 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)

# output shape :[5, 3, 20]
output, hn = rnn(input, h0)

1.2 GRUCell

計算表達式:
r=σ(Wirx+bir+Whrh+bhr)z=σ(Wizx+biz+Whzh+bhz)n=tanh(Winx+bin+r(Whnh+bhn))h=(1z)n+zh r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h
是不是很眼熟,是不是感覺和上面的形式是一模一樣的,他們本來就是一個東西,只是一個GRU可以有多個GRUCell,經過多個時間步組合起來的GRUCell就可以看成是GRU了。

函數的簽名(意義和GRU一致,不重複了,手動狗頭):

nn.GRUCell(input_size, hidden_size, bias=True)

實例如下:

import torch
import torch.nn as nn

rnn = nn.GRUCell(10, 20)

# (_,batch_size.input_size)
input = torch.randn(6, 3, 10)

# (batch, hidden_size)
hx = torch.randn(3, 20)
output = []
for i in range(6):
    hx = rnn(input[i], hx)
    output.append(hx)
# 6
print(len(output))

# (batch, hidden_size)=[3, 20]
output[0].shape

二、使用以及區別

其實在上面已經表明,經過多個時間步組合起來的GRUCell就可以看成是GRU了。但是爲什麼有了GRU還會有GRUCell呢,明明已經封裝的這麼好了,幹嘛要再拆開?

可能是另外有大用處吧,個人用的GRU較多,基本沒怎麼用GRUCell,如果後續有到確定要用GRUCell的情況會再來補充。

Reference

nn.GRU

nn.GRUCell

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章