GRU和GRUCell

原創

sir_TI

2020-06-25 21:20

GRU和GRUCell

最直觀的圖解

一、API

1.1 GRU

官方計算方式如下：
$r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)}$
API如下

nn.GRU(*args, **kwargs):

input_size: 輸入的每一維的特徵數（或許叫做feature_num比較容易理解）
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
        would mean stacking two GRUs together to form a `stacked GRU`,
        with the second GRU taking in outputs of the first GRU and
        computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
        Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
        as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
        GRU layer except the last layer, with dropout probability equal to
        :attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``

示例如下：

import torch
import torch.nn as nn

rnn = nn.GRU(input_size=10, hidden_size=20, num_layers=2,batch_first=True)
# (batch_size, seq_len, input_size)
input = torch.randn(3 , 5 , 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)

# output shape :[5, 3, 20]
output, hn = rnn(input, h0)

1.2 GRUCell

計算表達式：
$r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h$
是不是很眼熟，是不是感覺和上面的形式是一模一樣的，他們本來就是一個東西，只是一個GRU可以有多個GRUCell,經過多個時間步組合起來的GRUCell就可以看成是GRU了。

函數的簽名（意義和GRU一致，不重複了，手動狗頭）：

nn.GRUCell(input_size, hidden_size, bias=True)

實例如下：

import torch
import torch.nn as nn

rnn = nn.GRUCell(10, 20)

# (_,batch_size.input_size)
input = torch.randn(6, 3, 10)

# (batch, hidden_size)
hx = torch.randn(3, 20)
output = []
for i in range(6):
    hx = rnn(input[i], hx)
    output.append(hx)
# 6
print(len(output))

# (batch, hidden_size)=[3, 20]
output[0].shape

二、使用以及區別

其實在上面已經表明，經過多個時間步組合起來的GRUCell就可以看成是GRU了。但是爲什麼有了GRU還會有GRUCell呢，明明已經封裝的這麼好了，幹嘛要再拆開？

可能是另外有大用處吧，個人用的GRU較多，基本沒怎麼用GRUCell,如果後續有到確定要用GRUCell的情況會再來補充。

Reference

nn.GRU

nn.GRUCell

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

GRU和GRUCell

GRU和GRUCell

一、API

1.1 GRU

1.2 GRUCell

二、使用以及區別

Reference

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

.NET週刊【5月第2期 2024-05-12】

GRU和GRUCell

C++項目之演講比賽模擬

Top 100 Linked Question 修煉------第309題

Top 100 Linked Question 修煉------第297題

Top 100 Linked Question 修煉------第207題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結