GRU和GRUCell

原創

sir_TI

2020-06-25 21:20

GRU和GRUCell

最直观的图解

一、API

1.1 GRU

官方计算方式如下：
$r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)}$
API如下

nn.GRU(*args, **kwargs):

input_size: 输入的每一维的特征数（或许叫做feature_num比较容易理解）
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
        would mean stacking two GRUs together to form a `stacked GRU`,
        with the second GRU taking in outputs of the first GRU and
        computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
        Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
        as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
        GRU layer except the last layer, with dropout probability equal to
        :attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``

示例如下：

import torch
import torch.nn as nn

rnn = nn.GRU(input_size=10, hidden_size=20, num_layers=2,batch_first=True)
# (batch_size, seq_len, input_size)
input = torch.randn(3 , 5 , 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)

# output shape :[5, 3, 20]
output, hn = rnn(input, h0)

1.2 GRUCell

计算表达式：
$r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h$
是不是很眼熟，是不是感觉和上面的形式是一模一样的，他们本来就是一个东西，只是一个GRU可以有多个GRUCell,经过多个时间步组合起来的GRUCell就可以看成是GRU了。

函数的签名（意义和GRU一致，不重复了，手动狗头）：

nn.GRUCell(input_size, hidden_size, bias=True)

实例如下：

import torch
import torch.nn as nn

rnn = nn.GRUCell(10, 20)

# (_,batch_size.input_size)
input = torch.randn(6, 3, 10)

# (batch, hidden_size)
hx = torch.randn(3, 20)
output = []
for i in range(6):
    hx = rnn(input[i], hx)
    output.append(hx)
# 6
print(len(output))

# (batch, hidden_size)=[3, 20]
output[0].shape

二、使用以及区别

其实在上面已经表明，经过多个时间步组合起来的GRUCell就可以看成是GRU了。但是为什么有了GRU还会有GRUCell呢，明明已经封装的这么好了，干嘛要再拆开？

可能是另外有大用处吧，个人用的GRU较多，基本没怎么用GRUCell,如果后续有到确定要用GRUCell的情况会再来补充。

Reference

nn.GRU

nn.GRUCell

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

GRU和GRUCell

GRU和GRUCell

一、API

1.1 GRU

1.2 GRUCell

二、使用以及区别

Reference

AI 画图真刺激，手把手教你如何用 ComfyUI 来画出刺激的图

公司刚入职了一名 Java 中级开发，短短 4 行代码居然凑齐了 3 个 bug！我哭了~~

公众号5月C#/.NET热文一览

git 下载大陆镜像地址

GRU和GRUCell

C++項目之演講比賽模擬

Top 100 Linked Question 修煉------第309題

Top 100 Linked Question 修煉------第297題

Top 100 Linked Question 修煉------第207題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結