也不太簡單的流程圖
layer api
流程圖看到,需要幾個 layer, encoder 這裏就選擇 nn.Embedding, 循環神經
nn.Embedding
- torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)
- num_embeddings: 詞彙量有多大呀,一共有5000個不同單詞(token),你給我傳 100 不是很爲難我嘛
- embedding_dim:每個 vector 用多少數字表示捏?
- padding_idx:我這沒有這個貨,你說你用啥填吧,你說的算,不說就按0填入了。
- Output: (*, H), where * is the input shape and H=embedding_dim
循環神經
先回顧一下公式
- where is the hidden state at time t,
- is the input at time t
- is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0.
- If nonlinearity is ‘relu’, then ReLU is used instead of tanh.
api 參數
-
input_size – The number of expected features in the input x, 也就是你有多少 token(one hot 情況下),但是之前經過了 embedding layer,這個就爲 embedding dictionary dim,
-
hidden_size – The number of features in the hidden state h. 這裏其實不太好理解,看回公式, 上一個hidden state 要與新進入的 input 有個加和,矩陣運算來講,不同的維度是不能相加減,好在 torch 已經做的很好,這個調試就好啦。
-
num_layers – Number of recurrent layers. E.g, setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
-
nonlinearity – The non-linearity to use. Can be either ‘tanh’ or ‘relu’. Default: ‘tanh’
-
bias – If False, then the layer does not use bias weights and . Default: True
-
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
-
dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
-
bidirectional – If True, becomes a bidirectional RNN. Default: False
當然 rnn 現在用的已經很少,幾乎被 lstm,gru 取代,但是思路沒什麼差距,會用一個其他也會,這裏不贅述了
decoder
這裏比較簡單,用 nn,Linear() ,太簡單就不多講了
上代碼
class RNNModel(nn.Module):
""" 一個簡單的循環神經網絡"""
def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5):
''' 該模型包含以下幾層:
- 詞嵌入層
- 一個循環神經網絡層(RNN, LSTM, GRU)
- 一個線性層,從hidden state到輸出單詞表
- 一個dropout層,用來做regularization
'''
super(RNNModel, self).__init__()
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(ntoken, ninp)
if rnn_type in ['LSTM', 'GRU']:
self.rnn = getattr(nn, rnn_type)(ninp, nhid, nlayers, dropout=dropout)
else:
try:
nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
except KeyError:
raise ValueError( """An invalid option for `--model` was supplied,
options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
self.rnn = nn.RNN(ninp, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
self.decoder = nn.Linear(nhid, ntoken)
self.init_weights()
self.rnn_type = rnn_type
self.nhid = nhid
self.nlayers = nlayers
def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
self.decoder.bias.data.zero_()
self.decoder.weight.data.uniform_(-initrange, initrange)
def init_hidden(self, bsz, requires_grad=True):
weight = next(self.parameters())
if self.rnn_type == 'LSTM':
return (weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad),
weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad))
else:
return weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad)
def forward(self, input, hidden):
''' Forward pass:
- word embedding
- 輸入循環神經網絡
- 一個線性層從hidden state轉化爲輸出單詞表
'''
emb = self.drop(self.encoder(input))
output, hidden = self.rnn(emb, hidden)
output = self.drop(output)
decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden