版權聲明:本文爲原創文章,未經博主允許不得用於商業用途。
第一個RNN程序用來練手,輸入上聯,輸出下聯,使用了seq2seq模型,如下圖
(Image source: https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/)
模型說明
首先使用word-embedding對漢字重新編碼到500維向量,之後經過encoderRNN和decoderRNN(雙向GRU),其中decoderRNN通過Attention對encoder的最後一個隱藏層輸出加權,decoderRNN的第一輪輸入爲句子起始符SOS。
- 模型使用GRU作爲RNNCell,加入了Luong Attention,word-embedding是從隨模型共同訓練的。
- 由於輸出長度不確定,因此引入句子終結符EOS,當decoderRNN輸出EOS後就視作完成一次輸出。
- 由於RNN很容易出現梯度爆炸,所以使用clipping和GRU作爲Cell,不使用LSTM是爲了減少參數,加速訓練。
代碼如下:
#雙向GRU的編碼器,輸出爲最後一個隱藏層的數據
class EncoderRNN(nn.Module):
def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):
super(EncoderRNN, self).__init__()
self.n_layers = n_layers
self.hidden_size = hidden_size
self.embedding = embedding
# Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'
# because our input size is a word embedding with number of features == hidden_size
self.gru = nn.GRU(hidden_size, hidden_size, n_layers,
dropout=(0 if n_layers == 1 else dropout), bidirectional=True)
def forward(self, input_seq, input_lengths, hidden=None):
# use word-embedding to preprocess input charactors
embedded = self.embedding(input_seq)
# 轉化爲變長的padding
packed = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
outputs, hidden = self.gru(packed, hidden)
# Unpack padding
outputs, _ = nn.utils.rnn.pad_packed_sequence(outputs)
# 雙向RNN輸出直接做和作爲輸出
outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]
return outputs, hidden
# Luong attention layer
class Attn(nn.Module):
def __init__(self, method, hidden_size):
super(Attn, self).__init__()
self.method = method
if self.method not in ['dot', 'general', 'concat']:
raise ValueError(self.method, "is not an appropriate attention method.")
self.hidden_size = hidden_size
if self.method == 'general':
self.attn = nn.Linear(self.hidden_size, hidden_size)
elif self.method == 'concat':
self.attn = nn.Linear(self.hidden_size * 2, hidden_size)
self.v = nn.Parameter(torch.FloatTensor(hidden_size))
def dot_score(self, hidden, encoder_output):
return torch.sum(hidden * encoder_output, dim=2)
def general_score(self, hidden, encoder_output):
energy = self.attn(encoder_output)
return torch.sum(hidden * energy, dim=2)
def concat_score(self, hidden, encoder_output):
energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()
return torch.sum(self.v * energy, dim=2)
def forward(self, hidden, encoder_outputs):
# Calculate the attention weights (energies) based on the given method
if self.method == 'general':
attn_energies = self.general_score(hidden, encoder_outputs)
elif self.method == 'concat':
attn_energies = self.concat_score(hidden, encoder_outputs)
elif self.method == 'dot':
attn_energies = self.dot_score(hidden, encoder_outputs)
# Transpose max_length and batch_size dimensions
attn_energies = attn_energies.t()
# Return the softmax normalized probability scores (with added dimension)
return F.softmax(attn_energies, dim=1).unsqueeze(1)
#使用Luong Attention的Decoder
class LuongAttnDecoderRNN(nn.Module):
def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):
super(LuongAttnDecoderRNN, self).__init__()
# Keep for reference
self.attn_model = attn_model
self.hidden_size = hidden_size
self.output_size = output_size
self.n_layers = n_layers
self.dropout = dropout
# Define layers
self.embedding = embedding
self.embedding_dropout = nn.Dropout(dropout)
self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))
self.concat = nn.Linear(hidden_size * 2, hidden_size)
self.out = nn.Linear(hidden_size, output_size)
self.attn = Attn(attn_model, hidden_size)
def forward(self, input_step, last_hidden, encoder_outputs):
# Note: we run this one step (word) at a time
# embedding SOS
embedded = self.embedding(input_step)
embedded = self.embedding_dropout(embedded)
# Forward through unidirectional GRU
rnn_output, hidden = self.gru(embedded, last_hidden)
# 計算Attention Weight
attn_weights = self.attn(rnn_output, encoder_outputs)
# 計算encoder output基於Attention Weight的加權和
context = attn_weights.bmm(encoder_outputs.transpose(0, 1))
# 合併encoder output和GRU第一輪的輸出
rnn_output = rnn_output.squeeze(0)
context = context.squeeze(1)
concat_input = torch.cat((rnn_output, context), 1)
concat_output = torch.tanh(self.concat(concat_input))
# 將word embedding 轉化回字符
output = self.out(concat_output)
output = F.softmax(output, dim=1)
# Return output and final hidden state
return output, hidden
數據集
- 採用科賽上的中國對聯訓練集,包含77w+的對聯,9000+的漢字,保險起見就不發到網上了
訓練結果
RNN由於具有時序性,所以無法在GPU上很好的加速,因此迭代次數有限,Model文件夾爲迭代29epoch後的模型。
以下對聯爲CharRNN的輸出結果(由於每輪起始是GRU中的Memory爲隨機的,輸出也具有隨機性):
1
上聯:<s>天<\s>
下聯:<s>地<\s>
上聯:<s>雨<\s>
下聯:<s>煙<\s>
2
上聯:<s>米飯<\s>
下聯:<s>油茶<\s>
上聯:<s>山花<\s>
下聯:<s>野禽<\s>
3
上聯:<s>雞冠花<\s>
下聯:<s>龍牙梨<\s>
上聯:<s>孔夫子<\s>
下聯:<s>毛小公<\s>
more
上聯:<s>今天打雷下雨<\s>
下聯:<s>昨日打人走人<\s>
上聯:<s>狗和貓打架不分勝負<\s>
下聯:<s>狼與狗進球就是高多<\s>
文字越多輸出的連貫性越差,並且可能出現如下字數不相符的情況:
上聯:<s>人生沒有彩排,每一天都是現場直播<\s>
下聯:<s>世海無多解勢,衆今豈來地網先爭<\s>
個人理解是如果訓練次數足夠多可以獲得更好的結果。
完整代碼見github