[Pytorch] - No.2 Pytorch 實現RNN語言模型

最近使用Pytorch,搭建了一個RNNLM,目的是爲了利用詞典中的每個詞的One-Hot編碼(高維的稀疏向量),來生成 Dense Vectors。這篇文章不講解RNN原理以及爲什麼使用RNN語言模型,只是對pytorch中的代碼使用進行講解。
目前Pytorch的資料還比較少,我主要還是通過學習Pytorch文檔+使用Pytorch官方論壇的形式來入門Pytorch
全部代碼如下:

import torch
import torch.nn.functional as F
from torch import nn, optim
from torch.autograd import Variable
from numpy import *
from torch.utils.data import DataLoader
from mydataset import MyDataset

BATCH_SIZE = 5
sentence_set = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()

EMBDDING_DIM = len(sentence_set)+1
HIDDEN_UNITS = 200
word_to_ix = {}
for word in sentence_set:
    if word not in word_to_ix:
        word_to_ix[word] = len(word_to_ix)
print(word_to_ix)


def make_word_to_ix(word,word_to_ix):
    vec = torch.zeros(EMBDDING_DIM)
    #vec = torch.LongTensor(EMBDDING_DIM,1).zero_()
    if word in word_to_ix:
        vec[word_to_ix[word]] = 1
    else:
        vec[len(word_to_ix)] = 1
    return vec


data_words = []
data_labels = []
for i in range(len(sentence_set) -2):
    word = sentence_set[i]
    label = sentence_set[i+1]
    data_words.append(make_word_to_ix(word,word_to_ix))
    data_labels.append(make_word_to_ix(label,word_to_ix))

dataset = MyDataset(data_words, data_labels)
train_loader = DataLoader(dataset, batch_size=BATCH_SIZE)

'''
for _,batch in enumerate(train_loader):
    print("word_batch------------>\n")
    print(batch[0])
    print("label batch----------->\n")
    print(batch[1])
'''
#'''
class RNNModel(nn.Module):
    def __init__(self, embdding_size, hidden_size):
        super(RNNModel, self).__init__()
        self.rnn = nn.RNN(embdding_size, hidden_size,num_layers=1,nonlinearity='relu')
        self.linear = nn.Linear(hidden_size, embdding_size)

    def forward(self, x, hidden):
        #input = x.view(BATCH_SIZE, -1)
        output1, h_n = self.rnn(x, hidden)
        output2 = self.linear(output1)
        log_prob = F.log_softmax(output2)
        return log_prob, h_n


rnnmodel = RNNModel(EMBDDING_DIM, HIDDEN_UNITS)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(rnnmodel.parameters(), lr=1e-3)
#'''
#testing
#input_hidden = torch.autograd.Variable(torch.randn(BATCH_SIZE, HIDDEN_UNITS))
#x = torch.autograd.Variable(torch.rand(BATCH_SIZE,EMBDDING_DIM))
#y,_ = rnnmodel(x,input_hidden)
#print(y)
#''''
for epoch in range(3):
    print('epoch: {}'.format(epoch + 1))
    print('*' * 10)
    running_loss = 0
    input_hidden = torch.autograd.Variable(torch.randn(BATCH_SIZE, HIDDEN_UNITS))
    for _,batch in enumerate(train_loader):
        x = torch.autograd.Variable(batch[0])
        y = torch.autograd.Variable(batch[1])
        # forward
        out, input_hidden = rnnmodel(x, input_hidden)
        trgt = torch.max(y, 1)[1]
        loss = criterion(out, trgt)
        running_loss += loss.data[0]
        # backward
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
    print('Loss: {:.6f}'.format(running_loss / len(word_to_ix)))
#'''
#print(rnnmodel.state_dict().keys())


f = open("res-0104-rnn.txt","w+")
alpha = rnnmodel.state_dict()['rnn.weight_ih_l0']
for word in sentence_set:
    #print(word,torch.unsqueeze(alpha[word_to_ix[word]],0).numpy())
    line = word + " " +str(torch.unsqueeze(alpha[word_to_ix[word]],0).numpy().tolist()[0])+"\n"
    #print(line)
    f.write(line)
f.close()

語料的預處理

這裏的預處理主要指的是將字符串分割成以單詞爲單位的列表,同時統計詞典,爲詞典中的單詞生成One-Hot編碼。對於其他類型或者用途的語料需要使用分詞工具,這裏只是分割成列表,比較簡單,不再贅述。

統計詞典

word_to_ix = {}
for word in sentence_set:
    if word not in word_to_ix:
        word_to_ix[word] = len(word_to_ix)
print(word_to_ix)

首先聲明一個dict,然後對於列表中的每個單詞,如果不存在dict中,添加進dict

生成One-Hot編碼


def make_word_to_ix(word,word_to_ix):
    vec = torch.zeros(EMBDDING_DIM)
    #vec = torch.LongTensor(EMBDDING_DIM,1).zero_()
    if word in word_to_ix:
        vec[word_to_ix[word]] = 1
    else:
        vec[len(word_to_ix)] = 1
    return vec

EMBDDING_DIM 表示One-Hot編碼的維度,這裏設置爲字典長度加1,爲不存在於字典中的詞設置一位。

例如詞典爲{ Apple:0 ,Banana:1,Orange:2}
Apple的One-Hot向量爲 [1,0,0,0]
Banana爲[0,1,0,0]
Orange爲[0,0,1,0]
Lemon位[0,0,0,1]

返回結果的類型爲torch.FloatTensor,在Pytorch中,我們使用張量Tensor來表示向量,矩陣

數據集(Dataset)

torch提供一個Dataset的抽象類,繼承torch.utils.data.Dataset來實現自己的dataset,然後使用Dataloader來加載數據集
關於Dataset和Dataloader的詳細介紹參照Pytoch英文文檔Pytorch中文文檔
簡單來說,我們將樣本(包含數據和標籤)封裝到Dataset中,使用Dataloader,來讀取數據集,進行訓練

data_words = []
data_labels = []
for i in range(len(sentence_set) -2):
    word = sentence_set[i]
    label = sentence_set[i+1]
    data_words.append(make_word_to_ix(word,word_to_ix))
    data_labels.append(make_word_to_ix(label,word_to_ix))

dataset = MyDataset(data_words, data_labels)
train_loader = DataLoader(dataset, batch_size=BATCH_SIZE)

依據之前的One-Hot編碼,生成樣本。這裏使用列表中的某個單詞作爲樣本數據,其下一個單詞作爲樣本標籤

神經網絡模型

torch提供了torch.nn.Module作爲所有神經網絡模型的基類,自定義的神經網絡應該繼承nn.Module同時實現init()方法和forward()方法。init()方法定義了神經網絡的結構,forward()方法定義了神經網絡模型是如何計算前饋的。

class RNNModel(nn.Module):

    def __init__(self, embdding_size, hidden_size):
        super(RNNModel, self).__init__()
        self.rnn = nn.RNN(embdding_size, hidden_size,num_layers=1,nonlinearity='relu')
        self.linear = nn.Linear(hidden_size, embdding_size)

    def forward(self, x, hidden):
        #input = x.view(BATCH_SIZE, -1)
        output1, h_n = self.rnn(x, hidden)
        output2 = self.linear(output1)
        log_prob = F.log_softmax(output2)
        return log_prob, h_n

rnnmodel = RNNModel(EMBDDING_DIM, HIDDEN_UNITS)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(rnnmodel.parameters(), lr=1e-3)

定義神經網絡損失函數,以及optimizeroptimizer使用了torch.optim,這個對象主要是對計算得到的梯度,進行參數更新。同時,在創建optimizer的時候,我們需要設置其參數和學習率(learning rate)lr. SGD表示使用的是隨機梯度下降算法

訓練過程

for epoch in range(3):
    print('epoch: {}'.format(epoch + 1))
    print('*' * 10)
    running_loss = 0
    input_hidden = torch.autograd.Variable(torch.randn(BATCH_SIZE, HIDDEN_UNITS))
    for _,batch in enumerate(train_loader):
        x = torch.autograd.Variable(batch[0])
        y = torch.autograd.Variable(batch[1])
        # forward
        out, input_hidden = rnnmodel(x, input_hidden)
        trgt = torch.max(y, 1)[1]
        loss = criterion(out, trgt)
        running_loss += loss.data[0]
        # backward
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
    print('Loss: {:.6f}'.format(running_loss / len(word_to_ix)))

epoch表示我們將整個數據集訓練的次數,代碼中是3次。
torch.autograd.Variable 是pytorch圖計算的基本單位,所有用於計算的張量,都需要放到Variable中。在這裏主要解釋一下損失函數部分

criterion = nn.CrossEntropyLoss()
…….
trgt = torch.max(y, 1)[1]
loss = criterion(out, trgt)

在我們創建Dataloader的時候,我們聲明瞭batch_size–批大小。就表示,我們在訓練過程中使用了minibatch,即我們不是將單個的樣本放到神經網絡中,然後立刻計算損失值,而是將一批數據,放入神經網絡,然後計算損失值
例如: 如果我們的數據(One-Hot向量爲例)是100維的,batch_size3,那麼實際輸入到rnn中的x其實是 3 x 100 維
交叉熵損失函數 CrossEntropyLoss,常用於將數據分爲C類問題。其兩個參數形式分別爲:

input(N,C),Nbatch_size,C爲類別個數
target(N) 0 <= targets[i] <= C-1 ,即target爲一個N x 1的向量,代表每個每個樣本分類的結果爲第幾類

所以在這裏,需要使用torch.max(y,1)函數,其挑選出y中每一行的最大值,同時返回其最大值所在列的索引
例如:

>> a = torch.randn(4, 4)
>> a

        0.0692  0.3142  1.2513 -0.5428
        0.9288  0.8552 -0.2073  0.6409
        1.0695 -0.0101 -2.4507 -1.2230
        0.7426 -0.7666  0.4862 -0.6628
        torch.FloatTensor of size 4x4]

        >>> torch.max(a, 1)
        (
         1.2513
         0.9288
         1.0695
         0.7426
        [torch.FloatTensor of size 4]
        ,
         2
         0
         0
         0
        [torch.LongTensor of size 4]
        )

torch.max(input, dim)中第二個參數diminput的維度有關。如果input是一個二維的,那麼dim=1,如果input是三維,那麼dim = 2

In[2]: import torch
In[3]: a = torch.randn(2,2,2)
In[4]: a
Out[4]: 

(0 ,.,.) = 
  0.4905 -0.2557
 -0.4251  0.1878

(1 ,.,.) = 
 -0.4327  0.0734
 -1.2723 -0.1210
[torch.FloatTensor of size 2x2x2]
In[5]: torch.max(a,2)
Out[5]: 
(
  0.4905  0.1878
  0.0734 -0.1210
 [torch.FloatTensor of size 2x2], 
  0  1
  1  1
 [torch.LongTensor of size 2x2])

接下來就是backward的過程,注意每次要將梯度清零,避免累加。使用optimizer.step()更新參數

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章