PyTorch Exercise: Computing Word Embeddings: Continuous Bag-of-Words

PyTorch Tutorial

PyTorch中,關於訓練詞向量的練習,描述如下:

The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as pretraining embeddings. It almost always helps performance a couple of percent.

The CBOW model is as follows. Given a target word wiwi and an NN context window on each side, wi1,,wiNwi−1,…,wi−N and wi+1,,wi+Nwi+1,…,wi+N, referring to all context words collectively as CC, CBOW tries to minimize

logp(wi|C)=logSoftmax(A(wCqw)+b)−log⁡p(wi|C)=−log⁡Softmax(A(∑w∈Cqw)+b)

where qwqw is the embedding for word ww.

Implement this model in Pytorch by filling in the class below. Some tips:

  • Think about which parameters you need to define.
  • Make sure you know what shape each operation expects. Use .view() if you need to reshape.
代碼如下:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

CONTEXT_SIZE = 2  # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

# By deriving a set from `raw_text`, we deduplicate the array
vocab = set(raw_text)
vocab_size = len(vocab)

word_to_ix = {word: i for i, word in enumerate(vocab)}
data = []
for i in range(2, len(raw_text) - 2):
    context = [raw_text[i - 2], raw_text[i - 1],
               raw_text[i + 1], raw_text[i + 2]]
    target = raw_text[i]
    data.append((context, target))
print(data[:5])


class CBOW(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(CBOW,self).__init__() 
        self.embeddings = nn.Embedding(vocab_size, embedding_dim) # embeddings, 待訓練參數爲embedding詞表
        self.linear1 = nn.Linear(embedding_dim, vocab_size) # 待訓練參數爲 A b


    def forward(self, inputs):
        embeds = self.embeddings(inputs)
        add_embeds = torch.sum(embeds, dim=0).view(1,-1) # 相加後reshape
        out = self.linear1(add_embeds)
        log_probs = F.log_softmax(out)
        return log_probs

# create your model and train.  here are some functions to help you make
# the data ready for use by your module


def make_context_vector(context, word_to_ix):
    idxs = [word_to_ix[w] for w in context]
    tensor = torch.LongTensor(idxs)
    return Variable(tensor)


make_context_vector(data[0][0], word_to_ix)  # example

# 聲明loss model optimizer
losses = []
loss_function = nn.NLLLoss()
model = CBOW(vocab_size, embedding_dim=20, context_size=CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)

# 訓練10個epoch
for epoch in range(10):
    total_loss = torch.FloatTensor([0])
    for context, target in data:
        context_idxs = [word_to_ix[w] for w in context]
        target_idx = word_to_ix[target]
        context_var = Variable(torch.LongTensor(context_idxs))
        target_var = Variable(torch.LongTensor([target_idx]))
        model.zero_grad()
        log_probs = model(context_var)

        loss = loss_function(log_probs,target_var)
        loss.backward()
        optimizer.step()

        total_loss += loss.data
    losses.append(total_loss)
print(losses)

運行結果:

[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]
[
 260.2805
[torch.FloatTensor of size 1]
, 
 255.0300
[torch.FloatTensor of size 1]
, 
 249.8967
[torch.FloatTensor of size 1]
, 
 244.8781
[torch.FloatTensor of size 1]
, 
 239.9720
[torch.FloatTensor of size 1]
, 
 235.1766
[torch.FloatTensor of size 1]
, 
 230.4900
[torch.FloatTensor of size 1]
, 
 225.9105
[torch.FloatTensor of size 1]
, 
 221.4367
[torch.FloatTensor of size 1]
, 
 217.0672
[torch.FloatTensor of size 1]
]


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章