學習RNN-part2

RNN學習-詞向量

詞嵌入層難學:實戰我們會load預訓練模型。本次代碼你會學到3項:

  • 加載詞嵌入層,並用餘弦公式表達詞相似度
  • 使用詞嵌入層可解決詞類analogy問題,例如會使模型基於man2woman,學習到king2?
  • 有些詞嵌入層需要修改,避免政治正確

實戰代碼:

# 1 導入
import numpy as np
from w2v_utils import *
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

# 2 已知嵌入矩陣,計算詞之間的相似度
def cosine_similarity(u, v):
    dot = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    cosine_similarity = dot / norm_u / norm_v
    return cosine_similarity
# 2.1 找兩個詞的相似度
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))  # 0.89
# 2.2 已知合適詞組,找一個詞最般配的另一個詞
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
    e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              
    best_word = None                   
    for w in words:        
        if w in [word_a, word_b, word_c] :
            continue
        cosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c))
        if cosine_sim > max_cosine_sim:
            max_cosine_sim = cosine_sim
            best_word = w
    return best_word
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:
    print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))
'''italy -> italian :: spain -> spanish
india -> delhi :: japan -> tokyo
man -> woman :: boy -> girl
small -> smaller :: large -> larger'''

RNN學習-情感分析

通過嵌入層,將輸入的文字附加上emoji表情+原文字輸出

在這裏插入圖片描述

實戰代碼1-用basicRNN構建表情顏文字

# 1 導入
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
X_train, Y_train = read_csv('data/train_emoji.csv') # m=127
X_test, Y_test = read_csv('data/tesss.csv') # m=56
maxLen = len(max(X_train, key=len).split())
# 1.1 預覽一下
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
"""I am proud of your achievements ?"""
# 1.2 預處理:Y變成(m,5)獨熱碼
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
# 1.3 預覽數據
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt') # 400,001words
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
"""the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos"""

# 2 實現模型
# 2.1 處理輸入詞向量
def sentence_to_avg(sentence, word_to_vec_map):
    """
    提取句子中每個詞的GloVe representation然後累加/句子長度作爲句子的特徵向量"""
    words = sentence.lower().split()
    avg = np.zeros(50,)
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg / len(words)
    return avg
# 2.2 構建basicRNN模型
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Arguments:
    X -- shape (m, 1)
    Y -- shape (m, 1)
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    np.random.seed(1)
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    # Optimization loop
    for t in range(num_iterations):                       
        for i in range(m):                                
            avg = sentence_to_avg(X[i],word_to_vec_map)
            z = np.dot(W,avg) + b
            a = softmax(z)
            cost = -1 * np.multiply(Y[i],np.log(a))
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz
            W = W - learning_rate * dW
            b = b - learning_rate * db       
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)
    return pred, W, b
# 2.3 開始訓練
pred, W, b = model(X_train, Y_train, word_to_vec_map)
'''Epoch: 0 --- cost = [ 2.82117539  2.22537435  3.90409976  3.65077617  4.17192113]
Accuracy: 0.348484848485
Epoch: 100 --- cost = [  7.39085514   6.39666398   0.15943637   9.61056197  11.77782592]
Accuracy: 0.931818181818
Epoch: 200 --- cost = [  7.86956435   7.883712     0.08912738  11.25652113  13.75952996]
Accuracy: 0.954545454545
Epoch: 300 --- cost = [  8.06494045   8.67838712   0.06864535  12.0741376   14.92485916]
Accuracy: 0.969696969697'''
# 2.4 檢驗模型成果
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
'''Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143'''
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
'''
Accuracy: 0.833333333333

i adore you ❤️
i love you ❤️
funny lol ?
lets play with a ball ⚾
food is ready ?
not feeling happy ?'''

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too.

What you should remember from this part:

  • Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
  • Emojify-V1 will perform poorly on sentences such as “This movie is not good and not enjoyable” because it doesn’t understand combinations of words–it just averages all the words’ embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

實戰代碼2:用LSTM構建表情顏文字

# 1 導入
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

# 2 輸入預處理:填充
ef sentences_to_indices(X, word_to_index, max_len):
    """
    將X這些句子轉化成特徵向量並填充
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """    
    m = X.shape[0]                                   
    X_indices = np.zeros((m,max_len))
    
    for i in range(m):                               
        sentence_words = X[i].lower().split()
        j = 0
        for w in sentence_words:
            X_indices[i, j] = word_to_index[w]
            j = j+1
    return X_indices

# 3 詞嵌入層
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    emb_matrix = np.zeros((vocab_len,emb_dim))
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map(word_to_index(index)) 
    embedding_layer = Embedding(input_dim = vocab_len,output_dim = emb_dim,trainable=False)
    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])    
    return embedding_layer

# 4 構建模型
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(input_shape, dtype = 'int32')
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences = True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences = False)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation("softmax")(X)
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs = sentence_indices, outputs=X)
    return model
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
"""_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_3 (Embedding)      (None, 10, 50)            20000050  
_________________________________________________________________
lstm_3 (LSTM)                (None, 10, 128)           91648     
_________________________________________________________________
dropout_3 (Dropout)          (None, 10, 128)           0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 645       
_________________________________________________________________
activation_2 (Activation)    (None, 5)                 0         
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________"""

# 5 訓練
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
"""Epoch 50/50
132/132 [==============================] - 0s - loss: 0.0797 - acc: 0.9848     - ETA: 0s - loss: 0.0812 - acc: 0.984"""

# 6 評估
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
"""Test accuracy =  0.925000008515"""
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
"""
Expected emoji:❤️ prediction: I love taking breaks	?
Expected emoji:? prediction: she is a bully	?
Expected emoji:? prediction: she said yes	?
Expected emoji:❤️ prediction: I love you to the stars and back	?
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))
"""not feeling happy ?"""

我們可以學到

  • keras框架輸入每一個mini-batch必須保證X的長度一致纔可向量化,但句子的長度往往不一致。因此我們:padding
  • 學會如何創建embedding keras層keras.layers.Embedding(vocab_len, sequence_length)
    • step1:將整個X根據mini-batch切成列表indices
    • step2:填充到max length
    • step3:餵給embedding層即可 E維度爲(400001,max_length)
    • step4:生成對應的矩陣
  • If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.
  • Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:
    • To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.
    • An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it’s usually not worth trying to train a large pre-trained set of embeddings.
    • LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.
    • You can use Dropout() right after LSTM() to regularize your network.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章