自然語言處理（NLP）：15 圖解attention計算過程（02）

原創

走在前方

2020-06-16 15:42

專注於文本分類、關鍵詞抽取、文本摘要、FAQ問答系統、對話系統語義理解NLU、知識圖譜等。結合工業界具體案例和學術界最新研究成果實現NLP技術場景落地。更多精彩內容加入“NLP技術交流羣” 學習。

文主要介紹 seq2seq框架和attention 在機器翻譯中應用。

seq2seq 面臨問題

遺忘

句子過長，很難記住以前的信息

對齊

輸入和輸出序列不同情況下，如何對齊

Fig. 0.1: seq2seq with an input sequence of length 4

Fig. 0.2: seq2seq with an input sequence of length 64

Attention計算過程

Attention爲每個word計算出一個得分score
對得分進行softmaxed
對encoder 中每個hidden state 進行加權求和，得到當前時刻 context 向量

[Step 0: Prepare hidden states.]

Fig. 1.0: Getting ready to pay attention

encoder 中全部的hidden state(綠色）

4個hidden state

decoder 的hidden state(紅色）

encoder 最後一個hidden state 作爲decoder輸入

[Step 1: Obtain a score for every encoder hidden state]

Fig. 1.1: Get the scores

計算encoder 的hidden state 和 decoder 的hidden state 的得分（相關性）

我們通過一個score函數來計算，本案例中我們使用 dot product 來完成。

decoder_hidden = [10, 5, 10]
encoder_hidden  score
---------------------
     [0, 1, 1]     15 (= 10×0 + 5×1 + 10×1, the dot product)
     [5, 0, 1]     60
     [1, 1, 0]     15
     [0, 5, 1]     35

更多的score function （也被稱爲 alignment score function 或者alignment model）如下所示：

[Step 2: Run all the scores through a softmax layer.]

Fig. 1.2: Get the softmaxed scores

計算出的score-> softmax 層-> 歸一化的分值　softmaxed score (scalar)

encoder_hidden score score^

 [0, 1, 1]     15       0

 [5, 0, 1]     60       1

 [1, 1, 0]     15       0

 [0, 5, 1]     35       0

[Step 3][Multiply each encoder hidden state by its softmaxed score.]

Fig. 1.3: Get the alignment vectors

softmaxed score 和　每個encoder 的hidden state 進行相乘－> 每個hidden state 得到新的向量（context vector）

encoder score score^ alignment

[0, 1, 1] 15 0 [0, 0, 0]

[5, 0, 1] 60 1 [5, 0, 1]

[1, 1, 0] 15 0 [0, 0, 0]

[0, 5, 1] 35 0 [0, 0, 0]

[Step 4][Sum up the alignment vectors.]

Fig. 1.4: Get the context vector

這裏通過加權求和的方式獲取最終 context vector.

encoder  score  score^  alignment
---------------------------------
[0, 1, 1]   15     0  [0, 0, 0]
[5, 0, 1]   60     1  [5, 0, 1]
[1, 1, 0]   15     0  [0, 0, 0]
[0, 5, 1]   35     0  [0, 0, 0]
context = [0+5+0+0, 0+0+0+0, 0+1+0+0] = [5, 0, 1]

[Step 5][Feed the context vector into the decoder.]

Fig. 1.5: Feed the context vector to decoder

[step: 求解attention 的整個過程彙總]

動態演示attention 的整個過程

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

自然語言處理（NLP）：15 圖解attention計算過程（02）

目錄

seq2seq 面臨問題

Attention計算過程

[Step 0: Prepare hidden states.]

[Step 1: Obtain a score for every encoder hidden state]

[Step 2: Run all the scores through a softmax layer.]

[Step 3][Multiply each encoder hidden state by its softmaxed score.]

[Step 4][Sum up the alignment vectors.]

[Step 5][Feed the context vector into the decoder.]

[step: 求解attention 的整個過程彙總]

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

深度學習開源數據集整理

自然語言處理（NLP）：22 BERT中文命名實體識別

自然語言處理（NLP）：11 SelfAttention和transformer Encoder情感分析

人工智能-Pytorch案例實戰（1）-CNN Convolution Layer

自然語言處理（NLP）： 12 BERT文本分類

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結