Sequence Model - Sequence Models & Attention Mechanism

原創

2021-08-24 13:58

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

use CNN(AlexNet) first to get a 4096-dimensional vector, feed it to a RNN

Picking the Most Likely Sentence

translate a French sentence \(x\) to the most likely English sentence \(y\) .

it's to find

\[\argmax_{y^{<1>}, \dots, y^{<T_y>}} P(y^{<1>}, \dots, y^{<T_y>} | x) \]

Why not a greedy search?

(Find the most likely words one by one) Because it may be verbose and long.

Beam Search

set the \(B = 3 \text{(beam width)}\), find \(3\) most likely English outputs
consider each for the most likely second word, and then find \(B\) most likely words
do it again until \(<EOS>\)

if \(B = 1\), it's just greedy search.

Length normalization

\[\argmax_{y} \prod_{t = 1}^{T_y} P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

\(P\) is much less than \(1\) (close to \(0\)) take \(\log\)

\[\argmax_{y} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

it tends to give the short sentences.

So you can normalize it (\(\alpha\) is a hyperparameter)

\[\argmax_{y} \frac 1 {T_y^{\alpha}} \sum_{t = 1}^{T_y} \log P(y^{<t>}|x, y^{<1>}, y^{<t - 1>}) \]

Beam search discussion

large \(B\) : better result, slower
small \(B\) : worse result, faster

Error Analysis in Beam Search

let \(y^*\) be human high quality translation, and \(\hat y\) be algorithm output.

\(P(y^* | x) > P(\hat y | x)\) : Beam search is at fault
\(P(y^* | x) \le P(\hat y | x)\) : RNN model is at fault

Bleu(bilingual evaluation understudy) Score

if you have some good referrences to evaluate the score.

\[p_n = \frac{\sum_{\text{n-grams} \in \hat y} \text{Count}_{\text{clip}}(\text{n-grams})} {\sum_{\text{n-grams} \in \hat y} \text{Count}(\text{n-grams})} \]

Bleu details

calculate it with \(\exp(\frac{1}{4} \sum_{n = 1}^4 p_n)\)

BP = brevity penalty

\[BP = \begin{cases} 1 & \text{if~~MT\_output\_length > reference\_output\_length}\\ \exp(1 - \text{reference\_output\_length / MT\_output\_length}) & \text{otherwise} \end{cases} \]

don't want short translation.

Attention Model Intuition

it's hard for network to memorize the whole sentence.

compute the attention weight to predict the word from the context

Attention Model

Use a BiRNN or BiLSTM.

\[\begin{aligned} a^{<t'>} &= (\vec a^{<t'>}, \overleftarrow a^{<t'>})\\ \sum_{t'} \alpha^{<i, t'>} &= 1\\ c^{<i>} &= \sum_{t'} \alpha^{<i, t'>} \alpha^{<t'>} \end{aligned} \]

Computing attention

\[\begin{aligned} \alpha^{<t, t'>} &= \text{amount of "attention" } y^{<t>} \text{ should pay to } a^{<t'>}\\ &= \frac{\exp(e^{<t, t'>})}{\sum_{t' = 1}^{T_x} \exp(e^{<t, t'>})} \end{aligned} \]

train a very small network to learn what the function is

the complexity is \(\mathcal O(T_x T_y)\) , which is so big (quadratic cost)

Speech Recognition - Audio Data

Speech recognition

\(x(\text{audio clip}) \to y(\text{transcript})\)

Attention model for sppech recognition

generate character by character

CTC cost for speech recognition

CTC(Connectionist temporal classification)

"ttt_h_eee___ ____qqq\(\dots\)" \(\rightarrow\) "the quick brown fox"

Basic rule: collapse repeated characters not separated by "blank"

Trigger Word Detection

label the trigger word, let the output be \(1\)s

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

Picking the Most Likely Sentence

Beam Search

Length normalization

Beam search discussion

Error Analysis in Beam Search

Bleu(bilingual evaluation understudy) Score

Bleu details

Attention Model Intuition

Attention Model

Computing attention

Speech Recognition - Audio Data

Speech recognition

Attention model for sppech recognition

CTC cost for speech recognition

Trigger Word Detection

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

ProlificDreamer（VSD）論文閱讀筆記

DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors 閱讀筆記

Sequence Model - Sequence Models & Attention Mechanism

Sequence Model - Natural Language Processing & Word Embeddings

Sequence Models - Recurrent Neural Networks

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Sequence Model - Sequence Models & Attention Mechanism

Various Sequence To Sequence Architectures

Basic Models

Sequence to sequence model

Image captioning

Picking the Most Likely Sentence

Beam Search

Refinements to beam search

Length normalization

Beam search discussion

Error Analysis in Beam Search

Bleu(bilingual evaluation understudy) Score

Bleu details

Attention Model Intuition

Attention Model

Computing attention

Speech Recognition - Audio Data

Speech recognition

Attention model for sppech recognition

CTC cost for speech recognition

Trigger Word Detection