NLP(二)

交叉熵誤差

\(H(p,q)=-\sum^{C}_{c=1}{p(c)logq(c)}\)

  • p(c)爲類別概率
  • q(c)爲SoftMax概率
    \(J(\theta)={\frac{1}{N}}\)\(\sum^{N}_{i=1}{-log(\frac{e^{f_{yi}}}{e^{f_c}})}+M\sum_{K}{{\theta}k^2}\)
    \(M\sum_{K}{{\theta}k^2}\)是正則項,避免過擬合或者爆炸
    Q:何時更新詞向量?
    A:固定詞向量,因爲對於較小的語料庫可能過擬合

詞窗口分類

假設我們想要識別人名、地點、組織以及其它(四個類別)
... museums in Paris are amazing...
\(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)
R\(\in{5D}\)
with x=\(x_{window}\)
\(y_y=p(y|x)=\frac{exp(W_{y.}x)}{\sum^{C}_{c=1}{exp(W_{c.}X)}}\)
define:
y:softmax probability output vector (see previous slide)
t:target probability distribution (one-hot)
f=f(x)=\({W_x}\in{R^c}\)and \(f_c\)=c'th element of the f vector
\(\delta_{x}\)J=\(\frac{\delta{}}{\delta{x}}\)\(-logp(y|x)=\sum^{C}_{c=1}\delta_{c}W^T_c=W^T\delta{}\)(代價函數的導數)
Let \(\delta_{x}=W^T\delta{}=\delta{x_{window}}\)
With \(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)
\(\delta_{window}\)=[\(\delta{x_{museums}}\) \(\delta{x_{in}}\) \(\delta{x_{Paris}}\) \(\delta{x_{are}}\) \(\delta{x_{amazing}}\)\(]^T\)
Q:如何更新這些連接的詞向量?

A single layer neural network


與SoftMax的本質區別:輸出可能爲另一個神經元的損失函數
It is a combination of a linear layer and a non linear layer:
Z=Wx+b
a=f(z)
The neural activation a can then be used to compute some output
For instance,a probability via SoftMax
p(y|x)=SoftMax(\(W_a\))
Or an unnormalized score (even simple)
score(x)=\(U^T\)a\(\in\)R

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章