NLP(二)

NLP(二)

原創

小帆敲代碼

2020-10-24 13:46

交叉熵誤差

\(H(p,q)=-\sum^{C}_{c=1}{p(c)logq(c)}\)

p(c)爲類別概率
q(c)爲SoftMax概率
\(J(\theta)={\frac{1}{N}}\)\(\sum^{N}_{i=1}{-log(\frac{e^{f_{yi}}}{e^{f_c}})}+M\sum_{K}{{\theta}k^2}\)
\(M\sum_{K}{{\theta}k^2}\)是正則項，避免過擬合或者爆炸
Q:何時更新詞向量？
A:固定詞向量，因爲對於較小的語料庫可能過擬合

詞窗口分類

假設我們想要識別人名、地點、組織以及其它（四個類別）
... museums in Paris are amazing...
\(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)
R\(\in{5D}\)
with x=\(x_{window}\)
\(y_y=p(y|x)=\frac{exp(W_{y.}x)}{\sum^{C}_{c=1}{exp(W_{c.}X)}}\)
define:
y:softmax probability output vector (see previous slide)
t:target probability distribution (one-hot)
f=f(x)=\({W_x}\in{R^c}\)and \(f_c\)=c'th element of the f vector
\(\delta_{x}\)J=\(\frac{\delta{}}{\delta{x}}\)\(-logp(y|x)=\sum^{C}_{c=1}\delta_{c}W^T_c=W^T\delta{}\)(代價函數的導數）
Let \(\delta_{x}=W^T\delta{}=\delta{x_{window}}\)
With \(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)
\(\delta_{window}\)=[\(\delta{x_{museums}}\) \(\delta{x_{in}}\) \(\delta{x_{Paris}}\) \(\delta{x_{are}}\) \(\delta{x_{amazing}}\)\(]^T\)
Q:如何更新這些連接的詞向量？

A single layer neural network

與SoftMax的本質區別：輸出可能爲另一個神經元的損失函數
It is a combination of a linear layer and a non linear layer:
Z=Wx+b
a=f(z)
The neural activation a can then be used to compute some output
For instance,a probability via SoftMax
p(y|x)=SoftMax(\(W_a\))
Or an unnormalized score (even simple)
score(x)=\(U^T\)a\(\in\)R

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

交叉熵誤差

詞窗口分類

A single layer neural network

實操|基於OceanBase打造更穩定的Zabbix監控系統

Milvus 老友匯｜RAG 場景、電商平臺、AI 平臺……如何用向量數據庫構建業務方案？

提高 RAG 應用準確度，時下流行的 Reranker 瞭解一下？

刷題（一）數組

Materials and Appearances

GAMES101 作業3

Ray Tracking 蒙特卡羅積分和Path Tracking

GAMES101 作業2

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結