Softmax函數和交叉熵Cross-entropy以及KL散度求導

參考鏈接:https://blog.csdn.net/qian99/article/details/78046329

交叉熵cross-entropy

對一個分類神經網絡ff,輸出爲z=f(x;θ),z=[z0,z1,,zC1]z=f(x;\theta),z=[z_{0},z_{1},\cdots,z_{C-1}],zz爲logits,其中類別數量爲CC,yyxx的one-hot標籤。通過softmax歸一化來得到概率:
pi=expzijexpzjp_{i}=\frac{\exp{z_{i}}}{\sum_{j}{\exp{z_{j}}}}
負交叉熵誤差爲:
L=iyilogpi\mathcal{L}=-\sum_{i}y_{i}\log{p_{i}}
誤差對於概率的梯度爲:
Lpi=yi1pi\frac{\partial \mathcal{L}}{\partial p_{i}}=-y_{i}\frac{1}{p_{i}}
緊接着計算pizk,k=0,1,...,C1\frac{\partial \mathcal{p_{i}}}{\partial z_{k}},k=0,1,...,C-1:
(1)當k=ik=i時,
pizi=(expzijexpzj)zi=expzijexpzj(expzi)2(jexpzj)2=(expzijexpzj)(1expzijexpzj)=pi(1pi)\frac{\partial \mathcal{p_{i}}}{\partial z_{i}}=\frac{\partial ( \frac{\exp{z_{i}}}{\sum_{j}{\exp{z_{j}}}})}{\partial z_{i}}=\frac{\exp{z_{i}}\sum_{j}\exp{z_{j}}-(\exp{z_{i}})^{2}}{(\sum_{j}{\exp{z_{j}}})^{2}} \\ =( \frac{\exp{z_{i}}}{\sum_{j}{\exp{z_{j}}}})(1- \frac{\exp{z_{i}}}{\sum_{j}{\exp{z_{j}}}})=p_{i}(1-p_{i})

(2)當kik\neq i時,
pizk=(expzijexpzj)zk=expziexpzk(jexpzj)2=pipk\frac{\partial \mathcal{p_{i}}}{\partial z_{k}}=\frac{\partial ( \frac{\exp{z_{i}}}{\sum_{j}{\exp{z_{j}}}})}{\partial z_{k}}=\frac{-\exp{z_{i}}\exp{z_{k}}}{(\sum_{j}{\exp{z_{j}}})^{2}} =-p_{i}p_{k}
根據求導的鏈式法則:
Lzk=j(Lpjpjzk)=j=/k(Lpjpjzk)+(Lpkpkzk)=j=/k(yj1pjpjpk)+(yk1pkpk(1pk))=j=/k(yjpk)yk+ykpk=pkjyjyk\frac{\partial \mathcal{\mathcal{L}}}{\partial z_{k}}=\sum_{j}(\frac{\partial \mathcal{L}}{\partial p_{j}}\frac{\partial \mathcal{p_{j}}}{\partial z_{k}})\\ =\sum_{j=/k}(\frac{\partial \mathcal{L}}{\partial p_{j}}\frac{\partial \mathcal{p_{j}}}{\partial z_{k}})+(\frac{\partial \mathcal{L}}{\partial p_{k}}\frac{\partial \mathcal{p_{k}}}{\partial z_{k}})\\ =\sum_{j=/k}(-y_{j}\frac{1}{p_{j}}*-p_{j}p_{k})+(-y_{k}\frac{1}{p_{k}}*p_{k}(1-p_{k}))\\ =\sum_{j=/k}(y_{j}p_{k})-y_{k}+y_{k}p_{k}\\ =p_{k}\sum_{j}y_{j}-y_{k}
因爲yy爲one-hot編碼,所以jyj=1\sum_{j}y_{j}=1,i.e.,
Lzk=pkyk\frac{\partial \mathcal{\mathcal{L}}}{\partial z_{k}}=p_{k}-y_{k}

相對熵KL散度

預測的概率分佈pp,真實概率分佈爲qq,KL的散度爲:
L=KL(qp)=kqclogqkpk\mathcal{L}=KL(q||p)=\sum_{k}q_{c}\log{\frac{q_{k}}{p_{k}}}
求解對概率pkp_{k}的梯度
Lpk=qkpk\frac{\partial \mathcal{\mathcal{L}}}{\partial p_{k}}=-\frac{q_{k}}{p_{k}}
求解對logits zkz_{k}的梯度:
Lzc=j(Lpjpjzk)=j=/k(Lpjpjzk)+(Lpkpkzk)=j=/k(qjpjpjpk)+(qkpkpk(1pk))=j=/k(qjpk)+qkpkqk=jqjpkqk\frac{\partial \mathcal{\mathcal{L}}}{\partial z_{c}}= \sum_{j}(\frac{\partial \mathcal{L}}{\partial p_{j}}\frac{\partial \mathcal{p_{j}}}{\partial z_{k}})\\ =\sum_{j=/k}(\frac{\partial \mathcal{L}}{\partial p_{j}}\frac{\partial \mathcal{p_{j}}}{\partial z_{k}})+(\frac{\partial \mathcal{L}}{\partial p_{k}}\frac{\partial \mathcal{p_{k}}}{\partial z_{k}})\\ =\sum_{j=/k}(-\frac{q_{j}}{p_{j}}*-p_{j}p_{k})+(-\frac{q_{k}}{p_{k}}*p_{k}(1-p_{k}))\\ =\sum_{j=/k}(q_{j}p_{k})+q_{k}p_{k}-q_{k}\\ =\sum_{j}q_{j}p_{k}-q_{k}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章