Softmax反向傳播

softmax 公式:

假設有一個向量\textbf{x},其長度爲nx_{i}表示\textbf{x}中的第i個元素,那麼這個元素的softmax值爲: 

y_{i}=\frac{e^{x_{i}}}{\sum{}_{k=0}^{n}e^{x_{k}}}

Softmax反向傳播

當j!=i時,\frac{\partial l}{\partial x_{i}} =\sum_{j=0,j!=i}^{n}\frac{\partial l}{\partial y_{j}}\frac{\partial y_{j}}{\partial x_{i}}=\sum_{j=0,j!=i}^{n}\frac{\partial l}{\partial y_{j}}\frac{-e^{x_{i}}\cdot e^{x_{j}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}

當j=i時,\frac{\partial l}{\partial x_{i}} =\frac{\partial l}{\partial y_{i}}\frac{\partial y_{i}}{\partial x_{i}}=\frac{\partial l}{\partial y_{i}}\frac{e^{x_{i}}\sum{}_{k=0}^{n}e^{x_{k}}-e^{x_{i}}\cdot e^{x_{i}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}

 

所以 將上面兩個式子加起來得到

 \frac{\partial l}{\partial x_{i}} =\sum_{j=0,j!=i}^{n}\frac{\partial l}{\partial y_{j}}\frac{\partial y_{j}}{\partial x_{i}}+\frac{\partial l}{\partial y_{i}}\frac{\partial y_{i}}{\partial x_{i}}=\sum_{j=0,j!=i}^{n}\frac{\partial l}{\partial y_{j}}\frac{-e^{x_{i}}\cdot e^{x_{j}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}+\frac{\partial l}{\partial y_{i}}\frac{e^{x_{i}}\sum{}_{k=0}^{n}e^{x_{k}}-e^{x_{i}}\cdot e^{x_{i}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}=\sum_{j=0}^{n}\frac{\partial l}{\partial y_{j}}\frac{-e^{x_{i}}\cdot e^{x_{j}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}+\frac{\partial l}{\partial y_{i}}\frac{e^{x_{i}}\sum{}_{k=0}^{n}e^{x_{k}}}{\sum{}_{k=0}^{n}e^{x_{k}}\cdot\sum{}_{k=0}^{n}e^{x_{k}}}

=-\sum_{j=0}^{n}\frac{\partial l}{\partial y_{j}}\cdot y_{j}\cdot y_{i}+\frac{\partial l}{\partial y_{i}}\cdot y_{i}

=-(\sum_{j=0}^{n}\frac{\partial l}{\partial y_{j}}\cdot y_{j})\cdot y_{i}+\frac{\partial l}{\partial y_{i}}\cdot y_{i}

注意上式括號裏面的量與x_{i}無關,並且其值爲\frac{\partial l}{\partial y_{j}}y_{j}的逐元素乘積之和,設爲\delta的乘積之和,設其爲\sigma

{\frac{\partial l}{\partial \mathbf{x}}} = -\sigma \cdot \mathbf{y}+{\frac{\partial l}{\partial \mathbf{y}}} \cdot \mathbf{y}= \mathbf{y}\cdot({\frac{\partial l}{\partial \mathbf{y}}}-\sigma )

 

有人問這有什麼意義?

其實這樣就說明softmax的反向傳播在編程的時候並不需要 分i=j和i!=j的情況來計算。

以caffe爲例子 bottom_diff = top_data * (top_diff - sum(top_diff * top_data))  其中*表示點乘。

可以看出這樣來計算backward不需要 gemm矩陣乘,只需要點乘即可完成。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章