Softmax Derivation

討論最簡單情況:

     以神經網絡爲例:

     假設在softmax層,輸入的數據是N維的一維數組,輸出結果是多分類的各個概率,假設爲C類。

--1. input: x --> import data with dimension N, can be writen as $$(x_0, x_1, x_2, ..., x_{N-1})$$,  in neural network, means the last hidden layer output.

               W, b --> the affine weight with shape:(N, C) and (C, )

                 y --> the target label of the data, the value of y is in (0, C-1). I will transform y to be an one-hot vector. In the form $$(y_0, y_1, ..., y_k, ..., y_{C-1}),\ where\ y_k=1,\ y_i=0,\ i\ not\ j$$.

--2: Derivation

      定義隱層輸出爲$$S_i, where\ i\ in\ (0, C-1), S_i= \frac{e^{z_i}}{\sum_{j}{e^{z_j}}},\ where\ z_i = W_{., i}^T x + b_i$$.

     定義損失函數:

                                                         $$Loss = \sum_{i=0}^{i=C-1}{y_i log(S_i)}$$

     求反向傳播參數dx, dW:

     這裏我把它變成求中間變量dz, 然後用dz 推導dx, dW.

                                                    $$\frac{\partial Loss}{\partial z_i}=\sum{\frac{\partial Loss}{\partial S_j} \frac{\partial S_j}{\partial z_i}}$$

                                                     $$\frac{\partial Loss}{\partial S_j} = y_j \times \frac{1}{S_j}$$

                                                      $$\frac{\partial S_j}{\partial z_i} = -S_i S_j,\ if\ i\ not\ equal\ to\ j. $$

                                                     $$\frac{\partial S_j}{\partial z_i} = (1-S_i)S_i,\ if\ i\ equal\ to\ j. $$

So:

                                                      $$\frac{\partial Loss}{\partial z_i} = \sum_{j=0}^{j=C-1} \frac{y_j}{S_j} \times \delta_{j} $$

where:

                                                     $$\delta_{j} = (1-S_i)S_i,\ if\ j=i, else\ (0 - S_i) S_j$$

                                                    $$\frac{\partial Loss}{\partial z_i} = y_i (1-S_i) + \sum_{j\ is\ not\ i} y_j( -S_i)$$

改寫成:

                                                   $$\delta_j = (y_j - S_j)S_i, if\ and\ only\ if\ i=j, y_j = 1$$, one-hot 性質決定了結果中實際只有一個值有效。因此這個形式可以化簡成其中$$y_i$$爲1的那項。

                                                   $$\frac{\partial Loss}{\partial z_i} = \sum_i y_i (y_i - S_i)=y_k-S_k, \ where\ k\ is\ the\ target\ label$$

同理可以求出其他z,然後根據z求出dx, dW.

                歡迎討論!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章