Single Sample
Symbols
X = ⎛ ⎝ ⎜ ⎜ x 1 ⋮ x n x ⎞ ⎠ ⎟ ⎟ , Y = ⎛ ⎝ ⎜ ⎜ y 1 ⋮ y n y ⎞ ⎠ ⎟ ⎟ , X = ( x 1 ⋮ x n x ) , Y = ( y 1 ⋮ y n y ) ,
Z [ l ] = ⎛ ⎝ ⎜ ⎜ ⎜ z [ l ] 1 ⋮ z [ l ] n l ⎞ ⎠ ⎟ ⎟ ⎟ , 1 ≤ l ≤ L Z [ l ] = ( z 1 [ l ] ⋮ z n l [ l ] ) , 1 ≤ l ≤ L
A [ l ] = ⎛ ⎝ ⎜ ⎜ ⎜ a [ l ] 1 ⋮ a [ l ] n l ⎞ ⎠ ⎟ ⎟ ⎟ , A ~ [ l ] = ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ a [ l ] 0 a [ l ] 1 ⋮ a [ l ] n l ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ = ( 1 A [ l ] ) , 0 ≤ l ≤ L A [ l ] = ( a 1 [ l ] ⋮ a n l [ l ] ) , A ~ [ l ] = ( a 0 [ l ] a 1 [ l ] ⋮ a n l [ l ] ) = ( 1 A [ l ] ) , 0 ≤ l ≤ L
W [ l ] = ( w [ l ] i j ) n l × n l − 1 , w ′ [ l ] = ⎛ ⎝ ⎜ ⎜ ⎜ w [ l ] 1 , 0 ⋮ w [ l ] n l , 0 ⎞ ⎠ ⎟ ⎟ ⎟ , W ~ [ l ] = ( w ′ [ l ] W [ l ] ) , 1 ≤ l < L W [ l ] = ( w i j [ l ] ) n l × n l − 1 , w ′ [ l ] = ( w 1 , 0 [ l ] ⋮ w n l , 0 [ l ] ) , W ~ [ l ] = ( w ′ [ l ] W [ l ] ) , 1 ≤ l < L
Neural Network Architecture
X = A [ 0 ] → Z [ 1 ] → A [ 1 ] → ⋯ → Z [ L ] → A [ L ] = Y ^ X = A [ 0 ] → Z [ 1 ] → A [ 1 ] → ⋯ → Z [ L ] → A [ L ] = Y ^
Loss Function
z [ l ] i = ∑ j = 0 n l − 1 w [ l ] i j a ~ [ l − 1 ] j , 1 ≤ i ≤ n l , 1 ≤ l ≤ L z i [ l ] = ∑ j = 0 n l − 1 w i j [ l ] a ~ j [ l − 1 ] , 1 ≤ i ≤ n l , 1 ≤ l ≤ L
即 Z l = W [ l ] A ~ [ l − 1 ] , 1 ≤ l ≤ L Z l = W [ l ] A ~ [ l − 1 ] , 1 ≤ l ≤ L
a [ l ] i = g ( z [ l ] i ) , 1 ≤ i ≤ n l , 1 ≤ l ≤ L a i [ l ] = g ( z i [ l ] ) , 1 ≤ i ≤ n l , 1 ≤ l ≤ L
即 A [ l ] = g ( Z [ l ] ) , 1 ≤ l ≤ L A [ l ] = g ( Z [ l ] ) , 1 ≤ l ≤ L
loss ( X , Y ) = − ∑ i = 1 n y [ y i ln y ^ i + ( 1 − y i ) ln ( 1 − y ^ i ) ] loss ( X , Y ) = − ∑ i = 1 n y [ y i ln y ^ i + ( 1 − y i ) ln ( 1 − y ^ i ) ]
公式
∂ ∂ z [ L ] i loss ( X , Y ) = d y ^ i d z [ L ] i ⋅ ∂ ∂ y ^ i loss ( X , Y ) ∂ ∂ z i [ L ] loss ( X , Y ) = d y ^ i d z i [ L ] ⋅ ∂ ∂ y ^ i loss ( X , Y )
= − g ′ ( z [ L ] ) [ y i ⋅ 1 y ^ i − ( 1 − y i ) ⋅ 1 1 − y ^ i ] = − g ′ ( z [ L ] ) [ y i ⋅ 1 y ^ i − ( 1 − y i ) ⋅ 1 1 − y ^ i ]
= − y ^ i ( 1 − y ^ i ) [ y i ⋅ 1 y ^ i − ( 1 − y i ) ⋅ 1 1 − y ^ i ] = − y ^ i ( 1 − y ^ i ) [ y i ⋅ 1 y ^ i − ( 1 − y i ) ⋅ 1 1 − y ^ i ]
= ( 1 − y i ) y ^ i − y i ( 1 − y ^ i ) = ( 1 − y i ) y ^ i − y i ( 1 − y ^ i )
= y ^ i − y i , 1 ≤ i ≤ n L = y ^ i − y i , 1 ≤ i ≤ n L
∂ ∂ z [ l ] j loss ( X , Y ) = ∑ i = 1 n l + 1 ∂ z [ l + 1 ] i ∂ z [ l ] j ⋅ ∂ ∂ z [ l + 1 ] i loss ( X , Y ) ∂ ∂ z j [ l ] loss ( X , Y ) = ∑ i = 1 n l + 1 ∂ z i [ l + 1 ] ∂ z j [ l ] ⋅ ∂ ∂ z i [ l + 1 ] loss ( X , Y )
= ∑ i = 1 n l + 1 g ′ ( z [ l ] j ) w [ l ] i j ⋅ ∂ ∂ z [ l + 1 ] i loss ( X , Y ) = ∑ i = 1 n l + 1 g ′ ( z j [ l ] ) w i j [ l ] ⋅ ∂ ∂ z i [ l + 1 ] loss ( X , Y )
= g ′ ( z [ l ] j ) ∑ i = 1 n l + 1 w [ l ] i j ⋅ ∂ ∂ z [ l + 1 ] i loss ( X , Y ) , 1 ≤ j ≤ s l , 1 ≤ l < L = g ′ ( z j [ l ] ) ∑ i = 1 n l + 1 w i j [ l ] ⋅ ∂ ∂ z i [ l + 1 ] loss ( X , Y ) , 1 ≤ j ≤ s l , 1 ≤ l < L
因此
∂ ∂ Z [ l ] loss ( X , Y ) = ⎧ ⎩ ⎨ ⎪ ⎪ A [ L ] − Y , l = L g ′ ( Z [ l ] ) . ∗ ( ( W [ l + 1 ] ) ⊺ ∂ ∂ Z [ l + 1 ] loss ( X , Y ) ) , 1 ≤ l < L ∂ ∂ Z [ l ] loss ( X , Y ) = { A [ L ] − Y , l = L g ′ ( Z [ l ] ) . ∗ ( ( W [ l + 1 ] ) ⊺ ∂ ∂ Z [ l + 1 ] loss ( X , Y ) ) , 1 ≤ l < L
where .* is element-wise product.
∂ ∂ w [ l ] i j loss ( X , Y ) = ∂ ∂ z [ l ] i loss ( X , Y ) ⋅ a ~ [ l − 1 ] j , 1 ≤ i ≤ s l + 1 , 0 ≤ j ≤ s l , 1 ≤ l ≤ L ∂ ∂ w i j [ l ] loss ( X , Y ) = ∂ ∂ z i [ l ] loss ( X , Y ) ⋅ a ~ j [ l − 1 ] , 1 ≤ i ≤ s l + 1 , 0 ≤ j ≤ s l , 1 ≤ l ≤ L
因此
∂ ∂ W ~ [ l ] loss ( X , Y ) = ∂ ∂ Z [ l ] loss ( X , Y ) ⋅ A ~ [ l − 1 ] ⊺ , 1 ≤ l ≤ L ∂ ∂ W ~ [ l ] loss ( X , Y ) = ∂ ∂ Z [ l ] loss ( X , Y ) ⋅ A ~ [ l − 1 ] ⊺ , 1 ≤ l ≤ L
Multiple Samples
Symbols
X = ( X ( 1 ) , ⋯ , X ( m ) ) , X = ( X ( 1 ) , ⋯ , X ( m ) ) ,
Y = ( Y ( 1 ) , ⋯ , Y ( m ) ) , Y = ( Y ( 1 ) , ⋯ , Y ( m ) ) ,
Z [ l ] = ( Z [ l ] ( 1 ) , ⋯ , Z [ l ] ( m ) ) , 1 ≤ l ≤ L Z [ l ] = ( Z [ l ] ( 1 ) , ⋯ , Z [ l ] ( m ) ) , 1 ≤ l ≤ L
A [ l ] = ( A [ l ] ( 1 ) , ⋯ , A [ l ] ( m ) ) , 0 ≤ l ≤ L A [ l ] = ( A [ l ] ( 1 ) , ⋯ , A [ l ] ( m ) ) , 0 ≤ l ≤ L
A ~ [ l ] = ( A ~ [ l ] ( 1 ) , ⋯ , A ~ [ l ] ( m ) ) , 0 ≤ l ≤ L A ~ [ l ] = ( A ~ [ l ] ( 1 ) , ⋯ , A ~ [ l ] ( m ) ) , 0 ≤ l ≤ L
∂ Z [ l ] = ( ∂ ∂ Z [ l ] loss ( X ( 1 ) , Y ( 1 ) ) , ⋯ , ∂ ∂ Z [ l ] loss ( X ( m ) , Y ( m ) ) ) n l × m , 1 ≤ l ≤ L ∂ Z [ l ] = ( ∂ ∂ Z [ l ] loss ( X ( 1 ) , Y ( 1 ) ) , ⋯ , ∂ ∂ Z [ l ] loss ( X ( m ) , Y ( m ) ) ) n l × m , 1 ≤ l ≤ L
Cost Function
cost ( X , Y ) = 1 m ∑ i = 1 m loss ( X ( i ) , Y ( i ) ) cost ( X , Y ) = 1 m ∑ i = 1 m loss ( X ( i ) , Y ( i ) )
公式
Z [ l ] = W [ l ] A ~ [ l − 1 ] , 1 ≤ l < L Z [ l ] = W [ l ] A ~ [ l − 1 ] , 1 ≤ l < L
A [ l ] = g ( Z [ l ] ) , 1 ≤ l ≤ L A [ l ] = g ( Z [ l ] ) , 1 ≤ l ≤ L
g ′ ( Z [ l ] ) = A [ l ] . ∗ ( 1 n l × m − A [ l ] ) , 1 ≤ l ≤ L g ′ ( Z [ l ] ) = A [ l ] . ∗ ( 1 n l × m − A [ l ] ) , 1 ≤ l ≤ L
∂ Z [ l ] = { A [ L ] − Y , l = L g ′ ( Z [ l ] ) . ∗ ( ( W [ l + 1 ] ) ⊺ ⋅ ∂ Z [ l + 1 ] ) , 1 ≤ l < L ∂ Z [ l ] = { A [ L ] − Y , l = L g ′ ( Z [ l ] ) . ∗ ( ( W [ l + 1 ] ) ⊺ ⋅ ∂ Z [ l + 1 ] ) , 1 ≤ l < L
∂ ∂ W ~ [ l ] cost ( X , Y ) = 1 m ∂ Z [ l ] ⋅ A ~ [ l − 1 ] ⊺ , 1 ≤ l ≤ L ∂ ∂ W ~ [ l ] cost ( X , Y ) = 1 m ∂ Z [ l ] ⋅ A ~ [ l − 1 ] ⊺ , 1 ≤ l ≤ L