DeepLearning-L8-ResNet


2015年,Kaiming He在《Deep Residual Learning for Image Recognition》提出ResNet,將網絡深度提升到152層,奪得 ILSVRC 2015的冠軍。

1. 深度網絡的問題

深層網絡能夠表示非常複雜的函數,在反向傳播過程中,梯度會逐漸消失(假如採用Sigmoid函數,對於幅度爲1的信號,每向後傳遞一層,梯度就衰減爲原來的0.25,層數越多,衰減越厲害),導致無法對前面網絡層的權重進行有效的調整。
在這裏插入圖片描述
隨着網絡層級的不斷增加,模型精度不斷得到提升,而當網絡層級增加到一定的數目以後,訓練精度和測試精度迅速下降,這說明當網絡變得很深以後,深度網絡就變得更加難以訓練了,因此並不是網絡越深越好。
在這裏插入圖片描述

2. ResNet基本模塊

ResNets使用"shortcut"或者叫"skip connection" ,使得梯度可以直接傳播到前幾層。
在這裏插入圖片描述

(1)The identity block

輸入輸出維度一致
在這裏插入圖片描述

Identity block. Skip connection "skips over" 2 layers.

在這裏插入圖片描述

Identity block. Skip connection "skips over" 3 layers.

First componentof main path:

  • The first CONV2D has F1F_1 filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2a'. Use 0 as the seed for the random initialization.
  • The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

  • The second CONV2D has F2F_2 filters of shape (f,f)(f,f) and a stride of (1,1). Its padding is “same” and its name should be conv_name_base + '2b'. Use 0 as the seed for the random initialization.
  • The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

  • The third CONV2D has F3F_3 filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2c'. Use 0 as the seed for the random initialization.
  • The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Final step:

  • The shortcut and the input are added together.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

(2)The convolutional block

輸入輸出維度不一致
在這裏插入圖片描述
First component of main path:

  • The first CONV2D has F1F_1 filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '2a'.
  • The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

  • The second CONV2D has F2F_2 filters of (f,f) and a stride of (1,1). Its padding is “same” and it’s name should be conv_name_base + '2b'.
  • The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

  • The third CONV2D has F3F_3 filters of (1,1) and a stride of (1,1). Its padding is “valid” and it’s name should be conv_name_base + '2c'.
  • The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Shortcut path:

  • The CONV2D has F3F_3 filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '1'.
  • The BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '1'.

Final step:

  • The shortcut and the main path values are added together.
  • Then apply the ReLU activation function. This has no name and no hyperparameters.

3. ResNet網絡結構

ResNet-50 model
在這裏插入圖片描述

  • Zero-padding pads the input with a pad of (3,3)
  • Stage 1:
    • The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
    • BatchNorm is applied to the channels axis of the input.
    • MaxPooling uses a (3,3) window and a (2,2) stride.
      -** Stage 2**:
    • The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
    • The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.
  • Stage 3:
    • The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
    • The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.
  • Stage 4:
    • The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
    • The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.
  • Stage 5:
    • The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
    • The 2 identity blocks use three set of filters of size [512, 512, 2048], “f” is 3 and the blocks are “b” and “c”.
  • The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.
  • The flatten doesn’t have any hyperparameters or name.
  • The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be 'fc' + str(classes).

《Identity Mappings in Deep Residual Networks》提出了ResNet V2。通過研究 ResNet 殘差學習單元的傳播公式,發現前饋和反饋信號可以直接傳輸,因此“shortcut connection”(捷徑連接)的非線性激活函數(如ReLU)替換爲 Identity Mappings。同時,ResNet V2 在每一層中都使用了 Batch Normalization。這樣處理後,新的殘差學習單元比以前更容易訓練且泛化性更強。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章