2015年,Kaiming He在《Deep Residual Learning for Image Recognition》提出ResNet,將網絡深度提升到152層,奪得 ILSVRC 2015的冠軍。
1. 深度網絡的問題
深層網絡能夠表示非常複雜的函數,在反向傳播過程中,梯度會逐漸消失(假如採用Sigmoid函數,對於幅度爲1的信號,每向後傳遞一層,梯度就衰減爲原來的0.25,層數越多,衰減越厲害),導致無法對前面網絡層的權重進行有效的調整。
隨着網絡層級的不斷增加,模型精度不斷得到提升,而當網絡層級增加到一定的數目以後,訓練精度和測試精度迅速下降,這說明當網絡變得很深以後,深度網絡就變得更加難以訓練了,因此並不是網絡越深越好。
2. ResNet基本模塊
ResNets使用"shortcut"或者叫"skip connection" ,使得梯度可以直接傳播到前幾層。
(1)The identity block
輸入輸出維度一致
First componentof main path:
- The first CONV2D has filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be
conv_name_base + '2a'
. Use 0 as the seed for the random initialization. - The first BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2a'
. - Then apply the ReLU activation function. This has no name and no hyperparameters.
Second component of main path:
- The second CONV2D has filters of shape and a stride of (1,1). Its padding is “same” and its name should be
conv_name_base + '2b'
. Use 0 as the seed for the random initialization. - The second BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2b'
. - Then apply the ReLU activation function. This has no name and no hyperparameters.
Third component of main path:
- The third CONV2D has filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be
conv_name_base + '2c'
. Use 0 as the seed for the random initialization. - The third BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2c'
. Note that there is no ReLU activation function in this component.
Final step:
- The shortcut and the input are added together.
- Then apply the ReLU activation function. This has no name and no hyperparameters.
(2)The convolutional block
輸入輸出維度不一致
First component of main path:
- The first CONV2D has filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be
conv_name_base + '2a'
. - The first BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2a'
. - Then apply the ReLU activation function. This has no name and no hyperparameters.
Second component of main path:
- The second CONV2D has filters of (f,f) and a stride of (1,1). Its padding is “same” and it’s name should be
conv_name_base + '2b'
. - The second BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2b'
. - Then apply the ReLU activation function. This has no name and no hyperparameters.
Third component of main path:
- The third CONV2D has filters of (1,1) and a stride of (1,1). Its padding is “valid” and it’s name should be
conv_name_base + '2c'
. - The third BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '2c'
. Note that there is no ReLU activation function in this component.
Shortcut path:
- The CONV2D has filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be
conv_name_base + '1'
. - The BatchNorm is normalizing the channels axis. Its name should be
bn_name_base + '1'
.
Final step:
- The shortcut and the main path values are added together.
- Then apply the ReLU activation function. This has no name and no hyperparameters.
3. ResNet網絡結構
ResNet-50 model
- Zero-padding pads the input with a pad of (3,3)
- Stage 1:
- The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
- BatchNorm is applied to the channels axis of the input.
- MaxPooling uses a (3,3) window and a (2,2) stride.
-** Stage 2**: - The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
- The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.
- Stage 3:
- The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
- The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.
- Stage 4:
- The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
- The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.
- Stage 5:
- The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
- The 2 identity blocks use three set of filters of size [512, 512, 2048], “f” is 3 and the blocks are “b” and “c”.
- The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.
- The flatten doesn’t have any hyperparameters or name.
- The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be
'fc' + str(classes)
.
《Identity Mappings in Deep Residual Networks》提出了ResNet V2。通過研究 ResNet 殘差學習單元的傳播公式,發現前饋和反饋信號可以直接傳輸,因此“shortcut connection”(捷徑連接)的非線性激活函數(如ReLU)替換爲 Identity Mappings。同時,ResNet V2 在每一層中都使用了 Batch Normalization。這樣處理後,新的殘差學習單元比以前更容易訓練且泛化性更強。