ResNet理解

       隨着網絡深度的增加,精確度變得飽和,然後迅速退化。出乎意料是,這種退化並不是由於模型的過擬合造成的,而且在適當深度的模型中增加更多的層會導致更高的訓練誤差,爲了訓練更深層次的神經網絡,提出了一種全新的網絡,叫做深度殘差網絡。深度殘差網絡獲得了ILSVRC & COCO 2015 競賽第一名,而且在ImageNet檢測、ImageNet定位、COCO檢測以及COCO分割上均獲得了第一名的成績。

一,殘差塊

一個殘差塊包含兩部分:identity mapping 和 residual mapping,其中identity mapping指的是曲線部分,resdual mapping表示的是非曲線部分,

形式上,將期望的底層映射爲H(x),將堆疊的非線性層擬合另一個映射F(x)=H(x) - x。原始的映射就變成F(x) + x,這裏假設殘差映射比原始的,爲參考的映射更容易優化。

二,深度瓶頸結構

殘差模塊具有兩種形式如上圖所示:左邊適用於小型網絡結構,右圖適用於較深的網絡結構主要用來降低網絡的計算量和參數。

右圖主要使用兩個1*1的卷積和一個3*3的卷積,1*1的卷積先對特徵圖進行降維,再增加特徵圖的維度。

使用瓶頸結構的殘差塊參數爲:1 * 1 * 256 * 64 + 3 * 3 * 64 * 64 + 1 * 1 * 64 * 256 = 69632

不使用瓶頸結構的殘差塊參數爲:3 * 3 * 256 * 256 * 2 = 1179648 

參數大概相差了17倍。

三,網絡結構

1,平原網絡

這裏平原網絡的基準主要受到VGG網絡的啓發,遵循兩個簡單的設計原則:

(1)相同的輸出特徵圖尺寸,層的卷積核數量也是相同的。

(2)特徵圖的尺寸減半,卷積核的數量就加倍,保證每層的時間複雜度相同。

通過strides爲2的卷積層來降低特徵圖的尺寸,如下圖中間的網絡所示。

2,殘差網絡

殘差網絡是在平原網絡的基礎上增加短路連接。當輸入和輸出維度相同時,可以shortcut可以直接使用;不相同時,需要進行維度匹配。其中主要的做法如下:

(1)採用零填充來增加維度;

(2)通過1 * 1 的卷積層來增加維度;

四,殘差網絡

主要的殘差網絡有resnet50/101/152:

resnet50主要由五個部分組成:

  • 通過一個(3, 3)的zero-padding對輸入進行填充;
  • stage1:
  •          conv1:shape(7,7),filters of number 64,strides(2, 2)
  •          BN:channels axis of the input
  •          maxpooling: shape(3, 3), strides(2, 2)
  • stage 2:
  •          conv block: number of filters[64, 64, 256], f is 3, s is 1, block is a
  •         identity block : number of filters[64, 64, 356], f is 3, block is b, c
  • stage 3:
  •            conv block:number of filters[128, 128, 512], f is 3, s is 2, block is a
  •           identity block : number of filters[128, 128, 512], f is 3, block is b, c
  • stage 4:
  •            conv block : number of filters[256, 256, 1024], f is 3, s is 2,block is a
  •            identity block: number of filters[256, 256, 1024], f is 3, block is b, c, d, e, f
  • stage 5:
  •           conv block: number of filters[512, 512, 2048], f is 3, s is 2, block is a
  •          identity block : number of filters[512, 512, 2048], f is 3, block is b, c
  •          avg_pool:shape is(2, 2)

 

五,ResNet50實現

使用keras對ResNet50網絡進行復現:網絡主要包括五個階段.

from keras.layers import Input
from keras.layers import MaxPooling2D, GlobalAveragePooling2D, ZeroPadding2D
from keras.layers import Conv2D, BatchNormalization, Activation, add, Dense
from keras.models import Model
from keras.utils.vis_utils import plot_model


def conv_block(input_tensor, kernel_size, filters, stage, block, strides):
    assert len(filters) == 3
    filter1, filter2, filter3 = filters
    conv_name_base = 'conv'+ str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    # 1 * 1
    x = Conv2D(filter1, kernel_size=(1, 1),
               strides=strides, # stage2, strides=1; stage345, strides=2
               kernel_initializer='he_normal',
               name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(axis=3, name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)
    # 3 * 3
    x = Conv2D(filter2, kernel_size, padding='same',
               kernel_initializer='he_normal', name=conv_name_base + '2b')(x)
    x = BatchNormalization(axis=3, name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)
    # 1 * 1
    x = Conv2D(filter3, kernel_size=(1, 1),
               kernel_initializer='he_normal',
               name=conv_name_base + '2c')(x)
    x = BatchNormalization(axis=3, name=bn_name_base + '2c')(x)
    shortcut = Conv2D(filter3, kernel_size=(1, 1),
                      strides=strides,
                      kernel_initializer='he_normal',
                      name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(axis=3, name=bn_name_base + '1')(shortcut)
    x = add([x, shortcut])
    x = Activation('relu')(x)
    return x

def identity_block(input_tensor, kernel_size, filters, stage, block):
    assert len(filters) == 3
    filter1, filter2, filter3 = filters
    conv_name = 'res' + str(stage) + block + '_barnch'
    bn_name = 'bn' + str(stage) + block + '_branch'
    # 1 * 1
    x = Conv2D(filter1, kernel_size=(1, 1),
               kernel_initializer='he_normal',
               name=conv_name + '2a')(input_tensor)
    x = BatchNormalization(axis=3, name=bn_name + '2a')(x)
    x = Activation('relu')(x)
    # 3 * 3
    x = Conv2D(filter2, kernel_size,
               padding='same',
               kernel_initializer='he_normal',
               name=conv_name + '2b')(x)
    x = BatchNormalization(axis=3, name=bn_name + '2b')(x)
    x = Activation('relu')(x)
    # 1 * 1
    x = Conv2D(filter3, kernel_size=(1, 1),
               kernel_initializer='he_normal',
               name=conv_name + '2c')(x)
    x = BatchNormalization(axis=3, name=bn_name + '2c')(x)
    print(x.shape, input_tensor.shape)
    x = add([x, input_tensor])
    x = Activation('relu')(x)
    return x

def resnet50(input_tensor, include_top=True, classes=1000):
    # stage 1
    x = ZeroPadding2D((3, 3), name='padding')(input_tensor)
    x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2),
               kernel_initializer='he_normal',
               name='conv1')(x)
    x = BatchNormalization(axis=3, name='conv1_bn')(x)
    x = Activation('relu', name='conv1_relu')(x)
    x = MaxPooling2D((3, 3), strides=(2, 2), name='pool')(x)
    # stage 2 repeat 3
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
    # stage 3 repeat 4
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', strides=(2, 2))
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
    # stage 4 repeat 6
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', strides=(2, 2))
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
    # stage 5 repeat 3
    x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', strides=(2, 2))
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')

    if include_top:
        x = GlobalAveragePooling2D(name='avg_pool')(x)
        x = Dense(classes, activation='softmax', name='fc1000')(x)
    model = Model(input=inputs, output=x, name='ResNet50')
    return model


inputs = Input(shape=(224, 224, 3))
model = resnet50(inputs)
plot_model(model, to_file='./resnet50.jpg', show_shapes=True)

六,ResNet50網絡結構圖

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章