【CV07】如何使用 Keras 開發 VGG,Inception、ResNet 模塊

上篇文章簡單介紹了卷積神經網絡的經典模型,本節將介紹如何使用TensorFlow.Keras實現各個模型。



1. VGG Blocks

VGG卷積神經網絡架構以牛津大學Visual Geometry Group命名,是將深度學習方法用於計算機視覺的重要里程碑。該模型的關鍵創新是重複堆疊VGG Block,使用較小的filter(如3×3)進行卷積,之後是步幅爲2尺寸爲2×2的最大池化層。

開發新模型時,具有VGG塊的卷積神經網絡是較好的起點,因爲它易於實現,並且可以非常有效地從圖像中提取特徵。

下例將VGG塊進行多層堆疊,它們具有相同數量的filter,size爲3×3,stride爲1×1,使用padding以使得輸出特徵圖的大小與輸入特徵圖的大小相同,並使用relu激活函數,之後使用最大池化層,其尺寸和步幅都爲2×2。定義輸入尺寸爲256×256×3,繪製網絡結構圖。完整示例如下:(防抄襲水印:CSDN:datamonday。本文只發布在CSDN,其餘平臺皆爲抄襲。)

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D
from keras.utils import plot_model

# VGG block
def vgg_block(layer_in, n_filters, n_conv):
    
    # add convolutional layers
    for _ in range(n_conv):
        layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu', name=f'conv_{_}')(layer_in)
    
    # add max pooling layer
    layer_in = MaxPooling2D((2,2), strides=(2,2), name='maxpool')(layer_in)
    
    return layer_in

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add vgg module
layer = vgg_block(visible, 64, 2)

# create model
model = Model(inputs=visible, outputs=layer, name='VGG Block')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='vgg_block.png', dpi=200)

輸出:

Model: "VGG Block"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 256, 256, 3)       0         
_________________________________________________________________
conv_0 (Conv2D)              (None, 256, 256, 64)      1792      
_________________________________________________________________
conv_1 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
maxpool (MaxPooling2D)       (None, 128, 128, 64)      0         
=================================================================
Total params: 38,720
Trainable params: 38,720
Non-trainable params: 0
_________________________________________________________________

在這裏插入圖片描述
【參數量計算】:

  • conv_0conv\_0(3×3×3+1)×64=1792(3×3×3+1) × 64 = 1792
  • conv_1conv\_1(3×3×64+1)×64=36928(3×3×64+1) × 64 = 36928

更進一步,可以擴展爲一個具有3個VGG塊的模型,前兩個塊分別具有兩個帶64個和128個filter的卷積層,第三個塊具有4個具有256個filter的卷積層,這是VGG塊的常見用法,其中filter的數量隨模型的深度而增加

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, Dense
from keras.utils import plot_model
import numpy as np 

# VGG block
def vgg_block(layer_in, n_filters, n_conv_start, n_conv_end, pool_index):
    '''
    layer_in:輸入模型
    n_filters:過濾器數量
    n_conv_start:卷積層起始索引
    n_conv_end:卷積層結束索引+1,前閉後開。
    pool_index:池化層索引
    
    '''
    # add convolutional layers
    for _ in np.arange(n_conv_start, n_conv_end, 1):
        layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu', name=f'conv_{_}')(layer_in)
    
    # add max pooling layer
    layer_in = MaxPooling2D((2,2), strides=(2,2), name=f'maxpool_{pool_index}')(layer_in)
    
    return layer_in

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add vgg module
layer = vgg_block(visible, 64, 1, 4, 1)
layer = vgg_block(layer, 128, 4, 6, 2)
layer = vgg_block(layer, 256, 6, 8, 3)


# add dense(fc) layer
dense = Dense(1024, name='dense_1')(layer)
dense = Dense(1024, name='dense_2')(dense)
dense = Dense(512, name='dense_3')(dense)

# create model
model = Model(inputs=visible, outputs=dense, name='VGG Navie Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=True, to_file='vgg_block.png', dpi=200)

輸出:

Model: "VGG Navie Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 256, 256, 3)       0         
_________________________________________________________________
conv_1 (Conv2D)              (None, 256, 256, 64)      1792      
_________________________________________________________________
conv_2 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
conv_3 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
maxpool_1 (MaxPooling2D)     (None, 128, 128, 64)      0         
_________________________________________________________________
conv_4 (Conv2D)              (None, 128, 128, 128)     73856     
_________________________________________________________________
conv_5 (Conv2D)              (None, 128, 128, 128)     147584    
_________________________________________________________________
maxpool_2 (MaxPooling2D)     (None, 64, 64, 128)       0         
_________________________________________________________________
conv_6 (Conv2D)              (None, 64, 64, 256)       295168    
_________________________________________________________________
conv_7 (Conv2D)              (None, 64, 64, 256)       590080    
_________________________________________________________________
maxpool_3 (MaxPooling2D)     (None, 32, 32, 256)       0         
_________________________________________________________________
dense_1 (Dense)              (None, 32, 32, 1024)      263168    
_________________________________________________________________
dense_2 (Dense)              (None, 32, 32, 1024)      1049600   
_________________________________________________________________
dense_3 (Dense)              (None, 32, 32, 512)       524800    
=================================================================
Total params: 3,019,904
Trainable params: 3,019,904
Non-trainable params: 0
_________________________________________________________________

在這裏插入圖片描述
【參數量計算】:

  • dense_1dense\_1(256+1)×1024=263168(256+1) \times 1024 = 263168
  • dense_2dense\_2(1024+1)×1024=1049600(1024+1) \times 1024 = 1049600
  • dense_3dense\_3(1024+1)×512=524800(1024+1) \times 512 = 524800

2. Inception Module

該模型的關鍵創新爲Inception模塊,其具有不同大小的過濾器(例如1× 1、3×3、5×5)的卷積層以及並行的3×3的池化層,然後將其結果合併在一起。

這是一個非常簡單而強大的結構單元,它使模型不僅可以學習相同大小的並行過濾器,而且可以學習不同大小的並行過濾器,從而可以進行多個尺度的學習。

# Inception module
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, concatenate
from keras.utils import plot_model

def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out):
    '''
    layer_in:模型輸入
    f1:1×1卷積核數量
    f2_in:3×3卷積核數量
    f2_out:3×3卷積核數量
    f3_in:5×5卷積核數量
    f3_ou:5×5卷積核數量
    f4_out:最大池化層的filter數量
    '''
    # 1x1 conv
    conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
    
    # 3x3 conv
    conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in)
    conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3)
    
    # 5x5 conv
    conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in)
    conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5)
    
    # 3x3 max pooling
    pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
    pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool)
    
    # concatenate filters, assumes filters/channels last
    layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
    
    return layer_out

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add inception block 1
layer = inception_module(visible, 64, 96, 128, 16, 32, 32)
layer = inception_module(layer, 128, 128, 192, 32, 96, 64)

# create model
model = Model(inputs=visible, outputs=layer, name='Inception Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='inception_module.png', dpi=200)

輸出:

Model: "Inception Model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, 256, 256, 3)  0                                            
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 256, 256, 96) 384         input[0][0]                      
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 256, 256, 16) 64          input[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_6 (MaxPooling2D)  (None, 256, 256, 3)  0           input[0][0]                      
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 256, 256, 64) 256         input[0][0]                      
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 256, 256, 128 110720      conv2d_20[0][0]                  
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 256, 256, 32) 12832       conv2d_22[0][0]                  
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 256, 256, 32) 128         max_pooling2d_6[0][0]            
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 256, 256, 256 0           conv2d_19[0][0]                  
                                                                 conv2d_21[0][0]                  
                                                                 conv2d_23[0][0]                  
                                                                 conv2d_24[0][0]                  
__________________________________________________________________________________________________
conv2d_26 (Conv2D)              (None, 256, 256, 128 32896       concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_28 (Conv2D)              (None, 256, 256, 32) 8224        concatenate_3[0][0]              
__________________________________________________________________________________________________
max_pooling2d_7 (MaxPooling2D)  (None, 256, 256, 256 0           concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_25 (Conv2D)              (None, 256, 256, 128 32896       concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_27 (Conv2D)              (None, 256, 256, 192 221376      conv2d_26[0][0]                  
__________________________________________________________________________________________________
conv2d_29 (Conv2D)              (None, 256, 256, 96) 76896       conv2d_28[0][0]                  
__________________________________________________________________________________________________
conv2d_30 (Conv2D)              (None, 256, 256, 64) 16448       max_pooling2d_7[0][0]            
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 256, 256, 480 0           conv2d_25[0][0]                  
                                                                 conv2d_27[0][0]                  
                                                                 conv2d_29[0][0]                  
                                                                 conv2d_30[0][0]                  
==================================================================================================
Total params: 513,120
Trainable params: 513,120
Non-trainable params: 0
__________________________________________________________________________________________________

在這裏插入圖片描述


3. Residual Module

ResNet中的一個關鍵創新是殘差模塊(Residual Module)。殘差模塊,特別是單位殘差模型(identity residual model),是兩個卷積層的塊,它們具有相同數量的過濾器和較小的過濾器大小,其中第二層的輸出與第一卷積層的輸入相加。以圖形形式繪製,模塊的輸入被添加到模塊的輸出中,稱爲快捷連接(shortcut connection)。

如果輸入層中的過濾器數量與模塊最後一個卷積層中的過濾器數量不匹配,則會報錯。一種解決方案是使用1×1卷積(通常稱爲投影層(projection layer))來增加輸入層的濾波器數量或減少模塊中最後一個卷積層的濾波器數量。前一種解決方案更有意義,論文中提出的方法,稱爲投影快捷方式(projection shortcut)

# identity or projection residual module
from keras.models import Model
from keras.layers import Input, Activation, Conv2D, MaxPooling2D
from keras.layers import add
from keras.utils import plot_model

def residual_module(layer_in, n_filters):
    merge_input = layer_in
    
    # check if the number of filters needs to be increase(assumes channels last format)
    if layer_in.shape[-1] != n_filters:
        merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
    
    # conv layer
    conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
    conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1)
    
    # add filters, assumes filters(channels last)
    layer_out = add([conv2, merge_input])
    
    # activation function
    layer_out = Activation('relu')(layer_out)
    
    return layer_out

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add residual module
layer = residual_module(visible, 64)

# create model
model = Model(inputs=visible, outputs=layer, name='Residual Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='residual_module.png', dpi=200)

輸出:

Model: "Residual Model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, 256, 256, 3)  0                                            
__________________________________________________________________________________________________
conv2d_41 (Conv2D)              (None, 256, 256, 64) 1792        input[0][0]                      
__________________________________________________________________________________________________
conv2d_42 (Conv2D)              (None, 256, 256, 64) 36928       conv2d_41[0][0]                  
__________________________________________________________________________________________________
conv2d_40 (Conv2D)              (None, 256, 256, 64) 256         input[0][0]                      
__________________________________________________________________________________________________
add_4 (Add)                     (None, 256, 256, 64) 0           conv2d_42[0][0]                  
                                                                 conv2d_40[0][0]                  
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 256, 256, 64) 0           add_4[0][0]                      
==================================================================================================
Total params: 38,976
Trainable params: 38,976
Non-trainable params: 0
__________________________________________________________________________________________________

在這裏插入圖片描述


參考:
https://machinelearningmastery.com/how-to-implement-major-architecture-innovations-for-convolutional-neural-networks/
https://blog.csdn.net/hzhj2007/article/details/80164909

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章