【CV07】如何使用 Keras 开发 VGG,Inception、ResNet 模块

上篇文章简单介绍了卷积神经网络的经典模型,本节将介绍如何使用TensorFlow.Keras实现各个模型。



1. VGG Blocks

VGG卷积神经网络架构以牛津大学Visual Geometry Group命名,是将深度学习方法用于计算机视觉的重要里程碑。该模型的关键创新是重复堆叠VGG Block,使用较小的filter(如3×3)进行卷积,之后是步幅为2尺寸为2×2的最大池化层。

开发新模型时,具有VGG块的卷积神经网络是较好的起点,因为它易于实现,并且可以非常有效地从图像中提取特征。

下例将VGG块进行多层堆叠,它们具有相同数量的filter,size为3×3,stride为1×1,使用padding以使得输出特征图的大小与输入特征图的大小相同,并使用relu激活函数,之后使用最大池化层,其尺寸和步幅都为2×2。定义输入尺寸为256×256×3,绘制网络结构图。完整示例如下:(防抄袭水印:CSDN:datamonday。本文只发布在CSDN,其余平台皆为抄袭。)

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D
from keras.utils import plot_model

# VGG block
def vgg_block(layer_in, n_filters, n_conv):
    
    # add convolutional layers
    for _ in range(n_conv):
        layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu', name=f'conv_{_}')(layer_in)
    
    # add max pooling layer
    layer_in = MaxPooling2D((2,2), strides=(2,2), name='maxpool')(layer_in)
    
    return layer_in

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add vgg module
layer = vgg_block(visible, 64, 2)

# create model
model = Model(inputs=visible, outputs=layer, name='VGG Block')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='vgg_block.png', dpi=200)

输出:

Model: "VGG Block"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 256, 256, 3)       0         
_________________________________________________________________
conv_0 (Conv2D)              (None, 256, 256, 64)      1792      
_________________________________________________________________
conv_1 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
maxpool (MaxPooling2D)       (None, 128, 128, 64)      0         
=================================================================
Total params: 38,720
Trainable params: 38,720
Non-trainable params: 0
_________________________________________________________________

在这里插入图片描述
【参数量计算】:

  • conv_0conv\_0(3×3×3+1)×64=1792(3×3×3+1) × 64 = 1792
  • conv_1conv\_1(3×3×64+1)×64=36928(3×3×64+1) × 64 = 36928

更进一步,可以扩展为一个具有3个VGG块的模型,前两个块分别具有两个带64个和128个filter的卷积层,第三个块具有4个具有256个filter的卷积层,这是VGG块的常见用法,其中filter的数量随模型的深度而增加

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, Dense
from keras.utils import plot_model
import numpy as np 

# VGG block
def vgg_block(layer_in, n_filters, n_conv_start, n_conv_end, pool_index):
    '''
    layer_in:输入模型
    n_filters:过滤器数量
    n_conv_start:卷积层起始索引
    n_conv_end:卷积层结束索引+1,前闭后开。
    pool_index:池化层索引
    
    '''
    # add convolutional layers
    for _ in np.arange(n_conv_start, n_conv_end, 1):
        layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu', name=f'conv_{_}')(layer_in)
    
    # add max pooling layer
    layer_in = MaxPooling2D((2,2), strides=(2,2), name=f'maxpool_{pool_index}')(layer_in)
    
    return layer_in

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add vgg module
layer = vgg_block(visible, 64, 1, 4, 1)
layer = vgg_block(layer, 128, 4, 6, 2)
layer = vgg_block(layer, 256, 6, 8, 3)


# add dense(fc) layer
dense = Dense(1024, name='dense_1')(layer)
dense = Dense(1024, name='dense_2')(dense)
dense = Dense(512, name='dense_3')(dense)

# create model
model = Model(inputs=visible, outputs=dense, name='VGG Navie Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=True, to_file='vgg_block.png', dpi=200)

输出:

Model: "VGG Navie Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 256, 256, 3)       0         
_________________________________________________________________
conv_1 (Conv2D)              (None, 256, 256, 64)      1792      
_________________________________________________________________
conv_2 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
conv_3 (Conv2D)              (None, 256, 256, 64)      36928     
_________________________________________________________________
maxpool_1 (MaxPooling2D)     (None, 128, 128, 64)      0         
_________________________________________________________________
conv_4 (Conv2D)              (None, 128, 128, 128)     73856     
_________________________________________________________________
conv_5 (Conv2D)              (None, 128, 128, 128)     147584    
_________________________________________________________________
maxpool_2 (MaxPooling2D)     (None, 64, 64, 128)       0         
_________________________________________________________________
conv_6 (Conv2D)              (None, 64, 64, 256)       295168    
_________________________________________________________________
conv_7 (Conv2D)              (None, 64, 64, 256)       590080    
_________________________________________________________________
maxpool_3 (MaxPooling2D)     (None, 32, 32, 256)       0         
_________________________________________________________________
dense_1 (Dense)              (None, 32, 32, 1024)      263168    
_________________________________________________________________
dense_2 (Dense)              (None, 32, 32, 1024)      1049600   
_________________________________________________________________
dense_3 (Dense)              (None, 32, 32, 512)       524800    
=================================================================
Total params: 3,019,904
Trainable params: 3,019,904
Non-trainable params: 0
_________________________________________________________________

在这里插入图片描述
【参数量计算】:

  • dense_1dense\_1(256+1)×1024=263168(256+1) \times 1024 = 263168
  • dense_2dense\_2(1024+1)×1024=1049600(1024+1) \times 1024 = 1049600
  • dense_3dense\_3(1024+1)×512=524800(1024+1) \times 512 = 524800

2. Inception Module

该模型的关键创新为Inception模块,其具有不同大小的过滤器(例如1× 1、3×3、5×5)的卷积层以及并行的3×3的池化层,然后将其结果合并在一起。

这是一个非常简单而强大的结构单元,它使模型不仅可以学习相同大小的并行过滤器,而且可以学习不同大小的并行过滤器,从而可以进行多个尺度的学习。

# Inception module
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, concatenate
from keras.utils import plot_model

def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out):
    '''
    layer_in:模型输入
    f1:1×1卷积核数量
    f2_in:3×3卷积核数量
    f2_out:3×3卷积核数量
    f3_in:5×5卷积核数量
    f3_ou:5×5卷积核数量
    f4_out:最大池化层的filter数量
    '''
    # 1x1 conv
    conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
    
    # 3x3 conv
    conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in)
    conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3)
    
    # 5x5 conv
    conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in)
    conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5)
    
    # 3x3 max pooling
    pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
    pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool)
    
    # concatenate filters, assumes filters/channels last
    layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
    
    return layer_out

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add inception block 1
layer = inception_module(visible, 64, 96, 128, 16, 32, 32)
layer = inception_module(layer, 128, 128, 192, 32, 96, 64)

# create model
model = Model(inputs=visible, outputs=layer, name='Inception Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='inception_module.png', dpi=200)

输出:

Model: "Inception Model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, 256, 256, 3)  0                                            
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 256, 256, 96) 384         input[0][0]                      
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 256, 256, 16) 64          input[0][0]                      
__________________________________________________________________________________________________
max_pooling2d_6 (MaxPooling2D)  (None, 256, 256, 3)  0           input[0][0]                      
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 256, 256, 64) 256         input[0][0]                      
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 256, 256, 128 110720      conv2d_20[0][0]                  
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 256, 256, 32) 12832       conv2d_22[0][0]                  
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 256, 256, 32) 128         max_pooling2d_6[0][0]            
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 256, 256, 256 0           conv2d_19[0][0]                  
                                                                 conv2d_21[0][0]                  
                                                                 conv2d_23[0][0]                  
                                                                 conv2d_24[0][0]                  
__________________________________________________________________________________________________
conv2d_26 (Conv2D)              (None, 256, 256, 128 32896       concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_28 (Conv2D)              (None, 256, 256, 32) 8224        concatenate_3[0][0]              
__________________________________________________________________________________________________
max_pooling2d_7 (MaxPooling2D)  (None, 256, 256, 256 0           concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_25 (Conv2D)              (None, 256, 256, 128 32896       concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_27 (Conv2D)              (None, 256, 256, 192 221376      conv2d_26[0][0]                  
__________________________________________________________________________________________________
conv2d_29 (Conv2D)              (None, 256, 256, 96) 76896       conv2d_28[0][0]                  
__________________________________________________________________________________________________
conv2d_30 (Conv2D)              (None, 256, 256, 64) 16448       max_pooling2d_7[0][0]            
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 256, 256, 480 0           conv2d_25[0][0]                  
                                                                 conv2d_27[0][0]                  
                                                                 conv2d_29[0][0]                  
                                                                 conv2d_30[0][0]                  
==================================================================================================
Total params: 513,120
Trainable params: 513,120
Non-trainable params: 0
__________________________________________________________________________________________________

在这里插入图片描述


3. Residual Module

ResNet中的一个关键创新是残差模块(Residual Module)。残差模块,特别是单位残差模型(identity residual model),是两个卷积层的块,它们具有相同数量的过滤器和较小的过滤器大小,其中第二层的输出与第一卷积层的输入相加。以图形形式绘制,模块的输入被添加到模块的输出中,称为快捷连接(shortcut connection)。

如果输入层中的过滤器数量与模块最后一个卷积层中的过滤器数量不匹配,则会报错。一种解决方案是使用1×1卷积(通常称为投影层(projection layer))来增加输入层的滤波器数量或减少模块中最后一个卷积层的滤波器数量。前一种解决方案更有意义,论文中提出的方法,称为投影快捷方式(projection shortcut)

# identity or projection residual module
from keras.models import Model
from keras.layers import Input, Activation, Conv2D, MaxPooling2D
from keras.layers import add
from keras.utils import plot_model

def residual_module(layer_in, n_filters):
    merge_input = layer_in
    
    # check if the number of filters needs to be increase(assumes channels last format)
    if layer_in.shape[-1] != n_filters:
        merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
    
    # conv layer
    conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
    conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1)
    
    # add filters, assumes filters(channels last)
    layer_out = add([conv2, merge_input])
    
    # activation function
    layer_out = Activation('relu')(layer_out)
    
    return layer_out

# define model input
visible = Input(shape=(256, 256, 3), name='input')

# add residual module
layer = residual_module(visible, 64)

# create model
model = Model(inputs=visible, outputs=layer, name='Residual Model')
model.summary()

# plot model architecture
plot_model(model, show_shapes=True, show_layer_names=False, to_file='residual_module.png', dpi=200)

输出:

Model: "Residual Model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              (None, 256, 256, 3)  0                                            
__________________________________________________________________________________________________
conv2d_41 (Conv2D)              (None, 256, 256, 64) 1792        input[0][0]                      
__________________________________________________________________________________________________
conv2d_42 (Conv2D)              (None, 256, 256, 64) 36928       conv2d_41[0][0]                  
__________________________________________________________________________________________________
conv2d_40 (Conv2D)              (None, 256, 256, 64) 256         input[0][0]                      
__________________________________________________________________________________________________
add_4 (Add)                     (None, 256, 256, 64) 0           conv2d_42[0][0]                  
                                                                 conv2d_40[0][0]                  
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 256, 256, 64) 0           add_4[0][0]                      
==================================================================================================
Total params: 38,976
Trainable params: 38,976
Non-trainable params: 0
__________________________________________________________________________________________________

在这里插入图片描述


参考:
https://machinelearningmastery.com/how-to-implement-major-architecture-innovations-for-convolutional-neural-networks/
https://blog.csdn.net/hzhj2007/article/details/80164909

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章