使用卷积神经网络VGG-16完成是否佩戴口罩的分类模型之网络参数配置详解(Python+PaddlePaddle)

经过前两天的学习,对深度学习的网络都有了一定的了解,所谓深度,其实就是网络的深度,今天来看一下另一个经典的卷积神经网络: VGG-16

VGG-16后面跟的16表示网络的层数,一般认为,带参数的网络才看作一层,而池化层是不需要计算参数的,因此池化层通常不算在里面:
在这里插入图片描述
卷积层共2+2+3+3+3=13层;全连接层有3层。加起来一共16层。

因为网络层数很多,像上一篇文章那样去写这16层网络会显得很麻烦,因此这里用了一个ConvPool类,将卷积层和池化层封装:

class ConvPool(fluid.dygraph.Layer):
    '''卷积+池化'''
    def __init__(self,
                 num_channels,
                 num_filters,
                 filter_size,
                 pool_size,
                 pool_stride,
                 groups,
                 conv_stride=1,
                 conv_padding=1,
                 pool_padding=0,
                 pool_type='max',
                 act=None):
        super(ConvPool, self).__init__()  

        self._conv2d_list = []

        for i in range(groups):
            conv2d = self.add_sublayer(   #返回一个由所有子层组成的列表。
                'bb_%d' % i,
                fluid.dygraph.Conv2D(
                num_channels=num_channels, #通道数
                num_filters=num_filters,   #卷积核个数
                filter_size=filter_size,   #卷积核大小
                stride=conv_stride,        #步长
                padding=conv_padding,      #padding大小,默认为0
                act=act)
            )
        self._conv2d_list.append(conv2d)   

        self._pool2d = fluid.dygraph.Pool2D(
            pool_size=pool_size,           #池化核大小
            pool_type=pool_type,           #池化类型,默认是最大池化
            pool_stride=pool_stride,       #池化步长
            pool_padding=pool_padding      #填充大小
            )

    def forward(self, inputs):
        x = inputs
        for conv in self._conv2d_list:
            x = conv(x)
        x = self._pool2d(x)
        return x

代码很多,我们主要关注__init__方法里的参数即可:

  • num_channels即图像的通道数,输入1张三通道的图像时,num_channels=3
  • num_filters即滤波器(卷积核)的个数,多个filter将得到多个输出特征
  • filter_size即滤波器(卷积核)大小,filter_size=3表示一个3*3的卷积核
  • pool_size即池化层大小,跟filter同理
  • pool_stride即池化层的步长
  • groups即卷积层的组数,当group=2,滤波器的前一半仅和输入特征图的前一半连接;滤波器的后一半仅和输入特征图的后一半连接。
  • conv_stride即卷积层的步长
  • conv_padding即卷积层的竖直和水平边界填充大小
  • pool_padding即池化层的竖直和水平边界填充大小
  • pool_type即池化层的下采样方式,有取平均和取最大值
  • act即应用于输出上的激活函数,如tanh、softmax、sigmoid,relu等

比较陌生的是padding,这里我们先来画个图:
在这里插入图片描述
假设步长为2,那么在做卷积时,图片的边缘部分由于无法取到完整的3 * 3矩阵,因此它的机制就是不做计算,这样一来,便造成了图片边缘部分的缺失(红色部分)

解决办法就是在图片周围加一层空白图像即padding:
在这里插入图片描述
经过这样的处理后,图像的边缘部分便不会缺失了

下面我们来具体构造一下这个VGG-16网络:
在这里插入图片描述

class VGGNet(fluid.dygraph.Layer):
    '''
    VGG网络
    '''
    def __init__(self):
        super(VGGNet, self).__init__()
        self.convpool01 = ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = "relu")
        self.convpool02 = ConvPool(64, 128, 3, 2, 2, 2, act = "relu")
        self.convpool03 = ConvPool(128, 256, 3, 2, 2, 3, act = "relu")
        self.convpool04 = ConvPool(256, 512, 3, 2, 2, 3, act = "relu")
        self.convpool05 = ConvPool(512, 512, 3, 2, 2, 3, act = "relu")

        self.pool_5_shape = 512 * 7 * 7
        self.fc01 = fluid.dygraph.Linear(self.pool_5_shape,4096,act="relu")
        self.fc02 = fluid.dygraph.Linear(4096,4096,act="relu")
        self.fc03 = fluid.dygraph.Linear(4096,2,act="softmax")
        

    def forward(self, inputs, label=None):
        """前向计算"""
        print(inputs.shape)
        out = self.convpool01(inputs)
        print(out.shape)
        out = self.convpool02(out)
        print(out.shape)
        out = self.convpool03(out)
        print(out.shape)
        out = self.convpool04(out)
        print(out.shape)
        out = self.convpool05(out)
        print(out.shape)

        out = fluid.layers.reshape(out, shape=[-1, 512*7*7])
        print(out.shape)
        out = self.fc01(out)
        print(out.shape)
        out = self.fc02(out)
        print(out.shape)
        out = self.fc03(out)
        print(out.shape)

        if label is not None:
            acc = fluid.layers.accuracy(input=out, label=label)
            return out, acc
        else:
            return out

在配置网络前,我们都要看一下图片的维度:
在这里插入图片描述
这张图片的原始shape是[8,3,224,224],下面我们来详细分析一下:

  1. 经过第一个卷积层ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = “relu”)时,由于num_filters=64,因此特征数变为了64,由filter_size=3以及padding=1可知图像大小经过了如下变换,224+2(padding=1,作用在图像周围)即图像大小变成了226226,进入卷积层后,经过计算:226-3(filter_size=3)+1=224即图像大小不变,但是经过步长为2的池化层后,图像长,宽都变为原来的一半即112112,因此输出的shape为[8,64,112,112]
  2. 到了下一层也是同理,经过13层的卷积操作后,图像的shape变成了[8,512,7,7],经过变形reshape(out, shape=[-1, 51277])得到[8,25088]
  3. 将这张图输入第一个全连接层Linear(25088,4096,act=“relu”)得到的矩阵有4096列
  4. 经过三个全连接层后,得到2个输出,因为是否佩戴口罩属于二分类问题,即佩戴口罩了和没有佩戴口罩,因此输出为2

下面是程序的完整代码:

import os
import zipfile
import random
import json
import paddle
import sys
import numpy as np
from PIL import Image
from PIL import ImageEnhance
import paddle.fluid as fluid
from multiprocessing import cpu_count
import matplotlib.pyplot as plt

'''
参数配置
'''
train_parameters = {
    "input_size": [3, 224, 224],                              #输入图片的shape
    "class_dim": -1,                                          #分类数
    "src_path":"/home/aistudio/work/maskDetect.zip",#原始数据集路径
    "target_path":"/home/aistudio/data/",                     #要解压的路径
    "train_list_path": "/home/aistudio/data/train.txt",       #train.txt路径
    "eval_list_path": "/home/aistudio/data/eval.txt",         #eval.txt路径
    "readme_path": "/home/aistudio/data/readme.json",         #readme.json路径
    "label_dict":{},                                          #标签字典
    "num_epochs": 1,                                         #训练轮数
    "train_batch_size": 8,                                    #训练时每个批次的大小
    "learning_strategy": {                                    #优化函数相关的配置
        "lr": 0.001                                           #超参数学习率
    } 
}

def unzip_data(src_path,target_path):
    '''
    解压原始数据集,将src_path路径下的zip包解压至data目录下
    '''
    if(not os.path.isdir(target_path + "maskDetect")):     
        z = zipfile.ZipFile(src_path, 'r')
        z.extractall(path=target_path)
        z.close()

def get_data_list(target_path,train_list_path,eval_list_path):
    '''
    生成数据列表
    '''
    #存放所有类别的信息
    class_detail = []
    #获取所有类别保存的文件夹名称
    data_list_path=target_path+"maskDetect/"
    class_dirs = os.listdir(data_list_path)  
    #总的图像数量
    all_class_images = 0
    #存放类别标签
    class_label=0
    #存放类别数目
    class_dim = 0
    #存储要写进eval.txt和train.txt中的内容
    trainer_list=[]
    eval_list=[]
    #读取每个类别,['maskimages', 'nomaskimages']
    for class_dir in class_dirs:
        if class_dir != ".DS_Store":
            class_dim += 1
            #每个类别的信息
            class_detail_list = {}
            eval_sum = 0
            trainer_sum = 0
            #统计每个类别有多少张图片
            class_sum = 0
            #获取类别路径 
            path = data_list_path  + class_dir
            # 获取所有图片
            img_paths = os.listdir(path)
            for img_path in img_paths:                                  # 遍历文件夹下的每个图片
                name_path = path + '/' + img_path                       # 每张图片的路径
                if class_sum % 10 == 0:                                 # 每10张图片取一个做验证数据
                    eval_sum += 1                                       # test_sum为测试数据的数目
                    eval_list.append(name_path + "\t%d" % class_label + "\n")
                else:
                    trainer_sum += 1 
                    trainer_list.append(name_path + "\t%d" % class_label + "\n")#trainer_sum测试数据的数目
                class_sum += 1                                          #每类图片的数目
                all_class_images += 1                                   #所有类图片的数目
             
            # 说明的json文件的class_detail数据
            class_detail_list['class_name'] = class_dir             #类别名称,如jiangwen
            class_detail_list['class_label'] = class_label          #类别标签
            class_detail_list['class_eval_images'] = eval_sum       #该类数据的测试集数目
            class_detail_list['class_trainer_images'] = trainer_sum #该类数据的训练集数目
            class_detail.append(class_detail_list)  
            #初始化标签列表
            train_parameters['label_dict'][str(class_label)] = class_dir
            class_label += 1 
            
    #初始化分类数
    train_parameters['class_dim'] = class_dim

   
    
    #乱序  
    random.shuffle(eval_list)
    with open(eval_list_path, 'a') as f:
        for eval_image in eval_list:
            f.write(eval_image) 
            
    random.shuffle(trainer_list)
    with open(train_list_path, 'a') as f2:
        for train_image in trainer_list:
            f2.write(train_image) 

    # 说明的json文件信息
    readjson = {}
    readjson['all_class_name'] = data_list_path                  #文件父目录
    readjson['all_class_images'] = all_class_images
    readjson['class_detail'] = class_detail
    jsons = json.dumps(readjson, sort_keys=True, indent=4, separators=(',', ': '))
    with open(train_parameters['readme_path'],'w') as f:
        f.write(jsons)
    print ('生成数据列表完成!')

def custom_reader(file_list):
    '''
    自定义reader
    '''
    def reader():
        with open(file_list, 'r') as f:
            lines = [line.strip() for line in f]
            for line in lines:
                img_path, lab = line.strip().split('\t')
                img = Image.open(img_path) 
                if img.mode != 'RGB': 
                    img = img.convert('RGB') 
                img = img.resize((224, 224), Image.BILINEAR)
                img = np.array(img).astype('float32') 
                img = img.transpose((2, 0, 1))  # HWC to CHW 
                img = img/255                # 像素值归一化 
                yield img, int(lab) 
    return reader

'''
参数初始化
'''
src_path=train_parameters['src_path']
target_path=train_parameters['target_path']
train_list_path=train_parameters['train_list_path']
eval_list_path=train_parameters['eval_list_path']
batch_size=train_parameters['train_batch_size']
print("batch_size:",batch_size)

'''
解压原始数据到指定路径
'''
unzip_data(src_path,target_path)

'''
划分训练集与验证集,乱序,生成数据列表
'''
#每次生成数据列表前,首先清空train.txt和eval.txt
with open(train_list_path, 'w') as f: 
    f.seek(0)
    f.truncate() 
with open(eval_list_path, 'w') as f: 
    f.seek(0)
    f.truncate() 
#生成数据列表   
get_data_list(target_path,train_list_path,eval_list_path)

'''
构造数据提供器
'''
train_reader = paddle.batch(custom_reader(train_list_path),
                            batch_size=batch_size,
                            drop_last=True)
eval_reader = paddle.batch(custom_reader(eval_list_path),
                            batch_size=batch_size,
                            drop_last=True)

class ConvPool(fluid.dygraph.Layer):
    '''卷积+池化'''
    def __init__(self,
                 num_channels,
                 num_filters,
                 filter_size,
                 pool_size,
                 pool_stride,
                 groups,
                 conv_stride=1,
                 conv_padding=1,
                 pool_padding=0,
                 pool_type='max',
                 act=None):
        super(ConvPool, self).__init__()  

        self._conv2d_list = []

        for i in range(groups):
            conv2d = self.add_sublayer(   #返回一个由所有子层组成的列表。
                'bb_%d' % i,
                fluid.dygraph.Conv2D(
                num_channels=num_channels, #通道数
                num_filters=num_filters,   #卷积核个数
                filter_size=filter_size,   #卷积核大小
                stride=conv_stride,        #步长
                padding=conv_padding,      #padding大小,默认为0
                act=act)
            )
        self._conv2d_list.append(conv2d)   

        self._pool2d = fluid.dygraph.Pool2D(
            pool_size=pool_size,           #池化核大小
            pool_type=pool_type,           #池化类型,默认是最大池化
            pool_stride=pool_stride,       #池化步长
            pool_padding=pool_padding      #填充大小
            )

    def forward(self, inputs):
        x = inputs
        for conv in self._conv2d_list:
            x = conv(x)
        x = self._pool2d(x)
        return x

class VGGNet(fluid.dygraph.Layer):
    '''
    VGG网络
    '''
    def __init__(self):
        super(VGGNet, self).__init__()
        self.convpool01 = ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = "relu")
        self.convpool02 = ConvPool(64, 128, 3, 2, 2, 2, act = "relu")
        self.convpool03 = ConvPool(128, 256, 3, 2, 2, 3, act = "relu")
        self.convpool04 = ConvPool(256, 512, 3, 2, 2, 3, act = "relu")
        self.convpool05 = ConvPool(512, 512, 3, 2, 2, 3, act = "relu")

        self.pool_5_shape = 512 * 7 * 7
        self.fc01 = fluid.dygraph.Linear(self.pool_5_shape,4096,act="relu")
        self.fc02 = fluid.dygraph.Linear(4096,4096,act="relu")
        self.fc03 = fluid.dygraph.Linear(4096,2,act="softmax")
        

    def forward(self, inputs, label=None):
        """前向计算"""
        print(inputs.shape)
        out = self.convpool01(inputs)
        print(out.shape)
        out = self.convpool02(out)
        print(out.shape)
        out = self.convpool03(out)
        print(out.shape)
        out = self.convpool04(out)
        print(out.shape)
        out = self.convpool05(out)
        print(out.shape)

        out = fluid.layers.reshape(out, shape=[-1, 512*7*7])
        print(out.shape)
        out = self.fc01(out)
        print(out.shape)
        out = self.fc02(out)
        print(out.shape)
        out = self.fc03(out)
        print(out.shape)

        if label is not None:
            acc = fluid.layers.accuracy(input=out, label=label)
            return out, acc
        else:
            return out

all_train_iter=0
all_train_iters=[]
all_train_costs=[]
all_train_accs=[]

def draw_train_process(title,iters,costs,accs,label_cost,lable_acc):
    plt.title(title, fontsize=24)
    plt.xlabel("iter", fontsize=20)
    plt.ylabel("cost/acc", fontsize=20)
    plt.plot(iters, costs,color='red',label=label_cost) 
    plt.plot(iters, accs,color='green',label=lable_acc) 
    plt.legend()
    plt.grid()
    plt.show()


def draw_process(title,color,iters,data,label):
    plt.title(title, fontsize=24)
    plt.xlabel("iter", fontsize=20)
    plt.ylabel(label, fontsize=20)
    plt.plot(iters, data,color=color,label=label) 
    plt.legend()
    plt.grid()
    plt.show()

'''
模型训练
'''
#with fluid.dygraph.guard(place = fluid.CUDAPlace(0)):
with fluid.dygraph.guard():
    print(train_parameters['class_dim'])
    print(train_parameters['label_dict'])
    vgg = VGGNet()
    optimizer=fluid.optimizer.AdamOptimizer(learning_rate=train_parameters['learning_strategy']['lr'],parameter_list=vgg.parameters()) 
    for epoch_num in range(train_parameters['num_epochs']):
        for batch_id, data in enumerate(train_reader()):
            dy_x_data = np.array([x[0] for x in data]).astype('float32')           
            y_data = np.array([x[1] for x in data]).astype('int64')      
            y_data = y_data[:, np.newaxis]

            #将Numpy转换为DyGraph接收的输入
            img = fluid.dygraph.to_variable(dy_x_data)
            label = fluid.dygraph.to_variable(y_data)

            out,acc = vgg(img,label)
            loss = fluid.layers.cross_entropy(out, label)
            avg_loss = fluid.layers.mean(loss)

            #使用backward()方法可以执行反向网络
            avg_loss.backward()
            optimizer.minimize(avg_loss)
             
            #将参数梯度清零以保证下一轮训练的正确性
            vgg.clear_gradients()
            

            all_train_iter=all_train_iter+train_parameters['train_batch_size']
            all_train_iters.append(all_train_iter)
            all_train_costs.append(loss.numpy()[0])
            all_train_accs.append(acc.numpy()[0])
                
            if batch_id % 1 == 0:
                print("Loss at epoch {} step {}: {}, acc: {}".format(epoch_num, batch_id, avg_loss.numpy(), acc.numpy()))

    draw_train_process("training",all_train_iters,all_train_costs,all_train_accs,"trainning cost","trainning acc")  
    draw_process("trainning loss","red",all_train_iters,all_train_costs,"trainning loss")
    draw_process("trainning acc","green",all_train_iters,all_train_accs,"trainning acc")  
    
    #保存模型参数
    fluid.save_dygraph(vgg.state_dict(), "vgg")   
    print("Final loss: {}".format(avg_loss.numpy()))

'''
模型校验
'''
with fluid.dygraph.guard():
    model, _ = fluid.load_dygraph("vgg")
    vgg = VGGNet()
    vgg.load_dict(model)
    vgg.eval()
    accs = []
    for batch_id, data in enumerate(eval_reader()):
        dy_x_data = np.array([x[0] for x in data]).astype('float32')
        y_data = np.array([x[1] for x in data]).astype('int')
        y_data = y_data[:, np.newaxis]
        
        img = fluid.dygraph.to_variable(dy_x_data)
        label = fluid.dygraph.to_variable(y_data)

        out, acc = vgg(img, label)
        lab = np.argsort(out.numpy())
        accs.append(acc.numpy()[0])
print(np.mean(accs))

def load_image(img_path):
    '''
    预测图片预处理
    '''
    img = Image.open(img_path) 
    if img.mode != 'RGB': 
        img = img.convert('RGB') 
    img = img.resize((224, 224), Image.BILINEAR)
    img = np.array(img).astype('float32') 
    img = img.transpose((2, 0, 1))  # HWC to CHW 
    img = img/255                # 像素值归一化 
    return img

label_dic = train_parameters['label_dict']

'''
模型预测
'''
with fluid.dygraph.guard():
    model, _ = fluid.dygraph.load_dygraph("vgg")
    vgg = VGGNet()
    vgg.load_dict(model)
    vgg.eval()
    
    #展示预测图片
    infer_path='/home/aistudio/data/data23615/infer_mask01.jpg'
    img = Image.open(infer_path)
    plt.imshow(img)          #根据数组绘制图像
    plt.show()               #显示图像

    #对预测图片进行预处理
    infer_imgs = []
    infer_imgs.append(load_image(infer_path))
    infer_imgs = np.array(infer_imgs)
   
    for  i in range(len(infer_imgs)):
        data = infer_imgs[i]
        dy_x_data = np.array(data).astype('float32')
        dy_x_data=dy_x_data[np.newaxis,:, : ,:]
        img = fluid.dygraph.to_variable(dy_x_data)
        out = vgg(img)
        lab = np.argmax(out.numpy())  #argmax():返回最大数的索引
        print("第{}个样本,被预测为:{}".format(i+1,label_dic[str(lab)]))
        
print("结束")

来看下效果:
在这里插入图片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章