使用卷積神經網絡VGG-16完成是否佩戴口罩的分類模型之網絡參數配置詳解(Python+PaddlePaddle)

經過前兩天的學習,對深度學習的網絡都有了一定的瞭解,所謂深度,其實就是網絡的深度,今天來看一下另一個經典的卷積神經網絡: VGG-16

VGG-16後面跟的16表示網絡的層數,一般認爲,帶參數的網絡纔看作一層,而池化層是不需要計算參數的,因此池化層通常不算在裏面:
在這裏插入圖片描述
卷積層共2+2+3+3+3=13層;全連接層有3層。加起來一共16層。

因爲網絡層數很多,像上一篇文章那樣去寫這16層網絡會顯得很麻煩,因此這裏用了一個ConvPool類,將卷積層和池化層封裝:

class ConvPool(fluid.dygraph.Layer):
    '''卷積+池化'''
    def __init__(self,
                 num_channels,
                 num_filters,
                 filter_size,
                 pool_size,
                 pool_stride,
                 groups,
                 conv_stride=1,
                 conv_padding=1,
                 pool_padding=0,
                 pool_type='max',
                 act=None):
        super(ConvPool, self).__init__()  

        self._conv2d_list = []

        for i in range(groups):
            conv2d = self.add_sublayer(   #返回一個由所有子層組成的列表。
                'bb_%d' % i,
                fluid.dygraph.Conv2D(
                num_channels=num_channels, #通道數
                num_filters=num_filters,   #卷積核個數
                filter_size=filter_size,   #卷積核大小
                stride=conv_stride,        #步長
                padding=conv_padding,      #padding大小,默認爲0
                act=act)
            )
        self._conv2d_list.append(conv2d)   

        self._pool2d = fluid.dygraph.Pool2D(
            pool_size=pool_size,           #池化核大小
            pool_type=pool_type,           #池化類型,默認是最大池化
            pool_stride=pool_stride,       #池化步長
            pool_padding=pool_padding      #填充大小
            )

    def forward(self, inputs):
        x = inputs
        for conv in self._conv2d_list:
            x = conv(x)
        x = self._pool2d(x)
        return x

代碼很多,我們主要關注__init__方法裏的參數即可:

  • num_channels即圖像的通道數,輸入1張三通道的圖像時,num_channels=3
  • num_filters即濾波器(卷積核)的個數,多個filter將得到多個輸出特徵
  • filter_size即濾波器(卷積核)大小,filter_size=3表示一個3*3的卷積核
  • pool_size即池化層大小,跟filter同理
  • pool_stride即池化層的步長
  • groups即卷積層的組數,當group=2,濾波器的前一半僅和輸入特徵圖的前一半連接;濾波器的後一半僅和輸入特徵圖的後一半連接。
  • conv_stride即卷積層的步長
  • conv_padding即卷積層的豎直和水平邊界填充大小
  • pool_padding即池化層的豎直和水平邊界填充大小
  • pool_type即池化層的下采樣方式,有取平均和取最大值
  • act即應用於輸出上的激活函數,如tanh、softmax、sigmoid,relu等

比較陌生的是padding,這裏我們先來畫個圖:
在這裏插入圖片描述
假設步長爲2,那麼在做卷積時,圖片的邊緣部分由於無法取到完整的3 * 3矩陣,因此它的機制就是不做計算,這樣一來,便造成了圖片邊緣部分的缺失(紅色部分)

解決辦法就是在圖片周圍加一層空白圖像即padding:
在這裏插入圖片描述
經過這樣的處理後,圖像的邊緣部分便不會缺失了

下面我們來具體構造一下這個VGG-16網絡:
在這裏插入圖片描述

class VGGNet(fluid.dygraph.Layer):
    '''
    VGG網絡
    '''
    def __init__(self):
        super(VGGNet, self).__init__()
        self.convpool01 = ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = "relu")
        self.convpool02 = ConvPool(64, 128, 3, 2, 2, 2, act = "relu")
        self.convpool03 = ConvPool(128, 256, 3, 2, 2, 3, act = "relu")
        self.convpool04 = ConvPool(256, 512, 3, 2, 2, 3, act = "relu")
        self.convpool05 = ConvPool(512, 512, 3, 2, 2, 3, act = "relu")

        self.pool_5_shape = 512 * 7 * 7
        self.fc01 = fluid.dygraph.Linear(self.pool_5_shape,4096,act="relu")
        self.fc02 = fluid.dygraph.Linear(4096,4096,act="relu")
        self.fc03 = fluid.dygraph.Linear(4096,2,act="softmax")
        

    def forward(self, inputs, label=None):
        """前向計算"""
        print(inputs.shape)
        out = self.convpool01(inputs)
        print(out.shape)
        out = self.convpool02(out)
        print(out.shape)
        out = self.convpool03(out)
        print(out.shape)
        out = self.convpool04(out)
        print(out.shape)
        out = self.convpool05(out)
        print(out.shape)

        out = fluid.layers.reshape(out, shape=[-1, 512*7*7])
        print(out.shape)
        out = self.fc01(out)
        print(out.shape)
        out = self.fc02(out)
        print(out.shape)
        out = self.fc03(out)
        print(out.shape)

        if label is not None:
            acc = fluid.layers.accuracy(input=out, label=label)
            return out, acc
        else:
            return out

在配置網絡前,我們都要看一下圖片的維度:
在這裏插入圖片描述
這張圖片的原始shape是[8,3,224,224],下面我們來詳細分析一下:

  1. 經過第一個卷積層ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = “relu”)時,由於num_filters=64,因此特徵數變爲了64,由filter_size=3以及padding=1可知圖像大小經過了如下變換,224+2(padding=1,作用在圖像周圍)即圖像大小變成了226226,進入卷積層後,經過計算:226-3(filter_size=3)+1=224即圖像大小不變,但是經過步長爲2的池化層後,圖像長,寬都變爲原來的一半即112112,因此輸出的shape爲[8,64,112,112]
  2. 到了下一層也是同理,經過13層的卷積操作後,圖像的shape變成了[8,512,7,7],經過變形reshape(out, shape=[-1, 51277])得到[8,25088]
  3. 將這張圖輸入第一個全連接層Linear(25088,4096,act=“relu”)得到的矩陣有4096列
  4. 經過三個全連接層後,得到2個輸出,因爲是否佩戴口罩屬於二分類問題,即佩戴口罩了和沒有佩戴口罩,因此輸出爲2

下面是程序的完整代碼:

import os
import zipfile
import random
import json
import paddle
import sys
import numpy as np
from PIL import Image
from PIL import ImageEnhance
import paddle.fluid as fluid
from multiprocessing import cpu_count
import matplotlib.pyplot as plt

'''
參數配置
'''
train_parameters = {
    "input_size": [3, 224, 224],                              #輸入圖片的shape
    "class_dim": -1,                                          #分類數
    "src_path":"/home/aistudio/work/maskDetect.zip",#原始數據集路徑
    "target_path":"/home/aistudio/data/",                     #要解壓的路徑
    "train_list_path": "/home/aistudio/data/train.txt",       #train.txt路徑
    "eval_list_path": "/home/aistudio/data/eval.txt",         #eval.txt路徑
    "readme_path": "/home/aistudio/data/readme.json",         #readme.json路徑
    "label_dict":{},                                          #標籤字典
    "num_epochs": 1,                                         #訓練輪數
    "train_batch_size": 8,                                    #訓練時每個批次的大小
    "learning_strategy": {                                    #優化函數相關的配置
        "lr": 0.001                                           #超參數學習率
    } 
}

def unzip_data(src_path,target_path):
    '''
    解壓原始數據集,將src_path路徑下的zip包解壓至data目錄下
    '''
    if(not os.path.isdir(target_path + "maskDetect")):     
        z = zipfile.ZipFile(src_path, 'r')
        z.extractall(path=target_path)
        z.close()

def get_data_list(target_path,train_list_path,eval_list_path):
    '''
    生成數據列表
    '''
    #存放所有類別的信息
    class_detail = []
    #獲取所有類別保存的文件夾名稱
    data_list_path=target_path+"maskDetect/"
    class_dirs = os.listdir(data_list_path)  
    #總的圖像數量
    all_class_images = 0
    #存放類別標籤
    class_label=0
    #存放類別數目
    class_dim = 0
    #存儲要寫進eval.txt和train.txt中的內容
    trainer_list=[]
    eval_list=[]
    #讀取每個類別,['maskimages', 'nomaskimages']
    for class_dir in class_dirs:
        if class_dir != ".DS_Store":
            class_dim += 1
            #每個類別的信息
            class_detail_list = {}
            eval_sum = 0
            trainer_sum = 0
            #統計每個類別有多少張圖片
            class_sum = 0
            #獲取類別路徑 
            path = data_list_path  + class_dir
            # 獲取所有圖片
            img_paths = os.listdir(path)
            for img_path in img_paths:                                  # 遍歷文件夾下的每個圖片
                name_path = path + '/' + img_path                       # 每張圖片的路徑
                if class_sum % 10 == 0:                                 # 每10張圖片取一個做驗證數據
                    eval_sum += 1                                       # test_sum爲測試數據的數目
                    eval_list.append(name_path + "\t%d" % class_label + "\n")
                else:
                    trainer_sum += 1 
                    trainer_list.append(name_path + "\t%d" % class_label + "\n")#trainer_sum測試數據的數目
                class_sum += 1                                          #每類圖片的數目
                all_class_images += 1                                   #所有類圖片的數目
             
            # 說明的json文件的class_detail數據
            class_detail_list['class_name'] = class_dir             #類別名稱,如jiangwen
            class_detail_list['class_label'] = class_label          #類別標籤
            class_detail_list['class_eval_images'] = eval_sum       #該類數據的測試集數目
            class_detail_list['class_trainer_images'] = trainer_sum #該類數據的訓練集數目
            class_detail.append(class_detail_list)  
            #初始化標籤列表
            train_parameters['label_dict'][str(class_label)] = class_dir
            class_label += 1 
            
    #初始化分類數
    train_parameters['class_dim'] = class_dim

   
    
    #亂序  
    random.shuffle(eval_list)
    with open(eval_list_path, 'a') as f:
        for eval_image in eval_list:
            f.write(eval_image) 
            
    random.shuffle(trainer_list)
    with open(train_list_path, 'a') as f2:
        for train_image in trainer_list:
            f2.write(train_image) 

    # 說明的json文件信息
    readjson = {}
    readjson['all_class_name'] = data_list_path                  #文件父目錄
    readjson['all_class_images'] = all_class_images
    readjson['class_detail'] = class_detail
    jsons = json.dumps(readjson, sort_keys=True, indent=4, separators=(',', ': '))
    with open(train_parameters['readme_path'],'w') as f:
        f.write(jsons)
    print ('生成數據列表完成!')

def custom_reader(file_list):
    '''
    自定義reader
    '''
    def reader():
        with open(file_list, 'r') as f:
            lines = [line.strip() for line in f]
            for line in lines:
                img_path, lab = line.strip().split('\t')
                img = Image.open(img_path) 
                if img.mode != 'RGB': 
                    img = img.convert('RGB') 
                img = img.resize((224, 224), Image.BILINEAR)
                img = np.array(img).astype('float32') 
                img = img.transpose((2, 0, 1))  # HWC to CHW 
                img = img/255                # 像素值歸一化 
                yield img, int(lab) 
    return reader

'''
參數初始化
'''
src_path=train_parameters['src_path']
target_path=train_parameters['target_path']
train_list_path=train_parameters['train_list_path']
eval_list_path=train_parameters['eval_list_path']
batch_size=train_parameters['train_batch_size']
print("batch_size:",batch_size)

'''
解壓原始數據到指定路徑
'''
unzip_data(src_path,target_path)

'''
劃分訓練集與驗證集,亂序,生成數據列表
'''
#每次生成數據列表前,首先清空train.txt和eval.txt
with open(train_list_path, 'w') as f: 
    f.seek(0)
    f.truncate() 
with open(eval_list_path, 'w') as f: 
    f.seek(0)
    f.truncate() 
#生成數據列表   
get_data_list(target_path,train_list_path,eval_list_path)

'''
構造數據提供器
'''
train_reader = paddle.batch(custom_reader(train_list_path),
                            batch_size=batch_size,
                            drop_last=True)
eval_reader = paddle.batch(custom_reader(eval_list_path),
                            batch_size=batch_size,
                            drop_last=True)

class ConvPool(fluid.dygraph.Layer):
    '''卷積+池化'''
    def __init__(self,
                 num_channels,
                 num_filters,
                 filter_size,
                 pool_size,
                 pool_stride,
                 groups,
                 conv_stride=1,
                 conv_padding=1,
                 pool_padding=0,
                 pool_type='max',
                 act=None):
        super(ConvPool, self).__init__()  

        self._conv2d_list = []

        for i in range(groups):
            conv2d = self.add_sublayer(   #返回一個由所有子層組成的列表。
                'bb_%d' % i,
                fluid.dygraph.Conv2D(
                num_channels=num_channels, #通道數
                num_filters=num_filters,   #卷積核個數
                filter_size=filter_size,   #卷積核大小
                stride=conv_stride,        #步長
                padding=conv_padding,      #padding大小,默認爲0
                act=act)
            )
        self._conv2d_list.append(conv2d)   

        self._pool2d = fluid.dygraph.Pool2D(
            pool_size=pool_size,           #池化核大小
            pool_type=pool_type,           #池化類型,默認是最大池化
            pool_stride=pool_stride,       #池化步長
            pool_padding=pool_padding      #填充大小
            )

    def forward(self, inputs):
        x = inputs
        for conv in self._conv2d_list:
            x = conv(x)
        x = self._pool2d(x)
        return x

class VGGNet(fluid.dygraph.Layer):
    '''
    VGG網絡
    '''
    def __init__(self):
        super(VGGNet, self).__init__()
        self.convpool01 = ConvPool(num_channels=3, num_filters=64, filter_size=3, pool_size=2, pool_stride=2, groups=2, act = "relu")
        self.convpool02 = ConvPool(64, 128, 3, 2, 2, 2, act = "relu")
        self.convpool03 = ConvPool(128, 256, 3, 2, 2, 3, act = "relu")
        self.convpool04 = ConvPool(256, 512, 3, 2, 2, 3, act = "relu")
        self.convpool05 = ConvPool(512, 512, 3, 2, 2, 3, act = "relu")

        self.pool_5_shape = 512 * 7 * 7
        self.fc01 = fluid.dygraph.Linear(self.pool_5_shape,4096,act="relu")
        self.fc02 = fluid.dygraph.Linear(4096,4096,act="relu")
        self.fc03 = fluid.dygraph.Linear(4096,2,act="softmax")
        

    def forward(self, inputs, label=None):
        """前向計算"""
        print(inputs.shape)
        out = self.convpool01(inputs)
        print(out.shape)
        out = self.convpool02(out)
        print(out.shape)
        out = self.convpool03(out)
        print(out.shape)
        out = self.convpool04(out)
        print(out.shape)
        out = self.convpool05(out)
        print(out.shape)

        out = fluid.layers.reshape(out, shape=[-1, 512*7*7])
        print(out.shape)
        out = self.fc01(out)
        print(out.shape)
        out = self.fc02(out)
        print(out.shape)
        out = self.fc03(out)
        print(out.shape)

        if label is not None:
            acc = fluid.layers.accuracy(input=out, label=label)
            return out, acc
        else:
            return out

all_train_iter=0
all_train_iters=[]
all_train_costs=[]
all_train_accs=[]

def draw_train_process(title,iters,costs,accs,label_cost,lable_acc):
    plt.title(title, fontsize=24)
    plt.xlabel("iter", fontsize=20)
    plt.ylabel("cost/acc", fontsize=20)
    plt.plot(iters, costs,color='red',label=label_cost) 
    plt.plot(iters, accs,color='green',label=lable_acc) 
    plt.legend()
    plt.grid()
    plt.show()


def draw_process(title,color,iters,data,label):
    plt.title(title, fontsize=24)
    plt.xlabel("iter", fontsize=20)
    plt.ylabel(label, fontsize=20)
    plt.plot(iters, data,color=color,label=label) 
    plt.legend()
    plt.grid()
    plt.show()

'''
模型訓練
'''
#with fluid.dygraph.guard(place = fluid.CUDAPlace(0)):
with fluid.dygraph.guard():
    print(train_parameters['class_dim'])
    print(train_parameters['label_dict'])
    vgg = VGGNet()
    optimizer=fluid.optimizer.AdamOptimizer(learning_rate=train_parameters['learning_strategy']['lr'],parameter_list=vgg.parameters()) 
    for epoch_num in range(train_parameters['num_epochs']):
        for batch_id, data in enumerate(train_reader()):
            dy_x_data = np.array([x[0] for x in data]).astype('float32')           
            y_data = np.array([x[1] for x in data]).astype('int64')      
            y_data = y_data[:, np.newaxis]

            #將Numpy轉換爲DyGraph接收的輸入
            img = fluid.dygraph.to_variable(dy_x_data)
            label = fluid.dygraph.to_variable(y_data)

            out,acc = vgg(img,label)
            loss = fluid.layers.cross_entropy(out, label)
            avg_loss = fluid.layers.mean(loss)

            #使用backward()方法可以執行反向網絡
            avg_loss.backward()
            optimizer.minimize(avg_loss)
             
            #將參數梯度清零以保證下一輪訓練的正確性
            vgg.clear_gradients()
            

            all_train_iter=all_train_iter+train_parameters['train_batch_size']
            all_train_iters.append(all_train_iter)
            all_train_costs.append(loss.numpy()[0])
            all_train_accs.append(acc.numpy()[0])
                
            if batch_id % 1 == 0:
                print("Loss at epoch {} step {}: {}, acc: {}".format(epoch_num, batch_id, avg_loss.numpy(), acc.numpy()))

    draw_train_process("training",all_train_iters,all_train_costs,all_train_accs,"trainning cost","trainning acc")  
    draw_process("trainning loss","red",all_train_iters,all_train_costs,"trainning loss")
    draw_process("trainning acc","green",all_train_iters,all_train_accs,"trainning acc")  
    
    #保存模型參數
    fluid.save_dygraph(vgg.state_dict(), "vgg")   
    print("Final loss: {}".format(avg_loss.numpy()))

'''
模型校驗
'''
with fluid.dygraph.guard():
    model, _ = fluid.load_dygraph("vgg")
    vgg = VGGNet()
    vgg.load_dict(model)
    vgg.eval()
    accs = []
    for batch_id, data in enumerate(eval_reader()):
        dy_x_data = np.array([x[0] for x in data]).astype('float32')
        y_data = np.array([x[1] for x in data]).astype('int')
        y_data = y_data[:, np.newaxis]
        
        img = fluid.dygraph.to_variable(dy_x_data)
        label = fluid.dygraph.to_variable(y_data)

        out, acc = vgg(img, label)
        lab = np.argsort(out.numpy())
        accs.append(acc.numpy()[0])
print(np.mean(accs))

def load_image(img_path):
    '''
    預測圖片預處理
    '''
    img = Image.open(img_path) 
    if img.mode != 'RGB': 
        img = img.convert('RGB') 
    img = img.resize((224, 224), Image.BILINEAR)
    img = np.array(img).astype('float32') 
    img = img.transpose((2, 0, 1))  # HWC to CHW 
    img = img/255                # 像素值歸一化 
    return img

label_dic = train_parameters['label_dict']

'''
模型預測
'''
with fluid.dygraph.guard():
    model, _ = fluid.dygraph.load_dygraph("vgg")
    vgg = VGGNet()
    vgg.load_dict(model)
    vgg.eval()
    
    #展示預測圖片
    infer_path='/home/aistudio/data/data23615/infer_mask01.jpg'
    img = Image.open(infer_path)
    plt.imshow(img)          #根據數組繪製圖像
    plt.show()               #顯示圖像

    #對預測圖片進行預處理
    infer_imgs = []
    infer_imgs.append(load_image(infer_path))
    infer_imgs = np.array(infer_imgs)
   
    for  i in range(len(infer_imgs)):
        data = infer_imgs[i]
        dy_x_data = np.array(data).astype('float32')
        dy_x_data=dy_x_data[np.newaxis,:, : ,:]
        img = fluid.dygraph.to_variable(dy_x_data)
        out = vgg(img)
        lab = np.argmax(out.numpy())  #argmax():返回最大數的索引
        print("第{}個樣本,被預測爲:{}".format(i+1,label_dic[str(lab)]))
        
print("結束")

來看下效果:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章