深度學習系列（九）計算機視覺之模型泛化能力（圖像增廣和微調） 2020.6.26

原創

2020-07-03 22:46

前言

前面學習了深度學習的經典模型和一些優化
本節開始學習深度學習在計算機視覺領域的應用
本節先學習兩個提升模型泛化能力的方法

圖像增廣
微調

1、圖像增廣

圖像增廣

對訓練圖像做⼀系列隨機改變
產⽣相似但⼜不同的訓練樣本
擴⼤訓練數據集的規模

實現

import d2lzh as d2l
import mxnet as mx
from mxnet import autograd, gluon, image, init, nd
from mxnet.gluon import data as gdata, loss as gloss, utils as gutils
import sys
import time

"""實現圖像增廣"""
# 讀取一張形狀爲高和寬分別爲400像素和500像素的圖像作爲實驗的樣例
d2l.set_figsize()
img = image.imread('../img/cat1.jpg')
d2l.plt.imshow(img.asnumpy())

# 繪圖函數
def show_images(imgs, num_rows, num_cols, scale=2):
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
    for i in range(num_rows):
        for j in range(num_cols):
            axes[i][j].imshow(imgs[i * num_cols + j].asnumpy())
            axes[i][j].axes.get_xaxis().set_visible(False)
            axes[i][j].axes.get_yaxis().set_visible(False)
    return axes
# 觀察圖像增⼴效果的輔助函數
def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):
    Y = [aug(img) for _ in range(num_rows * num_cols)]
    show_images(Y, num_rows, num_cols, scale)

# 翻轉
apply(img, gdata.vision.transforms.RandomFlipLeftRight()) #左右翻轉
apply(img, gdata.vision.transforms.RandomFlipTopBottom()) #上下翻轉

# 裁剪
shape_aug = gdata.vision.transforms.RandomResizedCrop((200, 200), scale=(0.1, 1), ratio=(0.5, 2)) #⾯積爲原⾯積10%-100%的區域，且該區域的寬和⾼之⽐隨機取⾃0.5-2，然後再將該區域的寬和⾼分別縮放到200像素
apply(img, shape_aug)

# 變化顏色
apply(img, gdata.vision.transforms.RandomBrightness(0.5)) #亮度
apply(img, gdata.vision.transforms.RandomHue(0.5)) #色調
color_aug = gdata.vision.transforms.RandomColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5) #還有飽和度和對比度
apply(img, color_aug)

# 上面幾個疊加
augs = gdata.vision.transforms.Compose([gdata.vision.transforms.RandomFlipLeftRight(), color_aug, shape_aug])
apply(img, augs)

效果

原圖

翻轉

裁剪

色調

幾種效果疊加

2、微調

微調是遷移學習的一種常用技術

在源數據集（如ImageNet數據集）上預訓練一個神經網絡模型，即源模型。
創建一個新的神經網絡模型，即目標模型。它複製了源模型上除了輸出層外的所有模型設計及其參數。我們假設這些模型參數包含了源數據集上學習到的知識，且這些知識同樣適用於目標數據集。我們還假設源模型的輸出層跟源數據集的標籤緊密相關，因此在目標模型中不予採用。
爲目標模型添加一個輸出大小爲目標數據集類別個數的輸出層，並隨機初始化該層的模型參數。
在目標數據集（如椅子數據集）上訓練目標模型。我們將從頭訓練輸出層，而其餘層的參數都是基於源模型的參數微調得到的。

實現

import d2lzh as d2l
from mxnet import gluon, init, nd
from mxnet.gluon import data as gdata, loss as gloss, model_zoo
from mxnet.gluon import utils as gutils
import os
import zipfile

"""用熱狗圖數據集實現微調"""
# 數據
data_dir = '../data'
base_url = 'https://apache-mxnet.s3-accelerate.amazonaws.com/'
fname = gutils.download(
    base_url + 'gluon/dataset/hotdog.zip',
    path=data_dir, sha1_hash='fba480ffa8aa7e0febbb511d181409f899b9baa5')
with zipfile.ZipFile(fname, 'r') as z:
    z.extractall(data_dir)
train_imgs = gdata.vision.ImageFolderDataset(os.path.join(data_dir, 'hotdog/train'))
test_imgs = gdata.vision.ImageFolderDataset(os.path.join(data_dir, 'hotdog/test'))

"""在訓練時，我們先從圖像中裁剪出隨機大小和隨機高寬比的一塊隨機區域
然後將該區域縮放爲高和寬均爲224像素的輸入
測試時，我們將圖像的高和寬均縮放爲256像素
然後從中裁剪出高和寬均爲224像素的中心區域作爲輸入
此外，我們對RGB（紅、綠、藍）三個顏色通道的數值做標準化
每個數值減去該通道所有數值的平均值，再除以該通道所有數值的標準差作爲輸出"""
# 指定RGB三個通道的均值和方差來將圖像通道歸一化
normalize = gdata.vision.transforms.Normalize(
    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
train_augs = gdata.vision.transforms.Compose([
    gdata.vision.transforms.RandomResizedCrop(224),
    gdata.vision.transforms.RandomFlipLeftRight(),
    gdata.vision.transforms.ToTensor(),
    normalize])
test_augs = gdata.vision.transforms.Compose([
    gdata.vision.transforms.Resize(256),
    gdata.vision.transforms.CenterCrop(224),
    gdata.vision.transforms.ToTensor(),
    normalize])

# 模型
pretrained_net = model_zoo.vision.resnet18_v2(pretrained=True) #ResNet18作爲源模型
finetune_net = model_zoo.vision.resnet18_v2(classes=2)
finetune_net.features = pretrained_net.features #⽬標模型實例finetune_net的成員變量features中的模型參數被初始化爲源模型相應層的模型參數
finetune_net.output.initialize(init.Xavier())
finetune_net.output.collect_params().setattr('lr_mult', 10) #output中的模型參數將在迭代中使用10倍大的學習率

# 微調
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5):
    train_iter = gdata.DataLoader(train_imgs.transform_first(train_augs), batch_size, shuffle=True)
    test_iter = gdata.DataLoader(test_imgs.transform_first(test_augs), batch_size)
    ctx = d2l.try_all_gpus()
    net.collect_params().reset_ctx(ctx)
    net.hybridize()
    loss = gloss.SoftmaxCrossEntropyLoss()
    trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate, 'wd': 0.001})
    d2l.train(train_iter, test_iter, net, loss, trainer, ctx, num_epochs)

train_fine_tuning(finetune_net, 0.01) #將Trainer實例中的學習率設得小⼀點，如0.01
# 定義⼀個相同的模型，但將它的所有模型參數都初始化爲隨機值作爲對比
scratch_net = model_zoo.vision.resnet18_v2(classes=2)
scratch_net.initialize(init=init.Xavier())
train_fine_tuning(scratch_net, 0.1)

結語

計算機視覺是個大領域
準備學個幾天學個大概

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

深度學習系列（九）計算機視覺之模型泛化能力（圖像增廣和微調） 2020.6.26

前言

1、圖像增廣

實現

效果

2、微調

結語

深度學習系列（八）計算性能（命令式編程和符號式編程、異步計算、多GPU計算) 2020.6.25

leetcode刷題記錄441-450 python版

深度學習系列（十）計算機視覺之目標檢測（object detection）2020.6.29

深度學習系列（三）深度卷積神經網絡（AlexNet、VGG、NiN、GoogleNet） 2020.6.18

leetcode刷題記錄431-440 python版

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

深度學習系列（九） 計算機視覺之模型泛化能力（圖像增廣和微調） 2020.6.26

前言

1、圖像增廣

實現

效果

2、微調

結語

深度學習系列（九）計算機視覺之模型泛化能力（圖像增廣和微調） 2020.6.26