PyTorch遷移學習入門——VGG16 圖像分類

前言

遷移學習指的是保存已有問題的解決模型,並將其利用在其他不同但相關問題上。 比如說,訓練用來辨識汽車的模型也可以被用來提升識別卡車的能力。很多情況下遷移學習能夠簡化或降低模型構建的難度,甚至還能取得不錯的準確度。
本文將針對一個小的圖片數據集,使用PyTorch進行遷移學習演示,包括如何使用預訓練模型,並將結果自己搭建的卷積神經網絡模型進行性能比較。

數據集介紹

考慮到VGG16要求圖像的形狀爲(224,224,3),即像素爲224x224的彩色圖像,因爲我準備用這個數據集進行實驗。所謂的應急車輛包括:警車、消防車和救護車。在數據集中有一個emergency_train.csv,用來存放訓練樣本的標籤。
數據集下載:【提取碼:pyne】

選取預訓練模型

預訓練模型是由某個人或團隊爲解決特定問題而已經設計和訓練好的模型。預訓練模型在深度學習項目中非常有用,因爲並非所有人都擁有足夠多的算力。我們需要使用本地機器,因此預訓練的模型就可以節約很多時間。預訓練的模型通過將其權重和偏差矩陣傳遞給新模型來得以共享他們訓練好的參數。因此,在進行遷移學習之前,我要首先選擇一個合適的預訓練模型,然後將其權重和偏差矩陣傳遞給新模型。針對不同的深度學習任務可能有很多預訓練模型可用,現在針對我要做的這個任務確定哪種模型最適合,根據我們的數據集介紹,我會選擇VGG16在ImageNet上的預訓練模型,而不是在MNIST上的預訓練模型,因爲我們的數據集中包含車輛圖像,ImageNet中具有豐富的車輛圖像,因此前者應該更爲合理。總之,選擇預訓練模型時不是考慮參數量和性能表現,而是考慮任務間的相關性以及數據集的相似程度。

數據處理

# PyTorch libraries and modules
import torch
from torch.autograd import Variable
from torch.nn import Linear, ReLU, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, BatchNorm2d, Dropout
from torch.optim import Adam, SGD
import pandas as pd
import numpy as np
from tqdm import tqdm

# torchvision for pre-trained models
from torchvision import models

# 導入讀取和展示圖片工具
from skimage.io import imread
from skimage.transform import resize
import matplotlib.pyplot as plt

# 數據切分,製作驗證集
from sklearn.model_selection import train_test_split

# 模型評價
from sklearn.metrics import accuracy_score

接下來,讀取包含圖像名稱和相應標籤的.csv文件,並查看內容:

# loading dataset
train = pd.read_csv('emergency_train.csv')
print(train.shape)
train.head(10)

在這裏插入圖片描述
該csv文件中包含兩列:

  • image_names: 代表數據集中所有圖像的名稱
  • Emergency_or_no: 指定特定圖像屬於緊急類別還是非緊急類別。0表示圖像是非緊急車輛,1表示緊急車輛

接下來,我們將加載所有圖像並將其以數組格式存儲:

# 加載訓練圖像
train_img = []
for img_name in tqdm(train['image_names']):
    # defining the image path
    image_path = 'images/' + img_name
    # reading the image
    img = imread(image_path)
    # normalizing the pixel values
    img = img/255
    # resizing the image to (224,224,3)
    img = resize(img, output_shape=(224,224,3), mode='constant', anti_aliasing=True)
    # converting the type of pixel to float 32
    img = img.astype('float32')
    # appending the image into the list
    train_img.append(img)

# converting the list to numpy array
train_x = np.array(train_img)
train_x.shape

在這裏插入圖片描述
加載這些圖像大約花費22秒鐘。數據集中有1646張圖像作爲訓練,由於VGG16需要所有此特定形狀的圖像,因此需要將所有圖像重塑爲(224,224,3)。現在讓我們可視化來自數據集的一些圖像:

# Exploring the data
index = 10
plt.imshow(train_x[index])
if (train['emergency_or_not'][index] == 1):
    print('It is an Emergency vehicle')
else:
    print('It is a Non-Emergency vehicle')

在這裏插入圖片描述
這是一輛普通的汽車,因此顯示爲非緊急車輛標籤。現在將目標值(0 or 1)存儲在單獨的變量中:

# defining the target
train_y = train['emergency_or_not'].values

讓我們藉助sklearn劃分數據集,這裏只使用驗證集來評估我們的模型,你也可以嘗試將數據集劃分爲:訓練/ 驗證/ 測試,三個部分。

# create validation set
train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size = 0.1, random_state = 13, stratify=train_y)
(train_x.shape, train_y.shape), (val_x.shape, val_y.shape)

在這裏插入圖片描述
最後劃分的訓練集中有1,481張圖像,驗證集中有165張圖像。現在,我們要將數據集轉換爲torch格式:

#先轉換訓練集
# converting training images into torch format
train_x = train_x.reshape(1481, 3, 224, 224)
train_x  = torch.from_numpy(train_x)

# converting the target into torch format
train_y = train_y.astype(int)
train_y = torch.from_numpy(train_y)

# shape of training data
train_x.shape, train_y.shape

#同樣地對驗證集進行轉換
# converting validation images into torch format
val_x = val_x.reshape(165, 3, 224, 224)
val_x  = torch.from_numpy(val_x)

# converting the target into torch format
val_y = val_y.astype(int)
val_y = torch.from_numpy(val_y)

# shape of validation data
val_x.shape, val_y.shape

我們的數據已經準備好!在下一部分中,我們將使用預訓練模型來解決此問題之前,將建立卷積神經網絡(CNN)。

使用普通CNN卷積神經網絡

終於到了模型構建,在使用遷移學習解決問題之前,先使用普通的CNN模型訓練處一個對照組(baseline)。
我們先構建一個非常簡單的CNN架構,該架構具有兩個卷積層以從圖像中提取特徵,最後是一個全連接層以對這些特徵進行分類:

class Net(Module):   
    def __init__(self):
        super(Net, self).__init__()

        self.cnn_layers = Sequential(
            # Defining a 2D convolution layer
            Conv2d(3, 4, kernel_size=3, stride=1, padding=1),
            BatchNorm2d(4),
            ReLU(inplace=True),
            MaxPool2d(kernel_size=2, stride=2),
            # Defining another 2D convolution layer
            Conv2d(4, 8, kernel_size=3, stride=1, padding=1),
            BatchNorm2d(8),
            ReLU(inplace=True),
            MaxPool2d(kernel_size=2, stride=2),
        )

        self.linear_layers = Sequential(
            Linear(8 * 56 * 56, 2)
        )

    # Defining the forward pass    
    def forward(self, x):
        x = self.cnn_layers(x)
        x = x.view(x.size(0), -1)
        x = self.linear_layers(x)
        return x

現在,爲模型定義優化器,學習率和損失函數,並使用GPU訓練模型:

# defining the model
model = Net()
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.0001)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()

print(model)

在這裏插入圖片描述
接下來設置迭代次數和訓練批次大小,這裏使用 batch_size = 128, epochs = 15 :

# batch size of the model
batch_size = 128

# number of epochs to train the model
n_epochs = 15

for epoch in range(1, n_epochs+1):

    # keep track of training and validation loss
    train_loss = 0.0
        
    permutation = torch.randperm(train_x.size()[0])

    training_loss = []
    for i in tqdm(range(0,train_x.size()[0], batch_size)):

        indices = permutation[i:i+batch_size]
        batch_x, batch_y = train_x[indices], train_y[indices]
        
        if torch.cuda.is_available():
            batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
        
        optimizer.zero_grad()
        # in case you wanted a semi-full example
        outputs = model(batch_x)
        loss = criterion(outputs,batch_y)

        training_loss.append(loss.item())
        loss.backward()
        optimizer.step()
        
    training_loss = np.average(training_loss)
    print('epoch: \t', epoch, '\t training loss: \t', training_loss)

在這裏插入圖片描述
打印了訓練進度和損失,正常情況下每次訓練後訓練損失都在減少。接下來驗證準確性:

# prediction for training set
prediction = []
target = []
permutation = torch.randperm(train_x.size()[0])
for i in tqdm(range(0,train_x.size()[0], batch_size)):
    indices = permutation[i:i+batch_size]
    batch_x, batch_y = train_x[indices], train_y[indices]

    if torch.cuda.is_available():
        batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

    with torch.no_grad():
        output = model(batch_x.cuda())

    softmax = torch.exp(output).cpu()
    prob = list(softmax.numpy())
    predictions = np.argmax(prob, axis=1)
    prediction.append(predictions)
    target.append(batch_y)
    
# training accuracy
accuracy = []
for i in range(len(prediction)):
    accuracy.append(accuracy_score(target[i],prediction[i]))
    
print('training accuracy: \t', np.average(accuracy))

# prediction for validation set
prediction_val = []
target_val = []
permutation = torch.randperm(val_x.size()[0])
for i in tqdm(range(0,val_x.size()[0], batch_size)):
    indices = permutation[i:i+batch_size]
    batch_x, batch_y = val_x[indices], val_y[indices]

    if torch.cuda.is_available():
        batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

    with torch.no_grad():
        output = model(batch_x.cuda())

    softmax = torch.exp(output).cpu()
    prob = list(softmax.numpy())
    predictions = np.argmax(prob, axis=1)
    prediction_val.append(predictions)
    target_val.append(batch_y.cpu())
    
# validation accuracy
accuracy_val = []
for i in range(len(prediction_val)):
    accuracy_val.append(accuracy_score(target_val[i],prediction_val[i]))
    
print('validation accuracy: \t', np.average(accuracy_val))

在這裏插入圖片描述
驗證準確性爲69.7%。現在我們有了baseline,接下來我們使用遷移學習來解決此分類問題。

使用遷移學習進行分類

我們將使用在ImageNet數據集上訓練的VGG16預訓練模型。讓我們先說下使用遷移學習訓練模型的步驟:

  1. 加載預訓練模型的權重-在本例中爲VGG16
  2. 根據手頭的問題對模型進行微調(不更新預訓練模型中部分層的參數)
  3. 使用這些預訓練的權重來提取我們訓練集的圖像特徵
  4. 最後,使用提取的特徵來訓練微調模型

因此,讓我們首先嚐試加載預訓練模型的權重:
torchvision—使用預訓練模型參考

# loading the pretrained model
model = models.vgg16_bn(pretrained=True)

在這裏插入圖片描述
現在,我們將微調模型。我們不會訓練VGG16模型的各個層,因此讓我們凍結這些層的權重:

# Freeze model weights
for param in model.parameters():
    param.requires_grad = False

由於我們只有2個類別需要預測,並且VGG16在ImageNet上有1000個類別,因此我們需要根據任務更新最後一層,因此我們將只訓練最後一層,可以通過設置該層中的requires_grad=True來只對最後一層進行權值更新。讓我們將訓練設置爲GPU訓練:

# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
# Add on classifier
# 添加分類器,只更新最後一層的權重
model.classifier[6] = Sequential(
                      Linear(4096, 2))
model.classifier
'''
輸出設爲2,並且更新這層的權重。
另外注意我們這裏沒有設置激活函數和dropout之類的防止過擬合手段,是爲了與上面的CNN有可比性。
'''
for param in model.classifier[6].parameters():
    param.requires_grad = True

在這裏插入圖片描述
現在,我們將使用預訓練模型來提取訓練圖像和驗證圖像的特徵,將batch_size設置爲128(同樣,您可以根據需要增加或減少該batch_size):

# batch_size
batch_size = 128

# extracting features for train data
data_x = []
label_x = []

inputs,labels = train_x, train_y

for i in tqdm(range(int(train_x.shape[0]/batch_size)+1)):
    input_data = inputs[i*batch_size:(i+1)*batch_size]
    label_data = labels[i*batch_size:(i+1)*batch_size]
    input_data , label_data = Variable(input_data.cuda()),Variable(label_data.cuda())
    x = model.features(input_data)
    data_x.extend(x.data.cpu().numpy())
    label_x.extend(label_data.data.cpu().numpy())


# extracting features for validation data
data_y = []
label_y = []

inputs,labels = val_x, val_y

for i in tqdm(range(int(val_x.shape[0]/batch_size)+1)):
    input_data = inputs[i*batch_size:(i+1)*batch_size]
    label_data = labels[i*batch_size:(i+1)*batch_size]
    input_data , label_data = Variable(input_data.cuda()),Variable(label_data.cuda())
    x = model.features(input_data)
    data_y.extend(x.data.cpu().numpy())
    label_y.extend(label_data.data.cpu().numpy())

接下來,我們將這些數據轉換爲torch格式:

# converting the features into torch format
x_train  = torch.from_numpy(np.array(data_x))
x_train = x_train.view(x_train.size(0), -1)
y_train  = torch.from_numpy(np.array(label_x))
x_val  = torch.from_numpy(np.array(data_y))
x_val = x_val.view(x_val.size(0), -1)
y_val  = torch.from_numpy(np.array(label_y))

我們仍然需要爲模型定義優化器和損失函數:

import torch.optim as optim

# specify loss function (categorical cross-entropy)
criterion = CrossEntropyLoss()

# specify optimizer (stochastic gradient descent) and learning rate
optimizer = optim.Adam(model.classifier[6].parameters(), lr=0.0005)

現在需要訓練我們的模型,爲了公平比較,仍然設置15次迭代週期,並將batch_size設置爲128:

# batch size
batch_size = 128
model = model.cpu()# 在colal上運行這塊的cuda()一直出現錯誤,改爲了cpu訓練,在後面再改回cuda。
# number of epochs to train the model
n_epochs = 15 # 與CNN一致設置爲15

for epoch in tqdm(range(1, n_epochs+1)):

    # keep track of training and validation loss
    train_loss = 0.0
        
    permutation = torch.randperm(x_train.size()[0])

    training_loss = []
    for i in range(0,x_train.size()[0], batch_size):

        indices = permutation[i:i+batch_size]
        batch_x, batch_y = x_train[indices], y_train[indices]
        
        # if torch.cuda.is_available():
        #     batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
        
        optimizer.zero_grad()
        # in case you wanted a semi-full example
        outputs = model.classifier(batch_x)
        loss = criterion(outputs,batch_y)

        training_loss.append(loss.item())
        loss.backward()
        optimizer.step()
        
    training_loss = np.average(training_loss)
    print('epoch: \t', epoch, '\t training loss: \t', training_loss)

在這裏插入圖片描述

# prediction for training set
prediction = []
target = []
permutation = torch.randperm(x_train.size()[0])
for i in tqdm(range(0,x_train.size()[0], batch_size)):
    indices = permutation[i:i+batch_size]
    batch_x, batch_y = x_train[indices], y_train[indices]

    if torch.cuda.is_available():
        batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

    with torch.no_grad():
        output = model.classifier(batch_x.cuda())

    softmax = torch.exp(output).cpu()
    prob = list(softmax.numpy())
    predictions = np.argmax(prob, axis=1)
    prediction.append(predictions)
    target.append(batch_y)
    
# training accuracy
accuracy = []
for i in range(len(prediction)):
    accuracy.append(accuracy_score(target[i],prediction[i]))
    
print('training accuracy: \t', np.average(accuracy))

我們在訓練集上的準確度達到82.5%。現在讓我們檢查驗證準確性:

# prediction for validation set
prediction_val = []
target_val = []
permutation = torch.randperm(x_val.size()[0])
for i in tqdm(range(0,x_val.size()[0], batch_size)):
    indices = permutation[i:i+batch_size]
    batch_x, batch_y = x_val[indices], y_val[indices]

    if torch.cuda.is_available():
        batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

    with torch.no_grad():
        output = model.classifier(batch_x.cuda())

    softmax = torch.exp(output).cpu()
    prob = list(softmax.numpy())
    predictions = np.argmax(prob, axis=1)
    prediction_val.append(predictions)
    target_val.append(batch_y)
    
# validation accuracy
accuracy_val = []
for i in range(len(prediction_val)):
    accuracy_val.append(accuracy_score(target_val[i],prediction_val[i]))
    
print('validation accuracy: \t', np.average(accuracy_val))

在這裏插入圖片描述模型在驗證集上的準確性也相似,達到80.2%。訓練和驗證的準確性幾乎是相同的,因此可以說該模型泛化能力較好。以下是我們的結果摘要:我們可以推斷,與CNN模型相比,使用VGG16預訓練模型提高了精度。

Model Training Accuracy Validation Accuracy
CNN 87.6% 69.7%
VGG16 82.5% 80.2%

結論

我們完成了使用預訓練模型和遷移學習方法來解決圖像分類問題。首先了解什麼是預訓練模型,以及如何根據實際問題選擇正確的預訓練模型。然後,進行了一個案例研究,將車輛圖像分類爲緊急情況或非緊急情況。我們首先使用CNN模型解決了此案例研究,然後使用VGG16預訓練模型解決了相同的問題。發現使用VGG16預訓練模型提高了模型性能,且獲得了更好的結果。
現在對使用PyTorch進行遷移學習有一個初步的瞭解,推薦從圖像分類問題入手遷移學習,因爲這是一類基礎問題,並嘗試應用轉移學習來解決它們,這將幫助理解遷移學習的工作原理。

參考:
Transfer Learning in Pytorch

Master the Powerful Art of Transfer Learning using PyTorch

Transfer Learning for Computer Vision Tutorial

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章