手把手教你用PyTorch從零搭建圖像分類模型

https://zhuanlan.zhihu.com/p/38236978

過去的這幾年，陸陸續續出現了不少深度學習框架。而在這些框架中，Facebook 發佈的 PyTorch 相對較新且很獨特的一個，由於靈活、迅速、簡單等特點，PyTorch 發展迅猛，受到很多人的青睞。

在 PyTorch 上，我們能夠很容易的自定義模型的層級，完全掌控訓練過程，包括梯度傳播。本文就手把手教你如何用 PyTorch 從零搭建一個完整的圖像分類器。

安裝 PyTorch

得益於預先內置的庫，PyTorch 安裝起來相當容易，在所有的系統上都能很好的運行。

在 Windows 系統上安裝

只有 CPU：

pip3 install http://download.Pytorch.org/whl/cpu/torch-0.4.0-cp35-cp35m-win_amd64.whl

pip3 install torchvision

有GPU支持

pip3 install http://download.Pytorch.org/whl/cu80/torch-0.4.0-cp35-cp35m-win_amd64.whl

pip3 install torchvision

在Linux系統上安裝

只有CPU：

pip3 install torch torchvision

有GPU支持

pip3 install http://download.Pytorch.org/whl/cpu/torch-0.4.0-cp35-cp35m-linux_x86_64.whl

pip3 install torchvision

在OSX系統上安裝

只有CPU：

pip3 install torch torchvision

有GPU支持

按照PyTorch官網（https://pytorch.org/）上的詳細指令安裝。

注意：如果想親自實踐本文的教程，你應該有CUDA GPU。如果沒有，也沒關係！在https://colab.research.google.com/ 上可以免費使用一個基於雲的GPU。

卷積神經網絡簡介

我們本文要使用的模型爲卷積神經網絡（CNN），它主要就是由一些卷積層堆疊在一起，通常還會有一些正則層和激活層。卷積神經網絡的組成部分總結如下：

CNN—— 一堆卷積層。
卷積層—— 能夠檢測一定的特徵，具有特定數量的通道。
通道—— 能夠檢測圖像中的具體特徵。
核/過濾器—— 每個通道中會被檢測到的特徵。它有固定的大小，通常爲3X3。

簡單來說，卷積層相當於一個特徵檢測層。每個卷積層有特定數目的通道，每個通道能夠檢測出圖像中的具體特徵。需要檢測的每個特徵常常被叫做核（kernel）或過濾器，它們都有固定大小，通常爲3X3。

定義模型架構

在PyTorch中，通過能擴展Module類的定製類來定義模型。模型的所有組件可以在torch.nn包中找到。因此，我們只需導入這個包就可以了。這裏我們會搭建一個簡單的CNN模型，用以分類來自CIFAR 10數據集的RGB圖像。該數據集包含了50000張訓練圖像和10000張測試圖像，所有圖像大小爲32 X 32。


# 導入需要的包
import torch
import torch.nn as nn
 
 
class SimpleNet(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleNet, self).__init__()
 
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
 
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()
 
        self.pool = nn.MaxPool2d(kernel_size=2)
 
        self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.relu3 = nn.ReLU()
 
        self.conv4 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.relu4 = nn.ReLU()
 
        self.fc = nn.Linear(in_features=16 * 16 * 24, out_features=num_classes)
 
    def forward(self, input):
        output = self.conv1(input)
        output = self.relu1(output)
 
        output = self.conv2(output)
        output = self.relu2(output)
 
        output = self.pool(output)
 
        output = self.conv3(output)
        output = self.relu3(output)
 
        output = self.conv4(output)
        output = self.relu4(output)
 
        output = output.view(-1, 16 * 16 * 24)
 
        output = self.fc(output)
 
        return output

在上面的代碼中，我們首先定義了一個新的類，叫做SimpleNet，它會擴展nn.Module類。在這個類的構造函數中，我們指明瞭神經網絡的全部層。我們的神經網絡結構爲——ReLU層——卷積層——ReLU層——池化層——卷積層——ReLU層——卷積層——ReLU層——線性層。

我們挨個講解它們。

卷積層

nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)

因爲我們的輸入爲有 3 個通道（紅-綠-藍）的 RGB 圖像，我們指明 in_channels 的數量爲 3。接着我們想將 12 特徵的檢測器應用在圖像上，所以我們指明 out_channels 的數量爲 12。這裏我們使用標準大小爲 3X3 的核。步幅設定爲 1，後面一直是這樣，除非你計劃縮減圖像的維度。將步幅設置爲 1，卷積會一次變爲 1 像素。最後，我們設定填充（padding）爲 1：這樣能確保我們的圖像以0填充，從而保持輸入和輸出大小一致。

基本上，你不用太擔心目前的步幅和填充大小，重點關注 in_channels 和 out_channels 就好了。

注意這一層的 out_channels 會作爲下一層的 in_channels，如下所示：

nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)

ReLU

這是標準的 ReLU 激活函數，它基本上會將所有輸入進來的特徵變爲 0 或更大的值。簡單說，當你用 ReLU 處理輸入特徵時，任何小於 0 的數字都會被變爲 0，其餘值保持不變。

MaxPool2d

這一層會通過將 kernel_size 設置爲 2、將圖像的寬和高減少 2 倍來降低圖像的維度。它的基本操作就是在圖像的 2X2 區域內取像素最大值，用它來表示整個區域，因此 4 像素就會變成只有 1 個。

線性層

我們的神經網絡的最後一層爲線性層。這是個標準的全連接層，它會計算每個類的分值——在我們這個例子中是 10 個類。

注意：我們在將最後一個卷積 -ReLU 層中的特徵圖譜輸入圖像前，必須把整個圖譜壓平。最後一層有 24 個輸出通道，由於 2X2 的最大池化，在這時我們的圖像就變成了16 X 16（32/2 = 16）。我們壓平後的圖像的維度會是16 x 16 x 24，實現代碼如下：

output = output.view(-1, 16 * 16 * 24)

在我們的線性層中，我們必須指明 input_features 的數目同樣爲 16 x 16 x 24，out_features 的數目應和我們所希望的類的數量一致。

注意在 PyTorch 中定義模型的簡單規則。在構造函數中定義層級，在前饋函數中傳遞所有輸入。

希望以上能幫你對如何在 PyTorch 中定義模型有了基本的理解。

模塊化

上面的代碼雖然酷，但是還不夠很酷——如果我們想洗個非常深的神經網絡，代碼會看着非常臃腫。而讓代碼保持乾淨整潔的關鍵就是模塊化。在上面的例子中，我們可以將卷積和 ReLU放在一個單獨的模塊中，將模塊的大部分堆疊在我們的 SimpleNet中。

要做到這點，我們首先以如下方式定義一個新模塊：


class Unit(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Unit, self).__init__()
 
        self.conv = nn.Conv2d(in_channels=in_channels, kernel_size=3, out_channels=out_channels, stride=1, padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()
 
    def forward(self, input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
 
        return output

如上所示，這個單元包含了卷積層-規範層 -ReLU 層。

不想我們所說的第一個例子，這裏我們將 BatchNorm2d 放在了 ReLU 前面。規範層會將所有輸入標準化爲具有零平均值和單位變異數。它會大幅提高 CNN 模型的準確率。

定義好上面的單元后，我們現在將它們堆疊在一起。


class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
 
 
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()
 
    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
 
        return output
 
class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()
 
        #Create 14 layers of the unit with max pooling in between
        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)
 
        self.pool1 = nn.MaxPool2d(kernel_size=2)
 
        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)
 
        self.pool2 = nn.MaxPool2d(kernel_size=2)
 
        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)
 
        self.pool3 = nn.MaxPool2d(kernel_size=2)
 
        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)
 
        self.avgpool = nn.AvgPool2d(kernel_size=4)
 
        #Add all the units into the Sequential layer in exact order
        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)
 
        self.fc = nn.Linear(in_features=128,out_features=num_classes)
 
    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

我們的整個神經網絡出來了，它有14個卷積層、14個ReLU層、14個規範層、4個池化層和1個線性層組成，總共62個層！

注意我們把除了全連接層以外的所有層放入一個有序類中，讓代碼更緊湊些。這會進一步簡化前饋函數中的代碼。

self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,self.unit12, self.unit13, self.unit14, self.avgpool)

此外，最後一個單元后面的AvgPooling層會計算每個通道中的所有函數的平均值。該單元的輸出有128個通道，在池化3次後，我們的32 X 32圖像變成了4 X 4。我們以核大小爲4使用AvgPool2D，將我們的特徵圖譜調整爲1X1X128。


self.avgpool = nn.AvgPool2d(kernel_size=4)
 
因此，線性層會有1X1X128=128個輸入特徵。
self.fc = nn.Linear(in_features=128,out_features=num_classes)
 
我們同樣會壓平神經網絡的輸出，讓它有128個特徵。
output = output.view(-1,128)

加載和增強數據

得益於torchvision包，數據加載在PyTorch中非常容易。比如，我們加載本文所用的CIFAR10 數據集。

首先，我們需要3個額外的導入語句。


from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms
from torch.utils.data import DataLoader

要加載數據集，我們按照如下步驟操作：

定義即將應用在圖像上的轉換

用torchvision加載數據集

創建DataLoader的實例來保存照片

代碼如下所示：


# 定義訓練集的轉換，隨機翻轉圖像，剪裁圖像，應用平均和標準正常化方法
train_transformations = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32,padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))
])
 
# 加載訓練集
train_set =CIFAR10(root="./data",train=True,transform=train_transformations,download=True)
 
# 爲訓練集創建加載程序
train_loader = DataLoader(train_set,batch_size=32,shuffle=True,num_workers=4)

首先，我們用 transform.Compose 輸入轉換的一個數組。RandomHorizontalFlip 會隨機水平翻轉照片。RandomCrop 隨機剪裁照片。下面是水平剪裁的示例：

最後，兩個最重要的步驟：ToTensor 將圖像轉換爲 PyTorch 能夠使用的格式；Normalize會讓所有像素範圍處於-1到+1之間。注意，在聲明轉換時，ToTensor 和 Normalize 必須和前面定義的順序一致。主要是因爲在輸入圖像上也應用了其它的轉換，比如 PIL 圖像處理。

數據增強能幫助模型正確地分類圖像，不用考慮圖像的展示角度。

接着，我們用 CIFAR10 類加載訓練集，最終我們爲訓練集創建一個加載程序，指定批次大小爲32張圖像。

在測試集中重複此步驟，只是轉換隻包括 ToTensor 和 Normalize。我們在測試集中不用其它類型的轉換。


# 定義測試集的轉換
test_transformations = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
 
])
 
# 加載測試集，注意這裏的train設爲false
test_set = CIFAR10(root="./data", train=False, transform=test_transformations, download=True)
 
# 爲測試集創建加載程序，注意這裏的shuffle設爲false
test_loader = DataLoader(test_set, batch_size=32, shuffle=False, num_workers=4)

你首次運行此代碼時，大約會有 179MB 的數據集加載到你的系統中。

訓練模型

用 PyTorch 訓練神經網絡非常清晰明確，你能區安全控制控制訓練過程。我們一步一步解釋。

以如下命令導入 Adam 優化器：

from torch.optim import Adam

第一步：初始化模型，創建優化器和損失函數


from torch.optim import Adam
 
 
# 檢查GPU是否可用
cuda_avail = torch.cuda.is_available()
 
# 創建模型，優化器和損失函數
model = SimpleNet(num_classes=10)
 
# 若GPU可用，將模型移往GPU
if cuda_avail:
    model.cuda()
 
# 定義優化器和損失函數
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=0.0001)
loss_fn = nn.CrossEntropyLoss()

第二步：寫一個函數調整學習率

創建一個學習率調整函數，每30個週期將學習率除以10。


# Create a learning rate adjustment function that divides the learning rate by 10 every 30 epochs
def adjust_learning_rate(epoch):
    lr = 0.001
 
    if epoch > 180:
        lr = lr / 1000000
    elif epoch > 150:
        lr = lr / 100000
    elif epoch > 120:
        lr = lr / 10000
    elif epoch > 90:
        lr = lr / 1000
    elif epoch > 60:
        lr = lr / 100
    elif epoch > 30:
        lr = lr / 10
 
    for param_group in optimizer.param_groups:
        param_group["lr"] = lr

該函數會在每30個週期後將學習率除以10.

第三步：寫出函數保存和評估模型


def save_models(epoch):
    torch.save(model.state_dict(), "cifar10model_{}.model".format(epoch))
print("Chekcpoint saved")
 
def test():
    model.eval()
    test_acc = 0.0
    for i, (images, labels) in enumerate(test_loader):
 
        if cuda_avail:
            images = Variable(images.cuda())
            labels = Variable(labels.cuda())
 
        # Predict classes using images from the test set
        outputs = model(images)
        _, prediction = torch.max(outputs.data, 1)
 
        test_acc += torch.sum(prediction == labels.data)
 
    # Compute the average acc and loss over all 10000 test images
    test_acc = test_acc / 10000
 
return test_acc

爲了能評估模型在測試集上準確度，我們迭代測試加載程序。在每一步，我們會把圖像和標籤移往GPU，在Variable中將它們封裝。將圖像傳入模型中以獲取預測值。選擇最大預測值，然後和實際類進行比較，以獲取準確率。最後，我們返回平均準確率。

第四步：寫出訓練函數


def train(num_epochs):
    best_acc = 0.0
 
    for epoch in range(num_epochs):
        model.train()
        train_acc = 0.0
        train_loss = 0.0
        for i, (images, labels) in enumerate(train_loader):
            # 若GPU可用，將圖像和標籤移往GPU
            if cuda_avail:
                images = Variable(images.cuda())
                labels = Variable(labels.cuda())
 
            # 清除所有累積梯度
            optimizer.zero_grad()
            # 用來自測試集的圖像預測類
            outputs = model(images)
            # 根據實際標籤和預測值計算損失
            loss = loss_fn(outputs, labels)
            # 傳播損失
            loss.backward()
 
            # 根據計算的梯度調整參數
            optimizer.step()
 
            train_loss += loss.cpu().data[0] * images.size(0)
            _, prediction = torch.max(outputs.data, 1)
 
            train_acc += torch.sum(prediction == labels.data)
 
        # 調用學習率調整函數
        adjust_learning_rate(epoch)
 
        # 計算模型在50000張訓練圖像上的準確率和損失值
        train_acc = train_acc / 50000
        train_loss = train_loss / 50000
 
        # 用測試集評估
        test_acc = test()
 
        # 若測試準確率高於當前最高準確率，則保存模型
        if test_acc > best_acc:
            save_models(epoch)
            best_acc = test_acc
 
        # 打印度量
        print("Epoch {}, Train Accuracy: {} , TrainLoss: {} , Test Accuracy: {}".format(epoch, train_acc, train_loss,

上面的訓練函數雖然有註釋，但有些地方可能仍然會讓你感到很困惑。我們詳細解釋一下上面到底發生了什麼。

首先我們循環訓練集的加載程序：


for i, (images,labels) in enumerate(train_loader):
 
接着，如果可以用GPU，我們就將圖像和標籤移往GPU：
if cuda_avail:
    images = Variable(images.cuda())
labels = Variable(labels.cuda())

下一行就是清除當前所有的累積梯度：

optimizer.zero_grad()

這很重要，因爲根據每個批次累積的梯度神經網絡的權重是可以調整的，在每個新的批次內梯度必須重新設置爲0，因此之前批次中的圖像不會將梯度傳播入新的批次。

在接下來的步驟中，我們將圖像傳入模型中。模型會返回預測值，然後我們將預測值和實際標籤輸入損失函數中。

我們調用 loss.backward() 來傳播梯度，然後根據傳播的梯度調用 optimizer.step() 來修正模型的參數。

這些就是訓練的主要步驟。

剩餘的代碼用於計算度量：


train_loss += loss.cpu().data[0] * images.size(0)
_, prediction = torch.max(outputs.data, 1)
 
train_acc += torch.sum(prediction == labels.data)

這裏我們檢索實際損失，然後獲取最大預測類。最後，我們將所有批次中的正確預測值相加，把所得值添加入整個 train_acc 中。

更重要的是，我們會一直追蹤最高的準確率，如果當前測試準確率高於我們的最好成績，我們就調用保存模型的函數。

GitHub 完整代碼地址：

https://gist.github.com/johnolafenwa/96b3322aabb61d4d36fd870a77f02aa3

運行此代碼 35 個週期後，你應該會得到超過 90% 的準確率。

用保存的模型進行推斷

模型經過訓練後，就可以用來對新的圖像進行推斷。

執行推斷過程的步驟如下：

定義和初始化你在訓練階段構造的同一模型
將保存的檢查點加載到模型中
從文件系統中選擇一張圖像
讓圖像通過模型，檢索最高預測值
將預測的類數目轉換爲類名

我們用具有預訓練的 ImageNet 權重的 Squeeze 模型來解釋一下。它幾乎能讓我們選擇任何圖形，並獲取圖像的預測值。

Torchvision 提供預定義模型，涵蓋大部分主流架構。

首先，導入所有需要的包和類，創建Squeezenet模型的實例，


# 導入需要的包
import torch
import torch.nn as nn
from torchvision.transforms import transforms
from torch.autograd import Variable
from torchvision.models import squeezenet1_1
import requests
import shutil
from io import open
import os
from PIL import Image
import json
 
 
model = squeezenet1_1(pretrained=True)
model.eval()

注意，在上面的代碼中，通過將pretrained設爲True，Squeezenet模型在你首次運行函數時就會被下載。模型的大小隻有4.7 MB。

接着，創建一個預測函數，如下：


def predict_image(image_path):
    print("Prediction in progress")
    image = Image.open(image_path)
 
    # Define transformations for the image, should (note that imagenet models are trained with image size 224)
    transformation = transforms.Compose([
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
 
    ])
 
    # 預處理圖像 
    image_tensor = transformation(image).float()
 
    # 額外添加一個批次維度，因爲PyTorch將所有的圖像當做批次
    image_tensor = image_tensor.unsqueeze_(0)
 
    if torch.cuda.is_available():
        image_tensor.cuda()
 
    # 將輸入變爲變量
    input = Variable(image_tensor)
 
    # 預測圖像的類
    output = model(input)
 
    index = output.data.numpy().argmax()
 
return index

上面的代碼包含了我們在訓練和評估模型階段所用的同樣組件。可以查看上面代碼中的註釋。

最後，在主函數中進行預測，我們從網上下載一張圖像，保存在硬盤上。我們同樣下載將所有類索引映射爲實際類名的類映射。這是因爲我們的模型會根據預測類名的編碼方式，返回預測類的索引，然後從索引-類映射中檢索實際的類名。

在這之後，我們用保存的圖像運行預測函數，用保存的類映射獲取正確的類名。


if __name__ == "__main__":
 
    imagefile = "image.png"
 
    imagepath = os.path.join(os.getcwd(), imagefile)
    # Donwload image if it doesn't exist
    if not os.path.exists(imagepath):
        data = requests.get(
            "https://github.com/OlafenwaMoses/ImageAI/raw/master/images/3.jpg", stream=True)
 
        with open(imagepath, "wb") as file:
            shutil.copyfileobj(data.raw, file)
 
        del data
 
    index_file = "class_index_map.json"
 
    indexpath = os.path.join(os.getcwd(), index_file)
    # Donwload class index if it doesn't exist
    if not os.path.exists(indexpath):
        data = requests.get('https://github.com/OlafenwaMoses/ImageAI/raw/master/imagenet_class_index.json')
 
        with open(indexpath, "w", encoding="utf-8") as file:
            file.write(data.text)
 
    class_map = json.load(open(indexpath))
 
    # run prediction function annd obtain prediccted class index
    index = predict_image(imagepath)
 
    prediction = class_map[str(index)][1]
 
print("Predicted Class ", prediction)

這是推斷過程的完整代碼：


# Import needed packages
import torch
import torch.nn as nn
from torchvision.transforms import transforms
import matplotlib.pyplot as plt
import numpy as np
from torch.autograd import Variable
from torchvision.models import squeezenet1_1
import torch.functional as F
import requests
import shutil
from io import open
import os
from PIL import Image
import json
 
""" Instantiate model, this downloads tje 4.7 mb  squzzene the first time it is called.
To use with your own model, re-define your trained networks ad load weights as below
 
checkpoint = torch.load("pathtosavemodel")
model = SimpleNet(num_classes=10)
 
 
model.load_state_dict(checkpoint)
model.eval()
"""
 
 
model = squeezenet1_1(pretrained=True)
model.eval()
 
 
def predict_image(image_path):
    print("Prediction in progress")
    image = Image.open(image_path)
 
    # Define transformations for the image, should (note that imagenet models are trained with image size 224)
    transformation = transforms.Compose([
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
 
    ])
 
    # Preprocess the image
    image_tensor = transformation(image).float()
 
    # Add an extra batch dimension since pytorch treats all images as batches
    image_tensor = image_tensor.unsqueeze_(0)
 
    if torch.cuda.is_available():
        image_tensor.cuda()
 
    # Turn the input into a Variable
    input = Variable(image_tensor)
 
    # Predict the class of the image
    output = model(input)
 
    index = output.data.numpy().argmax()
 
    return index
 
 
if __name__ == "__main__":
 
    imagefile = "image.png"
 
    imagepath = os.path.join(os.getcwd(), imagefile)
    # Donwload image if it doesn't exist
    if not os.path.exists(imagepath):
        data = requests.get(
            "https://github.com/OlafenwaMoses/ImageAI/raw/master/images/3.jpg", stream=True)
 
        with open(imagepath, "wb") as file:
            shutil.copyfileobj(data.raw, file)
 
        del data
 
    index_file = "class_index_map.json"
 
    indexpath = os.path.join(os.getcwd(), index_file)
    # Donwload class index if it doesn't exist
    if not os.path.exists(indexpath):
        data = requests.get('https://github.com/OlafenwaMoses/ImageAI/raw/master/imagenet_class_index.json')
 
        with open(indexpath, "w", encoding="utf-8") as file:
            file.write(data.text)
 
    class_map = json.load(open(indexpath))
 
    # run prediction function annd obtain prediccted class index
    index = predict_image(imagepath)
 
    prediction = class_map[str(index)][1]
 
    print("Predicted Class ", prediction)

上面所用的樣本圖像就是下面這張：

這張照片來自ImageAI。如果你想用自己搭建的網絡進行推斷，比如我們前面搭建的SimpleNet，你只需替換模型的加載部分：


checkpoint = torch.load("pathtosavemodel")
model = SimpleNet(num_classes=10)
 
 
model.load_state_dict(checkpoint)
model.eval()

注意，如果你的模型使用ImageNet訓練的，那麼你的num_classes必須爲1000而不是10.

代碼的所有其它部分維持一致，只有一點不同——如果我們以使用CIFAR10訓練的模型進行預測，那麼在轉換中，要將transforms.CenterCrop(224)改變爲transforms.Resize(32)。

不過，如果你的模型是用ImageNet訓練的，就不用改了。

結語

本文我們介紹瞭如何用PyTorch搭建一個圖像分類器，以及如何用訓練後的模型對其它數據做出預測。

關於PyTorch和TensorFlow的不同之處，可以參考我們的這篇文章：

https://zhuanlan.zhihu.com/p/37102973

參考資料：
https://heartbeat.fritz.ai/basics-of-image-classification-with-pytorch-2f8973c51864

Cool丶白鼠

發佈了27 篇原創文章 · 獲贊 4 · 訪問量 1萬+

私信關注

手把手教你用PyTorch從零搭建圖像分類模型

webapi框架搭建-創建項目（二）-以iis爲部署環境的配置

winform窗體自適應

接口測試工具soapUI

如何壓縮Json格式數據，減少Json數據的體積？

native C++ 動態調用.NET DLL

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結