PyTorch分類網絡：Python訓練_測試_模型轉換 && Windows_LibTorch_C++部署

第一部分前言

衆位小夥伴，好久沒更新博客了，本次爲大家帶來：如何將PyTorch訓練的網絡通過模型轉換，部署到Windows純C++下執行，落地應用。這裏並沒有將PyTorch模型轉至其他深度學習框架下，而是通過PyTorch的LibTorch來完成相關C++的部署應用。

第二部分說明

PyTorch版本：Torch-1.4.0-cu101

LibTorch版本：LibTorch-1.4.0-cu101

Anaconda版本：Anaconda3-Python3.6

GPU：GTX1080

VS版本：VS2017（用於編譯LibTorch）

第三部分開源鏈接

GITHUB（如果對您有幫助，還望給個star，以鼓勵博主向着開源之路繼續前進^v^）

第四部分訓練數據

以kaggle貓狗大戰數據集爲例，數據格式如下：

1、訓練數據路徑：data/train/cat/*.jpg，data/train/dog/*.jpg

2、驗證數據路徑：data/val/cat/*.jpg，data/val/dog/*.jpg

第五部分訓練代碼

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torchvision import datasets, models, transforms

import time
import os
import copy

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('learning rate: {}'.format(scheduler.get_lr()[0]))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
                
            if phase == 'train':
                scheduler.step()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,shuffle=True, num_workers=0) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# model_conv = torchvision.models.resnet18(pretrained=False)
# print(model_conv)
# for param in model_conv.parameters():
#     param.requires_grad = False

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)

torch.save(model_ft, 'model.pkl')

第六部分測試代碼

from __future__ import print_function, division

import torch
import torch.nn.functional as F
from torchvision import transforms

from PIL import Image
import os

classes = ['cat','dog']

test_path = "data/val/"
true_count = 0
all_count = 0

for test_dir in os.listdir(test_path):
    test_dir_path = test_path + test_dir + "/"
    for img_names in os.walk(test_dir_path):
        for img_name in img_names[2]:
            img_path = test_dir_path + img_name
            print(img_path)
    
            image = Image.open(img_path)
            transform = transforms.Compose([
                    transforms.Resize(256),
                    transforms.CenterCrop(224),
                    transforms.ToTensor(),
                    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
                ])
            image_transformed = transform(image)
            image_transformed = image_transformed.unsqueeze(0)
                
            device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
            
            model = torch.load('model.pkl')
            model = model.to(device)
            
            model.eval()
            
            output = model(image_transformed.to(device))
            output = F.softmax(output, dim=1)
            predict_value, predict_idx = torch.max(output, 1)
            
            if(classes[predict_idx.cpu().data[0].numpy()] == test_dir):
                true_count += 1
            
            all_count += 1
        
print("acc: {}/{}={}".format(true_count,all_count,float(true_count)/float(all_count)))
#acc: 1966/2000=0.983

第七部分模型轉換代碼

"""
This python script converts the network into Script Module---CPU
"""
import torch

# Download and load the pre-trained model
model = torch.load("model.pkl",map_location='cpu')

model.eval()

example_input = torch.rand(1, 3, 224, 224)
script_module = torch.jit.trace(model, example_input)
script_module.save('model_cpu.pt')

#"""
#This python script converts the network into Script Module---GPU
#"""
#import torch
#
#device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#
## Download and load the pre-trained model
#model = torch.load("model.pkl")
#
#model.eval()
#
#example_input = torch.rand(1, 3, 224, 224)
#script_module = torch.jit.trace(model, example_input.to(device))
#script_module.save('model_gpu.pt')

第八部分 LibTorch_C++實現代碼

#include <torch/torch.h>
#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>

#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

Mat pilResize(Mat &img, int size) {
	int imgWidth = img.cols;
	int imgHeight = img.rows;
	if ((imgWidth <= imgHeight && imgWidth == size) || (imgHeight <= imgWidth && imgHeight == size)) {
		return img;
	}
	Mat output;
	if (imgWidth < imgHeight) {
		int outWidth = size;
		int outHeight = int(size * imgHeight / (float)imgWidth);
		resize(img, output, Size(outWidth, outHeight));
	}
	else {
		int outHeight = size;
		int outWidth = int(size * imgWidth / (float)imgHeight);
		resize(img, output, Size(outWidth, outHeight));
	}

	return output;
}

Mat pilCropCenter(Mat &img, int output_size) {
	Rect imgRect;
	imgRect.x = int(round((img.cols - output_size) / 2.));
	imgRect.y = int(round((img.rows - output_size) / 2.));
	imgRect.width = output_size;
	imgRect.height = output_size;

	return img(imgRect).clone();
}

Mat setNorm(Mat &img) {
	Mat img_rgb;
	cvtColor(img, img_rgb, COLOR_RGB2BGR);

	Mat img_resize = pilResize(img_rgb, 256);
	Mat img_crop = pilCropCenter(img_resize, 224);

	Mat image_resized_float;
	img_crop.convertTo(image_resized_float, CV_32F, 1.0 / 255.0);

	return image_resized_float;
}

Mat setMean(Mat &image_resized_float) {
	vector<float> mean = { 0.485, 0.456, 0.406 };
	vector<float> std = { 0.229, 0.224, 0.225 };

	vector<Mat> image_resized_split;
	split(image_resized_float, image_resized_split);
	for (int ch = 0; ch < image_resized_split.size(); ch++) {
		image_resized_split[ch] -= mean[ch];
		image_resized_split[ch] /= std[ch];
	}
	Mat image_resized_merge;
	merge(image_resized_split, image_resized_merge);

	return image_resized_merge;
}

int main() {
	torch::DeviceType device_type;
	if (torch::cuda::is_available()) {
		std::cout << "CUDA available! Test on GPU." << std::endl;
		device_type = torch::kCUDA;
	}
	else {
		std::cout << "Test on CPU." << std::endl;
		device_type = torch::kCPU;
	}
	torch::Device device(device_type);

	// Deserialize the ScriptModule from a file using torch::jit::load().
	torch::jit::script::Module model = torch::jit::load("model_cpu.pt");
	model.to(device);

	vector<string> classes = { "cat","dog" };

	string test_path = "val/dog/";
	vector<string> img_paths;
	glob(test_path, img_paths);

	int truth_count = 0;

	for (int i = 0; i < img_paths.size(); i++) {
		Mat img = imread(img_paths[i]);

		clock_t start_t = clock();

		//norm
		Mat image_resized_float = setNorm(img);
		//mean
		Mat image_resized_merge = setMean(image_resized_float);

		auto img_tensor = torch::from_blob(image_resized_merge.data, { 224, 224, 3 }, torch::kFloat32);
		auto img_tensor_ = torch::unsqueeze(img_tensor, 0);
		img_tensor_ = img_tensor_.permute({ 0, 3, 1, 2 });

		// Create a vector of inputs.
		vector<torch::jit::IValue> inputs;
		inputs.push_back(img_tensor_.to(device));

		torch::Tensor prob = model.forward(inputs).toTensor();
		torch::Tensor output = torch::softmax(prob, 1);
		auto predict = torch::max(output, 1);

		//cout << "cost time:" << clock() - start_t << endl;

		cout << img_paths[i] << "\t";
		cout << "class: " << classes[get<1>(predict).item<int>()] <<
			", prob: " << get<0>(predict).item<float>() << endl;

		if (get<1>(predict).item<int>() == 1) {
			truth_count++;
		}
	}

	cout << truth_count << "/" << img_paths.size() << endl;
	system("pause");

	return 0;
}

第九部分相關說明

1、Pytorch默認通過PIL載入圖像數據，這點很重要！

2、需要載入ResNet預訓練模型，否則訓練效果較差！

3、PyTorch訓練出來的模型格式爲pkl，需要將其轉換爲pt格式，C++方能採用torch::jit::load方式載入。

4、轉換時，有CPU和GPU兩種方式，C++實現時可通過兩種方式載入。

5、C++實現時，需將PIL格式的圖像轉換爲OpenCV的圖像，否則數據不統一，導致測試結果不正確！

任何問題請加唯一QQ2258205918（名稱samylee）！

PyTorch分類網絡：Python訓練_測試_模型轉換 && Windows_LibTorch_C++部署

第一部分前言

第二部分說明

第三部分開源鏈接

第四部分訓練數據

第五部分訓練代碼

第六部分測試代碼

第七部分模型轉換代碼

第八部分 LibTorch_C++實現代碼

第九部分相關說明

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

通用中文OCR-離線

Amazing行人檢測（CPU Real-time）

TensorFlow-YOLO_V1測試代碼梳理

單線程、SSE、AVX運行效率對比——最大值/最小值運算

LCNN_SSD（Open Source）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

PyTorch分類網絡：Python訓練_測試_模型轉換 && Windows_LibTorch_C++部署

第一部分 前言

第二部分 說明

第三部分 開源鏈接

第四部分 訓練數據

第五部分 訓練代碼

第六部分 測試代碼

第七部分 模型轉換代碼

第八部分 LibTorch_C++實現代碼

第九部分 相關說明

第一部分前言

第二部分說明

第三部分開源鏈接

第四部分訓練數據

第五部分訓練代碼

第六部分測試代碼

第七部分模型轉換代碼

第九部分相關說明