手把手教你用Pytorch构造模型(GPU版和CPU版)

本文的示例代码为GPU版，如若需要CPU版代码，请自行删去代码中关于device的部分。

文章目录

一个全连接神经网络就想当于是一个多层感知机，接下来我将从零开始构造一个全连接神经网络，包括：结点参数，损失函数，优化方法等。

由于大部分模型都能在传统的MNIST数据集上取得比较好的分类性能，我将使用Fashion-MNIST数据集来进行我们的实验。这个数据集一共包含10个类别，数据集大小只有几十M。

我们在导入函数时一律将数据转化为Tensor类型，transforms.ToTensor将数据转化为了torch.float32类型且位于[0.0,1.0]的tensor。

1 导入数据集

## 导入数据集
import torch 
import torchvision as tv
from torchvision import transforms
from IPython import display

display.set_matplotlib_formats('svg')

mn_train = tv.datasets.FashionMNIST(root = '~/Datasets/FashionMNIST', train=True, download=True, transform = transforms.ToTensor())
mn_test = tv.datasets.FashionMNIST(root = '~/Datasets/FashionMNIST', train=False, download=True, transform = transforms.ToTensor())

print(len(mn_train), len(mn_test))
print(mn_train[0][0].shape, mn_train[0][1])
device = torch.device('cuda')##在GPU上训练

这段代码的输出是：

60000 10000
torch.Size([1, 28, 28]) 9

从上面的输出可以看到，数据集是一张张28*28尺寸的图片，其中9是第一张图片的标签。接下来我们看看数据具体长啥样。

2 查看数据

## 绘图
%matplotlib inline
import matplotlib.pyplot as plt

labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
imgs, y = [], []
for i in range(10):
    imgs.append(mn_train[i][0])
    y.append(mn_train[i][1])

labels = [labels[int(i)] for i in y]

plt.rcParams['figure.figsize'] = (12,12)
_, figs = plt.subplots(1, len(imgs))

for f, img, label in zip(figs, imgs, labels):
    f.imshow(img.view( (28, 28) ).numpy())
    f.set_title(label)
    f.axes.get_xaxis().set_visible(False)
    f.axes.get_yaxis().set_visible(False)
plt.show()

输出为：

接下来我们使用DataLoader来构造一个数据读取器。详情见这篇博客，这篇博客中有关于如何使用pytorch读取数据的详细介绍。

3 读取数据

##读取数据
from torch.utils.data import DataLoader
batch_size = 128

train_iter = DataLoader(mn_train, batch_size = 128, shuffle= True, num_workers = 4)
test_iter = DataLoader(mn_test, batch_size = 128, shuffle= False, num_workers = 4)

接下来构造模型。

4 定义模型参数

在这里，我们构造一个包含两个隐层的模型。在定义参数的时候，关于device的使用，有个大坑，博主踩过，希望各位以后不要踩到，详情请看这篇文章

##定义模型参数
import numpy as np

num_inputs, num_hiddens1, num_hiddens2, num_outputs = 784, 128, 128, 10

w1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)),
                  dtype=torch.float, device = device, requires_grad=True)
b1 = torch.zeros(num_hiddens1, device = device, requires_grad=True)

w2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)),
                  dtype=torch.float,device = device, requires_grad=True)
b2 = torch.zeros(num_hiddens2,device = device, requires_grad=True)

w3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), 
                  dtype=torch.float, device = device,requires_grad=True)
b3 = torch.zeros(num_outputs,device = device, requires_grad=True)


params = [w1, b1, w2, b2, w3, b3]

5 构造损失函数，优化方法，激活函数和评价函数

def cross_entropy(y_hat, y, batch_size = batch_size):
    los = - torch.log(y_hat.gather(1, y.view(-1, 1) ) ) / batch_size
    return los 
 
def sgd(params, lr):
    for param in params:
        param.data -= lr * param.grad
        
def relu(x):
    return torch.max(input = x, other = torch.tensor(0.0).to(device))

def evaluate_accuracy(y_hat,y):
    acc = 0
    acc += (y_hat.argmax(dim=1) == y).float().sum().item()
    return acc

6 构造模型

## 构造模型

###定义softmax
def softmax(x):
    x_exp = x.exp()
    partition = x_exp.sum(dim=1, keepdim=True)
    return x_exp / partition  # 这里应用了广播机制

def net(x):
    x = x.view(-1, num_inputs)
    h1 = relu(torch.matmul(x,w1) + b1) ###第一个隐层
    h2 = relu(torch.matmul(h1,w2) + b2)
    output = softmax(torch.matmul(h2, w3) + b3) 
    
    return output

7 训练模型

这里需要注意一下学习率的设置，过大的学习率将导致模型无法收敛，详情请见这篇文章

epochs = 20
lr = 0.1

loss = cross_entropy

###训练模型
for epoch in range(epochs):
    train_ls = 0 ##train loss
    train_acc = 0
    test_acc = 0
    n = 0
    ###训练集
    for x, y in train_iter:
        x = x.to(device)
        y = y.to(device)
        y_hat = net(x)
        ls = loss(y_hat, y).sum()
        ls.backward()##反向传播
        sgd(params, lr)
        for param in params:
            param.grad.data.zero_()
        train_ls += ls.item()
        train_acc += evaluate_accuracy(y_hat, y)
        n += y.shape[0]
        
    ##测试集
    n_test = 0
    for x, y in test_iter:
        x = x.to(device)
        y = y.to(device)
        y_hat = net(x)
        test_acc += evaluate_accuracy(y_hat, y)
        n_test += y.shape[0]
    print('epoch: {}, loss: {:.4f}, train acc: {:.4f}, test acc: {:.4f}'.format(epoch+1, train_ls/n, train_acc/n, test_acc/n_test))

输出为:

epoch: 1, loss: 0.0117, train acc: 0.4318, test acc: 0.6827
epoch: 2, loss: 0.0055, train acc: 0.7444, test acc: 0.7803
epoch: 3, loss: 0.0043, train acc: 0.8033, test acc: 0.8212
epoch: 4, loss: 0.0037, train acc: 0.8291, test acc: 0.8363
epoch: 5, loss: 0.0034, train acc: 0.8426, test acc: 0.8391
epoch: 6, loss: 0.0032, train acc: 0.8518, test acc: 0.8370
epoch: 7, loss: 0.0030, train acc: 0.8587, test acc: 0.8487
epoch: 8, loss: 0.0029, train acc: 0.8647, test acc: 0.8541
epoch: 9, loss: 0.0028, train acc: 0.8713, test acc: 0.8500
epoch: 10, loss: 0.0027, train acc: 0.8749, test acc: 0.8641
epoch: 11, loss: 0.0026, train acc: 0.8784, test acc: 0.8671
epoch: 12, loss: 0.0025, train acc: 0.8813, test acc: 0.8710
epoch: 13, loss: 0.0025, train acc: 0.8842, test acc: 0.8730
epoch: 14, loss: 0.0024, train acc: 0.8885, test acc: 0.8651
epoch: 15, loss: 0.0023, train acc: 0.8903, test acc: 0.8763
epoch: 16, loss: 0.0023, train acc: 0.8927, test acc: 0.8750
epoch: 17, loss: 0.0022, train acc: 0.8945, test acc: 0.8791
epoch: 18, loss: 0.0022, train acc: 0.8970, test acc: 0.8716
epoch: 19, loss: 0.0021, train acc: 0.8981, test acc: 0.8762
epoch: 20, loss: 0.0021, train acc: 0.8998, test acc: 0.8760

9 保存和加载模型

训练好了模型自然就要保存起来了，方便以后加载使用。详情请见我这篇博客>>Pytorch保存和加载模型完全指南: 关于使用Pytorch读写模型的一切方法

miguemath

发布了25 篇原创文章 · 获赞 3 · 访问量 1万+

私信关注

手把手教你用Pytorch构造模型(GPU版和CPU版)

文章目录

1 导入数据集

2 查看数据

3 读取数据

4 定义模型参数

5 构造损失函数，优化方法，激活函数和评价函数

6 构造模型

7 训练模型

9 保存和加载模型

杭州的 IT 崩盘了么？

开源高性能结构化日志模块NanoLog

Python 潮流周刊#55：分享 9 个高质量的技术类信息源！

Azure Virtual Network (22) 多订阅使用Azure DNS解析问题 Windows Azure Platform 系列文章目录

【简写Mybatis-02】注册机的实现以及SqlSession处理

手绘二维码

.NET借助虚拟网卡实现一个简单异地组网工具

LSTM用於元學習-"Learning to learn by gradient descent by gradient descent"-筆記詳解

MAML模型無關的元學習方法

Learning to Learn without Gradient Descent by Gradient Descent論文解析（黑箱優化問題）

Latex ulem包設置下劃線刪除線強調文本等效果

高斯過程是什麼？

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結