本文的示例代码为GPU版,如若需要CPU版代码,请自行删去代码中关于device的部分。
一个全连接神经网络就想当于是一个多层感知机,接下来我将从零开始构造一个全连接神经网络,包括:结点参数,损失函数,优化方法等。
由于大部分模型都能在传统的MNIST数据集上取得比较好的分类性能,我将使用Fashion-MNIST数据集来进行我们的实验。这个数据集一共包含10个类别,数据集大小只有几十M。
我们在导入函数时一律将数据转化为Tensor类型,transforms.ToTensor
将数据转化为了torch.float32类型且位于[0.0,1.0]的tensor。
1 导入数据集
## 导入数据集
import torch
import torchvision as tv
from torchvision import transforms
from IPython import display
display.set_matplotlib_formats('svg')
mn_train = tv.datasets.FashionMNIST(root = '~/Datasets/FashionMNIST', train=True, download=True, transform = transforms.ToTensor())
mn_test = tv.datasets.FashionMNIST(root = '~/Datasets/FashionMNIST', train=False, download=True, transform = transforms.ToTensor())
print(len(mn_train), len(mn_test))
print(mn_train[0][0].shape, mn_train[0][1])
device = torch.device('cuda')##在GPU上训练
这段代码的输出是:
60000 10000
torch.Size([1, 28, 28]) 9
从上面的输出可以看到,数据集是一张张28*28尺寸的图片,其中9是第一张图片的标签。接下来我们看看数据具体长啥样。
2 查看数据
## 绘图
%matplotlib inline
import matplotlib.pyplot as plt
labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
imgs, y = [], []
for i in range(10):
imgs.append(mn_train[i][0])
y.append(mn_train[i][1])
labels = [labels[int(i)] for i in y]
plt.rcParams['figure.figsize'] = (12,12)
_, figs = plt.subplots(1, len(imgs))
for f, img, label in zip(figs, imgs, labels):
f.imshow(img.view( (28, 28) ).numpy())
f.set_title(label)
f.axes.get_xaxis().set_visible(False)
f.axes.get_yaxis().set_visible(False)
plt.show()
输出为:
接下来我们使用DataLoader
来构造一个数据读取器。详情见这篇博客,这篇博客中有关于如何使用pytorch读取数据的详细介绍。
3 读取数据
##读取数据
from torch.utils.data import DataLoader
batch_size = 128
train_iter = DataLoader(mn_train, batch_size = 128, shuffle= True, num_workers = 4)
test_iter = DataLoader(mn_test, batch_size = 128, shuffle= False, num_workers = 4)
接下来构造模型。
4 定义模型参数
在这里,我们构造一个包含两个隐层的模型。在定义参数的时候,关于device的使用,有个大坑,博主踩过,希望各位以后不要踩到,详情请看这篇文章
##定义模型参数
import numpy as np
num_inputs, num_hiddens1, num_hiddens2, num_outputs = 784, 128, 128, 10
w1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)),
dtype=torch.float, device = device, requires_grad=True)
b1 = torch.zeros(num_hiddens1, device = device, requires_grad=True)
w2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)),
dtype=torch.float,device = device, requires_grad=True)
b2 = torch.zeros(num_hiddens2,device = device, requires_grad=True)
w3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)),
dtype=torch.float, device = device,requires_grad=True)
b3 = torch.zeros(num_outputs,device = device, requires_grad=True)
params = [w1, b1, w2, b2, w3, b3]
5 构造损失函数,优化方法,激活函数和评价函数
def cross_entropy(y_hat, y, batch_size = batch_size):
los = - torch.log(y_hat.gather(1, y.view(-1, 1) ) ) / batch_size
return los
def sgd(params, lr):
for param in params:
param.data -= lr * param.grad
def relu(x):
return torch.max(input = x, other = torch.tensor(0.0).to(device))
def evaluate_accuracy(y_hat,y):
acc = 0
acc += (y_hat.argmax(dim=1) == y).float().sum().item()
return acc
6 构造模型
## 构造模型
###定义softmax
def softmax(x):
x_exp = x.exp()
partition = x_exp.sum(dim=1, keepdim=True)
return x_exp / partition # 这里应用了广播机制
def net(x):
x = x.view(-1, num_inputs)
h1 = relu(torch.matmul(x,w1) + b1) ###第一个隐层
h2 = relu(torch.matmul(h1,w2) + b2)
output = softmax(torch.matmul(h2, w3) + b3)
return output
7 训练模型
这里需要注意一下学习率的设置,过大的学习率将导致模型无法收敛,详情请见这篇文章
epochs = 20
lr = 0.1
loss = cross_entropy
###训练模型
for epoch in range(epochs):
train_ls = 0 ##train loss
train_acc = 0
test_acc = 0
n = 0
###训练集
for x, y in train_iter:
x = x.to(device)
y = y.to(device)
y_hat = net(x)
ls = loss(y_hat, y).sum()
ls.backward()##反向传播
sgd(params, lr)
for param in params:
param.grad.data.zero_()
train_ls += ls.item()
train_acc += evaluate_accuracy(y_hat, y)
n += y.shape[0]
##测试集
n_test = 0
for x, y in test_iter:
x = x.to(device)
y = y.to(device)
y_hat = net(x)
test_acc += evaluate_accuracy(y_hat, y)
n_test += y.shape[0]
print('epoch: {}, loss: {:.4f}, train acc: {:.4f}, test acc: {:.4f}'.format(epoch+1, train_ls/n, train_acc/n, test_acc/n_test))
输出为:
epoch: 1, loss: 0.0117, train acc: 0.4318, test acc: 0.6827
epoch: 2, loss: 0.0055, train acc: 0.7444, test acc: 0.7803
epoch: 3, loss: 0.0043, train acc: 0.8033, test acc: 0.8212
epoch: 4, loss: 0.0037, train acc: 0.8291, test acc: 0.8363
epoch: 5, loss: 0.0034, train acc: 0.8426, test acc: 0.8391
epoch: 6, loss: 0.0032, train acc: 0.8518, test acc: 0.8370
epoch: 7, loss: 0.0030, train acc: 0.8587, test acc: 0.8487
epoch: 8, loss: 0.0029, train acc: 0.8647, test acc: 0.8541
epoch: 9, loss: 0.0028, train acc: 0.8713, test acc: 0.8500
epoch: 10, loss: 0.0027, train acc: 0.8749, test acc: 0.8641
epoch: 11, loss: 0.0026, train acc: 0.8784, test acc: 0.8671
epoch: 12, loss: 0.0025, train acc: 0.8813, test acc: 0.8710
epoch: 13, loss: 0.0025, train acc: 0.8842, test acc: 0.8730
epoch: 14, loss: 0.0024, train acc: 0.8885, test acc: 0.8651
epoch: 15, loss: 0.0023, train acc: 0.8903, test acc: 0.8763
epoch: 16, loss: 0.0023, train acc: 0.8927, test acc: 0.8750
epoch: 17, loss: 0.0022, train acc: 0.8945, test acc: 0.8791
epoch: 18, loss: 0.0022, train acc: 0.8970, test acc: 0.8716
epoch: 19, loss: 0.0021, train acc: 0.8981, test acc: 0.8762
epoch: 20, loss: 0.0021, train acc: 0.8998, test acc: 0.8760
9 保存和加载模型
训练好了模型自然就要保存起来了,方便以后加载使用。详情请见我这篇博客>>Pytorch保存和加载模型完全指南: 关于使用Pytorch读写模型的一切方法