一、pytorch入門
1、資料鏈接
- 內含pytorch完整的視頻及資料
pytorch關於nlp的教程視頻和實戰:
鏈接:https://pan.baidu.com/s/1tbvLeUTPvxKy_gcVhiT8rw
提取碼:hgkk
pytorch教程:
http://pytorchchina.com/
https://pytorch-cn.readthedocs.io/zh/latest/
http://pytorch123.com/
b站視頻鏈接:https://www.bilibili.com/video/av66421076
2、pytorch介紹
(1) c1_cpu和gpu速度對比.py
import time
import torch
print(torch.__version__)
print(torch.cuda.is_available)
a = torch.randn(10000, 1000)
b = torch.randn(1000, 2000)
t0 = time.time()
c = torch.matmul(a, b)
t1 = time.time()
print(a.device, t1 - t0, c.norm(2))
device = torch.device('cuda')
a = a.to(device)
b = b.to(device)
# 第一次在cuda上運行時,沒有完成一些環境的初始化,因此會花費一定的時間
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))
# 第二次運行就是正常的gpu加速後的速度
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))
console:
1.2.0
<function is_available at 0x0000014EA22A72F0>
cpu 0.484722375869751 tensor(141163.7188)
cuda:0 2.04582142829895 tensor(141564.8281, device='cuda:0')
cuda:0 0.007996797561645508 tensor(141564.8281, device='cuda:0')
(2) c2_autograd.py — pytorch求導
import torch
x = torch.tensor(1.)
# requires_grad=True 告訴pytorch需要對a,b,c求導
a = torch.tensor(1., requires_grad=True)
b = torch.tensor(2., requires_grad=True)
c = torch.tensor(3., requires_grad=True)
y = a ** 2 * x + b * x + c
print('before: ', a.grad, b.grad, c.grad)
# 使用pytorch對y分別對a,b,c求導
grads = torch.autograd.grad(y, [a, b, c])
print('after: ', grads[0], grads[1], grads[2])
console:
before: None None None
after: tensor(2.) tensor(1.) tensor(1.)
3、張量
Tensors (張量)
Tensors 類似於 NumPy 的 ndarrays ,同時 Tensors 可以使用 GPU 進行計算。
# 其實這句函數之後,即使在低版本的python2.X,當使用print函數時,
# 須python3.X那樣加括號使用。tips:python2.X中print不需要括號,
# 而在python3.X中則需要。
from __future__ import print_function
import torch
構造一個5×3矩陣,不初始化。
x = torch.empty(5, 3)
print(x)
console:
# 輸出:
tensor(1.00000e-04 *
[[-0.0000, 0.0000, 1.5135],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]])
構造一個隨機初始化的矩陣:
x = torch.rand(5, 3)
print(x)
console:
tensor([[ 0.6291, 0.2581, 0.6414],
[ 0.9739, 0.8243, 0.2276],
[ 0.4184, 0.1815, 0.5131],
[ 0.5533, 0.5440, 0.0718],
[ 0.2908, 0.1850, 0.5297]])
構造一個矩陣全爲 0,而且數據類型是 long.
x = torch.zeros(5, 3, dtype=torch.long)
print(x)
console:
tensor([[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]])
構造一個張量,直接使用數據:
x = torch.tensor([5.5, 3])
print(x)
console:
tensor([ 5.5000, 3.0000])
創建一個 tensor 基於已經存在的 tensor。
x = x.new_ones(5, 3, dtype=torch.double)
# new_* methods take in sizes
print(x)
x = torch.randn_like(x, dtype=torch.float)
# override dtype!重寫數據類型
print(x)
# result has the same size 最後的矩陣同樣大小
console:
tensor([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]], dtype=torch.float64)
tensor([[-0.2183, 0.4477, -0.4053],
[ 1.7353, -0.0048, 1.2177],
[-1.1111, 1.0878, 0.9722],
[-0.7771, -0.2174, 0.0412],
[-2.1750, 1.3609, -0.3322]])
獲取它的維度信息:
print(x.size())
console:
torch.Size([5, 3])
注意: torch.Size 是一個元組,所以它支持左右的元組操作。
加法: 方式 1
y = torch.rand(5, 3)
print(x + y)
console:
tensor([[-0.1859, 1.3970, 0.5236],
[ 2.3854, 0.0707, 2.1970],
[-0.3587, 1.2359, 1.8951],
[-0.1189, -0.1376, 0.4647],
[-1.8968, 2.0164, 0.1092]])
加法: 方式2
print(torch.add(x, y))
console:
tensor([[-0.1859, 1.3970, 0.5236],
[ 2.3854, 0.0707, 2.1970],
[-0.3587, 1.2359, 1.8951],
[-0.1189, -0.1376, 0.4647],
[-1.8968, 2.0164, 0.1092]])
加法: 提供一個輸出 tensor 作爲參數
result = torch.empty(5, 3)
# 將輸出結果賦值給result
torch.add(x, y, out=result)
print(result)
console:
tensor([[-0.1859, 1.3970, 0.5236],
[ 2.3854, 0.0707, 2.1970],
[-0.3587, 1.2359, 1.8951],
[-0.1189, -0.1376, 0.4647],
[-1.8968, 2.0164, 0.1092]])
加法: in-place
# adds x to y
y.add_(x)
print(y)
console:
tensor([[-0.1859, 1.3970, 0.5236],
[ 2.3854, 0.0707, 2.1970],
[-0.3587, 1.2359, 1.8951],
[-0.1189, -0.1376, 0.4647],
[-1.8968, 2.0164, 0.1092]])
注意: 任何使張量會發生變化的操作都有一個前綴 ‘’。例如:x.copy(y), x.t_(), 將會改變 x.
可以使用標準的 NumPy 類似的索引操作
print(x[:, 1])
console:
tensor([ 0.4477, -0.0048, 1.0878, -0.2174, 1.3609])
改變大小:如果你想改變一個 tensor 的大小或者形狀,你可以使用 torch.view:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8) # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())
console:
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
如果你有一個元素 tensor ,使用 .item() 來獲得這個 value 。
x = torch.randn(1)
print(x)
print(x.item())
console:
tensor([ 0.9422])
0.9422121644020081
numpy和tensor之間相互轉換
- torch的tensor和numpy的array會共享他們的存儲空間,修改一個會導致另一個也被修改
- cpu上除了CharTensor,都支持轉換爲與NumPy之間相互轉換
- 將一個torch tensor 轉化爲numpy array
# 將torch的張量轉換爲numpy的數組
a = torch.ones(5)
b = a.numpy()
print(a)
print(b)
# 此處演示當修改numpy數組之後,與之相關聯的tensor也會相應被修改
a.add_(1)
print(a)
print(b)
console:
tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]
- 將numpy數組自動更改爲torch tensor
import numpy as np
import torch
a = np.ones(5)
b = torch.Tensor(a)
np.add(a, 1, out=a)
print(a)
print(b)
console:
[2. 2. 2. 2. 2.]
tensor([1., 1., 1., 1., 1.])
- tensors通過.to函數可以被移動到任何設備
if torch.cuda.is_available():
x = torch.rand(5, 5)
# 一個cuda設備對象
device = torch.device("cuda")
# 直接而在gpu上創建tensor
y = torch.ones_like(x, device=device)
# 或者直接用字符串.to("cuda")
x = x.to(device)
z = x + y
print(z)
print(z.to("cpu", torch.double))
console:
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
[1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
[1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
[1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
[1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], device='cuda:0')
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
[1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
[1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
[1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
[1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], dtype=torch.float64)
二、Autograd自動微分
1、簡介
- autograd 包是 PyTorch 中所有神經網絡的核心。首先讓我們簡要地介紹它,然後我們將會去訓練我們的第一個神經網絡。該 autograd 軟件包爲 Tensors 上的所有操作提供自動微分。它是一個由運行定義的框架,這意味着以代碼運行方式定義你的後向傳播,並且每次迭代都可以不同。我們從 tensor 和 gradients 來舉一些例子。
- TENSOR
- torch.Tensor 是包的核心類。如果將其屬性 .requires_grad 設置爲 True,則會開始跟蹤針對 tensor 的所有操作。完成計算後,您可以調用 .backward() 來自動計算所有梯度。該張量的梯度將累積到 .grad 屬性中。
- 要停止 tensor 歷史記錄的跟蹤,您可以調用 .detach(),它將其與計算曆史記錄分離,並防止將來的計算被跟蹤。
- 要停止跟蹤歷史記錄(和使用內存),您還可以將代碼塊使用 with torch.no_grad(): 包裝起來。在評估模型時,這是特別有用,因爲模型在訓練階段具有 requires_grad = True 的可訓練參數有利於調參,但在評估階段我們不需要梯度。
- 還有一個類對於 autograd 實現非常重要那就是 Function。Tensor 和 Function 互相連接並構建一個非循環圖,它保存整個完整的計算過程的歷史信息。每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用,(如果用戶自己創建張量,則grad_fn 是 None )。
- 如果你想計算導數,你可以調用 Tensor.backward()。如果 Tensor 是標量(即它包含一個元素數據),則不需要指定任何參數backward(),但是如果它有更多元素,則需要指定一個gradient 參數來指定張量的形狀。
2、怎麼用?
- 跟蹤Tensor上的所有操作:設置屬性requires_grad=True
- 自動計算所有梯度:調用.backward()
- 停止跟蹤Tensor:
- 方式一:調用detach()
- 方式二:代碼塊:with torch.no grad
創建一個張量,設置 requires_grad=True 來跟蹤與它相關的計算
x = torch.ones(2, 2, requires_grad=True)
print(x)
# 針對張量做一個操作
y = x + 2
print(y)
# y 作爲操作的結果被創建,所以它有 grad_fn
print(y.grad_fn)
# 針對張量做更多操作
z = y * y * 3
out = z.mean()
print(z, out)
console:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
# 每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>) <AddBackward0 object at 0x000001989F20D7B8>
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
.requires_grad_( … ) 會改變張量的 requires_grad 標記。
- 如果沒有提供相應的參數,輸入的標記默認爲 False。
import torch
a = torch.randn(2, 2)
a = ((a * .3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
console:
False
True
<SumBackward0 object at 0x000001B498D8D7B8>
梯度:
- backward函數是反向傳播的入口點,在需要被求導的節點上調用backward函數會計算梯度值到相應的節點上。backward需要一個重要的參數grad_tensor,但如果節點只含有一個標量值,這個參數就可以忽略。我們現在後向傳播,因爲輸出包含了一個標量,所以out.backward() 等同於 out.backward(torch.tensor(1.)), 否者就會報如下錯誤:
backward should be called only on a scalar (i.e, 1-element tensor) or with gradient w.r.t the variable
import torch
x = torch.ones(2, 2, requires_grad=True)
print(x)
# 針對張量做一個操作
y = x + 2
print(y)
# y 作爲操作的結果被創建,所以它有 grad_fn
print(y.grad_fn)
# 針對張量做更多操作
z = y * y * 3
out = z.mean()
print(z)
print(out)
out.backward()
print(x.grad)
console:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000210E6595080>
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
- 原理解釋:
- 英文版解釋
- 中文版解釋:
雅可比向量的例子:
import torch
# .requires_grad=True 的張量自動求導。
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(x)
print(y)
# 現在在這種情況下,y 不再是一個標量。torch.autograd 不能夠直接計算整個雅可比,
# 但是如果我們只想要雅可比向量積,只需要簡單的傳遞向量給 backward 作爲參數。
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
# 你可以通過將代碼包裹在 with torch.no_grad(),來停止對從跟蹤歷史中的
# .requires_grad=True 的張量自動求導。
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
console:
tensor([ 0.2052, 0.6057, -0.6355], requires_grad=True)
tensor([ 420.3014, 1240.4666, -1301.4670], grad_fn=<MulBackward0>)
tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])
True
True
False
三、PyTorch神經網絡
1、簡介
-
神經網絡可以通過 torch.nn 包來構建。
-
現在對於自動梯度(autograd)有一些瞭解,神經網絡是基於自動梯度 (autograd)來定義一些模型。一個 nn.Module 包括層和一個方法 forward(input) 它會返回輸出(output)。
-
例如,看一下數字圖片識別的網絡:
-
這是一個簡單的前饋神經網絡,它接收輸入,讓輸入一個接着一個的通過一些層,最後給出輸出。
-
一個典型的神經網絡訓練過程包括以下幾點:
- 1.定義一個包含可訓練參數的神經網絡
- 2.迭代整個輸入
- 3.通過神經網絡處理輸入
- 4.計算損失(loss)
- 5.反向傳播梯度到神經網絡的參數
- 6.更新網絡的參數,典型的用一個簡單的更新方法:weight = weight - learning_rate *gradient
2、神經網絡各參數意義
3、定義神經網絡
# -*- coding: utf-8 -*-
"""
Neural Networks
===============
Neural networks can be constructed using the ``torch.nn`` package.
Now that you had a glimpse of ``autograd``, ``nn`` depends on
``autograd`` to define models and differentiate them.
An ``nn.Module`` contains layers, and a method ``forward(input)`` that
returns the ``output``.
For example, look at this network that classifies digit images:
.. figure:: /_static/img/mnist.png
:alt: convnet
convnet
It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.
A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or
weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
``weight = weight - learning_rate * gradient``
Define the network
------------------
Let’s define this network:
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
print('---------------------')
########################################################################
# You just have to define the ``forward`` function, and the ``backward``
# function (where gradients are computed) is automatically defined for you
# using ``autograd``.
# You can use any of the Tensor operations in the ``forward`` function.
#
# The learnable parameters of a model are returned by ``net.parameters()``
# 一個模型可訓練的參數可以通過調用 net.parameters() 返回:
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
print('---------------------')
########################################################################
# Let try a random 32x32 input
# Note: Expected input size to this net(LeNet) is 32x32. To use this net on
# MNIST dataset, please resize the images from the dataset to 32x32.
# 讓我們嘗試隨機生成一個 32x32 的輸入。注意:期望的輸入維度是 32x32 。
# 爲了使用這個網絡在 MNIST 數據及上,你需要把數據集中的圖片維度修改爲 32x32。
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
print('---------------------')
########################################################################
# Zero the gradient buffers of all parameters and backprops with random
# gradients:
# 把所有參數梯度緩存器置零,用隨機的梯度來反向傳播
net.zero_grad()
out.backward(torch.randn(1, 10))
print('---------------------')
########################################################################
# .. note::
#
# ``torch.nn`` only supports mini-batches. The entire ``torch.nn``
# package only supports inputs that are a mini-batch of samples, and not
# a single sample.
#
# For example, ``nn.Conv2d`` will take in a 4D Tensor of
# ``nSamples x nChannels x Height x Width``.
#
# If you have a single sample, just use ``input.unsqueeze(0)`` to add
# a fake batch dimension.
#
# Before proceeding further, let's recap all the classes you’ve seen so far.
#
# **Recap:**
# - ``torch.Tensor`` - A *multi-dimensional array* with support for autograd
# operations like ``backward()``. Also *holds the gradient* w.r.t. the
# tensor.
# - ``nn.Module`` - Neural network module. *Convenient way of
# encapsulating parameters*, with helpers for moving them to GPU,
# exporting, loading, etc.
# - ``nn.Parameter`` - A kind of Tensor, that is *automatically
# registered as a parameter when assigned as an attribute to a*
# ``Module``.
# - ``autograd.Function`` - Implements *forward and backward definitions
# of an autograd operation*. Every ``Tensor`` operation, creates at
# least a single ``Function`` node, that connects to functions that
# created a ``Tensor`` and *encodes its history*.
#
# **At this point, we covered:**
# - Defining a neural network
# - Processing inputs and calling backward
#
# **Still Left:**
# - Computing the loss
# - Updating the weights of the network
#
# Loss Function
# -------------
# A loss function takes the (output, target) pair of inputs, and computes a
# value that estimates how far away the output is from the target.
#
# There are several different
# `loss functions <https://pytorch.org/docs/nn.html#loss-functions>`_ under the
# nn package .
# A simple loss is: ``nn.MSELoss`` which computes the mean-squared error
# between the input and the target.
#
# For example:
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
print('---------------------')
########################################################################
# Now, if you follow ``loss`` in the backward direction, using its
# ``.grad_fn`` attribute, you will see a graph of computations that looks
# like this:
#
# ::
#
# input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
# -> view -> linear -> relu -> linear -> relu -> linear
# -> MSELoss
# -> loss
#
# So, when we call ``loss.backward()``, the whole graph is differentiated
# w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``
# will have their ``.grad`` Tensor accumulated with the gradient.
#
# For illustration, let us follow a few steps backward:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
print('---------------------')
########################################################################
# Backprop
# --------
# To backpropagate the error all we have to do is to ``loss.backward()``.
# You need to clear the existing gradients though, else gradients will be
# accumulated to existing gradients.
#
#
# Now we shall call ``loss.backward()``, and have a look at conv1's bias
# gradients before and after the backward.
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
print('---------------------')
########################################################################
# Now, we have seen how to use loss functions.
#
# **Read Later:**
#
# The neural network package contains various modules and loss functions
# that form the building blocks of deep neural networks. A full list with
# documentation is `here <https://pytorch.org/docs/nn>`_.
#
# **The only thing left to learn is:**
#
# - Updating the weights of the network
#
# Update the weights
# ------------------
# The simplest update rule used in practice is the Stochastic Gradient
# Descent (SGD):
#
# ``weight = weight - learning_rate * gradient``
#
# We can implement this using simple python code:
#
# .. code:: python
#
# learning_rate = 0.01
# for f in net.parameters():
# f.data.sub_(f.grad.data * learning_rate)
#
# However, as you use neural networks, you want to use various different
# update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
# To enable this, we built a small package: ``torch.optim`` that
# implements all these methods. Using it is very simple:
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
print('---------------------')
###############################################################
# .. Note::
#
# Observe how gradient buffers had to be manually set to zero using
# ``optimizer.zero_grad()``. This is because gradients are accumulated
# as explained in `Backprop`_ section.
console:
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
---------------------
10
torch.Size([6, 1, 5, 5])
---------------------
tensor([[-0.0588, -0.0427, -0.1616, 0.0437, 0.0163, 0.0543, -0.1478, -0.0592,
-0.0509, 0.0549]], grad_fn=<AddmmBackward>)
---------------------
---------------------
tensor(0.4980, grad_fn=<MseLossBackward>)
---------------------
<MseLossBackward object at 0x0000024FE0C6A8D0>
<AddmmBackward object at 0x0000024F8313D4A8>
<AccumulateGrad object at 0x0000024FE0C6A8D0>
---------------------
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03, 1.1338e-02, 1.7925e-03, -6.9680e-07, 9.8157e-03,
2.1737e-03])
---------------------
---------------------
你剛定義了一個前饋函數,然後反向傳播函數被自動通過 autograd 定義了。你可以使用任何張量操作在前饋函數上。
4、損失函數
- 一個損失函數需要一對輸入:模型輸出和目標,然後計算一個值來評估輸出距離目標有多遠。
- 有一些不同的損失函數在 nn 包中。一個簡單的損失函數就是 nn.MSELoss ,這計算了輸入與目標的均方誤差。
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
console:
tensor(0.6660, grad_fn=<MseLossBackward>)
- 現在,如果你跟隨損失到反向傳播路徑,可以使用它的 .grad_fn 屬性,你將會看到一個這樣的計算圖:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
- 所以,當我們調用 loss.backward(),整個圖都會微分,而且所有的在圖中的requires_grad=True 的張量將會讓他們的 grad 張量累計梯度。
- 爲了演示,我們將跟隨以下步驟來反向傳播。
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
console:
<MseLossBackward object at 0x0000019183144C18>
<AddmmBackward object at 0x0000019183144D30>
<AccumulateGrad object at 0x0000019183144D30>
5、反向傳播
- 爲了實現反向傳播損失,我們所有需要做的事情僅僅是使用 loss.backward()。你需要清空現存的梯度,要不然帝都將會和現存的梯度累計到一起。
- 現在我們調用 loss.backward() ,然後看一下 con1 的偏置項在反向傳播之前和之後的變化。
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
console:
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03, 1.1338e-02, 1.7925e-03, -6.9680e-07, 9.8157e-03,
2.1737e-03])
- 現在我們看到了,如何使用損失函數。唯一剩下的事情就是更新神經網絡的參數。更新神經網絡參數:最簡單的更新規則就是隨機梯度下降。
weight = weight - learning_rate * gradient
- 我們可以使用 python 來實現這個規則:
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
- 儘管如此,如果你是用神經網絡,你想使用不同的更新規則,類似於 SGD, Nesterov-SGD, Adam, RMSProp, 等。爲了讓這可行,我們建立了一個小包:torch.optim 實現了所有的方法。使用它非常的簡單。
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
四、CIFAR10圖像分類
1、簡介
-
通常來說,當你處理圖像,文本,語音或者視頻數據時,你可以使用標準 python 包將數據加載成 numpy 數組格式,然後將這個數組轉換成 torch.*Tensor
- 對於圖像,可以用 Pillow,OpenCV
- 對於語音,可以用 scipy,librosa
- 對於文本,可以直接用 Python 或 Cython 基礎數據加載模塊,或者用 NLTK 和 SpaCy
-
特別是對於視覺,我們已經創建了一個叫做 totchvision 的包,該包含有支持加載類似Imagenet,CIFAR10,MNIST 等公共數據集的數據加載模塊 torchvision.datasets 和支持加載圖像數據數據轉換模塊 torch.utils.data.DataLoader。
-
這提供了極大的便利,並且避免了編寫“樣板代碼”。
-
對於本教程,我們將使用CIFAR10數據集,它包含十個類別:‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’。CIFAR-10 中的圖像尺寸爲33232,也就是RGB的3層顏色通道,每層通道內的尺寸爲32*32。
2、函數介紹
- make_grid的作用是將若干幅圖像拼成一幅圖像。其中padding的作用就是子圖像與子圖像之間的pad有多寬。
3、訓練一個圖像分類器
- 我們將按次序的做如下幾步:
- 使用torchvision加載並且歸一化CIFAR10的訓練和測試數據集
- 定義一個卷積神經網絡
- 定義一個損失函數
- 在訓練樣本數據上訓練網絡
- 在測試樣本數據上測試網絡
整體神經網絡
# -*- coding: utf-8 -*-
"""
Training a Classifier
=====================
This is it. You have seen how to define neural networks, compute loss and make
updates to the weights of the network.
Now you might be thinking,
What about data?
----------------
Generally, when you have to deal with image, text, audio or video data,
you can use standard python packages that load data into a numpy array.
Then you can convert this array into a ``torch.*Tensor``.
- For images, packages such as Pillow, OpenCV are useful
- For audio, packages such as scipy and librosa
- For text, either raw Python or Cython based loading, or NLTK and
SpaCy are useful
Specifically for vision, we have created a package called
``torchvision``, that has data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data.DataLoader``.
This provides a huge convenience and avoids writing boilerplate code.
For this tutorial, we will use the CIFAR10 dataset.
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.
.. figure:: /_static/img/cifar10.png
:alt: cifar10
cifar10
Training an image classifier
----------------------------
We will do the following steps in order:
1. Load and normalizing the CIFAR10 training and test datasets using
``torchvision``
2. Define a Convolutional Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data
1. Loading and normalizing CIFAR10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using ``torchvision``, it’s extremely easy to load CIFAR10.
"""
import torch
import torchvision
import torchvision.transforms as transforms
########################################################################
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
########################################################################
# Let us show some of the training images, for fun.
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
########################################################################
# 2. Define a Convolutional Neural Network
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Copy the neural network from the Neural Networks section before and modify it to
# take 3-channel images (instead of 1-channel images as it was defined).
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
########################################################################
# 3. Define a Loss function and optimizer
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Let's use a Classification Cross-Entropy loss and SGD with momentum.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
########################################################################
# 4. Train the network
# ^^^^^^^^^^^^^^^^^^^^
#
# This is when things start to get interesting.
# We simply have to loop over our data iterator, and feed the inputs to the
# network and optimize.
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
########################################################################
# 5. Test the network on the test data
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# We have trained the network for 2 passes over the training dataset.
# But we need to check if the network has learnt anything at all.
#
# We will check this by predicting the class label that the neural network
# outputs, and checking it against the ground-truth. If the prediction is
# correct, we add the sample to the list of correct predictions.
#
# Okay, first step. Let us display an image from the test set to get familiar.
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
########################################################################
# Okay, now let us see what the neural network thinks these examples above are:
outputs = net(images)
########################################################################
# The outputs are energies for the 10 classes.
# Higher the energy for a class, the more the network
# thinks that the image is of the particular class.
# So, let's get the index of the highest energy:
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
########################################################################
# The results seem pretty good.
#
# Let us look at how the network performs on the whole dataset.
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
########################################################################
# That looks waaay better than chance, which is 10% accuracy (randomly picking
# a class out of 10 classes).
# Seems like the network learnt something.
#
# Hmmm, what are the classes that performed well, and the classes that did
# not perform well:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
########################################################################
# Okay, so what next?
#
# How do we run these neural networks on the GPU?
#
# Training on GPU
# ----------------
# Just like how you transfer a Tensor on to the GPU, you transfer the neural
# net onto the GPU.
#
# Let's first define our device as the first visible cuda device if we have
# CUDA available:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print(device)
########################################################################
# The rest of this section assumes that `device` is a CUDA device.
#
# Then these methods will recursively go over all modules and convert their
# parameters and buffers to CUDA tensors:
#
# .. code:: python
#
# net.to(device)
#
#
# Remember that you will have to send the inputs and targets at every step
# to the GPU too:
#
# .. code:: python
#
# inputs, labels = inputs.to(device), labels.to(device)
#
# Why dont I notice MASSIVE speedup compared to CPU? Because your network
# is realllly small.
#
# **Exercise:** Try increasing the width of your network (argument 2 of
# the first ``nn.Conv2d``, and argument 1 of the second ``nn.Conv2d`` –
# they need to be the same number), see what kind of speedup you get.
#
# **Goals achieved**:
#
# - Understanding PyTorch's Tensor library and neural networks at a high level.
# - Train a small neural network to classify images
#
# Training on multiple GPUs
# -------------------------
# If you want to see even more MASSIVE speedup using all of your GPUs,
# please check out :doc:`data_parallel_tutorial`.
#
# Where do I go next?
# -------------------
#
# - :doc:`Train neural nets to play video games </intermediate/reinforcement_q_learning>`
# - `Train a state-of-the-art ResNet network on imagenet`_
# - `Train a face generator using Generative Adversarial Networks`_
# - `Train a word-level language model using Recurrent LSTM networks`_
# - `More examples`_
# - `More tutorials`_
# - `Discuss PyTorch on the Forums`_
# - `Chat with other users on Slack`_
#
# .. _Train a state-of-the-art ResNet network on imagenet: https://github.com/pytorch/examples/tree/master/imagenet
# .. _Train a face generator using Generative Adversarial Networks: https://github.com/pytorch/examples/tree/master/dcgan
# .. _Train a word-level language model using Recurrent LSTM networks: https://github.com/pytorch/examples/tree/master/word_language_model
# .. _More examples: https://github.com/pytorch/examples
# .. _More tutorials: https://github.com/pytorch/tutorials
# .. _Discuss PyTorch on the Forums: https://discuss.pytorch.org/
# .. _Chat with other users on Slack: https://pytorch.slack.com/messages/beginner/
console:
Files already downloaded and verified
Files already downloaded and verified
plane cat deer horse
[1, 2000] loss: 2.286
[1, 4000] loss: 1.921
[1, 6000] loss: 1.731
[1, 8000] loss: 1.616
[1, 10000] loss: 1.568
[1, 12000] loss: 1.484
[2, 2000] loss: 1.408
[2, 4000] loss: 1.382
[2, 6000] loss: 1.345
[2, 8000] loss: 1.322
[2, 10000] loss: 1.304
[2, 12000] loss: 1.271
Finished Training
GroundTruth: cat ship ship plane
Predicted: cat ship plane plane
Accuracy of the network on the 10000 test images: 55 %
Accuracy of plane : 63 %
Accuracy of car : 50 %
Accuracy of bird : 38 %
Accuracy of cat : 41 %
Accuracy of deer : 37 %
Accuracy of dog : 41 %
Accuracy of frog : 77 %
Accuracy of horse : 63 %
Accuracy of ship : 70 %
Accuracy of truck : 71 %
cuda:0
下載數據集
- 加載並歸一化 CIFAR10 使用 torchvision ,用它來加載 CIFAR10 數據非常簡單。
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
展示其中的一些訓練圖片
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
print(img)
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
print(npimg)
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images,nrow=2, padding=1))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
console:
plane dog car cat
定義一個卷積神經網絡 在這之前先 從神經網絡章節 複製神經網絡,並修改它爲3通道的圖片(在此之前它被定義爲1通道)
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
定義一個損失函數和優化器 讓我們使用分類交叉熵Cross-Entropy 作損失函數,動量SGD做優化器。
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
訓練網絡 這裏事情開始變得有趣,我們只需要在數據迭代器上循環傳給網絡和優化器 輸入就可以。
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
console:
[1, 2000] loss: 2.210
[1, 4000] loss: 1.856
[1, 6000] loss: 1.638
[1, 8000] loss: 1.578
[1, 10000] loss: 1.514
[1, 12000] loss: 1.471
[2, 2000] loss: 1.391
[2, 4000] loss: 1.380
[2, 6000] loss: 1.381
[2, 8000] loss: 1.333
[2, 10000] loss: 1.293
[2, 12000] loss: 1.299
Finished Training
4、測試數據集
- 在測試集上測試網絡 我們已經通過訓練數據集對網絡進行了2次訓練,但是我們需要檢查網絡是否已經學到了東西。我們將用神經網絡的輸出作爲預測的類標來檢查網絡的預測性能,用樣本的真實類標來校對。如果預測是正確的,我們將樣本添加到正確預測的列表裏。好的,第一步,讓我們從測試集中顯示一張圖像來熟悉它。
- torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)。按維度dim 返回最大值。torch.max)(a,0) 返回每一列中最大值的那個元素,且返回索引(返回最大元素在這一列的行索引)
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
####################################
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
console:
GroundTruth: cat ship ship plane
Predicted: cat ship plane plane
查看網絡整體的預測情況
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
console:
Accuracy of the network on the 10000 test images: 55 %
查看網絡對每一個類的預測情況
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
console:
Accuracy of plane : 63 %
Accuracy of car : 50 %
Accuracy of bird : 38 %
Accuracy of cat : 41 %
Accuracy of deer : 37 %
Accuracy of dog : 41 %
Accuracy of frog : 77 %
Accuracy of horse : 63 %
Accuracy of ship : 70 %
Accuracy of truck : 71 %
在gpu上訓練
- 記住你也必須在每一個步驟向GPU發送輸入和目標:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print(device)
net.to(device)
# 記住你也必須在每一個步驟向GPU發送輸入和目標:
inputs, labels = inputs.to(device), labels.to(device)
console:
cuda:0