(1) c1_cpu和gpu速度對比.py

import time
import torch


a = torch.randn(10000, 1000)
b = torch.randn(1000, 2000)

t0 = time.time()
c = torch.matmul(a, b)
t1 = time.time()
print(a.device, t1 - t0, c.norm(2))

device = torch.device('cuda')
a = a.to(device)
b = b.to(device)

# 第一次在cuda上運行時,沒有完成一些環境的初始化,因此會花費一定的時間
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))

# 第二次運行就是正常的gpu加速後的速度
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))

<function is_available at 0x0000014EA22A72F0>
cpu 0.484722375869751 tensor(141163.7188)
cuda:0 2.04582142829895 tensor(141564.8281, device='cuda:0')
cuda:0 0.007996797561645508 tensor(141564.8281, device='cuda:0')

(2) c2_autograd.py — pytorch求導

import torch

x = torch.tensor(1.)
# requires_grad=True 告訴pytorch需要對a,b,c求導
a = torch.tensor(1., requires_grad=True)
b = torch.tensor(2., requires_grad=True)
c = torch.tensor(3., requires_grad=True)

y = a ** 2 * x + b * x + c

print('before: ', a.grad, b.grad, c.grad)
# 使用pytorch對y分別對a,b,c求導
grads = torch.autograd.grad(y, [a, b, c])
print('after: ', grads[0], grads[1], grads[2])

before:  None None None
after:  tensor(2.) tensor(1.) tensor(1.)


Tensors (張量)

Tensors 類似於 NumPy 的 ndarrays ,同時 Tensors 可以使用 GPU 進行計算。

# 其實這句函數之後,即使在低版本的python2.X,當使用print函數時,
# 須python3.X那樣加括號使用。tips:python2.X中print不需要括號,
# 而在python3.X中則需要。
from __future__ import print_function
import torch


x = torch.empty(5, 3)

# 輸出:
tensor(1.00000e-04 *
       [[-0.0000,  0.0000,  1.5135],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000]])


x = torch.rand(5, 3)

tensor([[ 0.6291,  0.2581,  0.6414],
        [ 0.9739,  0.8243,  0.2276],
        [ 0.4184,  0.1815,  0.5131],
        [ 0.5533,  0.5440,  0.0718],
        [ 0.2908,  0.1850,  0.5297]])

構造一個矩陣全爲 0,而且數據類型是 long.

x = torch.zeros(5, 3, dtype=torch.long)

tensor([[ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]])


x = torch.tensor([5.5, 3])

tensor([ 5.5000,  3.0000])

創建一個 tensor 基於已經存在的 tensor。

x = x.new_ones(5, 3, dtype=torch.double)      
# new_* methods take in sizes

x = torch.randn_like(x, dtype=torch.float)    
# override dtype!重寫數據類型
# result has the same size 最後的矩陣同樣大小

tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]], dtype=torch.float64)
tensor([[-0.2183,  0.4477, -0.4053],
        [ 1.7353, -0.0048,  1.2177],
        [-1.1111,  1.0878,  0.9722],
        [-0.7771, -0.2174,  0.0412],
        [-2.1750,  1.3609, -0.3322]])



torch.Size([5, 3])

注意: torch.Size 是一個元組,所以它支持左右的元組操作。

加法: 方式 1

y = torch.rand(5, 3)
print(x + y)

tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: 方式2

print(torch.add(x, y))

tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: 提供一個輸出 tensor 作爲參數

result = torch.empty(5, 3)
# 將輸出結果賦值給result
torch.add(x, y, out=result)

tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: in-place

# adds x to y

tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

注意: 任何使張量會發生變化的操作都有一個前綴 ‘’。例如:x.copy(y), x.t_(), 將會改變 x.

可以使用標準的 NumPy 類似的索引操作

print(x[:, 1])

tensor([ 0.4477, -0.0048,  1.0878, -0.2174,  1.3609])

改變大小:如果你想改變一個 tensor 的大小或者形狀,你可以使用 torch.view:

x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

如果你有一個元素 tensor ,使用 .item() 來獲得這個 value 。

x = torch.randn(1)

tensor([ 0.9422])


  • torch的tensor和numpy的array會共享他們的存儲空間,修改一個會導致另一個也被修改
  • cpu上除了CharTensor,都支持轉換爲與NumPy之間相互轉換
  • 將一個torch tensor 轉化爲numpy array
# 將torch的張量轉換爲numpy的數組
a = torch.ones(5)
b = a.numpy()

# 此處演示當修改numpy數組之後,與之相關聯的tensor也會相應被修改

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]
  • 將numpy數組自動更改爲torch tensor
import numpy as np
import torch

a = np.ones(5)
b = torch.Tensor(a)
np.add(a, 1, out=a)

[2. 2. 2. 2. 2.]
tensor([1., 1., 1., 1., 1.])
  • tensors通過.to函數可以被移動到任何設備
if torch.cuda.is_available():
    x = torch.rand(5, 5)
    # 一個cuda設備對象
    device = torch.device("cuda")
    # 直接而在gpu上創建tensor
    y = torch.ones_like(x, device=device)
    # 或者直接用字符串.to("cuda")
    x = x.to(device)
    z = x + y
    print(z.to("cpu", torch.double))
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
        [1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
        [1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
        [1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
        [1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], device='cuda:0')
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
        [1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
        [1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
        [1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
        [1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], dtype=torch.float64)



  • autograd 包是 PyTorch 中所有神經網絡的核心。首先讓我們簡要地介紹它,然後我們將會去訓練我們的第一個神經網絡。該 autograd 軟件包爲 Tensors 上的所有操作提供自動微分。它是一個由運行定義的框架,這意味着以代碼運行方式定義你的後向傳播,並且每次迭代都可以不同。我們從 tensor 和 gradients 來舉一些例子。
    • torch.Tensor 是包的核心類。如果將其屬性 .requires_grad 設置爲 True,則會開始跟蹤針對 tensor 的所有操作。完成計算後,您可以調用 .backward() 來自動計算所有梯度。該張量的梯度將累積到 .grad 屬性中。
    • 要停止 tensor 歷史記錄的跟蹤,您可以調用 .detach(),它將其與計算曆史記錄分離,並防止將來的計算被跟蹤。
    • 要停止跟蹤歷史記錄(和使用內存),您還可以將代碼塊使用 with torch.no_grad(): 包裝起來。在評估模型時,這是特別有用,因爲模型在訓練階段具有 requires_grad = True 的可訓練參數有利於調參,但在評估階段我們不需要梯度。
    • 還有一個類對於 autograd 實現非常重要那就是 Function。Tensor 和 Function 互相連接並構建一個非循環圖,它保存整個完整的計算過程的歷史信息。每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用,(如果用戶自己創建張量,則grad_fn 是 None )。
    • 如果你想計算導數,你可以調用 Tensor.backward()。如果 Tensor 是標量(即它包含一個元素數據),則不需要指定任何參數backward(),但是如果它有更多元素,則需要指定一個gradient 參數來指定張量的形狀。


  • 跟蹤Tensor上的所有操作:設置屬性requires_grad=True
  • 自動計算所有梯度:調用.backward()
  • 停止跟蹤Tensor:
    • 方式一:調用detach()
    • 方式二:代碼塊:with torch.no grad

創建一個張量,設置 requires_grad=True 來跟蹤與它相關的計算

x = torch.ones(2, 2, requires_grad=True)
# 針對張量做一個操作
y = x + 2
# y 作爲操作的結果被創建,所以它有 grad_fn
# 針對張量做更多操作
z = y * y * 3
out = z.mean()
print(z, out)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
# 每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>) <AddBackward0 object at 0x000001989F20D7B8>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

.requires_grad_( … ) 會改變張量的 requires_grad 標記。

  • 如果沒有提供相應的參數,輸入的標記默認爲 False。
import torch

a = torch.randn(2, 2)
a = ((a * .3) / (a - 1))
b = (a * a).sum()

<SumBackward0 object at 0x000001B498D8D7B8>


  • backward函數是反向傳播的入口點,在需要被求導的節點上調用backward函數會計算梯度值到相應的節點上。backward需要一個重要的參數grad_tensor,但如果節點只含有一個標量值,這個參數就可以忽略。我們現在後向傳播,因爲輸出包含了一個標量,所以out.backward() 等同於 out.backward(torch.tensor(1.)), 否者就會報如下錯誤:
backward should be called only on a scalar (i.e, 1-element tensor) or with gradient w.r.t the variable
import torch
x = torch.ones(2, 2, requires_grad=True)
# 針對張量做一個操作
y = x + 2
# y 作爲操作的結果被創建,所以它有 grad_fn
# 針對張量做更多操作
z = y * y * 3
out = z.mean()



tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000210E6595080>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
  • 原理解釋:
  • 英文版解釋
  • 中文版解釋:


import torch
# .requires_grad=True 的張量自動求導。
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
# 現在在這種情況下,y 不再是一個標量。torch.autograd 不能夠直接計算整個雅可比,
# 但是如果我們只想要雅可比向量積,只需要簡單的傳遞向量給 backward 作爲參數。
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
# 你可以通過將代碼包裹在 with torch.no_grad(),來停止對從跟蹤歷史中的
# .requires_grad=True 的張量自動求導。
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)


tensor([ 0.2052,  0.6057, -0.6355], requires_grad=True)
tensor([  420.3014,  1240.4666, -1301.4670], grad_fn=<MulBackward0>)
tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])



  • 神經網絡可以通過 torch.nn 包來構建。

  • 現在對於自動梯度(autograd)有一些瞭解,神經網絡是基於自動梯度 (autograd)來定義一些模型。一個 nn.Module 包括層和一個方法 forward(input) 它會返回輸出(output)。

  • 例如,看一下數字圖片識別的網絡:

  • 這是一個簡單的前饋神經網絡,它接收輸入,讓輸入一個接着一個的通過一些層,最後給出輸出。

  • 一個典型的神經網絡訓練過程包括以下幾點:

    • 1.定義一個包含可訓練參數的神經網絡
    • 2.迭代整個輸入
    • 3.通過神經網絡處理輸入
    • 4.計算損失(loss)
    • 5.反向傳播梯度到神經網絡的參數
    • 6.更新網絡的參數,典型的用一個簡單的更新方法:weight = weight - learning_rate *gradient




# -*- coding: utf-8 -*-
Neural Networks

Neural networks can be constructed using the ``torch.nn`` package.

Now that you had a glimpse of ``autograd``, ``nn`` depends on
``autograd`` to define models and differentiate them.
An ``nn.Module`` contains layers, and a method ``forward(input)`` that
returns the ``output``.

For example, look at this network that classifies digit images:

.. figure:: /_static/img/mnist.png
   :alt: convnet


It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  ``weight = weight - learning_rate * gradient``

Define the network

Let’s define this network:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()

# You just have to define the ``forward`` function, and the ``backward``
# function (where gradients are computed) is automatically defined for you
# using ``autograd``.
# You can use any of the Tensor operations in the ``forward`` function.
# The learnable parameters of a model are returned by ``net.parameters()``

# 一個模型可訓練的參數可以通過調用 net.parameters() 返回:
params = list(net.parameters())
print(params[0].size())  # conv1's .weight

# Let try a random 32x32 input
# Note: Expected input size to this net(LeNet) is 32x32. To use this net on
# MNIST dataset, please resize the images from the dataset to 32x32.

# 讓我們嘗試隨機生成一個 32x32 的輸入。注意:期望的輸入維度是 32x32 。
# 爲了使用這個網絡在 MNIST 數據及上,你需要把數據集中的圖片維度修改爲 32x32。
input = torch.randn(1, 1, 32, 32)
out = net(input)

# Zero the gradient buffers of all parameters and backprops with random
# gradients:

# 把所有參數梯度緩存器置零,用隨機的梯度來反向傳播
out.backward(torch.randn(1, 10))

# .. note::
#     ``torch.nn`` only supports mini-batches. The entire ``torch.nn``
#     package only supports inputs that are a mini-batch of samples, and not
#     a single sample.
#     For example, ``nn.Conv2d`` will take in a 4D Tensor of
#     ``nSamples x nChannels x Height x Width``.
#     If you have a single sample, just use ``input.unsqueeze(0)`` to add
#     a fake batch dimension.
# Before proceeding further, let's recap all the classes you’ve seen so far.
# **Recap:**
#   -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd
#      operations like ``backward()``. Also *holds the gradient* w.r.t. the
#      tensor.
#   -  ``nn.Module`` - Neural network module. *Convenient way of
#      encapsulating parameters*, with helpers for moving them to GPU,
#      exporting, loading, etc.
#   -  ``nn.Parameter`` - A kind of Tensor, that is *automatically
#      registered as a parameter when assigned as an attribute to a*
#      ``Module``.
#   -  ``autograd.Function`` - Implements *forward and backward definitions
#      of an autograd operation*. Every ``Tensor`` operation, creates at
#      least a single ``Function`` node, that connects to functions that
#      created a ``Tensor`` and *encodes its history*.
# **At this point, we covered:**
#   -  Defining a neural network
#   -  Processing inputs and calling backward
# **Still Left:**
#   -  Computing the loss
#   -  Updating the weights of the network
# Loss Function
# -------------
# A loss function takes the (output, target) pair of inputs, and computes a
# value that estimates how far away the output is from the target.
# There are several different
# `loss functions <https://pytorch.org/docs/nn.html#loss-functions>`_ under the
# nn package .
# A simple loss is: ``nn.MSELoss`` which computes the mean-squared error
# between the input and the target.
# For example:

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)

# Now, if you follow ``loss`` in the backward direction, using its
# ``.grad_fn`` attribute, you will see a graph of computations that looks
# like this:
# ::
#     input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
#           -> view -> linear -> relu -> linear -> relu -> linear
#           -> MSELoss
#           -> loss
# So, when we call ``loss.backward()``, the whole graph is differentiated
# w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``
# will have their ``.grad`` Tensor accumulated with the gradient.
# For illustration, let us follow a few steps backward:

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

# Backprop
# --------
# To backpropagate the error all we have to do is to ``loss.backward()``.
# You need to clear the existing gradients though, else gradients will be
# accumulated to existing gradients.
# Now we shall call ``loss.backward()``, and have a look at conv1's bias
# gradients before and after the backward.

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')


print('conv1.bias.grad after backward')

# Now, we have seen how to use loss functions.
# **Read Later:**
#   The neural network package contains various modules and loss functions
#   that form the building blocks of deep neural networks. A full list with
#   documentation is `here <https://pytorch.org/docs/nn>`_.
# **The only thing left to learn is:**
#   - Updating the weights of the network
# Update the weights
# ------------------
# The simplest update rule used in practice is the Stochastic Gradient
# Descent (SGD):
#      ``weight = weight - learning_rate * gradient``
# We can implement this using simple python code:
# .. code:: python
#     learning_rate = 0.01
#     for f in net.parameters():
#         f.data.sub_(f.grad.data * learning_rate)
# However, as you use neural networks, you want to use various different
# update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
# To enable this, we built a small package: ``torch.optim`` that
# implements all these methods. Using it is very simple:

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
optimizer.step()    # Does the update

# .. Note::
#       Observe how gradient buffers had to be manually set to zero using
#       ``optimizer.zero_grad()``. This is because gradients are accumulated
#       as explained in `Backprop`_ section.


  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
torch.Size([6, 1, 5, 5])
tensor([[-0.0588, -0.0427, -0.1616,  0.0437,  0.0163,  0.0543, -0.1478, -0.0592,
         -0.0509,  0.0549]], grad_fn=<AddmmBackward>)
tensor(0.4980, grad_fn=<MseLossBackward>)
<MseLossBackward object at 0x0000024FE0C6A8D0>
<AddmmBackward object at 0x0000024F8313D4A8>
<AccumulateGrad object at 0x0000024FE0C6A8D0>
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03,  1.1338e-02,  1.7925e-03, -6.9680e-07,  9.8157e-03,

你剛定義了一個前饋函數,然後反向傳播函數被自動通過 autograd 定義了。你可以使用任何張量操作在前饋函數上。


  • 一個損失函數需要一對輸入:模型輸出和目標,然後計算一個值來評估輸出距離目標有多遠。
  • 有一些不同的損失函數在 nn 包中。一個簡單的損失函數就是 nn.MSELoss ,這計算了輸入與目標的均方誤差。
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)


tensor(0.6660, grad_fn=<MseLossBackward>)
  • 現在,如果你跟隨損失到反向傳播路徑,可以使用它的 .grad_fn 屬性,你將會看到一個這樣的計算圖:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
  • 所以,當我們調用 loss.backward(),整個圖都會微分,而且所有的在圖中的requires_grad=True 的張量將會讓他們的 grad 張量累計梯度。
  • 爲了演示,我們將跟隨以下步驟來反向傳播。
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x0000019183144C18>
<AddmmBackward object at 0x0000019183144D30>
<AccumulateGrad object at 0x0000019183144D30>


  • 爲了實現反向傳播損失,我們所有需要做的事情僅僅是使用 loss.backward()。你需要清空現存的梯度,要不然帝都將會和現存的梯度累計到一起。
  • 現在我們調用 loss.backward() ,然後看一下 con1 的偏置項在反向傳播之前和之後的變化。
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')


print('conv1.bias.grad after backward')


conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03,  1.1338e-02,  1.7925e-03, -6.9680e-07,  9.8157e-03,
  • 現在我們看到了,如何使用損失函數。唯一剩下的事情就是更新神經網絡的參數。更新神經網絡參數:最簡單的更新規則就是隨機梯度下降。
weight = weight - learning_rate * gradient
  • 我們可以使用 python 來實現這個規則:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
  • 儘管如此,如果你是用神經網絡,你想使用不同的更新規則,類似於 SGD, Nesterov-SGD, Adam, RMSProp, 等。爲了讓這可行,我們建立了一個小包:torch.optim 實現了所有的方法。使用它非常的簡單。
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
optimizer.step()    # Does the update



  • 通常來說,當你處理圖像,文本,語音或者視頻數據時,你可以使用標準 python 包將數據加載成 numpy 數組格式,然後將這個數組轉換成 torch.*Tensor

    • 對於圖像,可以用 Pillow,OpenCV
    • 對於語音,可以用 scipy,librosa
    • 對於文本,可以直接用 Python 或 Cython 基礎數據加載模塊,或者用 NLTK 和 SpaCy
  • 特別是對於視覺,我們已經創建了一個叫做 totchvision 的包,該包含有支持加載類似Imagenet,CIFAR10,MNIST 等公共數據集的數據加載模塊 torchvision.datasets 和支持加載圖像數據數據轉換模塊 torch.utils.data.DataLoader。

  • 這提供了極大的便利,並且避免了編寫“樣板代碼”。

  • 對於本教程,我們將使用CIFAR10數據集,它包含十個類別:‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’。CIFAR-10 中的圖像尺寸爲33232,也就是RGB的3層顏色通道,每層通道內的尺寸爲32*32。



  • make_grid的作用是將若干幅圖像拼成一幅圖像。其中padding的作用就是子圖像與子圖像之間的pad有多寬。


  • 我們將按次序的做如下幾步:
    • 使用torchvision加載並且歸一化CIFAR10的訓練和測試數據集
    • 定義一個卷積神經網絡
    • 定義一個損失函數
    • 在訓練樣本數據上訓練網絡
    • 在測試樣本數據上測試網絡


# -*- coding: utf-8 -*-
Training a Classifier

This is it. You have seen how to define neural networks, compute loss and make
updates to the weights of the network.

Now you might be thinking,

What about data?

Generally, when you have to deal with image, text, audio or video data,
you can use standard python packages that load data into a numpy array.
Then you can convert this array into a ``torch.*Tensor``.

-  For images, packages such as Pillow, OpenCV are useful
-  For audio, packages such as scipy and librosa
-  For text, either raw Python or Cython based loading, or NLTK and
   SpaCy are useful

Specifically for vision, we have created a package called
``torchvision``, that has data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data.DataLoader``.

This provides a huge convenience and avoids writing boilerplate code.

For this tutorial, we will use the CIFAR10 dataset.
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

.. figure:: /_static/img/cifar10.png
   :alt: cifar10


Training an image classifier

We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using
2. Define a Convolutional Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

1. Loading and normalizing CIFAR10

Using ``torchvision``, it’s extremely easy to load CIFAR10.
import torch
import torchvision
import torchvision.transforms as transforms

# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# Let us show some of the training images, for fun.

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

# 2. Define a Convolutional Neural Network
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Copy the neural network from the Neural Networks section before and modify it to
# take 3-channel images (instead of 1-channel images as it was defined).

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

# 3. Define a Loss function and optimizer
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Let's use a Classification Cross-Entropy loss and SGD with momentum.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 4. Train the network
# ^^^^^^^^^^^^^^^^^^^^
# This is when things start to get interesting.
# We simply have to loop over our data iterator, and feed the inputs to the
# network and optimize.

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

# 5. Test the network on the test data
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# We have trained the network for 2 passes over the training dataset.
# But we need to check if the network has learnt anything at all.
# We will check this by predicting the class label that the neural network
# outputs, and checking it against the ground-truth. If the prediction is
# correct, we add the sample to the list of correct predictions.
# Okay, first step. Let us display an image from the test set to get familiar.

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

# Okay, now let us see what the neural network thinks these examples above are:

outputs = net(images)

# The outputs are energies for the 10 classes.
# Higher the energy for a class, the more the network
# thinks that the image is of the particular class.
# So, let's get the index of the highest energy:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

# The results seem pretty good.
# Let us look at how the network performs on the whole dataset.

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

# That looks waaay better than chance, which is 10% accuracy (randomly picking
# a class out of 10 classes).
# Seems like the network learnt something.
# Hmmm, what are the classes that performed well, and the classes that did
# not perform well:

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

# Okay, so what next?
# How do we run these neural networks on the GPU?
# Training on GPU
# ----------------
# Just like how you transfer a Tensor on to the GPU, you transfer the neural
# net onto the GPU.
# Let's first define our device as the first visible cuda device if we have
# CUDA available:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:


# The rest of this section assumes that `device` is a CUDA device.
# Then these methods will recursively go over all modules and convert their
# parameters and buffers to CUDA tensors:
# .. code:: python
#     net.to(device)
# Remember that you will have to send the inputs and targets at every step
# to the GPU too:
# .. code:: python
#         inputs, labels = inputs.to(device), labels.to(device)
# Why dont I notice MASSIVE speedup compared to CPU? Because your network
# is realllly small.
# **Exercise:** Try increasing the width of your network (argument 2 of
# the first ``nn.Conv2d``, and argument 1 of the second ``nn.Conv2d`` –
# they need to be the same number), see what kind of speedup you get.
# **Goals achieved**:
# - Understanding PyTorch's Tensor library and neural networks at a high level.
# - Train a small neural network to classify images
# Training on multiple GPUs
# -------------------------
# If you want to see even more MASSIVE speedup using all of your GPUs,
# please check out :doc:`data_parallel_tutorial`.
# Where do I go next?
# -------------------
# -  :doc:`Train neural nets to play video games </intermediate/reinforcement_q_learning>`
# -  `Train a state-of-the-art ResNet network on imagenet`_
# -  `Train a face generator using Generative Adversarial Networks`_
# -  `Train a word-level language model using Recurrent LSTM networks`_
# -  `More examples`_
# -  `More tutorials`_
# -  `Discuss PyTorch on the Forums`_
# -  `Chat with other users on Slack`_
# .. _Train a state-of-the-art ResNet network on imagenet: https://github.com/pytorch/examples/tree/master/imagenet
# .. _Train a face generator using Generative Adversarial Networks: https://github.com/pytorch/examples/tree/master/dcgan
# .. _Train a word-level language model using Recurrent LSTM networks: https://github.com/pytorch/examples/tree/master/word_language_model
# .. _More examples: https://github.com/pytorch/examples
# .. _More tutorials: https://github.com/pytorch/tutorials
# .. _Discuss PyTorch on the Forums: https://discuss.pytorch.org/
# .. _Chat with other users on Slack: https://pytorch.slack.com/messages/beginner/

Files already downloaded and verified
Files already downloaded and verified
plane   cat  deer horse
[1,  2000] loss: 2.286
[1,  4000] loss: 1.921
[1,  6000] loss: 1.731
[1,  8000] loss: 1.616
[1, 10000] loss: 1.568
[1, 12000] loss: 1.484
[2,  2000] loss: 1.408
[2,  4000] loss: 1.382
[2,  6000] loss: 1.345
[2,  8000] loss: 1.322
[2, 10000] loss: 1.304
[2, 12000] loss: 1.271
Finished Training
GroundTruth:    cat  ship  ship plane
Predicted:    cat  ship plane plane
Accuracy of the network on the 10000 test images: 55 %
Accuracy of plane : 63 %
Accuracy of   car : 50 %
Accuracy of  bird : 38 %
Accuracy of   cat : 41 %
Accuracy of  deer : 37 %
Accuracy of   dog : 41 %
Accuracy of  frog : 77 %
Accuracy of horse : 63 %
Accuracy of  ship : 70 %
Accuracy of truck : 71 %


  • 加載並歸一化 CIFAR10 使用 torchvision ,用它來加載 CIFAR10 數據非常簡單。
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


import matplotlib.pyplot as plt
import numpy as np

# functions to show an image

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images,nrow=2, padding=1))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

plane   dog   car   cat


定義一個卷積神經網絡 在這之前先 從神經網絡章節 複製神經網絡,並修改它爲3通道的圖片(在此之前它被定義爲1通道)

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

定義一個損失函數和優化器 讓我們使用分類交叉熵Cross-Entropy 作損失函數,動量SGD做優化器。

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

訓練網絡 這裏事情開始變得有趣,我們只需要在數據迭代器上循環傳給網絡和優化器 輸入就可以。

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')


[1,  2000] loss: 2.210
[1,  4000] loss: 1.856
[1,  6000] loss: 1.638
[1,  8000] loss: 1.578
[1, 10000] loss: 1.514
[1, 12000] loss: 1.471
[2,  2000] loss: 1.391
[2,  4000] loss: 1.380
[2,  6000] loss: 1.381
[2,  8000] loss: 1.333
[2, 10000] loss: 1.293
[2, 12000] loss: 1.299
Finished Training


  • 在測試集上測試網絡 我們已經通過訓練數據集對網絡進行了2次訓練,但是我們需要檢查網絡是否已經學到了東西。我們將用神經網絡的輸出作爲預測的類標來檢查網絡的預測性能,用樣本的真實類標來校對。如果預測是正確的,我們將樣本添加到正確預測的列表裏。好的,第一步,讓我們從測試集中顯示一張圖像來熟悉它。
  • torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)。按維度dim 返回最大值。torch.max)(a,0) 返回每一列中最大值的那個元素,且返回索引(返回最大元素在這一列的行索引)
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
outputs = net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

GroundTruth:    cat  ship  ship plane
Predicted:    cat  ship plane plane


correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
Accuracy of the network on the 10000 test images: 55 %


class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Accuracy of plane : 63 %
Accuracy of   car : 50 %
Accuracy of  bird : 38 %
Accuracy of   cat : 41 %
Accuracy of  deer : 37 %
Accuracy of   dog : 41 %
Accuracy of  frog : 77 %
Accuracy of horse : 63 %
Accuracy of  ship : 70 %
Accuracy of truck : 71 %


  • 記住你也必須在每一個步驟向GPU發送輸入和目標:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
# 記住你也必須在每一個步驟向GPU發送輸入和目標:
inputs, labels = inputs.to(device), labels.to(device)

