【pytorch】從入門到構建一個分類網絡超長超詳細教程（附代碼）

文章目錄

一、pytorch入門

1、資料鏈接

內含pytorch完整的視頻及資料

pytorch關於nlp的教程視頻和實戰：
鏈接：https://pan.baidu.com/s/1tbvLeUTPvxKy_gcVhiT8rw 
提取碼：hgkk 

pytorch教程：
		http://pytorchchina.com/
		https://pytorch-cn.readthedocs.io/zh/latest/
		http://pytorch123.com/

b站視頻鏈接：https://www.bilibili.com/video/av66421076

2、pytorch介紹

(1) c1_cpu和gpu速度對比.py

import time
import torch

print(torch.__version__)
print(torch.cuda.is_available)

a = torch.randn(10000, 1000)
b = torch.randn(1000, 2000)

t0 = time.time()
c = torch.matmul(a, b)
t1 = time.time()
print(a.device, t1 - t0, c.norm(2))

device = torch.device('cuda')
a = a.to(device)
b = b.to(device)

# 第一次在cuda上運行時，沒有完成一些環境的初始化，因此會花費一定的時間
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))

# 第二次運行就是正常的gpu加速後的速度
t0 = time.time()
c = torch.matmul(a, b)
t2 = time.time()
print(a.device, t2 - t0, c.norm(2))


console:
1.2.0
<function is_available at 0x0000014EA22A72F0>
cpu 0.484722375869751 tensor(141163.7188)
cuda:0 2.04582142829895 tensor(141564.8281, device='cuda:0')
cuda:0 0.007996797561645508 tensor(141564.8281, device='cuda:0')

(2) c2_autograd.py — pytorch求導

import torch

x = torch.tensor(1.)
# requires_grad=True 告訴pytorch需要對a,b,c求導
a = torch.tensor(1., requires_grad=True)
b = torch.tensor(2., requires_grad=True)
c = torch.tensor(3., requires_grad=True)

y = a ** 2 * x + b * x + c

print('before: ', a.grad, b.grad, c.grad)
# 使用pytorch對y分別對a,b,c求導
grads = torch.autograd.grad(y, [a, b, c])
print('after: ', grads[0], grads[1], grads[2])


console:
before:  None None None
after:  tensor(2.) tensor(1.) tensor(1.)

3、張量

Tensors (張量)

Tensors 類似於 NumPy 的 ndarrays ，同時 Tensors 可以使用 GPU 進行計算。

# 其實這句函數之後，即使在低版本的python2.X，當使用print函數時，
# 須python3.X那樣加括號使用。tips：python2.X中print不需要括號，
# 而在python3.X中則需要。
from __future__ import print_function
import torch

構造一個5×3矩陣，不初始化。

x = torch.empty(5, 3)
print(x)

console:
# 輸出:
tensor(1.00000e-04 *
       [[-0.0000,  0.0000,  1.5135],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000]])

構造一個隨機初始化的矩陣：

x = torch.rand(5, 3)
print(x)

console:
tensor([[ 0.6291,  0.2581,  0.6414],
        [ 0.9739,  0.8243,  0.2276],
        [ 0.4184,  0.1815,  0.5131],
        [ 0.5533,  0.5440,  0.0718],
        [ 0.2908,  0.1850,  0.5297]])

構造一個矩陣全爲 0，而且數據類型是 long.

x = torch.zeros(5, 3, dtype=torch.long)
print(x)

console:
tensor([[ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]])

構造一個張量，直接使用數據：

x = torch.tensor([5.5, 3])
print(x)

console:
tensor([ 5.5000,  3.0000])

創建一個 tensor 基於已經存在的 tensor。

x = x.new_ones(5, 3, dtype=torch.double)      
# new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    
# override dtype!重寫數據類型
print(x)                                      
# result has the same size 最後的矩陣同樣大小

console:
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]], dtype=torch.float64)
tensor([[-0.2183,  0.4477, -0.4053],
        [ 1.7353, -0.0048,  1.2177],
        [-1.1111,  1.0878,  0.9722],
        [-0.7771, -0.2174,  0.0412],
        [-2.1750,  1.3609, -0.3322]])

獲取它的維度信息:

print(x.size())

console:
torch.Size([5, 3])

注意: torch.Size 是一個元組，所以它支持左右的元組操作。

加法: 方式 1

y = torch.rand(5, 3)
print(x + y)

console:
tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: 方式2

print(torch.add(x, y))

console:
tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: 提供一個輸出 tensor 作爲參數

result = torch.empty(5, 3)
# 將輸出結果賦值給result
torch.add(x, y, out=result)
print(result)

console:
tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

加法: in-place

# adds x to y
y.add_(x)
print(y)

console:
tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

注意: 任何使張量會發生變化的操作都有一個前綴 ‘’。例如：x.copy(y), x.t_(), 將會改變 x.

可以使用標準的 NumPy 類似的索引操作

print(x[:, 1])

console:
tensor([ 0.4477, -0.0048,  1.0878, -0.2174,  1.3609])

改變大小：如果你想改變一個 tensor 的大小或者形狀，你可以使用 torch.view:

x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

console:
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

如果你有一個元素 tensor ，使用 .item() 來獲得這個 value 。

x = torch.randn(1)
print(x)
print(x.item())

console:
tensor([ 0.9422])
0.9422121644020081

numpy和tensor之間相互轉換

torch的tensor和numpy的array會共享他們的存儲空間，修改一個會導致另一個也被修改
cpu上除了CharTensor，都支持轉換爲與NumPy之間相互轉換
將一個torch tensor 轉化爲numpy array

# 將torch的張量轉換爲numpy的數組
a = torch.ones(5)
b = a.numpy()
print(a)
print(b)

# 此處演示當修改numpy數組之後，與之相關聯的tensor也會相應被修改
a.add_(1)
print(a)
print(b)

console:
tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]

將numpy數組自動更改爲torch tensor

import numpy as np
import torch

a = np.ones(5)
b = torch.Tensor(a)
np.add(a, 1, out=a)
print(a)
print(b)

console:
[2. 2. 2. 2. 2.]
tensor([1., 1., 1., 1., 1.])

tensors通過.to函數可以被移動到任何設備

if torch.cuda.is_available():
    x = torch.rand(5, 5)
    # 一個cuda設備對象
    device = torch.device("cuda")
    # 直接而在gpu上創建tensor
    y = torch.ones_like(x, device=device)
    # 或者直接用字符串.to("cuda")
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    
console:
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
        [1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
        [1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
        [1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
        [1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], device='cuda:0')
tensor([[1.4933, 1.9654, 1.8140, 1.1782, 1.9465],
        [1.4439, 1.9591, 1.0066, 1.6454, 1.2359],
        [1.2481, 1.5360, 1.9592, 1.3101, 1.6361],
        [1.0741, 1.6382, 1.2640, 1.9733, 1.7078],
        [1.8020, 1.4749, 1.4589, 1.8869, 1.2460]], dtype=torch.float64)

二、Autograd自動微分

1、簡介

autograd 包是 PyTorch 中所有神經網絡的核心。首先讓我們簡要地介紹它，然後我們將會去訓練我們的第一個神經網絡。該 autograd 軟件包爲 Tensors 上的所有操作提供自動微分。它是一個由運行定義的框架，這意味着以代碼運行方式定義你的後向傳播，並且每次迭代都可以不同。我們從 tensor 和 gradients 來舉一些例子。
TENSOR
- torch.Tensor 是包的核心類。如果將其屬性 .requires_grad 設置爲 True，則會開始跟蹤針對 tensor 的所有操作。完成計算後，您可以調用 .backward() 來自動計算所有梯度。該張量的梯度將累積到 .grad 屬性中。
- 要停止 tensor 歷史記錄的跟蹤，您可以調用 .detach()，它將其與計算曆史記錄分離，並防止將來的計算被跟蹤。
- 要停止跟蹤歷史記錄（和使用內存），您還可以將代碼塊使用 with torch.no_grad(): 包裝起來。在評估模型時，這是特別有用，因爲模型在訓練階段具有 requires_grad = True 的可訓練參數有利於調參，但在評估階段我們不需要梯度。
- 還有一個類對於 autograd 實現非常重要那就是 Function。Tensor 和 Function 互相連接並構建一個非循環圖，它保存整個完整的計算過程的歷史信息。每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用，（如果用戶自己創建張量，則grad_fn 是 None ）。
- 如果你想計算導數，你可以調用 Tensor.backward()。如果 Tensor 是標量（即它包含一個元素數據），則不需要指定任何參數backward()，但是如果它有更多元素，則需要指定一個gradient 參數來指定張量的形狀。

2、怎麼用？

跟蹤Tensor上的所有操作：設置屬性requires_grad=True
自動計算所有梯度：調用.backward()
停止跟蹤Tensor：
- 方式一：調用detach()
- 方式二：代碼塊：with torch.no grad

創建一個張量，設置 requires_grad=True 來跟蹤與它相關的計算

x = torch.ones(2, 2, requires_grad=True)
print(x)
# 針對張量做一個操作
y = x + 2
print(y)
# y 作爲操作的結果被創建，所以它有 grad_fn
print(y.grad_fn)
# 針對張量做更多操作
z = y * y * 3
out = z.mean()
print(z, out)

console:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
# 每個張量都有一個 .grad_fn 屬性保存着創建了張量的 Function 的引用
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>) <AddBackward0 object at 0x000001989F20D7B8>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

.requires_grad_( … ) 會改變張量的 requires_grad 標記。

如果沒有提供相應的參數,輸入的標記默認爲 False。

import torch

a = torch.randn(2, 2)
a = ((a * .3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

console:
False
True
<SumBackward0 object at 0x000001B498D8D7B8>

梯度：

backward函數是反向傳播的入口點，在需要被求導的節點上調用backward函數會計算梯度值到相應的節點上。backward需要一個重要的參數grad_tensor，但如果節點只含有一個標量值，這個參數就可以忽略。我們現在後向傳播，因爲輸出包含了一個標量，所以out.backward() 等同於 out.backward(torch.tensor(1.))，否者就會報如下錯誤：

backward should be called only on a scalar (i.e, 1-element tensor) or with gradient w.r.t the variable

import torch
x = torch.ones(2, 2, requires_grad=True)
print(x)
# 針對張量做一個操作
y = x + 2
print(y)
# y 作爲操作的結果被創建，所以它有 grad_fn
print(y.grad_fn)
# 針對張量做更多操作
z = y * y * 3
out = z.mean()
print(z)
print(out)

out.backward()
print(x.grad)

console:

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000210E6595080>
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

原理解釋：
英文版解釋
中文版解釋：

雅可比向量的例子：

import torch
# .requires_grad=True 的張量自動求導。
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(x)
print(y)
# 現在在這種情況下，y 不再是一個標量。torch.autograd 不能夠直接計算整個雅可比，
# 但是如果我們只想要雅可比向量積，只需要簡單的傳遞向量給 backward 作爲參數。
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
# 你可以通過將代碼包裹在 with torch.no_grad()，來停止對從跟蹤歷史中的
# .requires_grad=True 的張量自動求導。
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

console:

tensor([ 0.2052,  0.6057, -0.6355], requires_grad=True)
tensor([  420.3014,  1240.4666, -1301.4670], grad_fn=<MulBackward0>)
tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])
True
True
False

三、PyTorch神經網絡

1、簡介

神經網絡可以通過 torch.nn 包來構建。
現在對於自動梯度(autograd)有一些瞭解，神經網絡是基於自動梯度 (autograd)來定義一些模型。一個 nn.Module 包括層和一個方法 forward(input) 它會返回輸出(output)。
例如，看一下數字圖片識別的網絡：
這是一個簡單的前饋神經網絡，它接收輸入，讓輸入一個接着一個的通過一些層，最後給出輸出。
一個典型的神經網絡訓練過程包括以下幾點：
- 1.定義一個包含可訓練參數的神經網絡
- 2.迭代整個輸入
- 3.通過神經網絡處理輸入
- 4.計算損失(loss)
- 5.反向傳播梯度到神經網絡的參數
- 6.更新網絡的參數，典型的用一個簡單的更新方法：weight = weight - learning_rate *gradient

2、神經網絡各參數意義

3、定義神經網絡

# -*- coding: utf-8 -*-
"""
Neural Networks
===============

Neural networks can be constructed using the ``torch.nn`` package.

Now that you had a glimpse of ``autograd``, ``nn`` depends on
``autograd`` to define models and differentiate them.
An ``nn.Module`` contains layers, and a method ``forward(input)`` that
returns the ``output``.

For example, look at this network that classifies digit images:

.. figure:: /_static/img/mnist.png
   :alt: convnet

   convnet

It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or
  weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  ``weight = weight - learning_rate * gradient``

Define the network
------------------

Let’s define this network:
"""
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)
print('---------------------')

########################################################################
# You just have to define the ``forward`` function, and the ``backward``
# function (where gradients are computed) is automatically defined for you
# using ``autograd``.
# You can use any of the Tensor operations in the ``forward`` function.
#
# The learnable parameters of a model are returned by ``net.parameters()``

# 一個模型可訓練的參數可以通過調用 net.parameters() 返回：
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight
print('---------------------')


########################################################################
# Let try a random 32x32 input
# Note: Expected input size to this net(LeNet) is 32x32. To use this net on
# MNIST dataset, please resize the images from the dataset to 32x32.

# 讓我們嘗試隨機生成一個 32x32 的輸入。注意：期望的輸入維度是 32x32 。
# 爲了使用這個網絡在 MNIST 數據及上，你需要把數據集中的圖片維度修改爲 32x32。
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
print('---------------------')


########################################################################
# Zero the gradient buffers of all parameters and backprops with random
# gradients:

# 把所有參數梯度緩存器置零，用隨機的梯度來反向傳播
net.zero_grad()
out.backward(torch.randn(1, 10))
print('---------------------')


########################################################################
# .. note::
#
#     ``torch.nn`` only supports mini-batches. The entire ``torch.nn``
#     package only supports inputs that are a mini-batch of samples, and not
#     a single sample.
#
#     For example, ``nn.Conv2d`` will take in a 4D Tensor of
#     ``nSamples x nChannels x Height x Width``.
#
#     If you have a single sample, just use ``input.unsqueeze(0)`` to add
#     a fake batch dimension.
#
# Before proceeding further, let's recap all the classes you’ve seen so far.
#
# **Recap:**
#   -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd
#      operations like ``backward()``. Also *holds the gradient* w.r.t. the
#      tensor.
#   -  ``nn.Module`` - Neural network module. *Convenient way of
#      encapsulating parameters*, with helpers for moving them to GPU,
#      exporting, loading, etc.
#   -  ``nn.Parameter`` - A kind of Tensor, that is *automatically
#      registered as a parameter when assigned as an attribute to a*
#      ``Module``.
#   -  ``autograd.Function`` - Implements *forward and backward definitions
#      of an autograd operation*. Every ``Tensor`` operation, creates at
#      least a single ``Function`` node, that connects to functions that
#      created a ``Tensor`` and *encodes its history*.
#
# **At this point, we covered:**
#   -  Defining a neural network
#   -  Processing inputs and calling backward
#
# **Still Left:**
#   -  Computing the loss
#   -  Updating the weights of the network
#
# Loss Function
# -------------
# A loss function takes the (output, target) pair of inputs, and computes a
# value that estimates how far away the output is from the target.
#
# There are several different
# `loss functions <https://pytorch.org/docs/nn.html#loss-functions>`_ under the
# nn package .
# A simple loss is: ``nn.MSELoss`` which computes the mean-squared error
# between the input and the target.
#
# For example:

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)
print('---------------------')


########################################################################
# Now, if you follow ``loss`` in the backward direction, using its
# ``.grad_fn`` attribute, you will see a graph of computations that looks
# like this:
#
# ::
#
#     input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
#           -> view -> linear -> relu -> linear -> relu -> linear
#           -> MSELoss
#           -> loss
#
# So, when we call ``loss.backward()``, the whole graph is differentiated
# w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``
# will have their ``.grad`` Tensor accumulated with the gradient.
#
# For illustration, let us follow a few steps backward:

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
print('---------------------')


########################################################################
# Backprop
# --------
# To backpropagate the error all we have to do is to ``loss.backward()``.
# You need to clear the existing gradients though, else gradients will be
# accumulated to existing gradients.
#
#
# Now we shall call ``loss.backward()``, and have a look at conv1's bias
# gradients before and after the backward.


net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
print('---------------------')


########################################################################
# Now, we have seen how to use loss functions.
#
# **Read Later:**
#
#   The neural network package contains various modules and loss functions
#   that form the building blocks of deep neural networks. A full list with
#   documentation is `here <https://pytorch.org/docs/nn>`_.
#
# **The only thing left to learn is:**
#
#   - Updating the weights of the network
#
# Update the weights
# ------------------
# The simplest update rule used in practice is the Stochastic Gradient
# Descent (SGD):
#
#      ``weight = weight - learning_rate * gradient``
#
# We can implement this using simple python code:
#
# .. code:: python
#
#     learning_rate = 0.01
#     for f in net.parameters():
#         f.data.sub_(f.grad.data * learning_rate)
#
# However, as you use neural networks, you want to use various different
# update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
# To enable this, we built a small package: ``torch.optim`` that
# implements all these methods. Using it is very simple:

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update
print('---------------------')


###############################################################
# .. Note::
#
#       Observe how gradient buffers had to be manually set to zero using
#       ``optimizer.zero_grad()``. This is because gradients are accumulated
#       as explained in `Backprop`_ section.


console:

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
---------------------
10
torch.Size([6, 1, 5, 5])
---------------------
tensor([[-0.0588, -0.0427, -0.1616,  0.0437,  0.0163,  0.0543, -0.1478, -0.0592,
         -0.0509,  0.0549]], grad_fn=<AddmmBackward>)
---------------------
---------------------
tensor(0.4980, grad_fn=<MseLossBackward>)
---------------------
<MseLossBackward object at 0x0000024FE0C6A8D0>
<AddmmBackward object at 0x0000024F8313D4A8>
<AccumulateGrad object at 0x0000024FE0C6A8D0>
---------------------
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03,  1.1338e-02,  1.7925e-03, -6.9680e-07,  9.8157e-03,
         2.1737e-03])
---------------------
---------------------

你剛定義了一個前饋函數，然後反向傳播函數被自動通過 autograd 定義了。你可以使用任何張量操作在前饋函數上。

4、損失函數

一個損失函數需要一對輸入：模型輸出和目標，然後計算一個值來評估輸出距離目標有多遠。
有一些不同的損失函數在 nn 包中。一個簡單的損失函數就是 nn.MSELoss ，這計算了輸入與目標的均方誤差。

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

console:

tensor(0.6660, grad_fn=<MseLossBackward>)

現在，如果你跟隨損失到反向傳播路徑，可以使用它的 .grad_fn 屬性，你將會看到一個這樣的計算圖：

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

所以，當我們調用 loss.backward()，整個圖都會微分，而且所有的在圖中的requires_grad=True 的張量將會讓他們的 grad 張量累計梯度。
爲了演示，我們將跟隨以下步驟來反向傳播。

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

console:
<MseLossBackward object at 0x0000019183144C18>
<AddmmBackward object at 0x0000019183144D30>
<AccumulateGrad object at 0x0000019183144D30>

5、反向傳播

爲了實現反向傳播損失，我們所有需要做的事情僅僅是使用 loss.backward()。你需要清空現存的梯度，要不然帝都將會和現存的梯度累計到一起。
現在我們調用 loss.backward() ，然後看一下 con1 的偏置項在反向傳播之前和之後的變化。

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

console:

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-5.8280e-03,  1.1338e-02,  1.7925e-03, -6.9680e-07,  9.8157e-03,
         2.1737e-03])

現在我們看到了，如何使用損失函數。唯一剩下的事情就是更新神經網絡的參數。更新神經網絡參數：最簡單的更新規則就是隨機梯度下降。

weight = weight - learning_rate * gradient

我們可以使用 python 來實現這個規則：

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

儘管如此，如果你是用神經網絡，你想使用不同的更新規則，類似於 SGD, Nesterov-SGD, Adam, RMSProp, 等。爲了讓這可行，我們建立了一個小包：torch.optim 實現了所有的方法。使用它非常的簡單。

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

四、CIFAR10圖像分類

1、簡介

通常來說，當你處理圖像，文本，語音或者視頻數據時，你可以使用標準 python 包將數據加載成 numpy 數組格式，然後將這個數組轉換成 torch.*Tensor
- 對於圖像，可以用 Pillow，OpenCV
- 對於語音，可以用 scipy，librosa
- 對於文本，可以直接用 Python 或 Cython 基礎數據加載模塊，或者用 NLTK 和 SpaCy
特別是對於視覺，我們已經創建了一個叫做 totchvision 的包，該包含有支持加載類似Imagenet，CIFAR10，MNIST 等公共數據集的數據加載模塊 torchvision.datasets 和支持加載圖像數據數據轉換模塊 torch.utils.data.DataLoader。
這提供了極大的便利，並且避免了編寫“樣板代碼”。
對於本教程，我們將使用CIFAR10數據集，它包含十個類別：‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’。CIFAR-10 中的圖像尺寸爲33232，也就是RGB的3層顏色通道，每層通道內的尺寸爲32*32。

2、函數介紹

make_grid的作用是將若干幅圖像拼成一幅圖像。其中padding的作用就是子圖像與子圖像之間的pad有多寬。

3、訓練一個圖像分類器

我們將按次序的做如下幾步：
- 使用torchvision加載並且歸一化CIFAR10的訓練和測試數據集
- 定義一個卷積神經網絡
- 定義一個損失函數
- 在訓練樣本數據上訓練網絡
- 在測試樣本數據上測試網絡

整體神經網絡

# -*- coding: utf-8 -*-
"""
Training a Classifier
=====================

This is it. You have seen how to define neural networks, compute loss and make
updates to the weights of the network.

Now you might be thinking,

What about data?
----------------

Generally, when you have to deal with image, text, audio or video data,
you can use standard python packages that load data into a numpy array.
Then you can convert this array into a ``torch.*Tensor``.

-  For images, packages such as Pillow, OpenCV are useful
-  For audio, packages such as scipy and librosa
-  For text, either raw Python or Cython based loading, or NLTK and
   SpaCy are useful

Specifically for vision, we have created a package called
``torchvision``, that has data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data.DataLoader``.

This provides a huge convenience and avoids writing boilerplate code.

For this tutorial, we will use the CIFAR10 dataset.
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

.. figure:: /_static/img/cifar10.png
   :alt: cifar10

   cifar10


Training an image classifier
----------------------------

We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using
   ``torchvision``
2. Define a Convolutional Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

1. Loading and normalizing CIFAR10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using ``torchvision``, it’s extremely easy to load CIFAR10.
"""
import torch
import torchvision
import torchvision.transforms as transforms

########################################################################
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

########################################################################
# Let us show some of the training images, for fun.

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))


########################################################################
# 2. Define a Convolutional Neural Network
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Copy the neural network from the Neural Networks section before and modify it to
# take 3-channel images (instead of 1-channel images as it was defined).

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

########################################################################
# 3. Define a Loss function and optimizer
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Let's use a Classification Cross-Entropy loss and SGD with momentum.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

########################################################################
# 4. Train the network
# ^^^^^^^^^^^^^^^^^^^^
#
# This is when things start to get interesting.
# We simply have to loop over our data iterator, and feed the inputs to the
# network and optimize.

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

########################################################################
# 5. Test the network on the test data
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# We have trained the network for 2 passes over the training dataset.
# But we need to check if the network has learnt anything at all.
#
# We will check this by predicting the class label that the neural network
# outputs, and checking it against the ground-truth. If the prediction is
# correct, we add the sample to the list of correct predictions.
#
# Okay, first step. Let us display an image from the test set to get familiar.

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

########################################################################
# Okay, now let us see what the neural network thinks these examples above are:

outputs = net(images)

########################################################################
# The outputs are energies for the 10 classes.
# Higher the energy for a class, the more the network
# thinks that the image is of the particular class.
# So, let's get the index of the highest energy:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

########################################################################
# The results seem pretty good.
#
# Let us look at how the network performs on the whole dataset.

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

########################################################################
# That looks waaay better than chance, which is 10% accuracy (randomly picking
# a class out of 10 classes).
# Seems like the network learnt something.
#
# Hmmm, what are the classes that performed well, and the classes that did
# not perform well:

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

########################################################################
# Okay, so what next?
#
# How do we run these neural networks on the GPU?
#
# Training on GPU
# ----------------
# Just like how you transfer a Tensor on to the GPU, you transfer the neural
# net onto the GPU.
#
# Let's first define our device as the first visible cuda device if we have
# CUDA available:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:

print(device)

########################################################################
# The rest of this section assumes that `device` is a CUDA device.
#
# Then these methods will recursively go over all modules and convert their
# parameters and buffers to CUDA tensors:
#
# .. code:: python
#
#     net.to(device)
#
#
# Remember that you will have to send the inputs and targets at every step
# to the GPU too:
#
# .. code:: python
#
#         inputs, labels = inputs.to(device), labels.to(device)
#
# Why dont I notice MASSIVE speedup compared to CPU? Because your network
# is realllly small.
#
# **Exercise:** Try increasing the width of your network (argument 2 of
# the first ``nn.Conv2d``, and argument 1 of the second ``nn.Conv2d`` –
# they need to be the same number), see what kind of speedup you get.
#
# **Goals achieved**:
#
# - Understanding PyTorch's Tensor library and neural networks at a high level.
# - Train a small neural network to classify images
#
# Training on multiple GPUs
# -------------------------
# If you want to see even more MASSIVE speedup using all of your GPUs,
# please check out :doc:`data_parallel_tutorial`.
#
# Where do I go next?
# -------------------
#
# -  :doc:`Train neural nets to play video games </intermediate/reinforcement_q_learning>`
# -  `Train a state-of-the-art ResNet network on imagenet`_
# -  `Train a face generator using Generative Adversarial Networks`_
# -  `Train a word-level language model using Recurrent LSTM networks`_
# -  `More examples`_
# -  `More tutorials`_
# -  `Discuss PyTorch on the Forums`_
# -  `Chat with other users on Slack`_
#
# .. _Train a state-of-the-art ResNet network on imagenet: https://github.com/pytorch/examples/tree/master/imagenet
# .. _Train a face generator using Generative Adversarial Networks: https://github.com/pytorch/examples/tree/master/dcgan
# .. _Train a word-level language model using Recurrent LSTM networks: https://github.com/pytorch/examples/tree/master/word_language_model
# .. _More examples: https://github.com/pytorch/examples
# .. _More tutorials: https://github.com/pytorch/tutorials
# .. _Discuss PyTorch on the Forums: https://discuss.pytorch.org/
# .. _Chat with other users on Slack: https://pytorch.slack.com/messages/beginner/

console:
Files already downloaded and verified
Files already downloaded and verified
plane   cat  deer horse
[1,  2000] loss: 2.286
[1,  4000] loss: 1.921
[1,  6000] loss: 1.731
[1,  8000] loss: 1.616
[1, 10000] loss: 1.568
[1, 12000] loss: 1.484
[2,  2000] loss: 1.408
[2,  4000] loss: 1.382
[2,  6000] loss: 1.345
[2,  8000] loss: 1.322
[2, 10000] loss: 1.304
[2, 12000] loss: 1.271
Finished Training
GroundTruth:    cat  ship  ship plane
Predicted:    cat  ship plane plane
Accuracy of the network on the 10000 test images: 55 %
Accuracy of plane : 63 %
Accuracy of   car : 50 %
Accuracy of  bird : 38 %
Accuracy of   cat : 41 %
Accuracy of  deer : 37 %
Accuracy of   dog : 41 %
Accuracy of  frog : 77 %
Accuracy of horse : 63 %
Accuracy of  ship : 70 %
Accuracy of truck : 71 %
cuda:0

下載數據集

加載並歸一化 CIFAR10 使用 torchvision ,用它來加載 CIFAR10 數據非常簡單。

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

展示其中的一些訓練圖片

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    print(img)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    print(npimg)
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images,nrow=2, padding=1))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

console:
plane   dog   car   cat

定義一個卷積神經網絡在這之前先從神經網絡章節複製神經網絡，並修改它爲3通道的圖片(在此之前它被定義爲1通道)

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

定義一個損失函數和優化器讓我們使用分類交叉熵Cross-Entropy 作損失函數，動量SGD做優化器。


import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

訓練網絡這裏事情開始變得有趣，我們只需要在數據迭代器上循環傳給網絡和優化器輸入就可以。

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

console:

[1,  2000] loss: 2.210
[1,  4000] loss: 1.856
[1,  6000] loss: 1.638
[1,  8000] loss: 1.578
[1, 10000] loss: 1.514
[1, 12000] loss: 1.471
[2,  2000] loss: 1.391
[2,  4000] loss: 1.380
[2,  6000] loss: 1.381
[2,  8000] loss: 1.333
[2, 10000] loss: 1.293
[2, 12000] loss: 1.299
Finished Training

4、測試數據集

在測試集上測試網絡我們已經通過訓練數據集對網絡進行了2次訓練，但是我們需要檢查網絡是否已經學到了東西。我們將用神經網絡的輸出作爲預測的類標來檢查網絡的預測性能，用樣本的真實類標來校對。如果預測是正確的，我們將樣本添加到正確預測的列表裏。好的，第一步，讓我們從測試集中顯示一張圖像來熟悉它。
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)。按維度dim 返回最大值。torch.max)(a,0) 返回每一列中最大值的那個元素，且返回索引（返回最大元素在這一列的行索引）

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
####################################
outputs = net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

console:
GroundTruth:    cat  ship  ship plane
Predicted:    cat  ship plane plane

查看網絡整體的預測情況

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
    
console:
Accuracy of the network on the 10000 test images: 55 %

查看網絡對每一個類的預測情況

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))
        
console:

Accuracy of plane : 63 %
Accuracy of   car : 50 %
Accuracy of  bird : 38 %
Accuracy of   cat : 41 %
Accuracy of  deer : 37 %
Accuracy of   dog : 41 %
Accuracy of  frog : 77 %
Accuracy of horse : 63 %
Accuracy of  ship : 70 %
Accuracy of truck : 71 %

在gpu上訓練

記住你也必須在每一個步驟向GPU發送輸入和目標：

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print(device)
net.to(device)
# 記住你也必須在每一個步驟向GPU發送輸入和目標：
inputs, labels = inputs.to(device), labels.to(device)

console:
cuda:0

【pytorch】從入門到構建一個分類網絡超長超詳細教程（附代碼）

文章目錄

一、pytorch入門

1、資料鏈接

2、pytorch介紹

(1) c1_cpu和gpu速度對比.py

(2) c2_autograd.py — pytorch求導

3、張量

Tensors (張量)

構造一個5×3矩陣，不初始化。

構造一個隨機初始化的矩陣：

構造一個矩陣全爲 0，而且數據類型是 long.

構造一個張量，直接使用數據：

創建一個 tensor 基於已經存在的 tensor。

獲取它的維度信息:

加法: 方式 1

加法: 方式2

加法: 提供一個輸出 tensor 作爲參數

加法: in-place

可以使用標準的 NumPy 類似的索引操作

改變大小：如果你想改變一個 tensor 的大小或者形狀，你可以使用 torch.view:

如果你有一個元素 tensor ，使用 .item() 來獲得這個 value 。

numpy和tensor之間相互轉換

二、Autograd自動微分

1、簡介

2、怎麼用？

創建一個張量，設置 requires_grad=True 來跟蹤與它相關的計算

.requires_grad_( … ) 會改變張量的 requires_grad 標記。

梯度：

雅可比向量的例子：

三、PyTorch神經網絡

1、簡介

2、神經網絡各參數意義

3、定義神經網絡

4、損失函數

5、反向傳播

四、CIFAR10圖像分類

1、簡介

2、函數介紹

3、訓練一個圖像分類器

整體神經網絡

下載數據集

展示其中的一些訓練圖片

定義一個卷積神經網絡 在這之前先 從神經網絡章節 複製神經網絡，並修改它爲3通道的圖片(在此之前它被定義爲1通道)

定義一個損失函數和優化器 讓我們使用分類交叉熵Cross-Entropy 作損失函數，動量SGD做優化器。

訓練網絡 這裏事情開始變得有趣，我們只需要在數據迭代器上循環傳給網絡和優化器 輸入就可以。

4、測試數據集

查看網絡整體的預測情況

查看網絡對每一個類的預測情況

在gpu上訓練

定義一個卷積神經網絡在這之前先從神經網絡章節複製神經網絡，並修改它爲3通道的圖片(在此之前它被定義爲1通道)

定義一個損失函數和優化器讓我們使用分類交叉熵Cross-Entropy 作損失函數，動量SGD做優化器。

訓練網絡這裏事情開始變得有趣，我們只需要在數據迭代器上循環傳給網絡和優化器輸入就可以。