Pytorch : neural network

本文是pytorch官方的一篇教程，加入其它學習過程中的一些東西，持續更新：https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

主要記錄一些在Pytorch 中構建神經網絡的必要知識點。

Contents [hide]

神經網絡的典型訓練過程如下：

定義一些可學習的參數（或權重）的神經網絡
遍歷輸入數據集
通過網絡處理輸入
計算損失（輸出與正確（ground truth, correct, target）的距離有多遠）
將梯度傳播回網絡參數
通常使用簡單的更新規則來更新網絡的權重：
weights =weights - learning_rate * gradient

定義一些可學習的參數（或權重）的神經網絡

Pytorch中的Parameter 和Variable之間的比較

定義網絡

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

pytorch 的nn.Module 類似於keras 的 keras.Model。在初始化的時候定義層，在調用的時候構建網絡。只不過keras.Model定義模型流程的時候是call()，pytorch使用forward()定義模型流程。

tf.keras.Model

當然兩者在調用的時候都使用（）調用，也就是模型的名稱加上（）調用，具體原因是由於Python類中的__init__() 和 __call__() 方法的調用方式。
Python類使用時用class_name()() 調用，第一個括號代表 __init__() ，第二個括號代表__call__() 。 nn.Module 的forward()與keras.Model 的call() 都被__call__()調用了，也就是它們都重寫了__call__()方法。

來看一下nn.Module類的__call__()方法：UNDO_yiyehu

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        result = hook(self, input)
        if result is not None:
            if not isinstance(result, tuple):
                result = (result,)
            input = result
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
    else:
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            result = hook_result
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
            else:
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
                wrapper = functools.partial(hook, self)
                functools.update_wrapper(wrapper, hook)
                grad_fn.register_hook(wrapper)
    return result

關於pytorch的一些定義好的層：卷積、池化、激活、歸一化，dropout等

有兩種方式： nn.Xxx 和 nn.functional.xxx

nn.Xxx中定義的都是都繼承於一個共同祖先Module的類， nn.functional.xxx中定義的都是純函數形式的操作。一些操作都是調用C++編寫的函數進行計算的，如conv。

nn.Xxx 其實相當於對 nn.functional.xxx 的一個封裝，就像是keras 相對於Keras後端一樣，雖然現在TensorFlow有了自己的keras，tf. keras，也差不多這個意思了。

以conv2d操作舉例，nn.functional.conv2d的輸入是(input, weight, self.bias, self.stride, self.padding, self.dilation, self.groups) ，可以看出 weights 和 bias 都需要手動傳入。而 nn. Conv2d在__init__()中初始化了weights 和bias （在_ConvNd類中初始化了weights 和bias，Conv2d繼承自 _ConvNd ， _ConvNd 繼承自Module），並在 forward()中調用了 nn.functional.conv2d 。

torch.nn.Module

關於 nn.Xxx 和 nn.functional.xxx 區別的相關博文：

https://www.jianshu.com/p/7bb495573cb9
https://www.zhihu.com/question/66782101

通過網絡處理輸入

torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

torch.nn的輸入只支持 mini-batches. 也就是輸入需要多一維用作samples。

計算損失

https://pytorch.org/docs/stable/nn.html#l1loss

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Now, if you follow loss in the backward direction, using its .grad_fn attribute, you will see a graph of computations that looks like this:

現在，如果使用它的.grad_fn屬性，按照反向方向跟蹤loss，將看到一個計算圖

將梯度傳播回網絡參數

執行 loss.backward() ，整個圖與關於loss 微分，圖表中所有require_grad = True的張量的grad張量將累積其梯度。

梯度更新

pytorch在更新梯度的時候取得是梯度的均值，即除以 batch_size。當然，如果不使用net.zero_grad()的話，每一次訓練都會累積（accumulate）梯度。
這篇博文做了個實驗：https://www.jb51.net/article/168006.htm

使用簡單的更新規則來更新網絡的權重

# Stochastic Gradient Descent (SGD) 
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

使用其他梯度下降的方法 `torch.optim`

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

optimizer.zero_grad()，是爲了清空weights 累積的梯度，每一個batch都需要清空前一個batch累積的梯度。

Pytorch : neural network

神經網絡的典型訓練過程如下：

定義一些可學習的參數（或權重）的神經網絡

定義網絡

關於pytorch的一些定義好的層：卷積、池化、激活、歸一化，dropout等

通過網絡處理輸入

計算損失

將梯度傳播回網絡參數

梯度更新

使用簡單的更新規則來更新網絡的權重

使用其他梯度下降的方法 `torch.optim`

keras 保存完整模型，加載模型出錯Unknown metric function

關於Python的

設計模式,Spring中的設計模式

時間表達式，corn表達式

Pytorch : neural network

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Pytorch : neural network

神經網絡的典型訓練過程如下：

定義一些可學習的參數（或權重）的神經網絡

定義網絡

關於pytorch的一些定義好的層 ： 卷積、池化、激活、歸一化，dropout等

通過網絡處理輸入

計算損失

將梯度傳播回網絡參數

梯度更新

使用簡單的更新規則來更新網絡的權重

使用其他梯度下降的方法 torch.optim

關於pytorch的一些定義好的層：卷積、池化、激活、歸一化，dropout等

使用其他梯度下降的方法 `torch.optim`