Pytorch筆記(三)

用例子來學習Pytorch續

這篇筆記是接着Pytorch筆記(二)繼續的

Pytorch的nn模塊(neural network)

根據上面的學習我們瞭解到一個神經網絡包含了前向傳播和反向傳播,其中我們定義node和edge分別爲torch和function就是這樣數據和數據(data)的操作方法(operation)的組合,構成了我們的神經網絡,這一點我認爲很類似於一個系統框圖(信號與系統或者自動控制理論中)

雖然已經有了可以搭建神經網絡的方法,但是這樣的操作太底層了,對於一般性來說,如果僅僅是使用,已經很成熟的模塊,我們不需這個計算的過程,而是專注於結構上的構建,比如一個 Layer 或者是一個 Block

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

梳理代碼:

  • 這裏我們使用的搭建層的方法爲torch.nn.Sequential ,Sequential就是連續的,也是從頂到底的,一個堆疊,裏面包含了一個線性層 一個RuLU激活函數層 以及最後輸出的線性層,這個結構和我們之前的網絡結構是相同的

  • 在更新參數的時候我們使用得是model.paramrters()去拿到模型裏面的learnable參數,使用參數自身的gard去更新它的值,等於para是一個對象,裏面既有自己的值,同時保存了反向傳播的梯度

Optimization 加速器

還記得我們上面是怎麼更新梯度的嗎,我們使用with torch.no_grad():下面的代碼去逐個參數的更新數值,這樣的做法很辛苦,而且是很基礎的更新,事實上在更新方法上,我們已經有個很多很成熟的optimization algorithms ,比如Adam 、 RMSProp 、AdaGrad 等等,他們都包含在optim這個包下

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

代碼解析:

  • 最直觀的感受就是最下面沒有with torch.no_grad() 方法了,取而代之得是一個optimizer.step() ,這裏可以看到我們多一個新的對象,叫做optimizer(加速器)它的定義在torch.optim.Adam下面,裏面接收的參數是model.parameters(),以及一個學習率,這一點其實和之前也是共通的,我們之前操作得也是model的parameters,現在只不過把它傳給了加速器,讓加速器幫我們進行一個反向傳播更新從參數的方法

定製自己的module

我們上面使用了一個Sequnetial module ,這個模型最大的特點就是自頂而下,也就是好像僅僅有一條線,類似於我們學習佈局的時候用到的linear Layout,它僅僅在乎一個vertical方向的佈局

那我們想要有更豐富的module,那該怎麼辦呢?對此我們可以,繼承與torch.nn.Module,我們寫它的一個自定義子類,下面我們實現一個2層的子類module

# -*- coding: utf-8 -*-
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
  • 這裏看到,我們自己定義了一個twolayer module

這個module的寫發也可以變成這樣

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.relu=torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h = self.linear1(x)
        h_relu=self.relu(h)
        y_pred = self.linear2(h_relu)
        return y_pred
  • 因爲Relu這個激活函數,它很特殊,反向傳播是,梯度在負數爲0,正數爲1,不會影響參數,所以我們可以使用clamp去把它對截斷,同樣也可以用使用nn.ReLU 兩者的功效是相同的

流程控制與權值共享

下面介紹一個,網絡流程控制的案例,這種結構大多數用於RNN或者有LSTM模塊的網絡中,他們的網絡結構都是變化的對於一般的全連接,或者CNN可能用得比較少

# -*- coding: utf-8 -*-
import random
import torch


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
  • 可以看到和之前並沒有什麼太大不同就是每次訓練隱藏層的數量是不唯一的

這是官網關於MSELoss的解釋 鏈接
在這裏插入圖片描述
可以看到,如果不指定reduction的參數,這裏返回一個損失序列,指定爲mean則求均值,指定爲sum則求和
參見我們上一次的筆記我們的損失函數是
Loss=(Ypreytrue)2Loss= \sum (Y_{pre}-y_{true})^2
所以,這裏我們的reduction設置爲sum

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章