PyTorch examples/PyTorch入門實例

From and thanks to: github jcjohnson/pytorch-examples

    本文通過自包含的示例介紹了PyTorch的基本概念,jcjohson的這些實例可以很好地幫助理解PyTorch與numpy、TensorFlow等之間的關係,以及其自己的概念和設計。

    PyTorch的核心是兩個主要特徵:

1)一個n維Tensor,類似於numpy但可以在GPU上運行

2)自動差分建立和訓練神經網絡

    這裏將使用完全連接的ReLU網絡作爲運行示例。 網絡將具有單個隱藏層,並且將通過最小化網絡輸出和真實輸出之間的歐幾里德距離來訓練梯度下降以適合隨機數據。

    注意:這些示例已經更新爲PyTorch 0.4,它對核心PyTorch API進行了幾處重大更改。 最值得注意的是,在0.4之前,Tensors必須用Variable對象包裝才能使用autograd; 此功能現已直接添加到Tensors,現在不推薦使用變量。

目錄

Warm-up: numpy

PyTorch: Tensors

PyTorch: Autograd

TensorFlow: Static Graphs

PyTorch: nn

PyTorch: optim

PyTorch: Custom nn Modules


Warm-up: numpy

    Numpy是支持n維矩陣運算和操作的科學計算框架。Numpy本身與計算圖、深度學習或梯度無關,但可以通過numpy的操作輕易地構建簡單的神經網絡。以numpy實現一個2層的網絡(1個隱藏層):

# Code in file tensor/two_layer_net_numpy.py
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.dot(w1)
  h_relu = np.maximum(h, 0)
  y_pred = h_relu.dot(w2)
  
  # Compute and print loss
  loss = np.square(y_pred - y).sum()
  print(t, loss)
  
  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.T.dot(grad_y_pred)
  grad_h_relu = grad_y_pred.dot(w2.T)
  grad_h = grad_h_relu.copy()
  grad_h[h < 0] = 0
  grad_w1 = x.T.dot(grad_h)
 
  # Update weights
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

PyTorch: Tensors

    Numpy很強大,但是不支持利用GPU加速數值計算。而現代的深度神經網絡通過GPU能提速50倍甚至更高,所以numpy不能滿足現代的深度學習。

    PyTorch類似numpy,但支持GPU。PyTorch的Tensor在概念上等同於numpy的array,Tensor是一個n維array,類似numpy對於array的支持,PyTorch提供了很多操作Tensor的函數。因此,任何numpy可以完成的PyTorch都可以實現,它們都是科學計算的基本工具。在GPU上運行Tensor需要在創建Tensor時使用設備參數(例如:device = torch.device('cuda'); x = torch.randn(10, 20, device=device))。

    以PyTorch Tensor實現一個2層的網絡:

# Code in file tensor/two_layer_net_tensor.py
import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)

  # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
  # of shape (); we can get its value as a Python number with loss.item().
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # Update weights using gradient descent
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

PyTorch: Autograd

    上面的例子中我們不得不手動實現網絡的前向和後向傳播。對於一個二層小網絡的後向傳播進行實現問題不大,但是對於大的複雜網絡這種實現迅速變得很難。

         幸運的是我們可以利用automatic differentiation來自動計算神經網絡的後向傳播。PyTorch中的autograd包就是實現這個功能的。當使用autograd時,網絡的前向傳播會定義一個計算圖(computational graph);圖中的節點是Tensor,圖中的連線是獲取輸出Tensor的函數。通過該圖的後向傳播可以輕鬆計算梯度。

         如果我們想要計算某個Tensor的梯度,只需要在創建Tensor時設置requires_grad=True。然後該Tensor的所有PyTorch操作都會創建一個可以在後續計算後向傳播的計算圖。例如,x是一個設置requires_grad=True的Tensor,那麼後向傳播之後得到的x.grad是另一個Tensor,存儲針對某個放縮值的x的梯度。

         有時,你可能不希望PyTorch在Tensor執行某些操作時建立計算圖,例如我們在訓練神經網絡時通常不想通過權重更新步進行後向傳播。在這種情況下我們可以使用torch.no_grad()來阻止計算圖的創建。

         注意,在神經網絡中,我們需求更新的梯度是權重梯度,而輸入輸出和隱藏層的權重並不需要。這也就對應了我們上述的兩種情況。

         這裏我們使用PyTorch Tensor和autograd來實現2層網絡,現在我們不再需要手動實現網絡的後向傳播。這個例子中調用了autograd中的backward()方法完成後向傳播。

# Code in file autograd/two_layer_net_autograd.py
import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Create random Tensors for weights; setting requires_grad=True means that we
# want to compute gradients for these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y using operations on Tensors. Since w1 and
  # w2 have requires_grad=True, operations involving these Tensors will cause
  # PyTorch to build a computational graph, allowing automatic computation of
  # gradients. Since we are no longer implementing the backward pass by hand we
  # don't need to keep references to intermediate values.
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # Compute and print loss. Loss is a Tensor of shape (), and loss.item()
  # is a Python number giving its value.
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # Use autograd to compute the backward pass. This call will compute the
  # gradient of loss with respect to all Tensors with requires_grad=True.
  # After this call w1.grad and w2.grad will be Tensors holding the gradient
  # of the loss with respect to w1 and w2 respectively.
  loss.backward()

  # Update weights using gradient descent. For this step we just want to mutate
  # the values of w1 and w2 in-place; we don't want to build up a computational
  # graph for the update steps, so we use the torch.no_grad() context manager
  # to prevent PyTorch from building a computational graph for the updates
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

PyTorch: Defining new autograd functions

    原生的autograd操作包括兩種針對Tensor的操作:forward函數計算輸入Tensor的輸出Tensor;backward函數接收輸出Tensor的梯度並計算輸入Tensor的梯度。

    在PyTorch中我們可以通過定義torch.autograd.Function子類和實現forward和backward函數來定義我們自己的autograd操作。然後我們可以通過創建實例來使用新的autograd操作,並像函數一樣調用,傳播Tensor。

    這個例子我們自定義autograd函數並執行ReLU非線性,並使用它實現2層網絡。

# Code in file autograd/two_layer_net_custom_function.py
import torch

class MyReLU(torch.autograd.Function):
  """
  We can implement our own custom autograd Functions by subclassing
  torch.autograd.Function and implementing the forward and backward passes
  which operate on Tensors.
  """
  @staticmethod
  def forward(ctx, x):
    """
    In the forward pass we receive a context object and a Tensor containing the
    input; we must return a Tensor containing the output, and we can use the
    context object to cache objects for use in the backward pass.
    """
    ctx.save_for_backward(x)
    return x.clamp(min=0)

  def backward(ctx, grad_output):
    """
    In the backward pass we receive the context object and a Tensor containing
    the gradient of the loss with respect to the output produced during the
    forward pass. We can retrieve cached data from the context object, and must
    compute and return the gradient of the loss with respect to the input to the
    forward function.
    """
    x, = ctx.saved_tensors
    grad_x = grad_output.clone()
    grad_x[x < 0] = 0
    return grad_x


device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and output
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y using operations on Tensors; we call our
  # custom ReLU implementation using the MyReLU.apply function
  y_pred = MyReLU.apply(x.mm(w1)).mm(w2)
 
  # Compute and print loss
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # Use autograd to compute the backward pass.
  loss.backward()

  with torch.no_grad():
    # Update weights using gradient descent
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

TensorFlow: Static Graphs

    PyTorch autograd 同TensorFlow非常像:兩個框架都定義了計算圖,並使用automatic differentiation來計算梯度。兩者最大的差異就是,TensorFlow的計算圖是靜態的,而PyTorch的計算圖是動態的。

    在TensorFlow中我們僅定義一次計算圖,然後一遍一遍的執行相同的計算圖,每次給不同的輸入數據。在PyTorch中,每次前向傳播定義一個新的計算圖。

    靜態圖很好,因爲你可以預先優化圖; 例如,框架可能決定融合某些圖的操作以提高效率,或者提出一種用於多GPU或多機器上的分佈圖策略。 如果反覆使用相同的圖表,那麼對於這種代價高昂的前期優化便可以分攤,因爲相同的圖表會反覆重新運行。

    靜態和動態圖表不同的一個方面是控制流程。 對於某些模型,我們可能希望對每個數據點執行不同的計算; 例如,可以針對每個數據點針對不同數量的時間步長展開循環網絡; 這種展開可以作爲循環實現。 使用靜態圖形,循環結構需要是圖形的一部分; 因此,TensorFlow提供了諸如tf.scan之類的運算符,用於將循環嵌入到圖中。 使用動態圖形情況更簡單:因爲我們爲每個示例動態構建圖形,我們可以使用常規命令流程控制來執行每個輸入不同的計算。

    爲了與上面的PyTorch autograd示例相比,這裏我們使用TensorFlow來實現一個簡單的2層網絡:

    補充TensorFlow的解釋:TF裏的圖有的是變量和有的是變量計算,這與PyTorch不太一樣;另外TF先定義所有的圖(也即靜態圖的概念),但是在執行時只執行需要執行的圖(例如下例中的損失和權重,sess.run([loss, new_w1, new_w2]…)),並返回結果(該圖所涉及的其它圖都會計算但不返回結果),這也是它的一個靈活之處,但還是不免讓剛入門的人不太習慣,因爲不符合往常的編程習慣。PyTorch與TensorFlow不同的是不預先定義圖,用到的時候定義,例如圖可以定義在循環裏,這就和我們平常寫程序的習慣是一樣的,也就是PyTorch的動態圖特性,使得我們覺得其簡單易用。

# Code in file autograd/tf_two_layer_net.py
import tensorflow as tf
import numpy as np

# First we set up the computational graph:

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
  # Run the graph once to initialize the Variables w1 and w2.
  sess.run(tf.global_variables_initializer())

  # Create numpy arrays holding the actual data for the inputs x and targets y
  x_value = np.random.randn(N, D_in)
  y_value = np.random.randn(N, D_out)
  for _ in range(500):
    # Execute the graph many times. Each time it executes we want to bind
    # x_value to x and y_value to y, specified with the feed_dict argument.
    # Each time we execute the graph we want to compute the values for loss,
    # new_w1, and new_w2; the values of these Tensors are returned as numpy
    # arrays.
    loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                feed_dict={x: x_value, y: y_value})
    print(loss_value)

PyTorch: nn

    計算圖和autograd是非常強大的範式,用於定義複雜的運算符並自動獲取導數;然而對於大型神經網絡,原始的autograd可能有點太低級別。

    在構建神經網絡時,我們經常考慮將計算安排到層中,其中一些層具有可學習的參數(learnable parameters),這些參數將在學習期間進行優化。

    在TensorFlow中,像Keras,TensorFlow-Slim和TFLearn這樣的軟件包提供了基於原始計算圖更高級別的抽象,這對構建神經網絡十分有用。

    在PyTorch中,nn包具有同樣的作用。 nn包定義了一組模塊,它們大致相當於神經網絡層。模塊接收輸入Tensor並計算輸出Tensor,但也可以保持內部狀態,例如包含可學習參數的Tensor。nn包還定義了一組在訓練神經網絡時常用的有用的損失函數。

    在這個例子中,我們使用nn包來實現我們的2層網絡:

# Code in file nn/two_layer_net_nn.py
import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# After constructing the model we use the .to() method to move it to the
# desired device.
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        ).to(device)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
  # Forward pass: compute predicted y by passing x to the model. Module objects
  # override the __call__ operator so you can call them like functions. When
  # doing so you pass a Tensor of input data to the Module and it produces
  # a Tensor of output data.
  y_pred = model(x)

  # Compute and print loss. We pass Tensors containing the predicted and true
  # values of y, and the loss function returns a Tensor containing the loss.
  loss = loss_fn(y_pred, y)
  print(t, loss.item())
  
  # Zero the gradients before running the backward pass.
  model.zero_grad()

  # Backward pass: compute gradient of the loss with respect to all the learnable
  # parameters of the model. Internally, the parameters of each Module are stored
  # in Tensors with requires_grad=True, so this call will compute gradients for
  # all learnable parameters in the model.
  loss.backward()

  # Update the weights using gradient descent. Each parameter is a Tensor, so
  # we can access its data and gradients like we did before.
  with torch.no_grad():
    for param in model.parameters():
      param.data -= learning_rate * param.grad

PyTorch: optim

    到目前爲止,我們通過手動改變持有可學習參數的張量來更新模型的權重。對於像隨機梯度下降這樣的簡單優化算法來說,這不是一個巨大的負擔,但在實踐中,我們經常使用AdaGrad,RMSProp,Adam等更復雜的優化算法訓練神經網絡。

    PyTorch中的optim包抽象出優化算法的思想,並提供常用優化算法的實現。其採用很多高級方法來獲取更優的梯度方向並更新梯度。

    在這個例子中,我們將像以前一樣使用nn包定義我們的模型,但我們將使用optim包提供的Adam算法優化模型:

# Code in file nn/two_layer_net_optim.py
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs.
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        )
loss_fn = torch.nn.MSELoss(size_average=False)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
  # Forward pass: compute predicted y by passing x to the model.
  y_pred = model(x)

  # Compute and print loss.
  loss = loss_fn(y_pred, y)
  print(t, loss.item())
  
  # Before the backward pass, use the optimizer object to zero all of the
  # gradients for the Tensors it will update (which are the learnable weights
  # of the model)
  optimizer.zero_grad()

  # Backward pass: compute gradient of the loss with respect to model parameters
  loss.backward()

  # Calling the step function on an Optimizer makes an update to its parameters
  optimizer.step()

PyTorch: Custom nn Modules

    有時需要指定比現有模塊序列更復雜的模型;對於這些情況,可以通過子類化nn.Module定義自己的模塊,並定義一個接收輸入Tensor的forward,並使用其他模塊或Tensor上的其他autograd操作生成輸出Tensor。

    在這個例子中,我們將以自定義Module子類的形式實現2層網絡:

# Code in file nn/two_layer_net_module.py
import torch

class TwoLayerNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    """
    In the constructor we instantiate two nn.Linear modules and assign them as
    member variables.
    """
    super(TwoLayerNet, self).__init__()
    self.linear1 = torch.nn.Linear(D_in, H)
    self.linear2 = torch.nn.Linear(H, D_out)

  def forward(self, x):
    """
    In the forward function we accept a Tensor of input data and we must return
    a Tensor of output data. We can use Modules defined in the constructor as
    well as arbitrary (differentiable) operations on Tensors.
    """
    h_relu = self.linear1(x).clamp(min=0)
    y_pred = self.linear2(h_relu)
    return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above.
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
loss_fn = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
  # Forward pass: Compute predicted y by passing x to the model
  y_pred = model(x)

  # Compute and print loss
  loss = loss_fn(y_pred, y)
  print(t, loss.item())

  # Zero gradients, perform a backward pass, and update the weights.
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

PyTorch: Control Flow + Weight Sharing

    作爲動態圖和權重共享的一個例子,我們實現了一個非常奇怪的模型:一個完全連接的ReLU網絡,在每個前向傳遞中選擇1到4之間的隨機數並使用那麼多隱藏層,重複使用相同的權重多次 計算最裏面的隱藏層。

    對於這個模型,可以使用普通的Python流控制來實現循環,並且我們可以通過在定義正向傳遞時多次重複使用相同的模塊來實現最內層之間的權重共享。

    我們可以輕鬆的將這個模型實現成一個模塊子類:

# Code in file nn/dynamic_net.py
import random
import torch

class DynamicNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    """
    In the constructor we construct three nn.Linear instances that we will use
    in the forward pass.
    """
    super(DynamicNet, self).__init__()
    self.input_linear = torch.nn.Linear(D_in, H)
    self.middle_linear = torch.nn.Linear(H, H)
    self.output_linear = torch.nn.Linear(H, D_out)

  def forward(self, x):
    """
    For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
    and reuse the middle_linear Module that many times to compute hidden layer
    representations.

    Since each forward pass builds a dynamic computation graph, we can use normal
    Python control-flow operators like loops or conditional statements when
    defining the forward pass of the model.

    Here we also see that it is perfectly safe to reuse the same Module many
    times when defining a computational graph. This is a big improvement from Lua
    Torch, where each Module could be used only once.
    """
    h_relu = self.input_linear(x).clamp(min=0)
    for _ in range(random.randint(0, 3)):
      h_relu = self.middle_linear(h_relu).clamp(min=0)
    y_pred = self.output_linear(h_relu)
    return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs.
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
  # Forward pass: Compute predicted y by passing x to the model
  y_pred = model(x)

  # Compute and print loss
  loss = criterion(y_pred, y)
  print(t, loss.item())

  # Zero gradients, perform a backward pass, and update the weights.
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章