文章目錄

一、多層感知機的基本知識

深度學習主要關注多層模型。在這裏，我們將以多層感知機（multilayer perceptron，MLP）爲例，介紹多層神經網絡的概念。

1.1 隱藏層

下圖展示了一個多層感知機的神經網絡圖，它含有一個隱藏層，該層中有5個隱藏單元。

1.2 表達公式

具體來說，給定一個小批量樣本 $\textbf{X}\in R^{n\times d}$ 其批量大小爲n，輸入個數爲d。假設多層感知機只有一個隱藏層，其中隱藏單元個數爲h。記隱藏層的輸出（也稱爲隱藏層變量或隱藏變量）爲 $\textbf{H}$ ,有 $\textbf{H}\in R^{n\times h}$ 。因爲隱藏層和輸出層均是全連接層，可以設隱藏層的權重參數和偏差參數分別爲 $\textbf{W}_{h}\in R^{d\times h}$ 和 $\textbf{b}_{h}\in R^{1\times h}$ ,輸出層的權重和偏差參數分別爲 $\textbf{W}_{o}\in R^{h\times q}$ 和 $\textbf{b}_{o}\in R^{1\times q}$ .

我們先來看一種含單隱藏層的多層感知機的設計。其輸出 $\textbf{O}\in R^{n\times q}$ 的計算爲

也就是將隱藏層的輸出直接作爲輸出層的輸入。如果將以上兩個式子聯立起來，可以得到

從聯立後的式子可以看出，雖然神經網絡引入了隱藏層，卻依然等價於一個單層神經網絡：其中輸出層權重參數爲 $\textbf{W}_{h}\textbf{W}_{o}$ ，偏差參數爲 $\textbf{b}_{h}\textbf{W}_{o}+\textbf{b}_{o}$ 。不難發現，即便再添加更多的隱藏層，以上設計依然只能與僅含輸出層的單層神經網絡等價。

1.3 激活函數

上述問題的根源在於全連接層只是對數據做仿射變換（affine transformation），而多個仿射變換的疊加仍然是一個仿射變換。解決問題的一個方法是引入非線性變換，例如對隱藏變量使用按元素運算的非線性函數進行變換，然後再作爲下一個全連接層的輸入。這個非線性函數被稱爲激活函數（activation function）。

下面我們介紹幾個常用的激活函數：

1、ReLU函數

ReLU（rectified linear unit）函數提供了一個很簡單的非線性變換。給定元素 $x$ ，該函數定義爲

可以看出，ReLU函數只保留正數元素，並將負數元素清零。爲了直觀地觀察這一非線性變換，我們先定義一個繪圖函數xyplot。

%matplotlib inline
import torch
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
print(torch.__version__)

1.3.0

def xyplot(x_vals, y_vals, name):
    # d2l.set_figsize(figsize=(5, 2.5))
    plt.plot(x_vals.detach().numpy(), y_vals.detach().numpy())
    plt.xlabel('x')
    plt.ylabel(name + '(x)')

x = torch.arange(-8.0, 8.0, 0.1, requires_grad=True)
y = x.relu()
xyplot(x, y, 'relu')

y.sum().backward()
xyplot(x, x.grad, 'grad of relu')

2、Sigmoid函數

sigmoid函數可以將元素的值變換到0和1之間：

y = x.sigmoid()
xyplot(x, y, 'sigmoid')

依據鏈式法則，sigmoid函數的導數

下面繪製了sigmoid函數的導數。當輸入爲0時，sigmoid函數的導數達到最大值0.25；當輸入越偏離0時，sigmoid函數的導數越接近0。

x.grad.zero_()
y.sum().backward()
xyplot(x, x.grad, 'grad of sigmoid')

1.4 多層感知機

多層感知機就是含有至少一個隱藏層的由全連接層組成的神經網絡，且每個隱藏層的輸出通過激活函數進行變換。多層感知機的層數和各隱藏層中隱藏單元個數都是超參數。以單隱藏層爲例並沿用本節之前定義的符號，多層感知機按以下方式計算輸出：

其中 $\phi$ 表示激活函數。

二、多層感知機從零開始的實現

import torch
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
print(torch.__version__)

1.3.0

2.1 獲取訓練集

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,root='/home/kesci/input/FashionMNIST2065')

2.2 定義模型參數

num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
b1 = torch.zeros(num_hiddens, dtype=torch.float)
W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
b2 = torch.zeros(num_outputs, dtype=torch.float)

params = [W1, b1, W2, b2]
for param in params:
    param.requires_grad_(requires_grad=True)

2.3 定義激活函數

def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))

2.4 定義網絡

def net(X):
    X = X.view((-1, num_inputs))
    H = relu(torch.matmul(X, W1) + b1)
    return torch.matmul(H, W2) + b2

2.5 定義損失函數

loss = torch.nn.CrossEntropyLoss()

2.6 訓練

num_epochs, lr = 5, 100.0
# def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
#               params=None, lr=None, optimizer=None):
#     for epoch in range(num_epochs):
#         train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
#         for X, y in train_iter:
#             y_hat = net(X)
#             l = loss(y_hat, y).sum()
#             
#             # 梯度清零
#             if optimizer is not None:
#                 optimizer.zero_grad()
#             elif params is not None and params[0].grad is not None:
#                 for param in params:
#                     param.grad.data.zero_()
#            
#             l.backward()
#             if optimizer is None:
#                 d2l.sgd(params, lr, batch_size)
#             else:
#                 optimizer.step()  # “softmax迴歸的簡潔實現”一節將用到
#             
#             
#             train_l_sum += l.item()
#             train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
#             n += y.shape[0]
#         test_acc = evaluate_accuracy(test_iter, net)
#         print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
#               % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

epoch 1, loss 0.0030, train acc 0.712, test acc 0.806
epoch 2, loss 0.0019, train acc 0.821, test acc 0.806
epoch 3, loss 0.0017, train acc 0.847, test acc 0.825
epoch 4, loss 0.0015, train acc 0.856, test acc 0.834
epoch 5, loss 0.0015, train acc 0.863, test acc 0.847

三、多層感知機pytorch實現

import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)

3.1 初始化模型和各個參數

num_inputs, num_outputs, num_hiddens = 784, 10, 256
    
net = nn.Sequential(
        d2l.FlattenLayer(),
        nn.Linear(num_inputs, num_hiddens),
        nn.ReLU(),
        nn.Linear(num_hiddens, num_outputs), 
        )
    
for params in net.parameters():
    init.normal_(params, mean=0, std=0.01)

3.2 訓練

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,root='/home/kesci/input/FashionMNIST2065')
loss = torch.nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(net.parameters(), lr=0.5)

num_epochs = 5
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

epoch 1, loss 0.0031, train acc 0.701, test acc 0.774
epoch 2, loss 0.0019, train acc 0.821, test acc 0.806
epoch 3, loss 0.0017, train acc 0.841, test acc 0.805
epoch 4, loss 0.0015, train acc 0.855, test acc 0.834
epoch 5, loss 0.0014, train acc 0.866, test acc 0.840

陳小蝦

發佈了90 篇原創文章 · 獲贊 37 · 訪問量 1萬+

私信關注

《動手學深度學習》Day3:多層感知機

文章目錄

一、多層感知機的基本知識

1.1 隱藏層

1.2 表達公式

1.3 激活函數

1.4 多層感知機

二、多層感知機從零開始的實現

2.1 獲取訓練集

2.2 定義模型參數

2.3 定義激活函數

2.4 定義網絡

2.5 定義損失函數

2.6 訓練

三、多層感知機pytorch實現

3.1 初始化模型和各個參數

3.2 訓練

TDengine docker安裝方法

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

Navicat安裝與激活教程

leetcode230:二叉搜索樹中第K小的元素

leetcode94:二叉樹的中序遍歷

leetcode872:葉子相似的樹

leetcode101:對稱二叉樹

leetcode814: 二叉樹剪枝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結