2 PyTorch的autograd

%matplotlib inline

Autograd: 自动求导

autograd是PyTorchs搭建神经网络的核心

Tensor

自动求导的过程:

将Tensor的属性 .requires_grad设置为True
Tensor的计算过程中,Tensor的.grad_fn属性将自动记录(用户创建的Tensor该属性为None)
完成计算后,可以调用计算结果的标量.backward()自动计算所有梯度
该张量的梯度将累加到.grad属性中

import torch

创建一个张量并设置requires_grad=True为跟踪张量

x = torch.ones(2, 2, requires_grad=True)
print(x)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

进行张量运算:

y = x + 2
print(y)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

y是由于操作而创建的,因此具有grad_fn

print(y.grad_fn)
<AddBackward0 object at 0x0000020B017AE978>
z = y * y * 3
out = z.mean()

print(z, out)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

.requires_grad_( ... ) 就地更改现有Tensor的标志。如果未给出输入标志,则默认为False

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
False
True
<SumBackward0 object at 0x0000020B017B3940>

梯度

因为out包含单个标量,out.backward()所以等效于out.backward(torch.tensor(1.))。

out.backward()

打印渐变 d(out)dx\dfrac{d(out)}{dx}

print(x.grad)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

o=14izio = \dfrac{1}{4}\sum_i z_i
zi=3(xi+2)2z_i = 3(x_i+2)^2 and zixi=1=27z_i\bigr\rvert_{x_i=1} = 27.

oxi=32(xi+2)\dfrac{\partial o}{\partial x_i} = \dfrac{3}{2}(x_i+2), hence
oxixi=1=92=4.5\dfrac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \dfrac{9}{2} = 4.5.

如果具有向量值函数 y=f(x)\vec{y}=f(\vec{x}),
雅可比矩阵:

\begin{align}J=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\end{align}

torch.autograd 用于计算向量雅可比积的引擎。
给定任何向量v=(v1v2vm)Tv=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T},
计算 vTJv^{T}\cdot J.
l=g(y)l=g\left(\vec{y}\right),
v=(ly1lym)Tv=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T},

\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\left(\begin{array}{c}
\frac{\partial l}{\partial y_{1}}\
\vdots\
\frac{\partial l}{\partial y_{m}}
\end{array}\right)=\left(\begin{array}{c}
\frac{\partial l}{\partial x_{1}}\
\vdots\
\frac{\partial l}{\partial x_{n}}
\end{array}\right)\end{align}

矢量-雅可比积的这一特征使得将外部梯度输入具有非标量输出的模型变得非常方便。

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)
tensor([971.0132, 473.3496, 795.3892], grad_fn=<MulBackward0>)

\color{#FF0000}{注意}

此时y不是标量,若直接计算y.backward(),会出现RuntimeError: grad can be implicitly created only for scalar outputs错误。
对于y这种张量,要想求对应的雅可比矩阵,可以在y.backward()直接传入一个v参数。
v = torch.tensor([1,1,1], dtype=torch.float),可求出对应的雅可比矩阵。
这也是为什么y.backward()相当于y.backward(torch.tensor([1])

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])

with torch.no_grad():区块里,会自动使.requires_grad=True失效

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)
True
True
False

.detach() 创建新具有相同内容的Tensor,使其.requires_grad=False:

print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())
True
False
tensor(True)

阅读:

有关文档autograd.Function,请访问
https://pytorch.org/docs/stable/autograd.html#function

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章