深度學習模型的訓練,就是不斷更新權值,權值的更新需要求解梯度,求解梯度十分繁瑣,PyTorch提供自動求導系統,我們只要搭建好前向傳播的計算圖,就能獲得所有張量的梯度。
torch.autograd.backward()
torch.autograd.backward(tensors,
grad_tensors=None,
retain_graph=None,
create_graph=False)
功能: 自動求取梯度
- tensors: 用於求導的張量,如 loss
- retain_graph : 保存計算圖,由於PyTorch採用動態圖機制,在每次反向傳播之後計算圖都會釋放掉,如果還想繼續使用,就要設置此參數爲True
- create_graph : 創建導數計算圖,用於高階求導
- grad_tensors:多梯度權重,當有多個loss需要計算梯度時,需要設置每個loss的權值
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
y.backward()
print(w.grad)
調試:
在 y.backward() 處設置斷點
點擊 step into 進入方法,可以看到方法中只有一行,說明 y.backward() 直接調用了torch.autograd.backward()
torch.autograd.backward(self, gradient, retain_graph, create_graph)
點擊單步調試 step over 返回 y.backward()
停止調試
多次執行y.backward()會報錯,因爲計算圖被釋放,解決方法是第一次反向傳播時 y.backward(retain_graph=True)
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
y.backward() # 正確寫法 y.backward(retain_graph=True)
y.backward() # 再執行一次反向傳播
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
grad_tensors參數的用法
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
a = torch.add(x, w)
b = torch.add(w, 1)
y0 = torch.mul(a, b)
y1 = torch.add(a, b)
loss = torch.cat([y0, y1], dim=0)
grad_t = torch.tensor([1., 2.])
loss.backward(gradient=grad_t)
print(w.grad)
tensor([9.])
說明:
torch.autograd.grad()
torch.autograd.grad(outputs,
inputs,
grad_outputs=None,
retain_graph=None,
create_graph=False)
功能: 求取梯度
- outputs: 用於求導的張量,如上例中的 loss
- inputs : 需要梯度的張量,如上例中的w
- create_graph : 創建導數計算圖,用於高階 求導
- retain_graph : 保存計算圖
- grad_outputs:多梯度權重
計算 的二階導數
x = torch.tensor([3.], requires_grad=True)
y = torch.pow(x, 2)
grad1 = torch.autograd.grad(y, x, create_graph=True) # create_graph=True 創建導數的計算圖,實現高階求導
print(grad1)
grad2 = torch.autograd.grad(grad1[0], x)
print(grad2)
(tensor([6.], grad_fn=<MulBackward0>),)
(tensor([2.]),)
小貼士:
- 梯度不自動清零,在每次反向傳播中會疊加
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
for i in range(3):
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
y.backward()
print(w.grad)
tensor([5.])
tensor([10.])
tensor([15.])
這導致我們得不到正確的結果,所以需要手動清零
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
for i in range(3):
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
y.backward()
print(w.grad)
w.grad.zero_() # 梯度清零
tensor([5.])
tensor([5.])
tensor([5.])
- 依賴於葉子結點的結點,requires_grad默認爲True
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
print(a.requires_grad, b.requires_grad, y.requires_grad)
True True True
- 葉子結點不可執行in-place,因爲前向傳播記錄了葉子節點的地址,反向傳播需要用到葉子節點的數據時,要根據地址尋找數據,執行in-place操作改變了地址中的數據,梯度求解也會發生錯誤。
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
a = torch.add(x, w)
b = torch.add(w, 1)
y = torch.mul(a, b)
w.add_(1)
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
in-place操作,即原位操作,在原始內存中改變這個數據,方法後接_代表in-place操作
a = torch.tensor([1])
print(id(a), a)
a = a + torch.tensor([1]) # 開闢了新的內存地址
print(id(a), a)
a += torch.tensor([1]) # in-place操作,地址不變
print(id(a), a)
3008015174696 tensor([1])
3008046791240 tensor([2])
3008046791240 tensor([3])