pytorch中張量對張量的梯度求解:backward方法的gradient參數詳解

原創

阿喵酱紫糖

2020-05-10 02:32

一、問題起源：
閱讀《python深度學習：基於pytorch》這本書的tensor 與Autograd一節時，看到2.5.4非標量反向傳播

二、疑點在於：

backward(gradient=)這一參數沒有理由的就選擇了(1,1)，之後調整爲(1,0)和(0,1)便能正確求解，對該參數的設置原理，反向梯度求解的過程沒有說清楚。

三、分析過程：

看完博客https://www.cnblogs.com/zhouyang209117/p/11023160.html

便確認gradient這一參數的設置應該是根據使用者的需求，比如說設置爲(1,0)就正好求得y1對x的導數。

但是兩者的求解過程不一致，書中=J的轉置*V的轉置，但是博客中假設的是V*J ，其實解釋的原理都是一樣的，就是希望設置V來分別在y的不同分量上求得對x的導數，然後組合向量，獲得最終y對於x的導數。

雖然求解過程，但各偏導值是一致的。

書中x.grad = [[6,9]],但是博客中只有x1,x2,x3的梯度值，即便將代碼改爲對x求導，結果x.grad=None

x = torch.tensor([1,2,3],requires_grad=True, dtype=torch.float)

y.backward(torch.tensor([1, 1, 1], dtype=torch.float))

print(x.grad) # None 因爲y賦值的時候是通過x1,x2,x3標量賦值的，與x無關。

如果改成以下代碼：

y[0] = x[0]*x[1]*x[2]

y[1] = x[0]+x[1]+x[2]

y[2] = x[0]+x[1]*x[2]

y.backward(torch.tensor([1, 1, 1], dtype=torch.float))

print(x.grad) #tensor([8., 7., 5.])

則結果爲tensor([8., 7., 5.]),梯度值shape爲(1,3)，與初始值x的shape保持一致

書中雖然計算過程爲J的轉置*V的轉置，結果爲列向量(6,9)，但是輸出值是[[6,9]]，爲行向量。所以經過驗證可知，實際的計算過程是gradient*Jacobian 。

另外check了官網的doc描述如下：其實對參數grad_tensors的描述並不是很清晰的，只提到了jacobian-voector product。如果有get到了這段話描述的具體計算過程的，或者與我理解的有不符合的歡迎交流。就沒有去check源代碼了，因爲兩個測試代碼段的計算結果和我以及上述博客的grad_tensors*jacobian是一致的。

書中原始代碼塊如下：

博客代碼塊如下，這個對A的假設是很好地，但是我們實際求的還是y對於x的偏導數，所以要構造A對y的偏導矩陣v與y對於x的偏導矩陣J 來做點積，這樣實際求得的分量就是y對於x的偏導數：

# coding utf-8
import torch

x1 = torch.tensor(1, requires_grad=True, dtype=torch.float)
x2 = torch.tensor(2, requires_grad=True, dtype=torch.float)
x3 = torch.tensor(3, requires_grad=True, dtype=torch.float)
y = torch.randn(3)
y[0] = x1 * x2 * x3
y[1] = x1 + x2 + x3
y[2] = x1 + x2 * x3
x = torch.tensor([x1, x2, x3])
y.backward(torch.tensor([0.1, 0.2, 0.3], dtype=torch.float))
print(x1.grad)
print(x2.grad)
print(x3.grad)

我的測試代碼塊如下:

# coding utf-8
import torch

#x1 = torch.tensor(1, requires_grad=True, dtype=torch.float)
#x2 = torch.tensor(2, requires_grad=True, dtype=torch.float)
#x3 = torch.tensor(3, requires_grad=True, dtype=torch.float)
x = torch.tensor([1,2,3],requires_grad=True, dtype=torch.float)

y = torch.randn(3)
#y[0] = x1 * x2 * x3
#y[1] = x1 + x2 + x3
#y[2] = x1 + x2 * x3
#x = torch.tensor([x1, x2, x3],requires_grad=True)

y[0] = x[0]*x[1]*x[2]
y[1] = x[0]+x[1]+x[2]
y[2] = x[0]+x[1]*x[2]
y.backward(torch.tensor([1, 1, 1], dtype=torch.float)) #只能對float類型求梯度
#print(x1.grad)
#print(x2.grad)
#print(x3.grad)
print(x.grad)  #tensor([8., 7., 5.])

x = torch.tensor([[2,3]])
print(x.shape) #torch.Size([1, 2])

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pytorch中張量對張量的梯度求解:backward方法的gradient參數詳解

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

圖神經網絡GNN

mark 基礎知識書籍列表-深度學習 tensorflow

隱馬爾科夫模型的相關知識點

pytorch的conv2d函數groups分組卷積使用及理解

全連接層的直觀理解

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結