代碼文檔:https://github.com/lartpang/mypython/blob/master/2019-09-25%E8%AE%A1%E7%AE%97%E5%B1%80%E9%83%A8%E7%9B%B8%E5%85%B3%E6%80%A7%E7%9F%A9%E9%98%B5/%E8%AE%A1%E7%AE%97%E5%B1%80%E9%83%A8%E7%9B%B8%E5%85%B3%E6%80%A7.ipynb
問題說明
對於給定的數據,其尺寸爲N,C,H,W,現在想要計算其局部的相關性,也就是說特定尺寸範圍內,例如2x2大小的區域內任意兩點之間的點積。
試寫出相關的代碼。
問題分析
計算局部相關性,而且這裏也提到是說使用局部的區域的任意兩點之間的點積來計算,所以實際上也就是需要就算對應的2*2
範圍內的任意兩個C維矢量的點積,最終得到一個4*4
的關係矩陣。若是在矢量點積的時候,除以各自的模,那麼實際上計算的就是兩個矢量的餘弦距離。
餘弦相似度用向量空間中兩個向量夾角的餘弦值作爲衡量兩個個體間差異的大小。相比距離度量,餘弦相似度更加註重兩個向量在方向上的差異,而非距離或長度上。
https://blog.csdn.net/weixin_38659482/article/details/85045537
所以“除以模”這個歸一化操作放在點積之前。實際上也就是除以沿着C維度計算的L2範數。
最直接的思路是簡單的遍歷計算,但是不實際,太耗時。如何能夠利用GPU並行計算的優勢,那自然是使用矩陣操作。
對於已有的N,C,H,W
的數據,我們需要計算點積,對於三維以上的點積,可以使用torch.matmul
,這時,乘法發生在最右側的幾個維度上。
可以構想,我們最終得到的結果應該是N * NumOfRegions * 4 * 4
大小的一個張量。而這裏的NumOfRegions
表示總的計算了的區域的數量。對於pytorch,我所知道的可以收集區域數據,而且沒有其他多餘操作的方法只有torch.nn.Unfold
。所以這裏使用它來實現這個過程。
實現過程
對於矩陣乘法,思考的最簡單的方式就是維度匹配。
import torch
import torch.nn as nn
a = torch.rand(1, 2, 3, 4)
b = torch.rand(1, 2, 3, 4)
print("a=>\n", a)
print("b=>\n", b)
a=>
tensor([[[[0.4818, 0.9888, 0.8039, 0.7089],
[0.7667, 0.2273, 0.9956, 0.4739],
[0.9515, 0.1896, 0.7928, 0.0173]],
[[0.1723, 0.8767, 0.4832, 0.6515],
[0.9487, 0.6301, 0.5711, 0.7781],
[0.2017, 0.9220, 0.2793, 0.2675]]]])
b=>
tensor([[[[0.1417, 0.3510, 0.1170, 0.1698],
[0.4311, 0.1535, 0.6087, 0.6646],
[0.1880, 0.4103, 0.0289, 0.1094]],
[[0.3398, 0.8751, 0.8299, 0.3514],
[0.0333, 0.2831, 0.8086, 0.0514],
[0.3168, 0.2895, 0.5107, 0.4949]]]])
這裏先定義兩個tensor,二者實際上沒有關係,後面的計算也沒有關係,只是爲了多展示一點。
unfold_func = nn.Unfold(2, 1, 0, 1)
unfold_a = unfold_func(a)
print("unfold_a=>\n", unfold_a)
unfold_b = unfold_func(b)
print("unfold_b=>\n", unfold_b)
unfold_a=>
tensor([[[0.4818, 0.9888, 0.8039, 0.7667, 0.2273, 0.9956],
[0.9888, 0.8039, 0.7089, 0.2273, 0.9956, 0.4739],
[0.7667, 0.2273, 0.9956, 0.9515, 0.1896, 0.7928],
[0.2273, 0.9956, 0.4739, 0.1896, 0.7928, 0.0173],
[0.1723, 0.8767, 0.4832, 0.9487, 0.6301, 0.5711],
[0.8767, 0.4832, 0.6515, 0.6301, 0.5711, 0.7781],
[0.9487, 0.6301, 0.5711, 0.2017, 0.9220, 0.2793],
[0.6301, 0.5711, 0.7781, 0.9220, 0.2793, 0.2675]]])
unfold_b=>
tensor([[[0.1417, 0.3510, 0.1170, 0.4311, 0.1535, 0.6087],
[0.3510, 0.1170, 0.1698, 0.1535, 0.6087, 0.6646],
[0.4311, 0.1535, 0.6087, 0.1880, 0.4103, 0.0289],
[0.1535, 0.6087, 0.6646, 0.4103, 0.0289, 0.1094],
[0.3398, 0.8751, 0.8299, 0.0333, 0.2831, 0.8086],
[0.8751, 0.8299, 0.3514, 0.2831, 0.8086, 0.0514],
[0.0333, 0.2831, 0.8086, 0.3168, 0.2895, 0.5107],
[0.2831, 0.8086, 0.0514, 0.2895, 0.5107, 0.4949]]])
這裏使用fold和unfold操作之後可以看出來,外側的括號從原來的四層變爲了現在的三層,實際上表示的就是從原來的N,C,H,W
變成了現在的N,C*4,H/2*W/2
的樣子。
而對於H/2*W/2
的維度上,在滑窗處理時,也是基於行主序調整成一行的。
unfold_a_reshape = unfold_a.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4) # N,H'W',C,2*2
print("unfold_a_reshape=>\n", unfold_a_reshape)
unfold_b_reshape = unfold_b.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4)
print("unfold_b_reshape=>\n", unfold_b_reshape)
unfold_a_reshape=>
tensor([[[[0.4818, 0.9888, 0.7667, 0.2273],
[0.1723, 0.8767, 0.9487, 0.6301]],
[[0.9888, 0.8039, 0.2273, 0.9956],
[0.8767, 0.4832, 0.6301, 0.5711]],
[[0.8039, 0.7089, 0.9956, 0.4739],
[0.4832, 0.6515, 0.5711, 0.7781]],
[[0.7667, 0.2273, 0.9515, 0.1896],
[0.9487, 0.6301, 0.2017, 0.9220]],
[[0.2273, 0.9956, 0.1896, 0.7928],
[0.6301, 0.5711, 0.9220, 0.2793]],
[[0.9956, 0.4739, 0.7928, 0.0173],
[0.5711, 0.7781, 0.2793, 0.2675]]]])
unfold_b_reshape=>
tensor([[[[0.1417, 0.3510, 0.4311, 0.1535],
[0.3398, 0.8751, 0.0333, 0.2831]],
[[0.3510, 0.1170, 0.1535, 0.6087],
[0.8751, 0.8299, 0.2831, 0.8086]],
[[0.1170, 0.1698, 0.6087, 0.6646],
[0.8299, 0.3514, 0.8086, 0.0514]],
[[0.4311, 0.1535, 0.1880, 0.4103],
[0.0333, 0.2831, 0.3168, 0.2895]],
[[0.1535, 0.6087, 0.4103, 0.0289],
[0.2831, 0.8086, 0.2895, 0.5107]],
[[0.6087, 0.6646, 0.0289, 0.1094],
[0.8086, 0.0514, 0.5107, 0.4949]]]])
這裏調整一下形狀,這裏可以根據維度匹配的思想進行連接,這裏就是爲了方便通過後面的矩陣乘法實現對於區域內任意點關係的描述矩陣的構造。
mm_unfold_a = torch.matmul(unfold_a_reshape.transpose(2, 3), unfold_a_reshape) # N,H'W',2*2,2*2
print("mm_unfold_a=>\n", mm_unfold_a)
mm_unfold_b = torch.matmul(unfold_b_reshape.transpose(2, 3), unfold_b_reshape)
print("mm_unfold_b=>\n", mm_unfold_b)
mm_unfold_a=>
tensor([[[[0.2619, 0.6275, 0.5329, 0.2181],
[0.6275, 1.7462, 1.5898, 0.7771],
[0.5329, 1.5898, 1.4878, 0.7720],
[0.2181, 0.7771, 0.7720, 0.4487]],
[[1.7462, 1.2184, 0.7771, 1.4851],
[1.2184, 0.8796, 0.4871, 1.0763],
[0.7771, 0.4871, 0.4487, 0.5862],
[1.4851, 1.0763, 0.5862, 1.3174]],
[[0.8796, 0.8847, 1.0763, 0.7569],
[0.8847, 0.9270, 1.0779, 0.8429],
[1.0763, 1.0779, 1.3174, 0.9163],
[0.7569, 0.8429, 0.9163, 0.8301]],
[[1.4878, 0.7720, 0.9209, 1.0200],
[0.7720, 0.4487, 0.3433, 0.6240],
[0.9209, 0.3433, 0.9459, 0.3664],
[1.0200, 0.6240, 0.3664, 0.8860]],
[[0.4487, 0.5862, 0.6240, 0.3562],
[0.5862, 1.3174, 0.7153, 0.9488],
[0.6240, 0.7153, 0.8860, 0.4078],
[0.3562, 0.9488, 0.4078, 0.7065]],
[[1.3174, 0.9163, 0.9488, 0.1700],
[0.9163, 0.8301, 0.5930, 0.2164],
[0.9488, 0.5930, 0.7065, 0.0884],
[0.1700, 0.2164, 0.0884, 0.0719]]]])
mm_unfold_b=>
tensor([[[[0.1355, 0.3471, 0.0724, 0.1180],
[0.3471, 0.8891, 0.1805, 0.3017],
[0.0724, 0.1805, 0.1869, 0.0756],
[0.1180, 0.3017, 0.0756, 0.1037]],
[[0.8891, 0.7674, 0.3017, 0.9213],
[0.7674, 0.7025, 0.2530, 0.7424],
[0.3017, 0.2530, 0.1037, 0.3224],
[0.9213, 0.7424, 0.3224, 1.0244]],
[[0.7025, 0.3115, 0.7424, 0.1204],
[0.3115, 0.1523, 0.3875, 0.1309],
[0.7424, 0.3875, 1.0244, 0.4461],
[0.1204, 0.1309, 0.4461, 0.4443]],
[[0.1869, 0.0756, 0.0916, 0.1865],
[0.0756, 0.1037, 0.1186, 0.1450],
[0.0916, 0.1186, 0.1357, 0.1689],
[0.1865, 0.1450, 0.1689, 0.2522]],
[[0.1037, 0.3224, 0.1450, 0.1490],
[0.3224, 1.0244, 0.4839, 0.4306],
[0.1450, 0.4839, 0.2522, 0.1597],
[0.1490, 0.4306, 0.1597, 0.2616]],
[[1.0244, 0.4461, 0.4306, 0.4668],
[0.4461, 0.4443, 0.0455, 0.0982],
[0.4306, 0.0455, 0.2616, 0.2559],
[0.4668, 0.0982, 0.2559, 0.2569]]]])
這裏計算了乘法,實際上結果計算出來的就是對應的關係矩陣。這裏結果的尺寸爲N, NumOfRegion, 2*2, 2*2
。(這裏沒有計算範數,實際上應該除以範數)
a_ = a[0, :2, :2, :2]
b_ = b[0, :2, :2, :2]
print(a_.shape, b_.shape)
a_ = a_.reshape(1, 2, 2*2) # N,C,2*2
b_ = b_.reshape(1, 2, 2*2)
print("torch.matmul(a_.t, a_)=>\n", torch.matmul(a_.transpose(1, 2), a_))
print("torch.matmul(b_.t, b_)=>\n", torch.matmul(b_.transpose(1, 2), b_))
print(torch.matmul(a_.transpose(1, 2), a_)[0] == mm_unfold_a[0, 0])
print(torch.matmul(b_.transpose(1, 2), b_)[0] == mm_unfold_b[0, 0])
torch.Size([2, 2, 2]) torch.Size([2, 2, 2])
torch.matmul(a_.t, a_)=>
tensor([[[0.2619, 0.6275, 0.5329, 0.2181],
[0.6275, 1.7462, 1.5898, 0.7771],
[0.5329, 1.5898, 1.4878, 0.7720],
[0.2181, 0.7771, 0.7720, 0.4487]]])
torch.matmul(b_.t, b_)=>
tensor([[[0.1355, 0.3471, 0.0724, 0.1180],
[0.3471, 0.8891, 0.1805, 0.3017],
[0.0724, 0.1805, 0.1869, 0.0756],
[0.1180, 0.3017, 0.0756, 0.1037]]])
tensor([[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, True, True, True]])
tensor([[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, True, True, True]])
從這裏可以看出來,通過fold、reshape(view)、matmul
實現了對於N,C,H,W
形狀的數據的局部(這裏對應爲滑窗操作的kernel_size
)關聯矩陣的計算,而且速度又快(相較於最原始樸素的“滑窗式”計算方法)。
對於運算過程代碼的書寫,這裏驗證了一個想法,簡單的按照矩陣的維度匹配的原則,是可以直接寫出來這個局部關係矩陣的:
N,C,H,W --(Ws*Ws)-->
N,C*Ws*Ws,H/Ws*W/Ws -->
N,H/Ws*W/Ws,C*Ws*Ws -->
N,H/Ws*W/Ws,C*Ws*Ws -->
N,H/Ws*W/Ws,C,Ws*Ws -->
N,H/Ws*W/Ws,Ws*Ws,Ws*Ws
這裏的H/Ws*W/Ws
實際上反映出來的是分塊的數量,這裏直接使用除法對應的是滑窗大小正好可以被數據長寬整除,同時步長等於滑窗大小,沒有padding
的情況。
前面給出的代碼中可以看出來,這裏的值對於步長爲1的時候,是需要進行調整的。
unfold_func = nn.Unfold(2, 1, 0, 1)
...
unfold_a_reshape = unfold_a.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4)