Non-Local neural networks
PDF: https://arxiv.org/pdf/1711.07971.pdf
PyTorch代碼: https://github.com/shanglianlm0525/PyTorch-Networks
Non-Local Neural Network和Non-Local Means非局部均值去噪濾波有點相似。普通的濾波都是3×3的卷積核,然後在整個圖片上進行移動,處理的是3×3局部的信息。Non-Local Means操作則是結合了一個比較大的搜索範圍,並進行加權。
1 概述
- non-local operations通過計算任意兩個位置之間的交互直接捕捉遠程依賴,而不用侷限於相鄰點,其相當於構造了一個和特徵圖譜尺寸一樣大的卷積核, 從而可以維持更多信息。
- non-local可以作爲一個組件,和其它網絡結構結合,用於其他視覺任務中。
- Non-local在視頻分類上效果可觀
2 Non-local operation
Non-local 操作可以表示爲
其中
g函數是一個線性轉換
f函數用於計算i和j相似度的函數, 文中列舉中四種具體實現
Gaussian:
Embedded Gaussian:
Dot product:
Concatenation:
彙總起來就是
3 Non-local block
3-1 抽象圖
3-2 細節圖
4 Ablations
- a 使用non-local對baseline結果是有提升的,但是不同相似度計算方法之間差距並不大
- b non-local加入網絡的不同stage下性能都有提升,但是對較小的feature map提升不大
- c 添加越多的non-local 模塊,效果提升越明顯,但是會增大計算量
- d 同時在時域和空域上加入non-local 操作效果會最好
PyTorch代碼:
import torch
import torch.nn as nn
import torchvision
class NonLocalBlock(nn.Module):
def __init__(self, channel):
super(NonLocalBlock, self).__init__()
self.inter_channel = channel // 2
self.conv_phi = nn.Conv2d(in_channels=channel, out_channels=self.inter_channel, kernel_size=1, stride=1,padding=0, bias=False)
self.conv_theta = nn.Conv2d(in_channels=channel, out_channels=self.inter_channel, kernel_size=1, stride=1, padding=0, bias=False)
self.conv_g = nn.Conv2d(in_channels=channel, out_channels=self.inter_channel, kernel_size=1, stride=1, padding=0, bias=False)
self.softmax = nn.Softmax(dim=1)
self.conv_mask = nn.Conv2d(in_channels=self.inter_channel, out_channels=channel, kernel_size=1, stride=1, padding=0, bias=False)
def forward(self, x):
# [N, C, H , W]
b, c, h, w = x.size()
# [N, C/2, H * W]
x_phi = self.conv_phi(x).view(b, c, -1)
# [N, H * W, C/2]
x_theta = self.conv_theta(x).view(b, c, -1).permute(0, 2, 1).contiguous()
x_g = self.conv_g(x).view(b, c, -1).permute(0, 2, 1).contiguous()
# [N, H * W, H * W]
mul_theta_phi = torch.matmul(x_theta, x_phi)
mul_theta_phi = self.softmax(mul_theta_phi)
# [N, H * W, C/2]
mul_theta_phi_g = torch.matmul(mul_theta_phi, x_g)
# [N, C/2, H, W]
mul_theta_phi_g = mul_theta_phi_g.permute(0,2,1).contiguous().view(b,self.inter_channel, h, w)
# [N, C, H , W]
mask = self.conv_mask(mul_theta_phi_g)
out = mask + x
return out
if __name__=='__main__':
model = NonLocalBlock(channel=16)
print(model)
input = torch.randn(1, 16, 64, 64)
out = model(input)
print(out.shape)