- 影響卷積神經網絡的幾大因素:
- Depth: VGG, ResNet
- Width: GoogLeNet
- Cardinality: Xception, ResNeXt
- Attention:channel attention, spatial attention
- Attention在人類感知系統中扮演了重要角色,人類視覺系統的一大重要性質是人類並不是試圖一次處理完整個場景,與此相反,爲了更好地捕捉視覺結構,人類利用一系列的局部瞥見,選擇性地聚焦於突出的部分。
CBAM其實就是順序進行channel attention和spatial attention:
- Channel attention: focus on what feature map is meaningful; 全連接層是使用卷積核=1的卷積實現的
- Spatial attention:focus on where is an informative part;沿channel 軸的求均值操作
Attention和fature map是元素級別的相乘,相乘時會自動進行broadcast(copy)操作,即channel attention沿着spatial維度廣播,spatial attention沿着channel維度廣播
class ChannelAttention(nn.Module):
def __init__(self, in_planes, ratio=16):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.fc1 = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
self.relu1 = nn.ReLU()
self.fc2 = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
out = avg_out + max_out
return self.sigmoid(out)
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()
assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
padding = 3 if kernel_size == 7 else 1
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avg_out, max_out], dim=1)
x = self.conv1(x)
return self.sigmoid(x)
參考代碼:https://github.com/luuuyi/CBAM.PyTorch/blob/master/model/resnet_cbam.py