CAM實現的流程(pytorch)

之前寫了一個簡化版本(簡化版傳送門)的可視化過程,簡化版的可視化沒有考慮到通道之間的關係。這篇將介紹cam的流程。
下一篇爲Grad-Cam實現流程

流程圖

在這裏插入圖片描述

算法思路

  1. 將要可視化的圖片輸進網絡模型,判斷出所屬類別
  2. 獲取最後一個卷積層的輸出特徵圖
  3. 通過圖片所屬類別,得到權重,對獲取的特徵圖的各個通道賦值,並且相加爲單通道的特徵圖

舉個例子

如果輸入一張圖片,通過網絡模型之後,判斷這張圖片爲第500類(總共1000類)。獲取的特徵圖shape爲(1,512,13,13),假設分類層爲1 x 1卷積(這裏就不算是最後一個卷積層,而是屬於分類層)和全局平均池化組成。那麼,1000個類別有1000種權重,也就是說能夠給特徵圖賦1000種值。每個權重關注點不一樣,所以才需要知道圖片屬於哪個類別。知道它是500類後,那麼只需要拿出第500個類別的權重賦給特徵圖就ok了。
CAM算法有一個制約條件,需要用到全局平均池化的操作,如果最後有多層全連接層,那麼CAM算法就不適用了。比如vgg16,最後一個卷積層之後,接了三個全連接層,由於卷積層的輸出特徵圖需要flatten才能接入全連接層,在經過三個全連接層後,已經難以算出通道之間的聯繫,則很難去計算各個特徵圖通道的權重重要性。這種情況下就需要用到Grad-Cam算法了。

代碼分析

先準備圖片、標籤以及模型
類別標籤下載方法:
先安裝axel:
sudo apt-get install axel
執行下載命令
axel -n 5 https://s3.amazonaws.com/outcome-blog/imagenet/labels.json
圖片下載:
axel -n 5 http://media.mlive.com/news_impact/photo/9933031-large.jpg
模型下載:
senet1_1:axel -n 5 https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth
resnet18:axel -n 5 https://download.pytorch.org/models/resnet18-5c106cde.pth
densenet161: axel -n 5 https://download.pytorch.org/models/densenet161-8d451a50.pth

1.導入各種包,並且讀取類別標籤

from PIL import Image
import torch
from torchvision import models, transforms
from torch.autograd import Variable
from torch.nn import functional as F
import numpy as np
import cv2
import json

# 讀取 imagenet數據集的類別標籤
json_path = './cam/labels.json'
with open(json_path, 'r') as load_f:
    load_json = json.load(load_f)
classes = {int(key): value for (key, value)
           in load_json.items()}

2.讀取圖片,並預處理

# 讀取 imagenet數據集的某類圖片
img_path = './cam/9933031-large.jpg'
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

# 圖片預處理
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])

img_pil = Image.open(img_path)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))

3.加載預訓練模型

# 加載預訓練模型
model_id = 1
if model_id == 1:
    net = models.squeezenet1_1(pretrained=False)
    pthfile = r'./pretrained/squeezenet1_1-f364aa15.pth'
    net.load_state_dict(torch.load(pthfile))
    finalconv_name = 'features'  # 獲取卷積層的特徵
elif model_id == 2:
    net = models.resnet18(pretrained=False)
    finalconv_name = 'layer4'
elif model_id == 3:
    net = models.densenet161(pretrained=False)
    finalconv_name = 'features'
net.eval()	# 使用eval()屬性
print(net)

我只下了senet1_1,如果想使用其餘兩個模型,依葫蘆畫瓢自行修改。
打印模型的結果:

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (6): Fire(
      (squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (7): Fire(
      (squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (9): Fire(
      (squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (10): Fire(
      (squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (11): Fire(
      (squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (12): Fire(
      (squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AdaptiveAvgPool2d(output_size=(1, 1))
  )
)

可以看到特徵提取部分在(features)中,分類層在(classifier)中。

4.獲取特徵圖

features_blobs = []     # 後面用於存放特徵圖

def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

# 獲取 features 模塊的輸出
net._modules.get(finalconv_name).register_forward_hook(hook_feature)

register_forward_hook可以獲取中間層輸出,具體可自行百度。

5.獲取權重

# 獲取權重
params = list(net.parameters())
print(len(params))		# 52
weight_softmax = np.squeeze(params[-2].data.numpy())	# shape:(1000, 512)

params 中保存了模型的所有權重,怎麼索引到我們需要的呢?再回到模型打印結果那裏,由於pooling層和dropout層是不保存參數的,如果將所有的卷積、激活操作數下來,發現一共有52層有參數。如果要獲取features模塊到classifier模塊的權重,那麼就是獲取classifier中(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))的參數。這時,忽略最後一個全局平均池化,那麼就是索引爲-2的參數了。

logit = net(img_variable)				# 計算輸入圖片通過網絡後的輸出值
print(logit.shape)						# torch.Size([1, 1000])
print(params[-2].data.numpy().shape)	# 權重有1000種 (1000, 512, 1, 1)
print(features_blobs[0].shape)			# 特徵圖大小爲 (1, 512, 13, 13)

# 結果有1000類,進行排序,並獲得排序索引
h_x = F.softmax(logit, dim=1).data.squeeze()	
print(h_x.shape)						# torch.Size([1000])
probs, idx = h_x.sort(0, True)
probs = probs.numpy()					# 概率值排序
idx = idx.numpy()						# 類別索引排序,概率值越高,索引越靠前

# 取概率值爲前5的類別看看類別名和概率值
for i in range(0, 5):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))
'''
0.678 -> mountain bike, all-terrain bike, off-roader
0.088 -> bicycle-built-for-two, tandem bicycle, tandem
0.042 -> unicycle, monocycle
0.038 -> horse cart, horse-cart
0.019 -> lakeside, lakeshore

'''

6.定義計算CAM的函數

# 定義計算CAM的函數
def returnCAM(feature_conv, weight_softmax, class_idx):
    # 類激活圖上採樣到 256 x 256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    # 將權重賦給卷積層:這裏的weigh_softmax.shape爲(1000, 512)
    # 				feature_conv.shape爲(1, 512, 13, 13)
    # weight_softmax[class_idx]由於只選擇了一個類別的權重,所以爲(1, 512)
    # feature_conv.reshape((nc, h * w))後feature_conv.shape爲(512, 169)
    cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h * w)))
    print(cam.shape)		# 矩陣乘法之後,爲各個特徵通道賦值。輸出shape爲(1,169)
    cam = cam.reshape(h, w) # 得到單張特徵圖
    # 特徵圖上所有元素歸一化到 0-1
    cam_img = (cam - cam.min()) / (cam.max() - cam.min())  
    # 再將元素更改到 0-255
    cam_img = np.uint8(255 * cam_img)
    output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam

7.生成圖片

# 對概率最高的類別產生類激活圖
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])
# 融合類激活圖和原始圖片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM0.jpg', result)

cv2.applyColorMap函數的作用這裏不再贅述,上一篇博客中已經涉及。
在這裏插入圖片描述


# 對概率排在第五的類別產生類激活圖
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[4]])
# 融合類激活圖和原始圖片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM1.jpg', result)

在這裏插入圖片描述
差別一目瞭然

參考鏈接:
https://blog.csdn.net/qq_36825778/article/details/104193642
https://blog.csdn.net/u014264373/article/details/85415921

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章