caffe學習筆記11 -- Net Surgery

這是caffe官方文檔Notebook Examples中的第四個例子, 鏈接地址:http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

這個例子講述如何編輯caffe的模型參數以滿足特定的需要,所有的網絡數據,殘差,參數都在pycaffe中。

與之前的分類問題不同,本例最後的輸出不是對整個圖分類概率,而是輸出圖的每塊區域屬於哪一類,本例中,用451*451的圖像作爲輸入,輸出是一個8*8的分類圖,圖中的數字表示所代表的區域屬於那個類別,例如:

[[282 282 281 281 281 281 277 282]
 [281 283 283 281 281 281 281 282]
 [283 283 283 283 283 283 287 282]
 [283 283 283 281 283 283 283 259]
 [283 283 283 283 283 283 283 259]
 [283 283 283 283 283 283 259 259]
 [283 283 283 283 259 259 259 277]
 [335 335 283 259 263 263 263 277]]

這裏,282是虎貓,281是斑貓,283是波斯貓

1. 導入包python,更改路徑

import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline
import Image
caffe_root = '/home/sindyz/caffe-master/'
import sys
sys.path.insert(0, caffe_root+'python')
import caffe
# configure plotting
plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

model_file = caffe_root + 'examples/net_surgery/conv.prototxt'
image_file = caffe_root + 'examples/images/cat_gray.jpg'


2. 設計濾波器

爲了展示如何載入,操作,保存參數。這裏在一個簡單的網絡中設計我們自己的濾波器,這個網絡只有一個卷積層,兩個blobs,data是輸入數據,conv爲卷積輸出,參數conv是卷積濾波器的權重和偏移。

# Load the net, list its data and params, and filter an example image.
caffe.set_mode_cpu()
net = caffe.Net(model_file, caffe.TEST)
print("blobs {}\nparams {}".format(net.blobs.keys(), net.params.keys()))
# load image and prepare as a single input batch for Caffe
im = np.array(Image.open(image_file))
plt.title("original image")
plt.imshow(im)
plt.axis('off')

im_input=im[np.newaxis, np.newaxis,:,:]
net.blobs['data'].reshape(*im_input.shape)
net.blobs['data'].data[...] = im_input

blobs ['data', 'conv']
params ['conv']


網絡結構:

# Simple single-layer network to showcase editing model parameters.
name: "convolution"
input: "data"
input_shape {
  dim: 1
  dim: 1
  dim: 100
  dim: 100
}
layer {
  name: "conv"
  type: "Convolution"
  bottom: "data"
  top: "conv"
  convolution_param {
    num_output: 3
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
卷積權值使用高斯噪聲初始化,偏置初始化爲0,這種隨機濾波器可以的到類似邊緣濾波的效果

3. 顯示濾波結果

# helper show filter outputs
def show_filters(net):
    net.forward()
    plt.figure()
    filt_min, filt_max = net.blobs['conv'].data.min(), net.blobs['conv'].data.max()
    for i in range(3):
        plt.subplot(1,4,i+2)
        plt.title("filter #{} output".format(i))
        plt.imshow(net.blobs['conv'].data[0, i], vmin=filt_min, vmax=filt_max)
        plt.tight_layout()
        plt.axis('off')

# filter the image with initial 
show_filters(net)
輸出:


4. 提高濾波器的偏置將相應的提高他的輸出

conv0 = net.blobs['conv'].data[0, 0]
print("pre-surgery output mean {:.2f}".format(conv0.mean()))
# set first filter bias to 10
net.params['conv'][1].data[0] = 1.
net.forward()
print("post-surgery output mean {:.2f}".format(conv0.mean()))

pre-surgery output mean -12.93
post-surgery output mean -11.93

5. 更改濾波器

也可以更改濾波器,我們可以使用任意的核如高斯模糊濾波,Sobel邊緣算子等等。在下面的改動中,第0個濾波器爲高斯模糊濾波,第1個濾波器和第二個濾波器爲水平和數值方向的Sobel濾波算子。

ksize = net.params['conv'][0].data.shape[2:]
# make Gaussian blur
sigma = 1.http://write.blog.csdn.net/postedit/50681609
y, x = np.mgrid[-ksize[0]//2 + 1:ksize[0]//2 + 1, -ksize[1]//2 + 1:ksize[1]//2 + 1]
g = np.exp(-((x**2 + y**2)/(2.0*sigma**2)))
gaussian = (g / g.sum()).astype(np.float32)
net.params['conv'][0].data[0] = gaussian
# make Sobel operator for edge detection
net.params['conv'][0].data[1:] = 0.
sobel = np.array((-1, -2, -1, 0, 0, 0, 1, 2, 1), dtype=np.float32).reshape((3,3))
net.params['conv'][0].data[1, 0, 1:-1, 1:-1] = sobel  # horizontal
net.params['conv'][0].data[2, 0, 1:-1, 1:-1] = sobel.T  # vertical
show_filters(net)

可以看出,第0張圖片是模糊的,第1張圖片選出了水平方向的邊緣,第2張圖片選出了垂直方向的邊緣


6. 在全連接網絡中構造分類器

現在將caffe中自帶的ImageNet模型“caffenet”轉換成一個全卷積網絡,以便於對大量輸入高效,密集的運算。這個模型產生一個與輸入相同大小的分類圖而不是單一的分類器。特別的,在451x451的輸入中,一個8x8的分類圖提供了64倍的輸出但是僅消耗了3倍的時間。The computation exploits a natural efficiency of convolutional network (convnet) structure by amortizing the computation of overlapping receptive fields.

我們將CaffeNet的InnerProduct矩陣乘法層轉換爲卷積層,這是僅有的改變: the other layer types are agnostic to spatial size. 卷積是平移不變的, 激活是元素操作。 fc6全連接層用fc6-conv替換,用6*6濾波器已間隔爲1對pool5的輸出進行濾波。回到圖像空間中,對每個227*227的輸入且步長爲32的圖像給定一個分類器, 輸出圖像和感受野的尺寸相同; output = (input -kernel_size) / stride + 1。

執行:

!diff examples/net_surgery/bvlc_caffenet_full_conv.prototxt models/bvlc_reference_caffenet/deploy.prototxt
輸出:

1,2c1
< # Fully convolutional network version of CaffeNet.
< name: "CaffeNetConv"
---
> name: "CaffeNet"
5c4
<   dim: 1
---
>   dim: 10
7,8c6,7
<   dim: 451
<   dim: 451
---
>   dim: 227
>   dim: 227
154,155c153,154     #做過改變的行
<   name: "fc6-conv"      #將fc6改爲fc6-conv
<   type: "Convolution"      #內積層變爲卷基層
---
>   name: "fc6"
>   type: "InnerProduct"
157,158c156,157
<   top: "fc6-conv"
<   convolution_param {#變爲卷積層後需要設置參數
---
>   top: "fc6"
>   inner_product_param {
160d158
<     kernel_size: 6     #pool5的輸出是36,全連接卷積層爲6*6
166,167c164,165
<   bottom: "fc6-conv"
<   top: "fc6-conv"
---
>   bottom: "fc6"
>   top: "fc6"
172,173c170,171
<   bottom: "fc6-conv"
<   top: "fc6-conv"
---
>   bottom: "fc6"
>   top: "fc6"
179,183c177,181
<   name: "fc7-conv"
<   type: "Convolution"
<   bottom: "fc6-conv"
<   top: "fc7-conv"
<   convolution_param {
---
>   name: "fc7"
>   type: "InnerProduct"
>   bottom: "fc6"
>   top: "fc7"
>   inner_product_param {
185d182
<     kernel_size: 1
191,192c188,189
<   bottom: "fc7-conv"
<   top: "fc7-conv"
---
>   bottom: "fc7"
>   top: "fc7"
197,198c194,195
<   bottom: "fc7-conv"
<   top: "fc7-conv"
---
>   bottom: "fc7"
>   top: "fc7"
204,208c201,205
<   name: "fc8-conv"
<   type: "Convolution"
<   bottom: "fc7-conv"
<   top: "fc8-conv"
<   convolution_param {
---
>   name: "fc8"
>   type: "InnerProduct"
>   bottom: "fc7"
>   top: "fc8"
>   inner_product_param {
210d206
<     kernel_size: 1
216c212
<   bottom: "fc8-conv"
---
>   bottom: "fc8"

可見,結構上唯一需要改變的就是將全連接的分類器內積層改爲卷積層,並且使用6*6的濾波器,因爲參考模型的分類以pool6的36個輸出作爲fc6-conv的輸入。爲了保證密集分類,令步長爲1。注意,重命名是爲了避免當模型命名爲"預訓練"模型時caffe取載入舊的參數。

#載入傳統的網絡模型獲取全連接層的參數
net = caffe.Net('models/bvlc_reference_caffenet/deploy.prototxt', 
                'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel', 
                caffe.TEST)

params = ['fc6', 'fc7', 'fc8']
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}
for fc in params:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)

輸出:

fc6 weights are (4096, 9216) dimensional and biases are (4096,) dimensional
fc7 weights are (4096, 4096) dimensional and biases are (4096,) dimensional
fc8 weights are (1000, 4096) dimensional and biases are (1000,) dimensional

考慮到內部產生的參數的形狀,權值的規模是輸入*輸出,偏置的規模是輸出的規模。

# 導入全卷積網去移植的參數
net_full_conv = caffe.Net('examples/net_surgery/bvlc_caffenet_full_conv.prototxt', 
                          'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
                          caffe.TEST)
params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']
# conv_params = {name: (weights, biases)}
conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}

for conv in params_full_conv:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)

輸出:

fc6-conv weights are (4096, 256, 6, 6) dimensional and biases are (4096,) dimensional
fc7-conv weights are (4096, 4096, 1, 1) dimensional and biases are (4096,) dimensional
fc8-conv weights are (1000, 4096, 1, 1) dimensional and biases are (1000,) dimensional

卷積權重由 output*input*heigth*width的規模決定,爲了將內部產生的權重對應到卷積濾波器中,需要將內部產生的權值轉變爲channel*height*width規模的濾波矩陣。但是他們全部在內存中(按行存儲), 所以我們可以直接指定,即兩者是一致的。

偏置與內連接層相同。

開始移植;

for pr, pr_conv in zip(params, params_full_conv):
    conv_params[pr_conv][0].flat = fc_params[pr][0].flat  # flat unrolls the arrays
    conv_params[pr_conv][1][...] = fc_params[pr][1]

存儲新的模型:

net_full_conv.save('examples/net_surgery/bvlc_caffenet_full_conv.caffemodel')
最後,從examples裏貓的圖片中做分類圖。利用概率的熱分佈圖可視化分類的置信度。這裏對一個451*451的輸入給出一個8*8的分類圖,
im = caffe.io.load_image('examples/images/cat.jpg')
transformer = caffe.io.Transformer({'data': net_full_conv.blobs['data'].data.shape})
transformer.set_mean('data', np.load('python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
# make classification map by forward and print prediction indices at each location
out = net_full_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)]))
print out['prob'][0].argmax(axis=0)
# show net input and confidence map (probability of the top prediction at each location)
plt.subplot(1, 2, 1)
plt.imshow(transformer.deprocess('data', net_full_conv.blobs['data'].data[0]))
plt.subplot(1, 2, 2)
plt.imshow(out['prob'][0,281])

輸出:

[[282 282 281 281 281 281 277 282]
 [281 283 283 281 281 281 281 282]
 [283 283 283 283 283 283 287 282]
 [283 283 283 281 283 283 283 259]
 [283 283 283 283 283 283 283 259]
 [283 283 283 283 283 283 259 259]
 [283 283 283 283 259 259 259 277]
 [335 335 283 259 263 263 263 277]]



分類結果包含各種貓:282是虎貓,281是斑貓,283是波斯貓,還有狐狸及其他哺乳動物。

用這種方式,全連接網絡可以用於提取圖像的密集特徵,這比分類圖本身更加有用。

注意,這種模型不完全適用與滑動窗口檢測,因爲它是爲了整張圖片的分類而訓練的,然而,滑動窗口訓練和微調可以通過在真值和loss的基礎上定義一個滑動窗口,這樣一個loss圖就會由每一個位置組成,由此可以像往常一樣解決。



參考資料:

http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb


caffe新手,希望大家多多交流,共同進步。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章