深度學習（六）caffe入門學習

原文地址：http://blog.csdn.net/hjimce/article/details/48933813

作者：hjimce

本文主要講解caffe的整個使用流程，適用於初級入門caffe，通過學習本篇博文，理清項目訓練、測試流程。初級教程，高手請繞道。

我們知道，在caffe編譯完後，在caffe目錄下會生成一個build目錄，在build目錄下有個tools，這個裏面有個可執行文件caffe，如下圖所示：

有了這個可執行文件我們就可以進行模型的訓練，只需要學會調用這個可執行文件就可以了，這便是最簡單的caffe學習，不需要對caffe底層的東西懂太多，只需要會調參數，就可以構建自己的網絡，然後調用這個可執行文件就可以進行訓練，當然如果你不僅僅是調參數，而且想要更改相關的算法，那就要深入學習caffe的底層函數調用了，這個以後再講。本篇博文僅適合於剛入門學習caffe，高手請繞道。廢話不多說，迴歸正題：

一、總流程

完成一個簡單的自己的網絡模型訓練預測，主要包含幾個步驟：

1、數據格式處理，也就是把我們的圖片.jpg,.png等圖片以及標註標籤,打包在一起，搞成caffe可以直接方便調用的文件。後面我將具體講解如何打包自己的數據，讓caffe進行調用。

2、編寫網絡結構文件，這個文件的後綴格式是.prototxt。就是編寫你的網絡有多少層，每一層有多少個特徵圖，輸入、輸出……。看個例子，看一下caffe-》example-》mnist-》lenet_train_test.prototxt。這個便是手寫字體網絡結構文件了，我們需要根據自己的需要學會修改這個文件：

[cpp]view
plain copy
 
<span style="font-size:18px;">name: "LeNet"  

layer {  

  name: "mnist"  

  type: "Data"  //data層  

  top: "data"  

  top: "label"  

  include {  

    phase: TRAIN   //訓練階段  

  }  

  transform_param {  

    scale: 0.00390625   //對所有的圖片歸一化到0~1之間，也就是對輸入數據全部乘以scale，0.0039= 1/255  

  }  

  data_param {  

    source: "examples/mnist/mnist_train_lmdb"  //訓練數據圖片路徑  

    batch_size: 64    //每次訓練採用的圖片64張，min-batch  

    backend: LMDB  

  }  

}  

layer {  

  name: "mnist"  

  type: "Data"  

  top: "data"  

  top: "label"  

  include {  

    phase: TEST   //測試  

  }  

  transform_param {  

    scale: 0.00390625  

  }  

  data_param {  

    source: "examples/mnist/mnist_test_lmdb" //測試數據圖片路徑  

    batch_size: 100  

    backend: LMDB  

  }  

}  

layer {  

  name: "conv1"   //卷積神經網絡的第一層，卷積層  

  type: "Convolution"  //這層操作爲卷積  

  bottom: "data"   //這一層的前一層是data層  

  top: "conv1"   //  

  param {  

    lr_mult: 1     

  }  

  param {  

    lr_mult: 2  

  }  

  convolution_param {  

    num_output: 20    //定義輸出特徵圖個數  

    kernel_size: 5    //定義卷積核大小  

    stride: 1  

    weight_filler {  

      type: "xavier"  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layer {  

  name: "pool1"  

  type: "Pooling"      //池化層，這一層的操作爲池化  

  bottom: "conv1"   //這一層的前面一層名字爲：conv1  

  top: "pool1"  

  pooling_param {  

    pool: MAX   //最大池化  

    kernel_size: 2  

    stride: 2  

  }  

}  

layer {  

  name: "conv2"  

  type: "Convolution"  

  bottom: "pool1"  

  top: "conv2"  

  param {  

    lr_mult: 1  

  }  

  param {  

    lr_mult: 2  

  }  

  convolution_param {  

    num_output: 50  

    kernel_size: 5  

    stride: 1  

    weight_filler {  

      type: "xavier"  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layer {  

  name: "pool2"  

  type: "Pooling"  

  bottom: "conv2"  

  top: "pool2"  

  pooling_param {  

    pool: MAX  

    kernel_size: 2  

    stride: 2  

  }  

}  

layer {  

  name: "ip1"  

  type: "InnerProduct"  

  bottom: "pool2"  

  top: "ip1"  

  param {  

    lr_mult: 1  

  }  

  param {  

    lr_mult: 2  

  }  

  inner_product_param {  

    num_output: 500  

    weight_filler {  

      type: "xavier"  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layer {  

  name: "relu1"  

  type: "ReLU"  

  bottom: "ip1"  

  top: "ip1"  

}  

layer {  

  name: "ip2"  

  type: "InnerProduct"  

  bottom: "ip1"  

  top: "ip2"  

  param {  

    lr_mult: 1  

  }  

  param {  

    lr_mult: 2  

  }  

  inner_product_param {  

    num_output: 10  

    weight_filler {  

      type: "xavier"  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layer {  

  name: "accuracy"  

  type: "Accuracy"  

  bottom: "ip2"  

  bottom: "label"  

  top: "accuracy"  

  include {  

    phase: TEST  

  }  

}  

layer {  

  name: "loss"  

  type: "SoftmaxWithLoss"  

  bottom: "ip2"  

  bottom: "label"  

  top: "loss"  

}</span>

上面的網絡結構，定義的data層，就是定義我們輸入的訓練數據的路徑、圖片變換等。

3、網絡求解文件，這個文件我們喜歡把它取名爲：solver.prototxt，這個文件的後綴格式也是.prototxt。這個文件主要包含了一些求解網絡，梯度下降參數、迭代次數等參數……，看下手寫字體的solver.prototxt文件：

[cpp]view
plain copy
 
<span style="font-size:18px;">net: "examples/mnist/lenet_train_test.prototxt"  //定義網絡結構文件，也就是我們上一步編寫的文件  

test_iter: 100   

test_interval: 500 //每隔500次用測試數據，做一次驗證  

base_lr: 0.01     //學習率  

momentum: 0.9   //動量參數  

weight_decay: 0.0005   //權重衰減係數  

lr_policy: "inv"   //梯度下降的相關優化策略  

gamma: 0.0001  

power: 0.75  

display: 100  

max_iter: 10000   //最大迭代次數  

snapshot: 5000    //每迭代5000次，保存一次結果  

snapshot_prefix: "examples/mnist/lenet" //保存結果路徑  

solver_mode: GPU   //訓練硬件設備選擇GPU還是CPU</span>

這個文件的輸入就是我們前面一步定義的網絡結構。

4、編寫網絡求解文件後，我們可以說已經完成了CNN網絡的編寫。接着我們需要把這個文件，作爲caffe的輸入參數，調用caffe可執行文件，進行訓練就可以了。具體的命令如下：

[cpp]view
plain copy
 

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt  

這樣就完事了，程序就開始訓練了。上面的第一個參數caffe，就是我們在編譯caffe，生成的可執行文件：

然後solver就是我們在步驟3編寫的solver文件了，只要在ubuntu終端輸入上面的命令，就可以開始訓練了。

回想一下文件調用過程：首先caffe可執行文件，調用了solver.prototxt文件，而這個文件又調用了網絡結構文件lenet_train_test.prototxt，然後lenet_train_test.prototxt文件裏面又會調用輸入的訓練圖片數據等。因此我們如果要訓練自己的模型，需要備好3個文件：數據文件lmdb(該文件包含尋數據)、網絡結構lenet_train_test.prototxt、求解文件solver.prototxt，這幾個文件名隨便，但是文件後綴格式不要隨便亂改。把這三個文件放在同一個目錄下，然後在終端輸入命令，調用caffe就可以開始訓練了。

二、相關細節

1、lmdb數據格式生成

caffe輸入訓練圖片數據我比較喜歡用lmdb格式，好像還有另外一種格式leveldb，這個具體沒用過，這裏主要講解lmdb格式數據的製作。其實在caffe-》example-》imagenet文件夾下面的一些腳本文件可以幫助我們快速生產相關的caffe所需的數據。

create_imagenet.sh這個文件可以幫我們快速的生成lmdb的數據格式文件，因此我們只需要把這個腳本文件複製出來，稍作修改，就可以對我們的訓練圖片、標註文件進行打包爲lmdb格式文件了。製作圖片的腳本文件如下：

[python]view
plain copy
 
<span style="font-size:18px;">#!/usr/bin/env sh  

# Create the imagenet lmdb inputs  

# N.B. set the path to the imagenet train + val data dirs  

EXAMPLE=.          # 生成模型訓練數據文化夾  

TOOLS=../../build/tools                              # caffe的工具庫，不用變  

DATA=.                  # python腳步處理後數據路徑  

TRAIN_DATA_ROOT=train/  #待處理的訓練數據圖片路徑  

VAL_DATA_ROOT=val/      # 帶處理的驗證數據圖片路徑  

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have  

# already been resized using another tool.  

RESIZE=true   #圖片縮放  

if $RESIZE; then  

  RESIZE_HEIGHT=256  

  RESIZE_WIDTH=256  

else  

  RESIZE_HEIGHT=0  

  RESIZE_WIDTH=0  

fi  

if [ ! -d "$TRAIN_DATA_ROOT" ]; then  

  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"  

  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \  

       "where the ImageNet training data is stored."  

  exit 1  

fi  

if [ ! -d "$VAL_DATA_ROOT" ]; then  

  echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"  

  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \  

       "where the ImageNet validation data is stored."  

  exit 1  

fi  

echo "Creating train lmdb..."  

GLOG_logtostderr=1 $TOOLS/convert_imageset \  

    --resize_height=$RESIZE_HEIGHT \  

    --resize_width=$RESIZE_WIDTH \  

    --shuffle \  

    $TRAIN_DATA_ROOT \  

    $DATA/train.txt \     #標籤訓練數據文件  

    $EXAMPLE/train_lmdb  

echo "Creating val lmdb..."  

GLOG_logtostderr=1 $TOOLS/convert_imageset \  

    --resize_height=$RESIZE_HEIGHT \  

    --resize_width=$RESIZE_WIDTH \  

    --shuffle \  

    $VAL_DATA_ROOT \  

    $DATA/val.txt \    #驗證集標籤數據  

    $EXAMPLE/val_lmdb  

echo "Done."</span>

同時我們需要製作如下四個文件：

1、文件夾train，用於存放訓練圖片

2、文件夾val，用於存放驗證圖片

3、文件train.txt，裏面包含這每張圖片的名稱，及其對應的標籤。

[python]view
plain copy
 
<span style="font-size:18px;">first_batch/train_female/992.jpg    1  

first_batch/train_female/993.jpg    1  

first_batch/train_female/994.jpg    1  

first_batch/train_female/995.jpg    1  

first_batch/train_female/996.jpg    1  

first_batch/train_female/997.jpg    1  

first_batch/train_female/998.jpg    1  

first_batch/train_female/999.jpg    1  

first_batch/train_male/1000.jpg 0  

first_batch/train_male/1001.jpg 0  

first_batch/train_male/1002.jpg 0  

first_batch/train_male/1003.jpg 0  

first_batch/train_male/1004.jpg 0  

first_batch/train_male/1005.jpg 0  

first_batch/train_male/1006.jpg 0  

first_batch/train_male/1007.jpg 0  

first_batch/train_male/1008.jpg 0</span>

上面的標籤編號:1，表示女。標籤：0，表示男。

4、文件val.txt，同樣這個文件也是保存圖片名稱及其對應的標籤。

這四個文件在上面的腳本文件中，都需要調用到。製作玩後，跑一下上面的腳本文件，就ok了，跑完後，即將生成下面兩個文件夾：

文件夾下面有兩個對應的文件：

製作完後，要看看文件的大小，有沒有問題，如果就幾k，那麼正常是每做好訓練數據，除非你的訓練圖片就幾張。

二、訓練

1、直接訓練法

[python]view
plain copy
 
#!/usr/bin/env sh  

TOOLS=../cafferead/build/tools  

$TOOLS/caffe train --solver=gender_solver.prorotxt  -gpu all  #加入 -gpu 選項

-gpu 可以選擇gpu的id號，如果是 -gpu all表示啓用所有的GPU進行訓練。

2、採用funing-tuning 訓練法

[python]view
plain copy
 

$TOOLS/caffe train --solver=gender_solver.prorotxt -weights gender_net.caffemodel #加入-weights  

加入-weights，這個功能很好用，也經常會用到，因爲現在的CNN相關的文獻，很多都是在已有的模型基礎上，進行fine-tuning，因爲我們大部分人都缺少訓練數據，不像谷歌、百度這些土豪公司，有很多人專門做數據標註，對於小公司而言，往往缺少標註好的訓練數據。因此我們一般使用fine-tuning的方法，在少量數據的情況下，儘可能的提高精度。我們可以使用：-weights 選項，利用已有的模型訓練好的參數，作爲初始值，進行繼續訓練。

三、調用Python接口

訓練完畢後，我們就可以得到caffe的訓練模型了，接着我們的目標就預測，看看結果了。caffe爲我們提供了方便調用的python接口函數，這些都在模塊pycaffe裏面。因此我們還需要知道如何使用pycaffe，進行測試，查看結果。下面是pycaffe的預測調用使用示例：

[python]view
plain copy
 
# coding=utf-8  

import os  

import numpy as np  

from matplotlib import pyplot as plt  

import cv2  

import shutil  

import time  

#因爲RGB和BGR需要調換一下才能顯示  

def showimage(im):  

    if im.ndim == 3:  

        im = im[:, :, ::-1]  

    plt.set_cmap('jet')  

    plt.imshow(im)  

    plt.show()  

#特徵可視化顯示，padval用於調整亮度  

def vis_square(data, padsize=1, padval=0):  

    data -= data.min()  

    data /= data.max()  

    #因爲我們要把某一層的特徵圖都顯示到一個figure上，因此需要計算每個圖片佔用figure多少比例，以及繪製的位置  

    n = int(np.ceil(np.sqrt(data.shape[0])))  

    padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)  

    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))  

    # tile the filters into an image  

    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))  

    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])  

    showimage(data)  

#設置caffe源碼所在的路徑  

caffe_root = '../../../caffe/'  

import sys  

sys.path.insert(0, caffe_root + 'python')  

import caffe  

#加載均值文件  

mean_filename='./imagenet_mean.binaryproto'  

proto_data = open(mean_filename, "rb").read()  

a = caffe.io.caffe_pb2.BlobProto.FromString(proto_data)  

mean  = caffe.io.blobproto_to_array(a)[0]  

#創建網絡，並加載已經訓練好的模型文件  

gender_net_pretrained='./caffenet_train_iter_1500.caffemodel'  

gender_net_model_file='./deploy_gender.prototxt'  

gender_net = caffe.Classifier(gender_net_model_file, gender_net_pretrained,mean=mean,  

                       channel_swap=(2,1,0),#RGB通道與BGR  

                       raw_scale=255,#把圖片歸一化到0~1之間  

                       image_dims=(256, 256))#設置輸入圖片的大小  

#預測分類及其可特徵視化  

gender_list=['Male','Female']  

input_image = caffe.io.load_image('1.jpg')#讀取圖片  

prediction_gender=gender_net.predict([input_image])#預測圖片性別  

#打印我們訓練每一層的參數形狀  

print 'params:'  

for k, v in gender_net.params.items():  

    print 'weight:'  

    print (k, v[0].data.shape)#在每一層的參數blob中，caffe用vector存儲了兩個blob變量，用v[0]表示weight  

    print 'b:'  

    print (k, v[1].data.shape)#用v[1]表示偏置參數  

#conv1濾波器可視化  

filters = gender_net.params['conv1'][0].data  

vis_square(filters.transpose(0, 2, 3, 1))  

#conv2濾波器可視化  

'''''filters = gender_net.params['conv2'][0].data 

vis_square(filters[:48].reshape(48**2, 5, 5))'''  

#特徵圖  

print 'feature maps:'  

for k, v in gender_net.blobs.items():  

    print (k, v.data.shape);  

    feat = gender_net.blobs[k].data[0,0:4]#顯示名字爲k的網絡層,第一張圖片所生成的4張feature maps  

    vis_square(feat, padval=1)  

#顯示原圖片，以及分類預測結果  

str_gender=gender_list[prediction_gender[0].argmax()]  

print str_gender  

plt.imshow(input_image)  

plt.title(str_gender)  

plt.show()

上面的接口，同時包含了pycaffe加載訓練好的模型，進行預測及其特徵可視化的調用方法。

深度學習（六）caffe入門學習

Beamer 使用筆記

用LaTex製作幻燈片（slide）

latex的beamer幻燈片中對插入的圖形Figure編號

MRF,馬爾科夫隨機場

Alpha-expansion and Alpha-beta-swap Algorithm Flow

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結