前言
Faster R-CNN是Ross Girshick大神在Fast R-CNN基礎上提出的又一個更加快速、更高mAP的用於目標檢測的深度學習框架,它對Fast R-CNN進行的最主要的優化就是在Region Proposal階段,引入了Region Proposal Network (RPN)來進行Region Proposal,同時可以達到和檢測網絡共享整個圖片的卷積網絡特徵的目標,使得region proposal幾乎是cost free的。
關於Faster R-CNN的詳細介紹,可以參考我上一篇博客。
Faster R-CNN的代碼是開源的,有兩個版本:MATLAB版本(faster_rcnn),Python版本(py-faster-rcnn)。
這裏我主要使用的是Python版本,Python版本在測試期間會比MATLAB版本慢10%,因爲Python layers中的一些操作是在CPU中執行的,但是準確率應該是差不多的。
準備工作1——py-faster-rcnn的編譯安裝測試
py-faster-rcnn的編譯安裝
克隆Faster R-CNN倉庫:
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
一定要加上
--recursive
標誌,假設克隆後的文件夾名字叫py-faster-rcnn
編譯Cython模塊:
cd py-faster-rcnn/lib make
編譯裏面的Caffe和pycaffe:
cd py-faster-rcnn/caffe-fast-rcnn # 按照編譯Caffe的方法,進行編譯 # 注意Makefile.config的修改,這裏不再贅述Caffe的安裝 # 編譯 make -j8 && make pycaffe
這裏貼上我的
Makefile.config
文件代碼,根據你的情況進行相應修改## Refer to http://caffe.berkeleyvision.org/installation.html # Contributions simplifying and improving our build system are welcome! # cuDNN acceleration switch (uncomment to build with cuDNN). USE_CUDNN := 1 # CPU-only switch (uncomment to build without GPU support). # CPU_ONLY := 1 # uncomment to disable IO dependencies and corresponding data layers # USE_OPENCV := 0 # USE_LEVELDB := 0 # USE_LMDB := 0 # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary) # You should not set this flag if you will be reading LMDBs with any # possibility of simultaneous read and write # ALLOW_LMDB_NOLOCK := 1 # Uncomment if you're using OpenCV 3 OPENCV_VERSION := 3 # To customize your choice of compiler, uncomment and set the following. # N.B. the default for Linux is g++ and the default for OSX is clang++ # CUSTOM_CXX := g++ # CUDA directory contains bin/ and lib/ directories that we need. CUDA_DIR := /usr/local/cuda # On Ubuntu 14.04, if cuda tools are installed via # "sudo apt-get install nvidia-cuda-toolkit" then use this instead: # CUDA_DIR := /usr # CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the *_50 lines for compatibility. CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_50,code=compute_50 # BLAS choice: # atlas for ATLAS (default) # mkl for MKL # open for OpenBlas BLAS :=mkl # Custom (MKL/ATLAS/OpenBLAS) include and lib directories. # Leave commented to accept the defaults for your choice of BLAS # (which should work)! # BLAS_INCLUDE := /path/to/your/blas # BLAS_LIB := /path/to/your/blas # Homebrew puts openblas in a directory that is not on the standard search path # BLAS_INCLUDE := $(shell brew --prefix openblas)/include # BLAS_LIB := $(shell brew --prefix openblas)/lib # This is required only if you will compile the matlab interface. # MATLAB directory should contain the mex binary in /bin. MATLAB_DIR := /usr/local/MATLAB/R2016b # MATLAB_DIR := /Applications/MATLAB_R2012b.app # NOTE: this is required only if you will compile the python interface. # We need to be able to find Python.h and numpy/arrayobject.h. # PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/lib/python2.7/dist-packages/numpy/core/include # Anaconda Python distribution is quite popular. Include path: # Verify anaconda location, sometimes it's in root. ANACONDA_HOME := $(HOME)/anaconda PYTHON_INCLUDE := $(ANACONDA_HOME)/include \ $(ANACONDA_HOME)/include/python2.7 \ $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \ $ /usr/include/python2.7 # Uncomment to use Python 3 (default is Python 2) # PYTHON_LIBRARIES := boost_python3 python3.5m # PYTHON_INCLUDE := /usr/include/python3.5m \ # /usr/lib/python3.5/dist-packages/numpy/core/include # We need to be able to find libpythonX.X.so or .dylib. # PYTHON_LIB := /usr/lib PYTHON_LIB := $(ANACONDA_HOME)/lib # Homebrew installs numpy in a non standard path (keg only) # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include # PYTHON_LIB += $(shell brew --prefix numpy)/lib # Uncomment to support layers written in Python (will link against Python libs) WITH_PYTHON_LAYER := 1 # Whatever else you find you need goes here. # INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include # LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies # INCLUDE_DIRS += $(shell brew --prefix)/include # LIBRARY_DIRS += $(shell brew --prefix)/lib # Uncomment to use `pkg-config` to specify OpenCV library paths. # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.) # USE_PKG_CONFIG := 1 # N.B. both build and distribute dirs are cleared on `make clean` BUILD_DIR := build DISTRIBUTE_DIR := distribute # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171 # DEBUG := 1 # The ID of the GPU that 'make runtest' will use to run unit tests. TEST_GPUID := 0 # enable pretty build (comment to see full commands) Q ?= @
Demo運行
爲了檢驗你的py-faster-rcnn是否成功安裝,作者給出了一個demo,可以利用在PASCAL VOC2007數據集上體現訓練好的模型,來進行demo的運行,步驟如下:
下載預訓練好的Faster R-CNN檢測器:
cd py-faster-rcnn ./data/scripts/fetch_faster_rcnn_models.sh
這條命令會自動下載名爲
faster_rcnn_models.tgz
的文件,解壓後會創建data/faster_rcnn_models
文件夾,裏面會有兩個模型:- ZF_faster_rcnn_final.caffemodel:在ZF網絡模型下訓練所得
- VGG16_faster_rcnn_final.caffemodel:在VGG16網絡模型下訓練所得。
運行demo:
cd py-faster-rcnn ./tools/demo.py
demo會檢測5張圖片,這5張圖片放在
data/demo/
文件夾下,其中一張的檢測結果如下:至此如果上述過程沒有出錯,那麼py-faster-rcnn算是成功編譯安裝。
準備工作2——Caltech數據集
由於Faster R-CNN的一部分實驗是在PASCAL VOC2007數據集上進行的,所以要想用Faster R-CNN訓練我們自己的數據集,首先應該搞清楚PASCAL VOC2007數據集中的目錄、圖片、標註格式,這樣我們才能用自己的數據集製作出類似於PASCAL VOC2007類似的數據集,供Faster R-CNN來進行訓練及測試。
獲取PASCAL VOC2007數據集
這一部分不是必須的,如果你需要PASCAL VOC2007數據集,可以利用以下命令獲取數據集,但我們下載VOC數據集的目的主要是觀察他的文件結構和文件內容,以便於我們構建符合要求的自己的數據集。
創建一個專門用來存數據集的地方,假設是
$HOME/data
文件夾。下載PASCAL VOC2007的訓練、驗證和測試數據集:
cd $HOME/data wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
下載完後用以下命令解壓:
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar
會得到如下文件結構:
$HOME/data/VOCdevkit/ # 根文件夾 $HOME/data/VOCdevkit/VOC2007 # VOC2007文件夾 $HOME/data/VOCdevkit/VOC2007/Annotations # 標記文件夾 $HOME/data/VOCdevkit/VOC2007/ImageSets # 供train.txt、test.txt、val.txt等文件存放的文件夾 $HOME/data/VOCdevkit/VOC2007/JPEGImages # 存放圖片文件夾 # ... 以及其他的文件夾及子文件夾 ...
創建快捷方式symlinks來連接到VOC數據集存放的地方:
cd py-faster-rcnn/data ln -s $HOME/data/VOCdevkit/ VOCdevkit
這裏需要把
$HOME/data/VOCdevkit/
改爲你存放VOCdevkit
文件夾的路徑最好使用symlinks來在共享同一份數據集,防止數據集多處拷貝,佔用空間。
至此VOC數據集創建完畢。
PASCAL VOC數據集的分析
PASCAL VOC數據集的文件結構,如下:
└── VOCdevkit
└── VOC2007
├── Annotations
├── ImageSets
│ ├── Layout
│ ├── Main
│ └── Segmentation
├── JPEGImages
├── SegmentationClass
└── SegmentationObject
Annotations
該文件夾主要用來存放圖片標註(即爲ground truth),文件是.xml格式,每張圖片都有一個.xml文件與之對應。選取其中一個文件進行如下分析:
<annotation>
<folder>VOC2007</folder> # 必須有,父文件夾的名稱
<filename>000005.jpg</filename> # 必須有
<source> # 可有可無
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>325991873</flickrid>
</source>
<owner> # 可有可無
<flickrid>archintent louisville</flickrid>
<name>?</name>
</owner>
<size> # 表示圖像大小
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<segmented>0</segmented> # 用於分割
<object> # 目標信息,類別,bbox信息,圖片中每個目標對應一個<object>標籤
<name>chair</name>
<pose>Rear</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>263</xmin>
<ymin>211</ymin>
<xmax>324</xmax>
<ymax>339</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>1</difficult>
<bndbox>
<xmin>5</xmin>
<ymin>244</ymin>
<xmax>67</xmax>
<ymax>374</ymax>
</bndbox>
</object>
</annotation>
需要注意的,對於我們自己準備的xml標記文件中,每個<object>
標籤中的<xmin>
和<ymin>
標籤中所對應的座標值最好大於0,千萬不能爲負數,否則在訓練過程中會報錯:AssertionError: assert (boxes[:, 2]) >= boxes[:, 0]).all()
,如下:
所以爲了能夠順利訓練,一定要仔細檢查自己的xml文件中的左上角的座標是否都爲正。我被這個bug卡了一兩天,最終把自己標記中所有的錯誤座標找出來,才得以順利訓練。
ImageSets
ImageSets文件夾下有三個子文件夾,這裏我們只需關注Main文件夾即可。Main文件夾下主要用到的是train.txt、val.txt、test.txt、trainval.txt文件,每個文件中寫着供訓練、驗證、測試所用的文件名的集合,如下:
JPEGImages
JPEGImages文件夾下主要存放着所有的.jpg文件格式的輸入圖片,不在贅述。
製作VOC類似的Caltech數據集
經過以上對PASCAL VOC數據集文件結構的分析,我們仿照其,創建首先創建類似的文件結構即可:
└── VOCdevkit
└── VOC2007
└── Caltech
├── Annotations
├── ImageSets
│ └── Main
└── JPEGImages
我建議將Caltech文件創建一個symlinks鏈接到VOCdevkit文件夾之下,因爲這樣會方便之後訓練代碼的修改。
- 至於Caltech數據集如何從.seq文件轉化爲一張張.jpg圖片,這裏可以參考這裏。
- 至於Annotations中一個個.xml標記文件是實驗室師兄給我的,上面提到的方法也可以轉化,但是並不符合要求。
- 至於ImageSets中的train.txt是根據.xml文件得來的,test.txt是每個seq中每隔30幀取一幀圖片得來的。
參考博客
1