【深度学习】Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

本文简要介绍Faster R-CNN在win10 64位环境下的配置。本文使用的Faster R-CNN源码：https://github.com/endernewton/tf-faster-rcnn。

其他环境为：
tensorflow-gpu1.12.0
python3.6（通过Anaconda安装的）
CUDA9.0、cudnn7.3（个人笔记本显卡NVIDIA 840M）
VS2015或2017（这个在之后编译cpu_nms/gpu_nms或者cocoAPI时需要用到，如果嫌臃肿可以只安装编译工具，本文安装了vs2015编译工具VC++ 14 BUILD TOOLS，进行在线下载安装。链接：https://pan.baidu.com/s/1-VE1AxRvHaX2xDjtm8uAug 密码：ag1v）

其他注意事项：

注意在安装tensorflow-gpu时要与所安装的CUDA和cudnn版本相对应，可查询install/source_windows，版本不匹配的话容易出现以下错误ImportError: DLL load failed:找不到指定的模块，可参考此博客目的还是是为了解决tensorflow-gpu和CUDA版本不一致的问题。
在安装CUDA时，稍注意下CUDA版本和驱动版本要相符，否则可能出现CUDA driver version is insufficient for CUDA runtime version ，这里的CUDA runtime version指的就是CUDA toolkit版本，如CUDA10.0、CUDA9.0等，CUDA driver version指的是通过nvidia-smi命令查看到的驱动版本。下图为NVIDIA官网上CUDA toolkit与驱动版本对应表：在安装CUDA的时候会自动安装驱动，如果你之前已经装了驱动，若已安装的驱动版本低于CUDA安装包中驱动的版本，则会覆盖安装高版本驱动（前提是显卡能够支持安装较高版本驱动）；如果已安装驱动高于CUDA安装包中驱动版本，则会跳过安装步骤。注意安装的驱动版本稍高于CUDA toolkit默认的驱动版本是可以的（比如CUDA10.0.130依赖的驱动版本>=411.31即可运行，因为CUDA toolkit中接口的最终依赖还是需要驱动程序来支持的，驱动是向下兼容的）。
如果需要使用其他版本的python、CUDA、cudnn编译的tensorflow，可以查看https://github.com/fo40225/tensorflow-windows-wheel。
如果使用conda命令创建虚拟环境来安装的tensorflow-gpu，通常会默认下载cudatoolkit及相应的cudnn包。若显卡驱动版本支持conda中cudatoolkit版本，那么用conda activate 虚拟环境命令激活该环境，则可以直接调用cudatoolkit包而不使用本地安装的CUDA（若不激活环境去使用tensorflow，则使用不了cudatoolkit包，需要本地安装与tensorflow相匹配的CUDA，tensorflow会到本地CUDA安装目录下加载所需的动态库。详见conda 指南中activate说明）。要说明的是：anaconda 的 cudatoolkit不包含完整安装cuda的全部文件，只是包含了用于 tensorflow，pytorch，xgboost 和 Cupy 等所需要的共享库文件，还是处于方便移植的目的，免去本地安装CUDA（转自：https://zhihu.com/question/344950161/answer/819075473）。

附上几个原理及源码解析链接：
Faster R-CNN源码解析：https://blog.csdn.net/u012457308/article/details/79566195
一文读懂Faster RCNN：https://zhuanlan.zhihu.com/p/31426458
Tensorflow 版本 Faster RCNN 代码解读：https://zhuanlan.zhihu.com/p/32230004
从编程实现角度学习Faster R-CNN（附极简实现）：https://zhuanlan.zhihu.com/p/32404424（讲解的代码是基于pytorch版本的，但是基本思想是一样的，讲解得也很清楚。）

一、源码下载

个人使用的是git-bash+MinGW模拟linux命令行环境，本文中shell语句如未说明在CMD终端下均为在git-bash下执行。

git clone https://github.com/endernewton/tf-faster-rcnn.git

二、修改`lib`下部分文件，编译Cython模块

转自https://github.com/endernewton/tf-faster-rcnn/issues/335，如果看着不清楚可以前往查看原文。
1.修改lib/nms/cpu_nms.pyx：第25行

cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]

为：

cdef np.ndarray[np.int64_t, ndim=1] order = scores.argsort()[::-1]

2.修改lib/nms/gpu_nms.pyx：第25行

cdef np.ndarray[np.int_t, ndim=1] \

为：

cdef np.ndarray[np.int64_t, ndim=1] \

3.修改lib/datasets/pascal_voc.py：第226行

'{:s}.xml')

为：

'{0:s}.xml')

4.修改lib/datasets/voc_eval.py：第121行

with open(cachefile, 'w') as f:

为：

with open(cachefile, 'wb') as f:

5.修改lib/setup.py：

# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------

import os
from os.path import join as pjoin
import numpy as np
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

def find_in_path(name, path):
    "Find a file in a search path"
    #adapted fom http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
    for dir in path.split(os.pathsep):
        binpath = pjoin(dir, name)
        if os.path.exists(binpath):
            return os.path.abspath(binpath)
    return None

def locate_cuda():
    """Locate the CUDA environment on the system

    Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
    and values giving the absolute path to each directory.

    Starts by looking for the CUDAHOME env variable. If not found, everything
    is based on finding 'nvcc' in the PATH.
    """

    # first check if the CUDAHOME env variable is in use
    if 'CUDAHOME' in os.environ:
        home = os.environ['CUDAHOME']
        nvcc = pjoin(home, 'bin', 'nvcc')
    else:
        # otherwise, search the PATH for NVCC
        nvcc = find_in_path('nvcc.exe', os.environ['PATH'])
        if nvcc is None:
            raise EnvironmentError('The nvcc binary could not be '
                'located in your $PATH. Either add it to your path, or set $CUDAHOME')
        home = os.path.dirname(os.path.dirname(nvcc))

    cudaconfig = {'home':home, 'nvcc':nvcc,
                  'include': pjoin(home, 'include'),
                  'lib64': pjoin(home, 'lib', 'x64')}
    for k, v in iter(cudaconfig.items()):
        if not os.path.exists(v):
            raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))

    return cudaconfig
CUDA = locate_cuda()

# Obtain the numpy include directory.  This logic works across numpy versions.
try:
    numpy_include = np.get_include()
except AttributeError:
    numpy_include = np.get_numpy_include()

def customize_compiler_for_nvcc(self):
    # _msvccompiler.py imports:
    import os
    import shutil
    import stat
    import subprocess
    import winreg

    from distutils.errors import DistutilsExecError, DistutilsPlatformError, \
                                 CompileError, LibError, LinkError
    from distutils.ccompiler import CCompiler, gen_lib_options
    from distutils import log
    from distutils.util import get_platform

    from itertools import count

    super = self.compile
    self.src_extensions.append('.cu')
    # find python include
    import sys
    py_dir = sys.executable.replace('\\', '/').split('/')[:-1]
    py_include = pjoin('/'.join(py_dir), 'include')

    # override method in _msvccompiler.py, starts from line 340
    def compile(sources,
                output_dir=None, macros=None, include_dirs=None, debug=0,
                extra_preargs=None, extra_postargs=None, depends=None):

        if not self.initialized:
            self.initialize()
        compile_info = self._setup_compile(output_dir, macros, include_dirs,
                                           sources, depends, extra_postargs)
        macros, objects, extra_postargs, pp_opts, build = compile_info

        compile_opts = extra_preargs or []
        compile_opts.append('/c')
        if debug:
            compile_opts.extend(self.compile_options_debug)
        else:
            compile_opts.extend(self.compile_options)

        add_cpp_opts = False

        for obj in objects:
            try:
                src, ext = build[obj]
            except KeyError:
                continue
            if debug:
                # pass the full pathname to MSVC in debug mode,
                # this allows the debugger to find the source file
                # without asking the user to browse for it
                src = os.path.abspath(src)

            if ext in self._c_extensions:
                input_opt = "/Tc" + src
            elif ext in self._cpp_extensions:
                input_opt = "/Tp" + src
                add_cpp_opts = True
            elif ext in self._rc_extensions:
                # compile .RC to .RES file
                input_opt = src
                output_opt = "/fo" + obj
                try:
                    self.spawn([self.rc] + pp_opts + [output_opt, input_opt])
                except DistutilsExecError as msg:
                    raise CompileError(msg)
                continue
            elif ext in self._mc_extensions:
                # Compile .MC to .RC file to .RES file.
                #   * '-h dir' specifies the directory for the
                #     generated include file
                #   * '-r dir' specifies the target directory of the
                #     generated RC file and the binary message resource
                #     it includes
                #
                # For now (since there are no options to change this),
                # we use the source-directory for the include file and
                # the build directory for the RC file and message
                # resources. This works at least for win32all.
                h_dir = os.path.dirname(src)
                rc_dir = os.path.dirname(obj)
                try:
                    # first compile .MC to .RC and .H file
                    self.spawn([self.mc, '-h', h_dir, '-r', rc_dir, src])
                    base, _ = os.path.splitext(os.path.basename(src))
                    rc_file = os.path.join(rc_dir, base + '.rc')
                    # then compile .RC to .RES file
                    self.spawn([self.rc, "/fo" + obj, rc_file])

                except DistutilsExecError as msg:
                    raise CompileError(msg)
                continue
            elif ext == '.cu':
                # a trigger for cu compile
                try:
                    # use the cuda for .cu files
                    # self.set_executable('compiler_so', CUDA['nvcc'])
                    # use only a subset of the extra_postargs, which are 1-1 translated
                    # from the extra_compile_args in the Extension class
                    postargs = extra_postargs['nvcc']
                    arg = [CUDA['nvcc']] + sources + ['-odir', pjoin(output_dir, 'nms')]
                    for include_dir in include_dirs:
                        arg.append('-I')
                        arg.append(include_dir)
                    arg += ['-I', py_include]
                    # arg += ['-lib', CUDA['lib64']]
                    arg += ['-Xcompiler', '/EHsc,/W3,/nologo,/Ox,/MD']
                    arg += postargs
                    self.spawn(arg)
                    continue
                except DistutilsExecError as msg:
                    # raise CompileError(msg)
                    continue
            else:
                # how to handle this file?
                raise CompileError("Don't know how to compile {} to {}"
                                   .format(src, obj))

            args = [self.cc] + compile_opts + pp_opts
            if add_cpp_opts:
                args.append('/EHsc')
            args.append(input_opt)
            args.append("/Fo" + obj)
            args.extend(extra_postargs)

            try:
                self.spawn(args)
            except DistutilsExecError as msg:
                raise CompileError(msg)

        return objects

    self.compile = compile

# run the customize_compiler
class custom_build_ext(build_ext):
    def build_extensions(self):
        customize_compiler_for_nvcc(self.compiler)
        build_ext.build_extensions(self)

ext_modules = [
    Extension(
        "utils.cython_bbox",
        ["utils/bbox.pyx"],
        extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
        include_dirs = [numpy_include]
    ),
    Extension(
        "nms.cpu_nms",
        ["nms/cpu_nms.pyx"],
        include_dirs = [numpy_include]
    ),
    Extension('nms.gpu_nms',
        ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
        library_dirs=[CUDA['lib64']],
        libraries=['cudart'],
        language='c++',
        # this syntax is specific to this build system
        # we're only going to use certain compiler args with nvcc and not with gcc
        # the implementation of this trick is in customize_compiler() below
        extra_compile_args={'gcc': ["-Wno-unused-function"],
                            'nvcc': ['-arch=sm_52',
                                     '--ptxas-options=-v',
                                     '-c']},
        include_dirs = [numpy_include, CUDA['include']]
    )
]

setup(
    name='tf_faster_rcnn',
    ext_modules=ext_modules,
    # inject our custom trigger
    cmdclass={'build_ext': custom_build_ext},
)

这里说明一下为什么修改setup.py，该文件是为了编译Cython 模块，而源码的编译环境为linux所以会使用gcc来编译，但在windows下需要使用MSVC来编译，如本文开头所说可以安装VC++ 14 BUILD TOOLS来解决这个问题。
同时修改lib/setup.py第224行的参数'-arch=sm_52'，需与自己的GPU架构相符合，针对我个人的GPU将其改为了sm_50。可以查阅源码的installation guide。了解更多请移步NVIDIA GPU Compilation。
修改完成后，进入到lib目录下，执行python setup.py build_ext。这时会报错：

nms\gpu_nms.cpp(2075): error C2664: "void _nms(int *,int *,const float *,int,int,float,int)": cannot convert parameter 1 from '__pyx_t_5numpy_int32_t *' to 'int *'

需要修改lib/nms/gpu_nms.cpp：

_nms((&(*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int32_t *, __pyx_pybuffernd_keep.rcbuffer->pybuffer.buf, __pyx_t_10, ...

为：

_nms((&(*__Pyx_BufPtrStrided1d(int *, __pyx_pybuffernd_keep.rcbuffer->pybuffer.buf, __pyx_t_10, ...

之后再次运行python setup.py build_ext，将lib/build/lib.win-amd64-3.6下生产的所有文件，复制到lib下，就完成编译啦。

需要注意的一点小问题：如果已编译成功想再次编译时或者编译出现某些奇怪错误，可以尝试删除生成的.c和.cpp文件（lib/nms/cpu_nms.c，lib/nms/gpu_nms.cpp，lib/utils/bbox.c）以及lib下build文件夹，否则会如下图所示直接跳过而不编译（该提示图仅针对已编译成功再次编译情况）或者提示错误信息。

三、安装COCO API

源码中提供的Python COCO API并不支持windows，这部分教程转自在 Windows 下安装 COCO API（pycocotools）。
有两种安装方式供选择：
（1）使用pip安装
在CMD终端下运行如下代码：

pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

（2）通过源码安装

从https://github.com/philferriere/cocoapi下载源码，并解压。
终端进入到*/cocoapi-master/PythonAPI目录下，执行shell语句：

python setup.py build_ext

也可以执行make命令，但需要安装MinGW。
注意：此种安装方法同样需要使用 Microsoft Visual C++ 14.0 对 COCO 源码进行编译。
本文选择方式2，同时为了和Faster R-CNN源码中目录保持一致，将cocoapi-master改为coco复制到tf-faster-rcnn/data下。

四、下载数据

下载pascalVOC数据集，包括训练集train、验证集val、训练验证集trainval和测试集test。

http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar

将这三个压缩包解压，合并，主要文件夹结构如下：

├── VOCdevkit
    ├── VOC2007
    |   ├── Annotations
    |   ├── ImageSets
    |   ├── JPEGImages
    |   ├── SegmentationClass  # 用于图像分割
    |   ├── SegmentationObject  # 用于图像分割
    ├── results  # 在测试时会在该文件夹下的/VOC2007/Main中临时保存结果文件，否则删除该文件夹测试时会提示No such file or directory错误

其余文件夹或.m文件可以删除。
本文中将VOCdevkit更改为VOCdevkit2007，将该数据集复制到tf-faster-rcnn/data下，用于之后使用该数据集自己训练模型。

五、运行demo并测试预训练模型

（1）下载预训练模型

原作者提供了已经训练好的模型，链接：预训练模型提取码：4k5c。
可以利用原作者提供的软链接方式来使用预训练模型，如下shell命令：

NET=res101
TRAIN_IMDB=voc_2007_trainval+voc_2012_trainval
mkdir -p output/${NET}/${TRAIN_IMDB}
cd output/${NET}/${TRAIN_IMDB}
ln -s ../../../data/voc_2007_trainval+voc_2012_trainval ./default
cd ../../..

或者解压后将voc_2007_trainval+voc_2012_trainval文件夹放在data目录下，同时在tf-faster-rcnn根目录下新建output/res101/voc_2007_trainval+voc_2012_trainval/default，将voc_2007_trainval+voc_2012_trainval下的4个模型数据文件复制到default中。

（2）运行demo

执行如下shell命令运行demo.py，测试几个给定的示例图片。

# at repository root  要在tf-faster-rcnn根目录下
GPU_ID=0
CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py

（3）测试预训练模型

首先为了在windows下运行.sh文件，需将/experiments/scripts/test_faster_rcnn.sh：第58行

  CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/test_net.py \

和第67行

  CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/test_net.py \

中的time删掉。同样的为了后续顺利执行训练代码，将/experiments/scripts/train_faster_rcnn.sh中的第62/73行中的time删掉。
之后，执行如下shell命令，即会进行测试过程：

# at repository root  要在tf-faster-rcnn根目录下
GPU_ID=0
./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101

六、基于pascal VOC数据集自己训练模型

（1）下载预训练模型和权重

当前源码支持VGG16和Resnet V1模型，下载链接：VGG16 提取码：2b7y。下载链接：Resnet101 提取码：x73i 。
之后，在tf-faster-rcnn/data下新建imagenet_weights文件夹，将解压后的模型放到该文件夹下，同时将vgg_16.ckpt修改为vgg16.ckpt，resnet_v1_101.ckpt修改为 res101.ckpt。本文中仅使用了res101。

（2）训练模型（及测试、评估过程）

执行如下命令：

./experiments/scripts/train_faster_rcnn.sh [GPU_ID] [DATASET] [NET]
# GPU_ID is the GPU you want to test on
# NET in {vgg16, res50, res101, res152} is the network arch to use
# DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh
# Examples:
./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc res101
./experiments/scripts/train_faster_rcnn.sh 1 coco vgg16

默认情况下，训练得到的模型会存储在：

output/[NET]/[DATASET]/default/

测试输出保存在：

output/[NET]/[DATASET]/default/[SNAPSHOT]/

训练和验证的Tensorboard信息保存在：

tensorboard/[NET]/[DATASET]/default/
tensorboard/[NET]/[DATASET]/default_val/

说明：
（1）输出路径里的DATASET与前边shell语句中的DATASET不同。如果使用pascal_voc数据集，根据train_faster_rcnn.sh中第19/20行：

case ${DATASET} in
  pascal_voc)
    TRAIN_IMDB="voc_2007_trainval"
    TEST_IMDB="voc_2007_test"

训练时输出路径中的DATASET为voc_2007_trainval，测试时输出路径中的DATASET为voc_2007_test。
（2）在执行train_faster_rcnn.sh时会在末尾执行test_faster_rcnn.sh进行测试，及进行评估过程计算mAP。比如我们使用pascal_voc数据集，可以在train_faster_rcnn.sh和test_faster_rcnn.sh中修改DATASET中pascal_voc数据集的ITERS，以调整迭代次数，或者针对性自行设置其他参数。（如果只是为了跑通代码，排查错误，可以将两个.sh中的ITERS设置低一些，比如30，以减少运行时间）

七、利用Faster R-CNN框架训练自己项目数据

前面都是在说如何运行起来Faster R-CNN，当前这一部分才是如何将Faster R-CNN运用到自己的数据上面。

（1）数据标注及格式转换

许多目标检测框架的源码都使用的是pascal VOC格式的数据，Faster R-CNN框架也不例外。这里可以使用labelImg进行数据标注，可直接生成pascal VOC格式的标注。
具体细节可参见为目标检测制作PASCAL VOC2007格式的数据集。

一些说明：

其中的图片改名步骤个人认为不是必须的，我自己的图片数据已经用一串阿拉伯数字命名了如03150_1.jpg，实测不改也可行。
如果你的图片数据格式不是jpg的话，可以先转化为jpg再制作数据集。或者直接将tf-faster-rcnn/lib/datasets/pascal_voc.py中第43行self._image_ext设置为你的图片格式。

按照这个步骤会得到如下形式的数据集目录：

├── VOC2007
    ├── Annotations
    ├── ImageSets
    |    ├──Main
    |        ├── test.txt
    |        ├── train.txt
    |        ├── trainval.txt
    |        ├── val.txt
    ├── JPEGImages

将这个新数据集替换掉tf-faster-rcnn/data/VOCdevkit2007中的VOC2007即可。

（2）开始训练

首先，在tf-faster-rcnn/lib/datasets目录下的pascal_voc.py里第36行更改自己的类别，'__background__'切记不可删掉，把后面的原来的20个label换成自己的，不用更改类别数目，代码中会自己计算该元组的长度获得类别数目。其他参数如迭代次数可以在前文中讲到的文件中修改，学习率等参数在tf-faster-rcnn/lib/model/config.py中修改。
之后，需要把之前训练产生的模型以及cache删除掉，分别在tf-faster-rcnn/output/res101/voc_2007_trainval/default路径下和tf-faster-rcnn/data/cache路径下。
最后，终于要训练了，同样执行如下shell命令：

./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc res101

剩下的工作交给GPU吧。

八、参考资料

python3+Tensorflow+Faster R-CNN训练自己的数据：https://blog.csdn.net/char_QwQ/article/details/80980505

【深度学习】Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

一、源码下载

二、修改`lib`下部分文件，编译Cython模块

三、安装COCO API

四、下载数据

五、运行demo并测试预训练模型

（1）下载预训练模型

（2）运行demo

（3）测试预训练模型

六、基于pascal VOC数据集自己训练模型

（1）下载预训练模型和权重

（2）训练模型（及测试、评估过程）

七、利用Faster R-CNN框架训练自己项目数据

（1）数据标注及格式转换

（2）开始训练

八、参考资料

DAPPER 事务 TRANSACTION

Java中线程的创建方式

一键自动化博客发布工具,chrome和firfox详细配置

【機器學習】機器學習中的正則化項

目標檢測中的評價指標mAP理解及計算

【深度學習】Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

Python 之【re模塊的正則表達式學習】

批處理--ren重命名

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【深度学习】Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

Faster R-CNN+win10+tensorflow1.12.0+python3.6+CUDA9.0+cudnn7.3配置

一、源码下载

二、修改lib下部分文件，编译Cython模块

三、安装COCO API

四、下载数据

五、运行demo并测试预训练模型

（1）下载预训练模型

（2）运行demo

（3）测试预训练模型

六、基于pascal VOC数据集自己训练模型

（1）下载预训练模型和权重

（2）训练模型（及测试、评估过程）

七、利用Faster R-CNN框架训练自己项目数据

（1）数据标注及格式转换

（2）开始训练

八、参考资料

二、修改`lib`下部分文件，编译Cython模块