「Computer Vision」Note on CornerNet

QQ Group: 428014259
Tencent E-mail：[email protected]
http://blog.csdn.net/dgyuanshaofeng/article/details/82048113

20190413阅读arXiv v2，该论文格式为IJCV格式。【之前阅读的arXiv v1以及ECCV】

CornerNet[1]的作者是来自普林斯顿大学的Hei Law和Jia Deng，此算法/模型是目标检测的新方法。其中Jia Deng，还有其它比较出名的工作，比如人体姿态估计领域的Stacked hourglass networks（堆叠沙漏网络）（ECCV2016），图像分类领域的ImageNet数据集（CVPR2009）。

作者：Hei Law, Jia Deng
单位：Princeton University

0 摘要

将目标包围盒（object bounding box）转化为一对关键点（paired keypoints）：左-上角点和右-下角点，其优势在于取消anchor boxes的设计。anchor boxes通常在检测器中使用，比如SSD和Faster R-CNN。提出corner pooling（角点池化），用于准确定位角点。在MS COCO数据集上，取得42.1%【更改为42.2%】AP，超越所有一阶段式检测器（这里有问题，因为一阶段式检测器更为强调速度，而非精度，因此与这些模型比精度，似乎有些流氓了）。

1 介绍

anchor boxes就是各种size和aspect ratio预先定义好的框。一阶段式检测器依赖anchor boxes（锚框）实现类似二阶段式检测器的检测精度，并且更为有效率。然而，有两点缺陷。其一，需要大量锚框，DSSD使用40k，RetinaNet使用100k，最后造成正负锚框的巨大不平衡。其二，引入许多超参数和设计选择。如图1所示，为CornerNet的方法pipeline。卷积网络输出所有左和上一对角点的热图，输出所有右和下一对角点的热图，还输出每一对角点的embedding vector（嵌入向量）。

图 1：CornerNet的pipeline

CornerNet的另一个要素是角点池化，如图2所示。总之，角点通常在目标之外，需要借助其它信息来定位角点，而非使用角点附近的信息。

图 2: Corner Pooling

作者给出了检测角点比检测包围框好的两点假设性理由。其一，包围盒的中心需要目标四边，而角点需要两边。其二，更有效，可用O(wh)角点表达O(w^2 * h^2)可能锚框。

2 相关工作

分别介绍了两类目标检测器，其中DSSD和RON都使用了类似的沙漏网络。
着重介绍了DeNet和本文工作的区别。
特别强调了本文工作的启发工作Associative Embedding[2]，后文为NIPS2017的文章，可以重点阅读一下，因为NIPS上视觉类文章还是比较少的。
本文工作还使用了新颖变种focal loss。

3 角点网络（CornerNet）

3.1 Overview

物体检测被视作检测边界框的一对关键点——左上角点和右下角点。【就是[min_x, min_y]和[max_x, max_y]，不过这里用高斯核表示或具有一个半径范围】
如图4所示为CornerNet的overview。显然，使用沙漏网络作为主干网络。沙漏网络紧接着两个预测模块prediction module，上面的预测模块预测左上角点，下面的预测模块预测右下角点。每一个预测模块有自己的角点池化Corner Pooling，见图4最右边，角点池化池化主干网络的特征，然后预测热图heatmaps，嵌入向量embeddings和角点偏移offsets。作者在这里仅仅利用了主干网络的输出特征，并没有像其它物体检测器那样使用不同scales的特征，预测不同sizes的物体。

图 4: CornerNet

3.2 检测角点

变种focal loss：
$L_{det}=...$
fast r-cnn里面的smooth L1 loss：
$L_{off}=...$

3.3 组合角点

组合角点的“pull”（拉近）损失：
$L_{pull}=...$
分离角点的“push”（推远）损失：
$L_{push}=...$

3.4 角点池化（Corner Pooling）

如图6所示，非常清楚地解释了如何进行角点池化，以top-left点为例。在实现中，详细阅读作者的cpp代码。
1、水平方向，从右到左，相邻两列进行比较，保留较大者。
2、垂直方法，从底向顶，相邻两行进行比较，保留较大者。
3、上面两步得到的特征图交叠的部分进行add相加操作。

图 6：角点池化，以top-left点为例

如图5所示，为预测模块。具体修改见开源代码和文章描述。

图 5: 预测模块

3.5 沙漏网络（hourglass network）

作者对hourglass network[3]作了比较大的修改。
作者采用2个hourglasses。
1、不采用max pooling，采用stride=2的卷积。
2、下采样5次，通道增加[256, 384, 384, 384, 512]。
3、上采样时，先采用2个残差模块，后跟最近邻上采样。
4、跳跃结果里面的变换，采用2个残差模块。
5、进入hourglass module之前，将图像下采样4倍。【也算是高分辨率表示学习了HRNet】

4 实验

4.1 训练细节

总的训练损失：
$L=L_{det}+\alpha L_{pull}+\beta L_{push}+\gamma L_{off}$
$\alpha$ =0.1
$\beta$ =0.1
$\gamma$ =1

4.2 测试细节

在corner heatmaps上进行 $3 \times 3$ 最大值池化。
original和flipped图像都进行了测试，然后采用soft-nms。
平均预测时间244ms，采用设备为Titan X。

4.3 MS COCO

trainval2014用于训练
minival2014用于验证
testdev2017用于测试（没有金标准，上传服务器进行评价）

4.4 消融实验

4.4.1 角点池化

结果如表1所示。角点池化提升2.0个点。

表 1: 角点池化的作用

4.4.2 角点池化在较大区域的稳定性

结果如表3所示。角点池化提升2.8个点。

表 3: 角点池化的作用

4.4.3 减小负样本位置惩罚

对中型和大型物体具有好处。对小型物体却具有坏处。
结果如表2所示。

表 2: corner的半径

4.4.4 Hourglass Network

1、训练CornerNet，采用FPN w/ ResNet-101，initialize the networks from scratch，only use the final output of FPN for predictions
2、训练基于anchor box的检测器，采用Hourglass network，每一hourglass module在上采样阶段预测anchor box，采用中间监督。
结果如表4所示，说明hourglass+corners这种搭配最好。采用FPN为什么这么差，可能是因为从头训练。

表 4: CornerNet依赖堆叠沙漏网络

4.4.5 边界框的质量

与RetinaNet，Cascade R-CNN和IoU-Net进行比较，如表5所示。在高IoU阈值0.9时，CornerNet比这些方法都好，说明如果要求高质量的检测框，CornerNet具有优势。【但是，在三维图像中，工程实践设置高IoU阈值是更少见的。即平时会采用低IoU阈值。】

表 5: CornerNet在高IoU阈值时具有优势

4.4.6 误差分析

CornerNet同时输出heatmaps，offsets和embeddings，三者都影响检测性能。
作者使用ground-truth values替换预测heatmaps和offsets，在验证集上进行误差分析。结果如表6所示，AP从38.4提升到73.1。
作者使用ground-truth offsets替换预测offsets，在验证集上进行误差分析。结果如表6所示，AP从38.4提升到86.1。
上面两个实验说明，可以提升的空间是巨大的。
如图9所示，是一些不准确的结果。合并不同物体的边界，对于不同物体预测相似的embeddings。

4.5 与世界一流检测器比较

MS COCO test-dev
42.2%的mAP是采用了多尺度evaluation。
结果如表7所示，在单尺度，CornerNet的性能超越RetinaNet800和RefineDet512，超越两阶段的Mask R-CNN。在多尺度，CornerNet的性能超越RefineDet512。 $AP^{s}$ =19.1表明CornerNet在检测小物体时不具有较大的优势。

表 7: CornerNet与各方法比较

5 结论

略

6 实践

作者给出的PyTorch代码是可以跑起来的。
模型大小接近800MB。【VGG也才500MB】
测试阶段占用显存3.1GB。
在训练阶段中，看到需要一周（170/24=7.08），哭出来了。

    training loss at iteration 5: 434.4440002441406                                 
    training loss at iteration 10: 88.7403335571289                                 
    training loss at iteration 15: 38.28278732299805                                
    training loss at iteration 20: 22.313444137573242                               
    training loss at iteration 25: 10.515382766723633                               
    training loss at iteration 30: 9.795891761779785                                
      0%|                                   | 30/500000 [00:42<170:59:38,  1.23s/it]

6.1 Compiling Corner Pooling Layers

打开setup.py文件。

from setuptools import setup # 导入setuptools里面的setup函数
from torch.utils.cpp_extension import BuildExtension, CppExtension
# 导入BuildExtension和CppExtension
# torch.utils.cpp_extension里面还有CUDAExtension

setup(
    name="cpools",
    ext_modules=[
        CppExtension("top_pool", ["src/top_pool.cpp"]),
        CppExtension("bottom_pool", ["src/bottom_pool.cpp"]),
        CppExtension("left_pool", ["src/left_pool.cpp"]),
        CppExtension("right_pool", ["src/right_pool.cpp"])
    ],
    cmdclass={
        "build_ext": BuildExtension
    }
)

去了解setuptools，code和docs。一些参数的说明参考资料[4]。
去了解torch.utils.cpp_extension。

6.1.1 以top_pool.cpp为例阅读和理解角点池化

阅读如下代码，分成三部分：
1、前向传播
2、反向传播
3、利用pybind11将C++和Python绑定起来，详细了解参考资料[5]

top_pool.cpp的代码如下：

#include <torch/torch.h>
// https://github.com/pytorch/pytorch/tree/master/torch/csrc/api/include/torch
#include <vector>

std::vector<at::Tensor> top_pool_forward( // 前向传播
    at::Tensor input // 输入
) {
    // Initialize output
    at::Tensor output = at::zeros_like(input); // 输出，初始化为0值

    // Get height
    int64_t height = input.size(2); // input为[n, c, h, w]，取得h的大小
    								//如果改为三维，可能修改这里，因为[n, c, d, h, w]

    // Copy the last column
    at::Tensor input_temp  = input.select(2, height - 1);
    // https://pytorch.org/docs/0.4.1/tensors.html#torch.Tensor.select
    // select(dim, index) → Tensor
    // Slices the self tensor along the selected dimension at the given index.
    // 这里我觉得拿出的是最后一行。就是假如输入[1,3,3,5]，input_temp就是[1,3,5]。可能是C++和Python索引不同。
    // 拿出/选择最后一列，tensor[:,:,height-1,:]
    at::Tensor output_temp = output.select(2, height - 1);
    // 拿出/选择最后一列，tensor[:,:,height-1,:]
    output_temp.copy_(input_temp);
    // https://pytorch.org/docs/0.4.1/tensors.html#torch.Tensor.copy_
    // copy_(src, non_blocking=False) → Tensor
    // Copies the elements from src into self tensor and returns self.

    at::Tensor max_temp;
    for (int64_t ind = 1; ind < height; ++ind) {
        input_temp  = input.select(2, height - ind - 1);
        // from bottom to top
        // 从倒数第二行开始
        output_temp = output.select(2, height - ind);
        // from bottom to top
        // 从倒数第一行开始
        max_temp    = output.select(2, height - ind - 1);
        // from bottom to top
        // 从倒数第二行开始
        at::max_out(max_temp, input_temp, output_temp);
        // https://pytorch.org/cppdocs/api/function_namespaceat_1ab40751edb25d9ed68d4baa5047564a89.html
        // static Tensor &at::max_out(Tensor &out, const Tensor &self, const Tensor &other)
        // max_temp是从output的倒数第二列开始的，所以一开始为0；
        // input_temp为倒数第二列，其值不为0；
        // output_temp为倒数第一列，其值不为0；
        // input_temp和moutput_temp进行比较，保留较大者，影响output，形象的计算过程如Fig. 6
        
    }

    return { 
        output // 返回top-left点的top特征图，还需要left_pool.cpp计算出来的left特征图
    };
}

std::vector<at::Tensor> top_pool_backward( // 反向传播
    at::Tensor input, // 输入
    at::Tensor grad_output // 梯度
) {
    auto output = at::zeros_like(input); // 输出，初始化为0值

    int32_t batch   = input.size(0); // 样本数
    int32_t channel = input.size(1); // 通道数
    int32_t height  = input.size(2); // 高的大小
    int32_t width   = input.size(3); // 宽的大小

    auto max_val = at::zeros(torch::CUDA(at::kFloat), {batch, channel, width}); // 最大值, float
    auto max_ind = at::zeros(torch::CUDA(at::kLong),  {batch, channel, width}); // 最大值的索引, longint

    auto input_temp = input.select(2, height - 1); // 选择最后一列
    max_val.copy_(input_temp);
    // Copies the elements from src into self tensor and returns self.

    max_ind.fill_(height - 1);
    // Fills self tensor with the specified value. 即最后一列的位置

    auto output_temp      = output.select(2, height - 1); // 选择最后一列
    auto grad_output_temp = grad_output.select(2, height - 1); // 选择最后一列
    output_temp.copy_(grad_output_temp);
    // Copies the elements from src into self tensor and returns self.

    auto un_max_ind = max_ind.unsqueeze(2);
    // Returns a new tensor with a dimension of size one inserted at the specified position.
    auto gt_mask    = at::zeros(torch::CUDA(at::kByte),  {batch, channel, width});
    auto max_temp   = at::zeros(torch::CUDA(at::kFloat), {batch, channel, width});
    for (int32_t ind = 1; ind < height; ++ind) {
        input_temp = input.select(2, height - ind - 1);
        // 倒数第二列开始
        at::gt_out(gt_mask, input_temp, max_val); 
        // https://pytorch.org/docs/stable/torch.html#torch.gt
        // input_temp为倒数第二列
        // max_val为倒数第一列

        at::masked_select_out(max_temp, input_temp, gt_mask);
        // https://pytorch.org/docs/stable/torch.html#torch.masked_select
        // Returns a new 1-D tensor which indexes the input tensor according to the binary mask mask which is a ByteTensor.
        // 按照gt_mask取出input_temp的值，给max_temp
        max_val.masked_scatter_(gt_mask, max_temp);
        // https://pytorch.org/docs/stable/tensors.html#torch.Tensor.masked_scatter_
        // Copies elements from source into self tensor at positions where the mask is one. 
        max_ind.masked_fill_(gt_mask, height - ind - 1);
        // https://pytorch.org/docs/stable/tensors.html#torch.Tensor.masked_fill_
        // Fills elements of self tensor with value where mask is one.
        // gt_mask的值为1的位置，赋值hetiht - ind - 1，即倒数第二列

        grad_output_temp = grad_output.select(2, height - ind - 1).unsqueeze(2);
        output.scatter_add_(2, un_max_ind, grad_output_temp);
        // https://pytorch.org/docs/stable/tensors.html#torch.Tensor.scatter_add_
        // scatter_add_(dim, index, other) → Tensor
        // 要理解这部分
    }

    return {
        output
    };
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { // 后来在Python里面，要import TORCH_EXTENSION_NAME
    m.def(
        "forward", &top_pool_forward, "Top Pool Forward", // TORCH_EXTENSION_NAME.forward
        py::call_guard<py::gil_scoped_release>()
    );
    m.def(
        "backward", &top_pool_backward, "Top Pool Backward", // TORCH_EXTENSION_NAME.backward
        py::call_guard<py::gil_scoped_release>()
    );
}

理解scatter_add_：1、生成了随机矩阵；2、生成了Mask矩阵；3、执行scatter_add_，看到y的0-2，结果self的位置就是1.7404-1.7953，1的位置为1.000；再看到y的1-0，结果self的位置就是颠倒了1.2009-1.0427,2的位置为1.000；再看到y的2-0，结果self的位置就是颠倒了1.9154-1.6480,1的位置为1.000；再看y的0-1，结果self的位置就是顺着的，2的位置为1.000；最后看到y的0-2，跟之前的一样。规律是比较清楚了。

    >>> x = torch.rand(2, 5)
    >>> x
    tensor([[0.7404, 0.0427, 0.6480, 0.3806, 0.8328],
            [0.7953, 0.2009, 0.9154, 0.6782, 0.9620]])
    >>> y = torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]])
    >>> y
    tensor([[0, 1, 2, 0, 0],
    		[2, 0, 0, 1, 2]])
    >>> torch.ones(3, 5).scatter_add_(0, y, x)
    tensor([[1.7404, 1.2009, 1.9154, 1.3806, 1.8328],
            [1.0000, 1.0427, 1.0000, 1.6782, 1.0000],
            [1.7953, 1.0000, 1.6480, 1.0000, 1.9620]])

6.2 Compiling NMS

主要看Makefile，setup.py和nms.pyx。
.pyx是Cpython的文件扩展名，Cpython的介绍。nms.pyx是RBG大神写的，里面包含马里兰大学提出的soft-nms。
将nms和soft-nms改写为3D，然后make。

6.3 Installing MS COCO APIs

安装python版本的MS COCO API，用于解析标注。
主要看PythonAPI里面的Makefile、setup.py和pycocotools里面的文件。显然，需要看看maskApi.c和_mask.pyx。

from setuptools import setup, Extension
import numpy as np

# To compile and install locally run "python setup.py build_ext --inplace"
# To install library to Python site-packages run "python setup.py build_ext install"

    ext_modules = [
        Extension(
            'pycocotools._mask',
            sources=['../common/maskApi.c', 'pycocotools/_mask.pyx'],
            include_dirs = [np.get_include(), '../common'],
            extra_compile_args=['-Wno-cpp', '-Wno-unused-function', '-std=c99'],
        )
    ]
    
    setup(
        name='pycocotools',
        packages=['pycocotools'],
        package_dir = {'pycocotools': 'pycocotools'},
        install_requires=[
            'setuptools>=18.0',
            'cython>=0.27.3',
            'matplotlib>=2.1.0'
        ],
        version='2.0',
        ext_modules= ext_modules
    )

[1] CornerNet Detecting Objects as Paired Keypoints ECCV 2018 [paper] [PyTorch code]
[2] Associative Embedding End-to-End Learning for Joint Detection and Grouping NIPS 2017 [paper] [PyTorch code]
[3] Stacked Hourglass Networks for Human Pose Estimation ECCV 2016 [paper]
[4] Python 库打包分发(setup.py 编写)简易指南 [link]
[5] 使用pybind11 将C++代码编译为python模块 [link]

「Computer Vision」Note on CornerNet

0 摘要

1 介绍

2 相关工作

3 角点网络（CornerNet）

3.1 Overview

3.2 检测角点

3.3 组合角点

3.4 角点池化（Corner Pooling）

3.5 沙漏网络（hourglass network）

4 实验

4.1 训练细节

4.2 测试细节

4.3 MS COCO

4.4 消融实验

4.4.1 角点池化

4.4.2 角点池化在较大区域的稳定性

4.4.3 减小负样本位置惩罚

4.4.4 Hourglass Network

4.4.5 边界框的质量

4.4.6 误差分析

4.5 与世界一流检测器比较

5 结论

6 实践

6.1 Compiling Corner Pooling Layers

6.1.1 以top_pool.cpp为例阅读和理解角点池化

6.2 Compiling NMS

6.3 Installing MS COCO APIs

「Computer Vision」Note on CornerNet

「Medical Image Analysis」Note on Elastic Boundary Projection

「Medical Image Analysis」Note on NAS-Unet

「Medical Image Analysis」Note on Deep Image-to-Image Recurrent Network

「Deep Learning」Note on ProxylessNAS

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結