ROI Pooling原理及實現

目標檢測architecture通常可以分爲兩個階段：
（1）region proposal：給定一張輸入image找出objects可能存在的所有位置。這一階段的輸出應該是一系列object可能位置的bounding box。這些通常稱之爲region proposals或者 regions of interest（ROI），在這一過程中用到的方法是基於滑窗的方式和selective search。
（2）final classification：確定上一階段的每個region proposal是否屬於目標一類或者背景。

這個architecture存在的一些問題是：

產生大量的region proposals 會導致performance problems，很難達到實時目標檢測。
在處理速度方面是suboptimal。
無法做到end-to-end training。

這就是ROI pooling提出的根本原因，ROI pooling層能實現training和testing的顯著加速，並提高檢測accuracy。該層有兩個輸入：

從具有多個卷積核池化的深度網絡中獲得的固定大小的feature maps；
一個表示所有ROI的N*5的矩陣，其中N表示ROI的數目。第一列表示圖像index，其餘四列表示其餘的左上角和右下角座標；

ROI pooling具體操作如下：

根據輸入image，將ROI映射到feature map對應位置；
將映射後的區域劃分爲相同大小的sections（sections數量與輸出的維度相同）；
對每個sections進行max pooling操作；

這樣我們就可以從不同大小的方框得到固定大小的相應的feature maps。值得一提的是，輸出的feature maps的大小不取決於ROI和卷積feature maps大小。ROI pooling 最大的好處就在於極大地提高了處理速度。

ROI pooling example
我們有一個8*8大小的feature map，一個ROI，以及輸出大小爲2*2.

輸入的固定大小的feature map
region proposal 投影之後位置（左上角，右下角座標）：（0，3），（7，8）。
將其劃分爲（2*2）個sections（因爲輸出大小爲2*2），我們可以得到：
對每個section做max pooling，可以得到：

整體過程如下：

說明：在此案例中region proposals 是5*7大小的，在pooling之後需要得到2*2的，所以在5*7的特徵圖劃分成2*2的時候不是等分的，行是5/2，第一行得到2，剩下的那一行是3，列是7/2，第一列得到3，剩下那一列是4。

ROI Pooling 就是將大小不同的feature map 池化成大小相同的feature map，利於輸出到下一層網絡中。

代碼實現

$ git clone git@github.com:deepsense-io/roi-pooling.git
$ cd roi-pooling
$ python setup.py install

from __future__ import print_function

import tensorflow as tf
import numpy as np

from roi_pooling.roi_pooling_ops import roi_pooling



# 4x4 feature map with only 1 channel
input_value = [[
    [[1], [2], [4], [4]],
    [[3], [4], [1], [2]],
    [[6], [2], [1], [7]],
    [[1], [3], [2], [8]]
]]
input_value = np.asarray(input_value, dtype='float32')

# regions of interest as lists of:
# feature map index, upper left, bottom right coordinates
rois_value = [
    [0, 0, 0, 1, 3],
    [0, 2, 2, 3, 3],
    [0, 1, 0, 3, 2]
]
rois_value = np.asarray(rois_value, dtype='int32')

# in this case we have 3 RoI pooling operations:
# * channel 0, rectangular region (0, 0) to (1, 3)
#              xx..
#              xx..
#              xx..
#              xx..
#
# * channel 0, rectangular region (2, 2) to (3, 3)
#              ....
#              ....
#              ..xx
#              ..xx
# * channel 0, rectangular region (1, 0) to (3, 2)
#              ....
#              xxx.
#              xxx.
#              xxx.

input_featuremap = tf.placeholder(tf.float32)
rois = tf.placeholder(tf.int32)
input_const = tf.constant(input_value, tf.float32)
rois_const = tf.constant(rois_value, tf.int32)
y = roi_pooling(input_const, rois_const, pool_height=2, pool_width=2)

with tf.Session('') as sess:
    y_output = sess.run(y, feed_dict={input_featuremap: input_value, rois: rois_value})
    print(y_output)

輸出

[[[[ 3.  4.]
   [ 6.  3.]]]


 [[[ 1.  7.]
   [ 2.  8.]]]


 [[[ 4.  4.]
   [ 4.  7.]]]]

ROI Pooling原理及實現

ROI Pooling 就是將大小不同的feature map 池化成大小相同的feature map，利於輸出到下一層網絡中。

探究職業發展的關鍵：能力模型解讀

如何在低代碼平臺中引用 JavaScript ？

高效率使用windows

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

10人電梯（2）

最多能喝多少啤酒（3）

tensorflow-命令行參數

從RCNN到Faster RCNN

object-detection

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結