好的文章怕沒了，先轉載過來。。。。。。。

詳細的Faster R-CNN源碼解析之RPN源碼解析

原創 2018年04月02日 22:08:09

在闊別了將近三個月之後，筆者又準備更新博客了。對於前兩個多月的未及時更新，筆者在此向大家表示歉意，請大家原諒。

本次博客的更新是關於Faster R-CNN的源碼。首先說一下筆者爲什麼要更新Faster R-CNN的源碼解析，有以下兩個原因：

1. 筆者的研究方向和目標檢測有一些關係。雖然不是純做目標檢測，但是像Faster R-CNN這樣的經典框架必須做出比較深度的瞭解，尤其是像RPN這樣的革命性算法。並且，雖然Faster R-CNN是2016年面世的工作，但是其中的經典架構，尤其RPN，仍然是被目前非常多的方法採用。因此，筆者首先對Faster R-CNN的RPN做出解析。

2. 對於網上的資源，講Faster R-CNN原理的偏多，但是講解代碼的非常少。筆者認爲，原理固然重要，但是弄懂原理之後一定要仔細讀讀代碼，這樣對人的提升比較大。另外，筆者在寫作博客的時候，都沒有去盲從其他博主的內容。如果網絡上面內容比較詳實，筆者不會再寫作相關博客。反之，筆者認爲應該寫的是大家感到疑難的，並且網絡上面資源比較少的內容。這樣，才能解決大家的燃眉之急。

3. Faster R-CNN不僅作爲經典框架，筆者認爲，Faster R-CNN的代碼也是整個深度學習中非常經典，非常有難度，非常有代表性的。在Faster R-CNN的RPN中，比較難的模塊有如何生成anchor，如何計算anchor對應的標籤(分類與邊框迴歸)。因此，對Faster R-CNN的RPN做出解析，希望能解決大家的問題。

解說了寫作原因，在正式開始代碼之前，筆者還想多說幾點：

1. 如果要閱讀本篇博客或者想讓本篇博客對大家有幫助，請務必瞭解Faster R-CNN算法框架。筆者推薦的有以下幾個途徑：

1) 直接進行論文閱讀：https://arxiv.org/abs/1506.01497

2) 由於Faster R-CNN先驗知識很多，覺得論文閱讀有困難的讀者，不妨參考筆者的博客：

實例分割模型Mask R-CNN詳解：從R-CNN，Fast R-CNN，Faster R-CNN再到Mask R-CNN

3) 也可以看一篇知乎上面的這一篇介紹Faster R-CNN的文章，筆者認爲不錯。

一文讀懂Faster R-CNN

2. 由於筆者只是一個碩士，對於代碼解讀只能做到儘量詳實。如果覺得有問題有疑問有疏漏的讀者朋友，歡迎在評論區指出，筆者不勝感激。

3. (非常重要)，筆者解析的Faster R-CNN代碼是tensorflow版本的，鏈接地址https://github.com/kevinjliang/tf-Faster-RCNN，但是有非常多的接口還是沿用的Girshick的py-faster-rcnn版本，況且對於主要模塊的實現都一樣。所以，請大家還是先下載對應的代碼並對整個代碼結構有相應瞭解，才能看懂筆者的整篇博客。

下面開始乾貨：

首先，在faster_rcnn_resnet50ish.py文件中，我們看一下訓練時數據層輸出的是：

[python]view plain copy
# Train data  
self.x['TRAIN'] = tf.placeholder(tf.float32, [1, None, None, 3]) #圖片  
self.im_dims['TRAIN'] = tf.placeholder(tf.int32, [None, 2]) #圖像尺度 [height, width]  
self.gt_boxes['TRAIN'] = tf.placeholder(tf.int32, [None, 5]) #目標框  

可以看到，輸入網絡的首先是圖片。然後圖像的寬高，因爲對於不同尺寸的圖像生成的anchor座標也是不同的。最後是目標框信息，目標框信息的第二維包含五元，前四元是目標的座標，最後一元是目標的類別。

然後，我們進入faster_rcnn_networks.py文件，可以看到rpn類，按照筆者的風格我們還是先貼出註釋的源碼：

[python]view plain copy
# -*- coding: utf-8 -*-  
""" 
Created on Fri Dec 30 16:14:48 2016 
 
@author: Kevin Liang 
 
Faster R-CNN detection and classification networks. 
 
Contains the Region Proposal Network (RPN), ROI proposal layer, and the RCNN. 
 
TODO: -Split off these three networks into their own files OR add to Layers 
"""  
  
import sys  
  
sys.path.append('../')  
  
from Lib.TensorBase.tensorbase.base import Layers  
  
from Lib.faster_rcnn_config import cfg  
from Lib.loss_functions import rpn_cls_loss, rpn_bbox_loss, fast_rcnn_cls_loss, fast_rcnn_bbox_loss  
from Lib.roi_pool import roi_pool  
from Lib.rpn_softmax import rpn_softmax  
from Networks.anchor_target_layer import anchor_target_layer  
from Networks.proposal_layer import proposal_layer  
from Networks.proposal_target_layer import proposal_target_layer  
  
import tensorflow as tf  
  
  
class rpn:  
    ''''' 
    Region Proposal Network (RPN): From the convolutional feature maps 
    (TensorBase Layers object) of the last layer, generate bounding boxes 
    relative to anchor boxes and give an "objectness" score to each 
 
    In evaluation mode (eval_mode==True), gt_boxes should be None. 
    '''  
  
    def __init__(self, featureMaps, gt_boxes, im_dims, _feat_stride, eval_mode):  
        self.featureMaps = featureMaps #得到共享特徵  
        self.gt_boxes = gt_boxes #得到標籤 shape: [None, 5]，記錄左上角和右下角的座標以及類別  
        self.im_dims = im_dims #圖像尺度 shape: [None ,2]，記錄圖像的寬度與高度  
        self._feat_stride = _feat_stride #記錄圖像經過特徵圖縮小的尺度  
        self.anchor_scales = cfg.RPN_ANCHOR_SCALES #記錄anchor的尺度 [8, 16, 32]  
        self.eval_mode = eval_mode #記錄是訓練還是測試  
          
        self._network() #執行_network函數  
  
    def _network(self):  
        # There shouldn't be any gt_boxes if in evaluation mode  
        if self.eval_mode is True: #如果是測試的話，那麼就沒有ground truth  
            assert self.gt_boxes is None, \  
                'Evaluation mode should not have ground truth boxes (or else what are you detecting for?)'  
  
        _num_anchors = len(self.anchor_scales)*3 #_num_anchors爲9(3×3)，指一次滑動對應9個anchor  
  
        rpn_layers = Layers(self.featureMaps) #將共享特徵賦給rpn_layers  
  
        with tf.variable_scope('rpn'):  
            # Spatial windowing  
            for i in range(len(cfg.RPN_OUTPUT_CHANNELS)):# 在這裏先用3×3的核輸出512個通道  
                rpn_layers.conv2d(filter_size=cfg.RPN_FILTER_SIZES[i], output_channels=cfg.RPN_OUTPUT_CHANNELS[i])  
                  
            features = rpn_layers.get_output()  
  
            with tf.variable_scope('cls'):  
                # Box-classification layer (objectness)  
                self.rpn_bbox_cls_layers = Layers(features) #在這裏使用1×1的核輸出18(9×2)個通道  
                self.rpn_bbox_cls_layers.conv2d(filter_size=1, output_channels=_num_anchors*2, activation_fn=None)  
  
            with tf.variable_scope('target'): #在這裏得到每個anchor對應的target  
                # Only calculate targets in train mode. No ground truth boxes in evaluation mode  
                if self.eval_mode is False:  
                    # Anchor Target Layer (anchors and deltas)  
                    rpn_cls_score = self.rpn_bbox_cls_layers.get_output()  
                    self.rpn_labels, self.rpn_bbox_targets, self.rpn_bbox_inside_weights, self.rpn_bbox_outside_weights = \  
                        anchor_target_layer(rpn_cls_score=rpn_cls_score, gt_boxes=self.gt_boxes, im_dims=self.im_dims,  
                                            _feat_stride=self._feat_stride, anchor_scales=self.anchor_scales)  
  
            with tf.variable_scope('bbox'): #在這裏使用1×1的核輸出36(9×4)個通道  
                # Bounding-Box regression layer (bounding box predictions)  
                self.rpn_bbox_pred_layers = Layers(features)  
                self.rpn_bbox_pred_layers.conv2d(filter_size=1, output_channels=_num_anchors*4, activation_fn=None)  
  
    # Get functions  
    def get_rpn_cls_score(self): #返回rpn網絡判斷的anchor前後景分數  
        return self.rpn_bbox_cls_layers.get_output()  
  
    def get_rpn_labels(self): #返回每個anchor屬於前景還是後景的ground truth  
        assert self.eval_mode is False, 'No RPN labels without ground truth boxes'  
        return self.rpn_labels  
  
    def get_rpn_bbox_pred(self): #返回rpn判斷的anchor的四個偏移值  
        return self.rpn_bbox_pred_layers.get_output()  
  
    def get_rpn_bbox_targets(self): #返回每個anchor對應的事實的四個偏移值  
        assert self.eval_mode is False, 'No RPN bounding box targets without ground truth boxes'  
        return self.rpn_bbox_targets  
  
    def get_rpn_bbox_inside_weights(self): #在訓練計算邊框誤差時有用，僅對未超出圖像邊界的anchor有用  
        assert self.eval_mode is False, 'No RPN inside weights without ground truth boxes'  
        return self.rpn_bbox_inside_weights  
  
    def get_rpn_bbox_outside_weights(self): #在訓練計算邊框誤差時有用，僅對未超出圖像邊界的anchor有用  
        assert self.eval_mode is False, 'No RPN outside weights without ground truth boxes'  
        return self.rpn_bbox_outside_weights  
  
    # Loss functions  
    def get_rpn_cls_loss(self): #計算rpn的分類loss  
        assert self.eval_mode is False, 'No RPN cls loss without ground truth boxes'  
        rpn_cls_score = self.get_rpn_cls_score()  
        rpn_labels = self.get_rpn_labels()  
        return rpn_cls_loss(rpn_cls_score, rpn_labels)  
  
    def get_rpn_bbox_loss(self): #計算rpn的邊界損失loss，請注意在這裏用到了inside和outside_weights  
        assert self.eval_mode is False, 'No RPN bbox loss without ground truth boxes'  
        rpn_bbox_pred = self.get_rpn_bbox_pred()  
        rpn_bbox_targets = self.get_rpn_bbox_targets()  
        rpn_bbox_inside_weights = self.get_rpn_bbox_inside_weights()  
        rpn_bbox_outside_weights = self.get_rpn_bbox_outside_weights()  
        return rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights)  

我們可以看一下，rpn類在訓練的時候主要有兩個功能，第一個是get_rpn_cls_loss計算的rpn網絡分類loss，第二個是get_rpn_bbox_loss計算的rpn網絡的anchor邊界迴歸loss。那麼，要計算兩個loss，最難的地方是如何去獲得ground truth。這個ground truth的獲得是通過anchor_target_layer函數實現的，那麼，我們首先來進入這個函數，按照慣例先放出源碼：

[python]view plain copy
# -*- coding: utf-8 -*-  
""" 
Created on Sun Jan  1 16:11:17 2017 
 
@author: Kevin Liang (modifications) 
 
Anchor Target Layer: Creates all the anchors in the final convolutional feature 
map, assigns anchors to ground truth boxes, and applies labels of "objectness" 
 
Adapted from the official Faster R-CNN repo:  
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/anchor_target_layer.py 
"""  
  
# --------------------------------------------------------  
# Faster R-CNN  
# Copyright (c) 2015 Microsoft  
# Licensed under The MIT License [see LICENSE for details]  
# Written by Ross Girshick and Sean Bell  
# --------------------------------------------------------  
  
import sys  
sys.path.append('../')  
  
import numpy as np  
import numpy.random as npr  
import tensorflow as tf  
  
from Lib.bbox_overlaps import bbox_overlaps  
from Lib.bbox_transform import bbox_transform  
from Lib.faster_rcnn_config import cfg  
from Lib.generate_anchors import generate_anchors  
  
#該函數計算每個anchor對應的ground truth(前景/背景，座標偏移值)  
def anchor_target_layer(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):  
    ''''' 
    Make Python version of _anchor_target_layer_py below Tensorflow compatible 
    '''  
    #執行_anchor_target_layer_py函數，傳參有網絡預測的rpn分類分數，ground_truth_box，圖像的尺寸，與原圖相比特徵圖縮小的比例和anchor的尺度  
    rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights = \  
        tf.py_func(_anchor_target_layer_py, [rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales],  
                   [tf.float32, tf.float32, tf.float32, tf.float32])  
  
    #轉化成tensor  
    rpn_labels = tf.convert_to_tensor(tf.cast(rpn_labels,tf.int32), name = 'rpn_labels')  
    rpn_bbox_targets = tf.convert_to_tensor(rpn_bbox_targets, name = 'rpn_bbox_targets')  
    rpn_bbox_inside_weights = tf.convert_to_tensor(rpn_bbox_inside_weights , name = 'rpn_bbox_inside_weights')  
    rpn_bbox_outside_weights = tf.convert_to_tensor(rpn_bbox_outside_weights , name = 'rpn_bbox_outside_weights')  
  
    return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights  
  
  
def _anchor_target_layer_py(rpn_cls_score, gt_boxes, im_dims, _feat_stride, anchor_scales):  
    """ 
    Python version     
     
    Assign anchors to ground-truth targets. Produces anchor classification 
    labels and bounding-box regression targets. 
     
    # Algorithm: 
    # 
    # for each (H, W) location i 
    #   generate 9 anchor boxes centered on cell i 
    #   apply predicted bbox deltas at cell i to each of the 9 anchors 
    # filter out-of-image anchors 
    # measure GT overlap 
    """  
    im_dims = im_dims[0] #獲得原圖的尺度[height, width]  
    _anchors = generate_anchors(scales=np.array(anchor_scales))# 生成9個錨點，shape: [9,4]  
    _num_anchors = _anchors.shape[0] #_num_anchors值爲9  
      
    # allow boxes to sit over the edge by a small amount  
    _allowed_border =  0 #將anchor超出邊界的限度設置爲0  
      
    # Only minibatch of 1 supported 在這裏覈驗batch_size是否爲1  
    assert rpn_cls_score.shape[0] == 1, \  
        'Only single item batches are supported'      
      
    # map of shape (..., H, W)  
    height, width = rpn_cls_score.shape[1:3] #在這裏得到了rpn輸出的H和W，總的anchor數目應該是H×W×9  
      
    # 1. Generate proposals from bbox deltas and shifted anchors  
    #下面是在原圖上生成anchor  
    shift_x = np.arange(0, width) * _feat_stride #shape: [width,]  
    shift_y = np.arange(0, height) * _feat_stride #shape: [height,]  
    shift_x, shift_y = np.meshgrid(shift_x, shift_y) #生成網格 shift_x shape: [height, width], shift_y shape: [height, width]  
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),  
                        shift_x.ravel(), shift_y.ravel())).transpose() # shape[height*width, 4]  
  
    # add A anchors (1, A, 4) to  
    # cell K shifts (K, 1, 4) to get  
    # shift anchors (K, A, 4)  
    # reshape to (K*A, 4) shifted anchors  
    A = _num_anchors # A = 9  
    K = shifts.shape[0] # K=height*width(特徵圖上的)  
    all_anchors = (_anchors.reshape((1, A, 4)) +  
                   shifts.reshape((1, K, 4)).transpose((1, 0, 2))) #shape[K,A,4] 得到所有的anchor  
    all_anchors = all_anchors.reshape((K * A, 4))  
    total_anchors = int(K * A) #total_anchors記錄anchor的數目  
      
    # anchors inside the image inds_inside所有的anchor中沒有超過圖像邊界的  
    inds_inside = np.where(  
        (all_anchors[:, 0] >= -_allowed_border) &  
        (all_anchors[:, 1] >= -_allowed_border) &  
        (all_anchors[:, 2] < im_dims[1] + _allowed_border) &  # width  
        (all_anchors[:, 3] < im_dims[0] + _allowed_border)    # height  
    )[0]  
      
    # keep only inside anchors  
    anchors = all_anchors[inds_inside, :]#在這裏選出合理的anchors，指的是沒超出邊界的  
      
    # label: 1 is positive, 0 is negative, -1 is dont care  
    labels = np.empty((len(inds_inside), ), dtype=np.float32)#labels的長度就是合法的anchor的個數  
    labels.fill(-1) #先用-1填充labels  
      
    # overlaps between the anchors and the gt boxes  
    # overlaps (ex, gt)  
    #對所有的沒超過圖像邊界的anchor計算overlap，得到的shape: [len(anchors), len(gt_boxes)]  
    overlaps = bbox_overlaps(  
        np.ascontiguousarray(anchors, dtype=np.float),  
        np.ascontiguousarray(gt_boxes, dtype=np.float))  
    argmax_overlaps = overlaps.argmax(axis=1) #對於每個anchor，找到對應的gt_box座標。shape: [len(anchors),]  
    max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #對於每個anchor，找到最大的overlap的gt_box shape: [len(anchors)]  
    gt_argmax_overlaps = overlaps.argmax(axis=0) #對於每個gt_box，找到對應的最大overlap的anchor。shape[len(gt_boxes),]  
    gt_max_overlaps = overlaps[gt_argmax_overlaps,  
                               np.arange(overlaps.shape[1])]#對於每個gt_box，找到與anchor的最大IoU值。shape[len(gt_boxes),]  
    gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]#再次對於每個gt_box，找到對應的最大overlap的anchor。shape[len(gt_boxes),]  
      
    if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果不需要抑制positive的anchor，就先給背景anchor賦值，這樣在賦前景值的時候可以覆蓋。  
        # assign bg labels first so that positive labels can clobber them  
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在這裏將最大IoU仍然小於閾值(0.3)的某些anchor置0  
  
    # fg label: for each gt, anchor with highest overlap  
    labels[gt_argmax_overlaps] = 1 #在這裏將每個gt_box對應IoU最大的anchor置1  
  
    # fg label: above threshold IOU  
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 #在這裏將最大IoU大於閾值(0.7)的某些anchor置1  
  
    if cfg.TRAIN.RPN_CLOBBER_POSITIVES: #如果需要抑制positive的anchor，就將背景anchor後賦值  
        # assign bg labels last so that negative labels can clobber positives  
        labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 #在這裏將最大IoU仍然小於閾值(0.3)的某些anchor置0  
  
    # subsample positive labels if we have too many  
    num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)#計算出一個訓練batch中需要的前景的數量  
    fg_inds = np.where(labels == 1)[0] #找出被置爲前景的anchors  
    if len(fg_inds) > num_fg:  
        disable_inds = npr.choice(  
            fg_inds, size=(len(fg_inds) - num_fg), replace=False)  
        labels[disable_inds] = -1 #如果事實存在的前景anchor大於了所需值，就隨機拋棄一些前景anchor  
  
    # subsample negative labels if we have too many  
    num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) ##計算出一個訓練batch中需要的背景的數量  
    bg_inds = np.where(labels == 0)[0] #找出被置爲背景的anchors  
    if len(bg_inds) > num_bg:  
        disable_inds = npr.choice(  
            bg_inds, size=(len(bg_inds) - num_bg), replace=False)  
        labels[disable_inds] = -1 #如果事實存在的背景anchor大於了所需值，就隨機拋棄一些背景anchor  
  
    # bbox_targets: The deltas (relative to anchors) that Faster R-CNN should   
    # try to predict at each anchor  
    # TODO: This "weights" business might be deprecated. Requires investigation  
    #返回的是，對於每個anchor，得到四個座標變換值(tx,ty,th,tw)。  
    bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) #對每個在原圖內部的anchor,用全0初始化座標變換值  
    bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) #對於每個anchor，找到變換到對應的最大的overlap的gt_box的四個值  
  
    bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化inside_weights  
    bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) #在前景anchor處賦權重  
  
    bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) #使用全0初始化outside_weights  
    if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: #如果RPN_POSITIVE_WEIGHT小於0的話，  
        # uniform weighting of examples (given non-uniform sampling)  
        num_examples = np.sum(labels >= 0)  
        positive_weights = np.ones((1, 4)) * 1.0 / num_examples #則positive_weights和negative_weights都一樣  
        negative_weights = np.ones((1, 4)) * 1.0 / num_examples  
    else:  
        assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &  
                (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) #如果RPN_POSITIVE_WEIGHT位於0和1之間的話，  
        positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /  
                            np.sum(labels == 1))  
        negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /  
                            np.sum(labels == 0)) #則positive_weights和negative_weights分別賦值  
    bbox_outside_weights[labels == 1, :] = positive_weights  
    bbox_outside_weights[labels == 0, :] = negative_weights #將positive_weights和negative_weights賦給bbox_outside_weights  
  
    # map up to original set of anchors  
    labels = _unmap(labels, total_anchors, inds_inside, fill=-1)#把圖像內部的anchor對應的label映射回總的anchor(加上了那些超出邊界的anchor，類別填充-1)  
    bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)#把圖像內部的anchor對應的bbox_target映射回所有的anchor(加上了那些超出邊界的anchor，填充0)  
    bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) #把圖像內部的anchor對應的inside_weights映射回總的anchor(加上了那些超出邊界的anchor，填充0)  
    bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) #把圖像內部的anchor對應的outside_weights映射回總的anchor(加上了那些超出邊界的anchor，填充0)  
      
    # labels  
    labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)  
    labels = labels.reshape((1, 1, A * height, width)) #將anchor的類別label數組形狀置爲[1,1,9*height,width]  
    rpn_labels = labels  
  
    # bbox_targets  
    rpn_bbox_targets = bbox_targets.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #將anchor的位置映射數組的形狀置爲[1,9*4,height,width]  
      
    # bbox_inside_weights  
    rpn_bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #將anchor的inside_weights數組的形狀置爲[1,9*4,height,width]  
  
    # bbox_outside_weights  
    rpn_bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2) #將anchor的outside_weights數組的形狀置爲[1,9*4,height,width]  
  
    return rpn_labels,rpn_bbox_targets,rpn_bbox_inside_weights,rpn_bbox_outside_weights #返回所有的ground truth值  
      
  
def _unmap(data, count, inds, fill=0): #_unmap函數將圖像內部的anchor映射回到生成的所有的anchor  
    """ Unmap a subset of item (data) back to the original set of items (of 
    size count) """  
    if len(data.shape) == 1:  
        ret = np.empty((count, ), dtype=np.float32)  
        ret.fill(fill)  
        ret[inds] = data  
    else:  
        ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)  
        ret.fill(fill)  
        ret[inds, :] = data  
    return ret  
  
def _compute_targets(ex_rois, gt_rois): #_compute_targets函數計算anchor和對應的gt_box的位置映射  
    """Compute bounding-box regression targets for an image."""  
  
    assert ex_rois.shape[0] == gt_rois.shape[0]  
    assert ex_rois.shape[1] == 4  
    assert gt_rois.shape[1] == 5  
  
    return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)  

anchor_target_layer函數主要還是調用了_anchor_target_layer_py函數，然後將輸出轉化爲tensor。下面，我們就來仔細分析一下_anchor_target_layer_py函數。在該函數中，首先通過generate_anchors函數生成了9個候選框，然後按照在共享特徵上每滑動一次對應到原圖的位置生成候選框，即all_anchors。緊接着，排除了全部邊框超過圖像邊界的候選框，得到anchors，之後的操作都是針對圖像內部的anchors。然後，通過bbox_overlaps函數計算了所有邊界內anchor與包圍框之間的IoU值。接着，排除了IoU在0.3到0.7之間的anchor(通過將labels對應的值置爲-1)，並且爲訓練安排了合適數量的前景anchor和背景anchor。然後，通過_compute_targets函數計算出了每個anchor對應的座標變換值(tx,ty,th,tw)，存在bbox_targets數組裏面。再計算了bbox_inside_weights和bbox_outside_weights，這兩個數組在訓練anchor邊框修正時有重大作用。最後，通過_unmap函數將所有圖像邊框內部的anchor映射回所有的anchor。

筆者朋友們初看上面的解析可能覺得有些混亂，請不要着急。anchor_target_layer主要就是爲了得到兩個東西，第一個東西是對應的一張圖像生成的anchor的類別，在訓練時需要賦予一定數量的正樣本(前景)和一定數量的負樣本(背景)，其餘的需要全部置成-1，表示訓練的時候會忽略掉。第二個東西是對於每一個anchor的邊框修正，在進行邊框修正loss的計算時，只有前景anchor會起作用，可以看到這是bbox_inside_weights和bbox_outside_weights在實現。非前景和背景anchor對應的bbox_inside_weights和bbox_outside_weights都爲0。

在anchor_target_layer函數中，有幾個比較重要的函數，第一個函數就是generate_anchors，這個函數的主要作用是生成9個anchor，包含3種長寬比和3種面積。源代碼及註釋如下：

[python]view plain copy
# -*- coding: utf-8 -*-  
""" 
Created on Sun Jan  1 16:11:17 2017 
 
@author: Kevin Liang (modifications) 
 
generate_anchors and supporting functions: generate reference windows (anchors) 
for Faster R-CNN. Specifically, it creates a set of k (default of 9) relative  
coordinates. These references will be added on to all positions of the final 
convolutional feature maps. 
 
Adapted from the official Faster R-CNN repo:  
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py 
 
Note: the produced anchors have indices off by 1 of what the comments claim.  
Probably due to MATLAB being 1-indexed, while Python is 0-indexed. 
"""  
  
# --------------------------------------------------------  
# Faster R-CNN  
# Copyright (c) 2015 Microsoft  
# Licensed under The MIT License [see LICENSE for details]  
# Written by Ross Girshick and Sean Bell  
# --------------------------------------------------------  
  
import numpy as np  
  
# Verify that we compute the same anchors as Shaoqing's matlab implementation:  
#  
#    >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat  
#    >> anchors  
#  
#    anchors =  
#  
#       -83   -39   100    56  
#      -175   -87   192   104  
#      -359  -183   376   200  
#       -55   -55    72    72  
#      -119  -119   136   136  
#      -247  -247   264   264  
#       -35   -79    52    96  
#       -79  -167    96   184  
#      -167  -343   184   360  
  
#array([[ -83.,  -39.,  100.,   56.],  
#       [-175.,  -87.,  192.,  104.],  
#       [-359., -183.,  376.,  200.],  
#       [ -55.,  -55.,   72.,   72.],  
#       [-119., -119.,  136.,  136.],  
#       [-247., -247.,  264.,  264.],  
#       [ -35.,  -79.,   52.,   96.],  
#       [ -79., -167.,   96.,  184.],  
#       [-167., -343.,  184.,  360.]])  
  
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],  
                     scales=2**np.arange(3, 6)):  
    """ 
    Generate anchor (reference) windows by enumerating aspect ratios X 
    scales wrt a reference (0, 0, 15, 15) window. 
    """  
    #請注意anchor的表示形式有兩種，一種是記錄左上角和右下角的座標，一種是記錄中心座標和寬高  
    #這裏生成一個基準anchor，採用左上角和右下角的座標表示[0,0,15,15]  
    base_anchor = np.array([1, 1, base_size, base_size]) - 1 #[0,0,15,15]  
    ratio_anchors = _ratio_enum(base_anchor, ratios) #shape: [3,4]，返回的是不同長寬比的anchor  
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)  
                         for i in range(ratio_anchors.shape[0])])#生成九個候選框 shape: [9,4]   
    return anchors  
  
def _whctrs(anchor):#傳入anchor的左上角和右下角的座標，返回anchor的中心座標和長寬  
    """ 
    Return width, height, x center, and y center for an anchor (window). 
    """  
  
    w = anchor[2] - anchor[0] + 1  
    h = anchor[3] - anchor[1] + 1  
    x_ctr = anchor[0] + 0.5 * (w - 1)  
    y_ctr = anchor[1] + 0.5 * (h - 1)  
    return w, h, x_ctr, y_ctr  
  
def _mkanchors(ws, hs, x_ctr, y_ctr):#由anchor中心和長寬座標返回window，記錄左上角和右下角的座標  
    """ 
    Given a vector of widths (ws) and heights (hs) around a center 
    (x_ctr, y_ctr), output a set of anchors (windows). 
    """  
  
    ws = ws[:, np.newaxis] #shape: [3,1]  
    hs = hs[:, np.newaxis] #shape: [3,1]  
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),  
                         y_ctr - 0.5 * (hs - 1),  
                         x_ctr + 0.5 * (ws - 1),  
                         y_ctr + 0.5 * (hs - 1)))  
    return anchors #shape [3,4]，對於每個anchor，返回了左上角和右下角的座標值  
  
def _ratio_enum(anchor, ratios): #這個函數計算不同長寬尺度下的anchor的座標  
    """ 
    Enumerate a set of anchors for each aspect ratio wrt an anchor. 
    """  
  
    w, h, x_ctr, y_ctr = _whctrs(anchor) #找到anchor的中心點和長寬  
    size = w * h #返回anchor的面積  
    size_ratios = size / ratios #爲了計算anchor的長寬尺度設置的數組：array([512.,256.,128.])  
    ws = np.round(np.sqrt(size_ratios)) #計算不同長寬比下的anchor的寬：array([23.,16.,11.])  
    hs = np.round(ws * ratios) #計算不同長寬比下的anchor的長 array([12.,16.,22.])  
    #請大家注意，對應位置上ws和hs相乘，面積都爲256左右  
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)#返回新的不同長寬比的anchor 返回的數組shape:[3,4]，請注意anchor記錄的是左上角和右下角的座標  
    return anchors  
  
def _scale_enum(anchor, scales): #這個函數對於每一種長寬比的anchor，計算不同面積尺度的anchor座標  
    """ 
    Enumerate a set of anchors for each scale wrt an anchor. 
    """  
  
    w, h, x_ctr, y_ctr = _whctrs(anchor) #找到anchor的中心座標  
    ws = w * scales #shape [3,] 得到不同尺度的新的寬  
    hs = h * scales #shape [3,] 得到不同尺度的新的高  
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr) #得到不同面積尺度的anchor信息，對應的是左上角和右下角的座標  
    return anchors  
  
if __name__ == '__main__':  
    import time  
    t = time.time()  
    a = generate_anchors()  
    print(time.time() - t)  
    print(a)  
    from IPython import embed; embed()  

在上面的代碼中，主要的原理就是最開始生成一個基準anchor。然後，通過這個基準anchor生成三個不同長寬比，面積一樣的anchor。最後，對每個長寬比anchor生成三個不同面積尺度的anchor，最終生成9個anchor，詳情請見代碼註釋。

第二個重要的函數，是bbox_overlaps函數，這個函數對於每一個anchor，和所有的ground truth box計算IoU值，代碼如下：

[python]view plain copy
# -*- coding: utf-8 -*-  
""" 
Created on Sun Jan  1 20:25:19 2017 
 
@author: Kevin Liang (modification) 
 
Calculates bounding box overlaps between N bounding boxes, and K query boxes  
(anchors) and return a matrix of overlap proportions 
 
Written in Cython for optimization. 
"""  
# --------------------------------------------------------  
# Fast R-CNN  
# Copyright (c) 2015 Microsoft  
# Licensed under The MIT License [see LICENSE for details]  
# Written by Sergey Karayev  
# --------------------------------------------------------  
  
cimport cython  
import numpy as np  
cimport numpy as np  
  
DTYPE = np.float  
ctypedef np.float_t DTYPE_t  
  
def bbox_overlaps(#計算重合程度，兩個框之間的重合區域的面積 / 兩個區域一共加起來的面積  
        np.ndarray[DTYPE_t, ndim=2] boxes,  
        np.ndarray[DTYPE_t, ndim=2] query_boxes):  
    """ 
    Parameters 
    ---------- 
    boxes: (N, 4) ndarray of float 
    query_boxes: (K, 4) ndarray of float 
    Returns 
    ------- 
    overlaps: (N, K) ndarray of overlap between boxes and query_boxes 
    """  
    cdef unsigned int N = boxes.shape[0]  
    cdef unsigned int K = query_boxes.shape[0]  
    cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)  
    cdef DTYPE_t iw, ih, box_area  
    cdef DTYPE_t ua  
    cdef unsigned int k, n  
    for k in range(K):  
        box_area = (  
            (query_boxes[k, 2] - query_boxes[k, 0] + 1) *  
            (query_boxes[k, 3] - query_boxes[k, 1] + 1)  
        )  
        for n in range(N):  
            iw = (  
                min(boxes[n, 2], query_boxes[k, 2]) -  
                max(boxes[n, 0], query_boxes[k, 0]) + 1  
            )  
            if iw > 0:  
                ih = (  
                    min(boxes[n, 3], query_boxes[k, 3]) -  
                    max(boxes[n, 1], query_boxes[k, 1]) + 1  
                )  
                if ih > 0:  
                    ua = float(  
                        (boxes[n, 2] - boxes[n, 0] + 1) *  
                        (boxes[n, 3] - boxes[n, 1] + 1) +  
                        box_area - iw * ih  
                    )  
                    overlaps[n, k] = iw * ih / ua  
    return overlaps  

第三個重要的部分是，在計算anchor的座標變換值的時候，使用到了bbox_transform函數，請注意在計算座標變換的時候是將anchor的表示形式變成中心座標與長寬。該函數代碼及註釋如下所示：

[python]view plain copy
# -*- coding: utf-8 -*-  
""" 
Created on Sun Jan  1 21:18:58 2017 
 
@author: Kevin Liang (modifications) 
 
bbox_transform and its inverse operation 
"""  
  
# --------------------------------------------------------  
# Fast R-CNN  
# Copyright (c) 2015 Microsoft  
# Licensed under The MIT License [see LICENSE for details]  
# Written by Ross Girshick  
# --------------------------------------------------------  
  
import numpy as np  
  
def bbox_transform(ex_rois, gt_rois):  
    ''''' 
    Receives two sets of bounding boxes, denoted by two opposite corners  
    (x1,y1,x2,y2), and returns the target deltas that Faster R-CNN should aim  
    for. 
    '''  
    ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0  
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0  
    ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths  
    ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights  #計算得到每個anchor的中心座標和長寬  
  
    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0  
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0  
    gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths  
    gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights  #計算每個anchor對應的ground truth box對應的中心座標和長寬  
  
    targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths #計算四個座標變換值  
    targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights  
    targets_dw = np.log(gt_widths / ex_widths)  
    targets_dh = np.log(gt_heights / ex_heights)  
  
    targets = np.vstack(  
        (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()#對於每一個anchor，得到四個關係值 shape: [4, num_anchor]  
    return targets  

到這裏，anchor_target_layers解析就完成了。這是rpn源碼中最重要的函數之一，因爲會返回所有anchor對應的類別和對應的邊框修正值，方便在計算loss時計算。順便提供一下計算rpn的loss的函數，代碼及註釋如下所示：

[python]view plain copy
#!/usr/bin/env python3  
# -*- coding: utf-8 -*-  
""" 
Created on Tue Jan 17 15:05:05 2017 
 
@author: Kevin Liang 
 
Loss functions 
"""  
  
from .faster_rcnn_config import cfg  
  
import tensorflow as tf  
  
  
def rpn_cls_loss(rpn_cls_score,rpn_labels):  
    ''''' 
    Calculate the Region Proposal Network classifier loss. Measures how well  
    the RPN is able to propose regions by the performance of its "objectness"  
    classifier. 
     
    Standard cross-entropy loss on logits 
    '''  
    with tf.variable_scope('rpn_cls_loss'):  
        # input shape dimensions  
        shape = tf.shape(rpn_cls_score)  
          
        # Stack all classification scores into 2D matrix  
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,3,1,2])  
        rpn_cls_score = tf.reshape(rpn_cls_score,[shape[0],2,shape[3]//2*shape[1],shape[2]])  
        rpn_cls_score = tf.transpose(rpn_cls_score,[0,2,3,1])  
        rpn_cls_score = tf.reshape(rpn_cls_score,[-1,2])  
          
        # Stack labels  
        rpn_labels = tf.reshape(rpn_labels,[-1]) #在這裏先講label展開成one_hot向量  
          
        # Ignore label=-1 (Neither object nor background: IoU between 0.3 and 0.7)  
        #在這裏對應label中爲-1值的位置排除掉score中的值，並且變成[-1,2]的形狀方便計算交叉熵loss  
        rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_labels,-1))),[-1,2])  
        #在這裏留下label中的非-1的值，表示對應的anchor與gt的IoU在0.7以上  
        rpn_labels = tf.reshape(tf.gather(rpn_labels,tf.where(tf.not_equal(rpn_labels,-1))),[-1])   
          
        # Cross entropy error 在這裏計算交叉熵loss  
        rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_labels))  
      
    return rpn_cross_entropy  
      
      
def rpn_bbox_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_inside_weights, rpn_outside_weights):  
    ''''' 
    Calculate the Region Proposal Network bounding box loss. Measures how well  
    the RPN is able to propose regions by the performance of its localization. 
 
    lam/N_reg * sum_i(p_i^* * L_reg(t_i,t_i^*)) 
 
    lam: classification vs bbox loss balance parameter      
    N_reg: Number of anchor locations (~2500) 
    p_i^*: ground truth label for anchor (loss only for positive anchors) 
    L_reg: smoothL1 loss 
    t_i: Parameterized prediction of bounding box 
    t_i^*: Parameterized ground truth of closest bounding box 
    '''      
    with tf.variable_scope('rpn_bbox_loss'):  
        # Transposing  
        rpn_bbox_targets = tf.transpose(rpn_bbox_targets, [0,2,3,1])  
        rpn_inside_weights = tf.transpose(rpn_inside_weights, [0,2,3,1])  
        rpn_outside_weights = tf.transpose(rpn_outside_weights, [0,2,3,1])  
          
        # How far off was the prediction?  
    #在這裏將預測的tx,ty,th,tw和標籤做減法，並乘以rpn_inside_weights，意思是隻對positive anchor計算bbox loss  
        diff = tf.multiply(rpn_inside_weights, rpn_bbox_pred - rpn_bbox_targets)  
    #在這裏計算smooth_L1結果  
        diff_sL1 = smoothL1(diff, 3.0)  
          
        # Only count loss for positive anchors. Make sure it's a sum.  
    #在這裏將上面的運算結果乘以rpn_outside_weights並且求和，同樣是只對positive anchor計算bbox loss  
  
        rpn_bbox_reg = tf.reduce_sum(tf.multiply(rpn_outside_weights, diff_sL1))  
      
        # Constant for weighting bounding box loss with classification loss  
    #在這裏將邊框誤差再乘以一個lambda參數，作爲最終的邊框誤差  
        rpn_bbox_reg = cfg.TRAIN.RPN_BBOX_LAMBDA * rpn_bbox_reg  
      
    return rpn_bbox_reg #返回最終的誤差  

如上函數所示，在計算rpn_cls_loss的時候，排除掉了label中對應值爲-1的值，也就是說，只保留了圖像邊界內的與ground truth box最大IoU在0.7以上或者0.3以下的anchor。在計算rpn_bbox_loss的時候，從最開始乘以rpn_inside_weights來看，只計算了前景anchor的bbox loss，因爲其餘非前景anchor對應的rpn_inside_weights都爲0。

到此爲止，Faster R-CNN的RPN代碼就接近尾聲了。RPN代碼中比較巧妙的部分筆者認爲有如下兩個：

1) 如何生成H×W×9個anchor：做法是先生成9個不同長寬比不同面積anchor，然後在圖上各個滑動區域上都生成這9個anchor。

2) 如何計算每個anchor的類別(前景背景)和邊框變換值。做法是首先爲每個anchor計算與ground truth box對應的IoU值，排除IoU爲0.3~0.7的anchor。0.3以下的爲背景anchor，0.7以上的爲前景anchor。對於邊框變化值，是計算的anchor與IoU重合最大的ground truth box對應的tx,ty,th,tw四個值。

筆者在閱讀整篇RPN代碼之後。確實對Faster R-CNN作者的編程功底佩服得五體投地。筆者也深切地感受到，閱讀源碼的重要性，必須要理論結合代碼閱讀，纔能有更深的體會，取得更大的進步。

最後，筆者再次強調，要看懂筆者的此篇博客，需要對Faster R-CNN算法有相當的瞭解。另外，筆者在解析代碼的時候也許也存在疏漏，如有發現，請大家不吝賜教，筆者在此表示衷心的感謝。

歡迎閱讀筆者後續博客，各位讀者朋友的支持與鼓勵是我最大的動力！

written by jiong

道之所在，雖千萬人，吾往矣。

詳細的Faster R-CNN源碼解析之RPN源碼解析

好的文章怕沒了，先轉載過來。。。。。。。

詳細的Faster R-CNN源碼解析之RPN源碼解析

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

Nginx R31 doc-13-Limiting Access to Proxied HTTP Resources 訪問限流

python包：pandas

中外程序員到底有啥區別？

Python數據分析與挖掘實戰（5章）

一、什麼是Docker

C++文件/流

二、Docker 組件

揹包九講一 01揹包

今天！通義靈碼在北京、成都、杭州三城開講啦

轉載：GANS的世界2-0：DG-Net(行人重識別ReID)-目錄-史上最新無死角講解

Scalable Person Re-identification: A Benchmark（可擴張的行人重識別：基準）-1

目標檢測論文集（轉載）

【論文投稿】計算機學科部分核心期刊投稿攻略

Win10專業版+anaconda3+Python3.7+GTX1650安裝TensorFlow1.15.0

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結