工欲善其事必先利其器，先要了解RCNN的大家庭方能徹底搞清楚Faster-RCNN的機制。

一、RCNN大家庭論文介紹

要想充分理解Faster-RCNN，推薦閱讀paper的順序爲1->2->3。

1、Rich feature hierarchies for accurate object detection and semantic segmentation

個人感覺本文寫作思路就是作者先將CNN提取特徵與傳統的SIFT、HOG特徵提取算法進行對比，引出後面要提出的RCNN這個網絡就是用CNN完成特徵的提取工作的。

paper中提到的網絡結構就如下圖，大致思路就是首先利用SS算法(selective search)將輸入圖片分成大致2000左右的proposals，對於每一個框都去利用CNN提取特徵，之後訓練一個SVM分類器以及計算位置的迴歸損失，最後每個proposals會對應一個scores，利用NMS算法(非極大值抑制)來得到最後的框框。其中SS算法感興趣的可以自行百度，但算法確實有點老了，感覺沒必要去細究；NMS算法在講Faser-RCNN實現的時候會進行詳細說明。

RCNN_Paper下載鏈接：https://pan.baidu.com/s/13WVWSzL6tYNWpFDnUHNRHw
提取碼：rz9e

2、FastR-CNN

這論文取的名字真好，一個單詞！夠勁！個人感覺本文寫作思路就是批評當前目標檢測其他的網絡模型時間太慢，例如SPPnet，直接擺出自己設計的網絡模型即Fast-RCNN每張圖片處理只要0.3s，而且在VOC數據集上面mAP達到了很高的值。

paper中提到的網絡模型就是下圖，候選框即proposals生成還是利用之前RCNN的SS算法來生成，但是後面緊接着是全卷積層即圖中的Roi pooling layer，每個Roi都會被下采樣到固定尺度的feature map，那相比之前RCNN的一大改進點就已經很明顯了，通過共享卷積核參數大大減少了參數的個數進而提升了效率，最後再分別根據之前的輸出通過兩個全連接層，最後NMS。paper後面還提到了在全連接的時候可以先用SVD(矩陣的奇異值分解)可以加速。Roi pooling layer層具體的loss值後面在講Faser-RCNN實現的時候會進行詳細說明。

Fast-RCNN_Paper下載鏈接：https://pan.baidu.com/s/1v0wp3KYytwkh3uFUX_qkJA
提取碼：w4q4
複製這段內容後打開百度網盤手機App，操作更方便哦

3、Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

個人感覺本文寫作思路就是上來就狂批RCNN先用SS算法生成k個候選框之後再用conv提取特徵，很浪費時間，批完這個又對Fast-RCNN進行了一波操作，先讚揚Fast-RCNN實時性上已經很快了，but！when ignoring the time spent on region proposals，批了他生成候選框的方法。同時作者指出可以利用GPU來節約proposals生成的時間，於是設計了RPN網絡來代替了Fast-RCNN中生成候選框的SS算法。

paper中提到的網絡模型就如下圖，先用預訓練好的深度卷積神經網絡(vgg系列、resnet系列)來提取原圖的特徵向量，採用rpn網絡生成proposals，NMS之後通過Roi pooling層將proposals縮放到固定尺度，再經過全連接層。

Faster-RCNN_Paper下載鏈接：https://pan.baidu.com/s/1rRAdtWNWgbdnmtMaHXfrlA
提取碼：md67

二、Faster-RCNN詳解

各位好好記住這張圖！我代碼實現可能和灰色的虛線框有點出入，但是不影響理解整體結構。

1、特徵提取網絡

代碼使用預訓練好的vgg16模型。預訓練的權重直接使用這個代碼可以從網上直接下載models.vgg16(pretrained=True)。

decom_VGG16函數就作爲特徵提取器，函數的入參就是本地預訓練參數的路徑。

	from torchvision import models
	from torch import nn
	import torch
	
	
	def decom_VGG16(path):
	    model = load_pretrained_vgg16(path)
	    print(model)
	    # 拿出vgg16模型的前30層來進行特徵提取
	    features = list(model.features)[:30]
	    features = nn.Sequential(*features)
	
	    # 獲取vgg16的分類的那些層
	    classifier = list(model.classifier)
	    # 除去Dropout的相關層
	    del classifier[6]
	    del classifier[5]
	    del classifier[2]
	    classifier = nn.Sequential(*classifier)
	
	    # 前10層的參數不進行更新
	    for layer in features[:10]:
	        for p in layer.parameters():
	            p.requires_grad = False
	    return features, classifier
	
	
	def load_pretrained_vgg16(path):
	    vgg16 = models.vgg16()
	    vgg16.load_state_dict(torch.load(path))
	    return vgg16
	    # return models.vgg16(pretrained=True)
	
	
	if __name__ == '__main__':
	    path = '../vgg16-397923af.pth'
	    # model = torch.load(path)
	    # vgg16_model = models.vgg16().load_state_dict(model)
	    vgg16_model = load_pretrained_vgg16(path)
	    print(vgg16_model)

2、RPN網絡

總思路：這個網絡就我代碼裏面，先將之前1中講到的預訓練好的特徵提取網絡輸出的特徵向量中每個像素點生成9個錨點(可能成爲興趣區域即rois的點)即先驗框也可以叫anchors，對於vgg16輸出的特徵向量來計算則anchors的個數爲38 x 38 x 9 = 12996。之後通過一個3x3的卷積，再將這個卷積的輸出分別經過兩次1x1的卷積(並不是連着兩次，這兩個是可以分開獨立的，一個用於分類預測，一個用於迴歸預測，這部分是後面需要計算的loss值之一)。之後現根據RPN網絡中用於迴歸預測的輸出rpn_locs對先驗框即anchors進行微調，讓anchors變爲rois，對rois計算iou根據NMS非極大值抑制算法減少興趣區域的數量。

① 生成anchors部分

feature map中每個像素點生成9個anchors的代碼。generate_base_anchors函數就是針對單個像素點計算出9個錨點座標並返回，center_x和center_y是像素的偏移量，爲了方便enumerate_shifted_anchor函數中生成anchors的常規做法，每次都調用generate_base_anchors函數。當然未被註釋掉的是大神的實現，直接張量操作，最後利用pytorch的broadcast得到結果。

對於9個錨點生成很簡單，就相當於是3個不同的ratios和3個不同的scales進行組合。不過我犯了個錯，之前我還誤以爲scales的比例就是邊長的直接縮放，所以看別人實現的代碼百思不得其解，結果第二天早上瞬間就頓悟了，其實scales的平方就是面積之比，搞明白這個看下面代碼鬆鬆的。後來我還去paper上看了，原來作者講了這個問題，是我看得不夠仔細。。。

注意！不管是anchors、proposals還是後面的rois，他們其實都是矩形框左上角點的座標和右下角點的座標(按照左上x，左上y，右下x，右下y的順序)。另外計算機視覺中，x座標都是左小右大，y座標都是上小下大。

	import numpy as np
	
	
	def generate_base_anchors(base_size=16, ratios=[0.5, 1, 2], scales=[8, 16, 32], center_x=0, center_y=0):
	    """
	    function description: 生成k個以(0, 0)爲中心的anchors模板
	
	    :param base_size: 特徵圖的每個像素的感受野大小(相當於featuremap上的一個像素的尺度所對應原圖上的尺度)
	    :param ratios: 高寬的比率
	    :param scales: 面積的scales的開方
	    :return:
	    """
	    base_anchor = np.zeros((len(ratios) * len(scales), 4), dtype=np.float32)
	
	    # 生成anchor的算法本質: 使得總面積不變, 一個像素點衍生出9個anchors
	    for i in range(len(scales)):
	        for j in range(len(ratios)):
	            index = i * len(ratios) + j
	            area = (base_size * scales[i]) ** 2
	            width = np.sqrt(area * 1.0 / ratios[j])
	            height = width * ratios[j]
	
	            # 只需要保存左上角個右下角的點的座標即可
	            base_anchor[index, 0] = -width / 2. + center_x
	            base_anchor[index, 1] = -height / 2. + center_y
	            base_anchor[index, 2] = width / 2. + center_x
	            base_anchor[index, 3] = height / 2. + center_y
	
	    return base_anchor
	
	
	def enumerate_shifted_anchor(base_anchor, base_size, width, height):
	    """
	    function description: 減少不必要的如generate_base_anchors的計算, 較大的特徵圖的錨框生成模板, 生成錨框的初選模板即滑動窗口
	
	    :param base_anchor: 需要reshape的anchors
	    :param base_size: 特徵圖的每個像素的感受野大小
	    :param height: featuremap的高度
	    :param width: featuremap的寬度
	    :return:
	        anchor: 維度爲:[width*height*k, 4]的先驗框(anchors)
	    """
	    # 計算featuremap中每個像素點在原圖中感受野上的中心點座標
	    shift_x = np.arange(0, width * base_size, base_size)
	    shift_y = np.arange(0, height * base_size, base_size)
	    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
	    print('shift_x: ', shift_x.shape, 'shift_y: ', shift_y.shape)
	
	    # TODO 感覺最正統的方法還是遍歷中心點
	    # index = 0
	    # for x in shift_x:
	    #     for y in shift_y:
	    #         anchors = generate_base_anchors(center_x=x, center_y=y)
	    #         if index == 0:
	    #             old_anchors = anchors
	    #         else:
	    #             anchors = np.concatenate((old_anchors, anchors), axis=0)
	    #             old_anchors = anchors
	    #         index += 1
	
	    # TODO 直接利用broadcast貌似也可以達到目的
	    # shift_x.ravel()表示原地將爲一維數組, shift的維度爲: [feature_stride, 4]
	    shift = np.stack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel(),), axis=1)
	    A = base_anchor.shape[0]
	    K = shift.shape[0]
	    anchor = base_anchor.reshape((1, A, 4)) + shift.reshape((K, 1, 4))
	
	    # 最後再合成爲所有的先驗框, 相當於對featuremap的每個像素點都生成k(9)個先驗框(anchors)
	    anchors = anchor.reshape((K * A, 4)).astype(np.float32)
	    print('result: ', anchors.shape)
	    return anchors

當然我還對比了一下張量直接運算得到anchors和利用多重for循環生成anchors的耗時情況。

測試代碼：

	if __name__ == '__main__':
	    import matplotlib.pyplot as plt
	
	    start = time.time()
	    nine_anchors = generate_base_anchors()
	
	    height, width, base_size = 38, 38, 16
	    all_anchors = enumerate_shifted_anchor(nine_anchors, base_size, width, height)
	
	    fig = plt.figure()
	    ax = fig.add_subplot(111)
	    # x座標和y座標在接近[-10, 600]左右可以畫出全部坊featuremap的像素點
	    plt.ylim(-10, 600)
	    plt.xlim(-10, 600)
	    shift_x = np.arange(0, width * base_size, base_size)
	    shift_y = np.arange(0, height * base_size, base_size)
	    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
	    plt.scatter(shift_x, shift_y)
	
	    box_widths = all_anchors[:, 2] - all_anchors[:, 0]
	    box_heights = all_anchors[:, 3] - all_anchors[:, 1]
	    print(all_anchors.shape)
	
	    for i in range(12996):
	        rect = plt.Rectangle([all_anchors[i, 0], all_anchors[i, 1]], box_widths[i],
	                             box_heights[i], color="r", fill=False)
	        ax.add_patch(rect)
	    end = time.time()
	    print('all consumes {0} seconds'.format(end - start))
	    plt.show()

for循環運算耗時：

直接張量運算耗時：

生成的結果都是下圖。其實差距是比較小的，直接暴力for循環反而能更好理解。

② 根據rpn_locs對anchors進行微調部分

根據RPN網絡中標註框的位置即bbox的迴歸值對anchors進行微調，純實現paper中的公式。

	def loc2box(anchors, locs):
	    """
	    function description: 將所有的anchors根據通過rpn得到的locs值進行校正
	
	    :param anchors: 先驗框
	    :param locs: rpn得到的locs
	    :return:
	        roi: 興趣區域
	    """
	    anchors_width = anchors[:, 2] - anchors[:, 0]
	    anchors_height = anchors[:, 3] - anchors[:, 1]
	    anchors_center_x = anchors[:, 0] + 0.5 * anchors_width
	    anchors_center_y = anchors[:, 1] + 0.5 * anchors_height
	
	    tx = locs[:, 0]
	    ty = locs[:, 1]
	    tw = locs[:, 2]
	    th = locs[:, 3]
	
	    center_x = tx * anchors_width + anchors_center_x
	    center_y = ty * anchors_height + anchors_center_y
	    width = np.exp(tw) * anchors_width
	    height = np.exp(th) * anchors_height
	
	    # eps是一個很小的非負數, 使用eps將可能出現的零用eps來替換, 避免除數爲0而報錯
	    roi = np.zeros(locs.shape, dtype=locs.dtype)
	    roi[:, 0] = center_x - 0.5 * width  # xmin
	    roi[:, 2] = center_x + 0.5 * width  # xmax
	    roi[:, 1] = center_y - 0.5 * height  # ymin
	    roi[:, 3] = center_y + 0.5 * height  # ymax
	    return roi

③ NMS非極大值抑制算法

NMS非極大值抑制算法，將所有的rois放入一個數組中，每次選出scores最高的roi並加入結果索引中，分別和其他rois計算iou(交集/並集)，從數組中剔除iou超過閾值的rois，一直重複這個步驟直到數組爲空

關於面積的計算方法，更簡單了，無非就是對於兩個矩形的左上角取最大值, 對於右下角取最小值, 再判斷內部的矩形是否存在即可。這裏也將常規思路註釋掉了，放了大神的張量操作。

	def calculate_iou(valid_anchors, boxes):
	    """
	    function description: 計算兩個框框之間的IOU(交集/並集)
	
	    :param inside_anchors: 在圖片內的先驗框(anchors), 維度爲: [inside_anchors_num, 4]
	    :param boxes: 圖片中的真實標註框, 維度爲: [boxes_num, 4]
	    :return:
	        ious: 每個inside_anchors和boxes的iou的二維張量, 維度爲: [inside_anchors_num, boxes_num]
	    """
	    # if valid_anchors.shape[1] != 4 or boxes.shape[1] != 4:
	    #     raise IndexError
	
	    # boxes = boxes.detach().cpu().numpy()
	    # TODO 常規思路---對於兩個矩形的左上角取最大值, 對於右下角取最小值, 再判斷內部的矩形是否存在即可
	    # ious = np.empty((len(valid_anchors), 2), dtype=np.float32)
	    # ious.fill(0)
	    # 命名規則: 左上角爲1, 右下角爲2
	    # for i, point_i in enumerate(valid_anchors):
	    #     print(point_i)
	    #     xa1, ya1, xa2, ya2 = point_i
	    #     anchor_area = (ya2 - ya1) * (xa2 - xa1)
	    #     for j, point_j in enumerate(boxes):
	    #         print(point_j)
	    #         xb1, yb1, xb2, yb2 = point_j
	    #         box_area = (yb2 - yb1) * (xb2 - xb1)
	    #
	    #         inter_x1 = max(xa1, xa2)
	    #         inter_y1 = max(ya1, ya2)
	    #         inter_x2 = min(xb1, xb2)
	    #         inter_y2 = min(yb1, yb2)
	    #         if inter_x1 < inter_x2 and inter_y1 < inter_y2:
	    #             overlap_area = (inter_x2 - inter_x1) * (inter_y2 - inter_y1)
	    #             iou = (overlap_area) * 1.0 / (anchor_area + box_area - overlap_area)
	    #         else:
	    #             iou = 0.
	    #         ious[i][j] = iou
	
	    # TODO 直接張量運算
	    # 獲得重疊面積最大化的左上角點的座標信息, 返回的維度是[inside_anchors_num, boxes_num, 2]
	    tl = np.maximum(valid_anchors[:, None, :2], boxes[:, :2])
	    # 獲得重疊面積最大化的右下角點的座標信息, 返回的維度是[inside_anchors_num, boxes_num, 2]
	    br = np.minimum(valid_anchors[:, None, 2:], boxes[:, 2:])
	
	    # 計算重疊部分的面積, 返回的維度是[inside_anchors_num, boxes_num]
	    area_overlap = np.prod(br - tl, axis=2) * (tl < br).all(axis=2)
	    # 計算inside_anchors的面積, 返回的維度是[inside_anchors_num]
	    area_1 = np.prod(valid_anchors[:, 2:] - valid_anchors[:, :2], axis=1)
	    # 計算boxes的面積, 返回的維度是[boxes_num]
	    area_2 = np.prod(boxes[:, 2:] - boxes[:, :2], axis=1)
	    # area_1[:, None]表示將數組擴張一個維度即維度變爲[inside_anchors, 1]
	    ious = area_overlap / (area_1[:, None] + area_2 - area_overlap)
	    # 最後broadcast返回的維度是[inside_anchors_num, boxes_num]
	    return ious

再來看NMS算法。本來還想抽取計算公共代碼，因爲下面代碼和上面計算iou代碼有冗餘，想想還是算了，咱只是碼農，面向cv和百度編程。

	def non_maximum_suppression(roi, thresh):
	    """
	    function description: 非極大值抑制算法, 每次選出scores最高的roi分別和其他roi計算iou, 剔除iou查過閾值的roi,
	                           一直重複這個步驟
	
	    :param roi: 感興趣的區域
	    :param thresh: iou的閾值
	    :return:
	    """
	    # 左上角點的座標
	    xmin = roi[:, 0]
	    ymin = roi[:, 1]
	    # 右下角點的座標
	    xmax = roi[:, 2]
	    ymax = roi[:, 3]
	
	    areas = (xmax - xmin) * (ymax - ymin)
	    keep = []
	    order = np.arange(roi.shape[0])
	    while order.size > 0:
	        i = order[0]
	        keep.append(i)
	        # TODO 和計算iou有些許冗餘
	        xx1 = np.maximum(xmin[i], xmin[order[1:]])
	        yy1 = np.maximum(ymin[i], ymin[order[1:]])
	        xx2 = np.minimum(xmax[i], xmax[order[1:]])
	        yy2 = np.minimum(ymax[i], ymax[order[1:]])
	
	        width = np.maximum(0.0, xx2 - xx1)
	        height = np.maximum(0.0, yy2 - yy1)
	        inter = width * height
	        # 計算iou
	        iou = inter / (areas[i] + areas[order[1:]] - inter)
	
	        idx = np.where(iou <= thresh)[0]  # 去掉和scores的iou大於閾值的roi
	        order = order[1 + idx]  # 剔除score最大
	    roi_after_nms = roi[keep]
	    return roi_after_nms

④ 整個RPN層代碼

	from torch import nn
	import torch
	import torch.nn.functional as F
	from nets.anchors_creator import generate_base_anchors, enumerate_shifted_anchor
	from nets.proposal_creator import ProposalCreator
	from utils.util import normal_init
	from configs.config import in_channels, mid_channels, feature_stride, anchors_scales, anchors_ratios
	
	
	class RPN(nn.Module):
	    def __init__(self):
	        super(RPN, self).__init__()
	
	        self.in_channels = in_channels  # 經過預訓練好的特徵提取網絡輸出的featuremap的通道數
	        self.mid_channels = mid_channels  # rpn網絡第一層3 x 3卷積層輸出的維度
	        self.feature_stride = feature_stride  # 可以理解爲featuremap中感受野的大小(壓縮的倍數)
	        self.anchor_scales = anchors_scales  # 生成先驗框的面積比例的開方
	        self.anchor_ratios = anchors_ratios  # 生成先驗框的寬高之比
	
	        # 可以把rpn傳入; 如果是train階段, 返回的roi數量是2000; 如果是test則是300
	        self.proposal_layer = ProposalCreator(parent_model=self)
	
	        self.base_anchors = generate_base_anchors(scales=self.anchor_scales, ratios=self.anchor_ratios)
	        self.feature_stride = feature_stride
	
	        # RPN的卷積層用來接收特徵圖, 輸出512維的特徵圖
	        # TODO 刪除了之前bias=True的屬性
	        self.RPN_conv = nn.Conv2d(in_channels=in_channels, out_channels=self.mid_channels, kernel_size=3, stride=1,
	                                  padding=1)
	
	        anchors_num = self.base_anchors.shape[0]
	        # 2 x k(9) scores, 分類預測每一個網格點上每一個預測框內部是否包含了物體, 1表示包含了物體; 此處是1 x 1卷積, 只改變維度
	        self.RPN_cls_layer = nn.Conv2d(in_channels=self.mid_channels, out_channels=anchors_num * 2, kernel_size=1,
	                                       stride=1,
	                                       padding=0)
	
	        # 4 x k(9) coordinates, 迴歸預測每一個網格點上每一個先驗框的變化情況; 此處是1 x 1卷積, 只改變維度
	        self.RPN_reg_layer = nn.Conv2d(in_channels=512, out_channels=anchors_num * 4, kernel_size=1, stride=1,
	                                       padding=0)
	
	        # paper中提到的用0均值高斯分佈(標準差爲0.01)初始化1x1卷積的權重
	        normal_init(self.RPN_conv, mean=0, stddev=0.01)
	        normal_init(self.RPN_cls_layer, mean=0, stddev=0.01)
	        normal_init(self.RPN_reg_layer, mean=0, stddev=0.01)
	
	    def forward(self, base_feature_map, img_size):
	        """
	        function description: rpn網絡的前向計算
	
	        :param base_feature_map: 經過預訓練好的特徵提取網絡後的輸出, 維度爲: [batch_size, 38, 38, 512]
	        :param img_size: 原圖的尺寸, 需要用這個對anchors進行才間再轉化成rois
	        :return:
	            rpn_locs：rpn層迴歸預測每一個先驗框的變化情況, 維度爲:[n, w*h*k, 4]
	            rpn_scores: rpn分類每一個預測框內部是否包含了物體, 維度爲:[n, w*h*k, 2]
	            anchors: featuremap中每個像素點生成k個先驗框的集合, 維度爲:[w*h*k ,4]
	            rois: 通過rpn網絡輸出的locs來校正先驗框anchors的位置並完成NMS之後的rois
	        """
	        n, _, w, h = base_feature_map.shape
	
	        # 前向傳播的時候計算移動的anchors
	        anchors = enumerate_shifted_anchor(self.base_anchors, base_size=self.feature_stride, width=w, height=h)
	
	        h = F.relu(self.RPN_conv(base_feature_map), inplace=True)  # inplace=True表示原地操作, 節省內存
	
	        # 迴歸預測, 其中第三個維度的四個點分別代表左上角和右下角的點的座標
	        rpn_locs = self.RPN_reg_layer(h)
	        # [n, 4*k, w, h] -> [n, w, h, 4*k] -> [n, w*h*k, 4]
	        rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)
	
	        # 分類預測, 其中第三個維度爲1表示檢測到了object, 0表示未檢測到object
	        rpn_scores = self.RPN_cls_layer(h)
	        # [n, 2*k, w, h] -> [n, w, h, 2*k] -> [n, w*h*k, 2]
	        rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous().view(n, -1, 2)
	
	        print('rpn_locs: ', rpn_locs.shape)
	        print('rpn_scores: ', rpn_scores.shape)
	
	        # 根據rpn迴歸的結果對anchors微調以及裁剪之後轉爲rois, 同時提供rois給Fast-RCNN部分
	        rois = self.proposal_layer(rpn_locs[0].detach().cpu().numpy(),
	                                   rpn_scores[0].detach().cpu().numpy(),
	                                   anchors,
	                                   img_size)
	
	        return rpn_locs, rpn_scores, anchors, rois
	
	    @staticmethod
	    def reshape(x, width):
	        # input_size = x.size()
	        # x = x.view(input_size[0], int(d), int(float(input_size[1] * input_size[2]) / float(d)), input_size[3])
	        height = float(x.size(1) * x.size(1)) / width
	        x = x.view(x.size(0), int(width), int(height), x.size(3))
	        return x

⑤ ProposalCreator類的代碼

ProposalCreator封裝了anchors->rois及NMS算法。

	import numpy as np
	from utils.util import loc2box, non_maximum_suppression


	class ProposalCreator:
	    def __init__(self,
	                 parent_model,
	                 nms_thresh=0.7,
	                 n_train_pre_nms=12000,
	                 n_train_post_nms=2000,
	                 n_test_pre_nms=6000,
	                 n_test_post_nms=300,
	                 min_size=16):
	        """
	        :param parent_model: 區分是training_model還是testing_model
	        :param nms_thresh: 非極大值抑制的閾值
	        :param n_train_pre_nms: 訓練時NMS之前的boxes的數量
	        :param n_train_post_nms: 訓練時NMS之後的boxes的數量
	        :param n_test_pre_nms: 測試時NMS之前的數量
	        :param n_test_post_nms: 測試時NMS之後的數量
	        :param min_size: 生成一個roi所需的目標的最小高度, 防止Roi pooling層切割後維度降爲0
	        """
	        self.parent_model = parent_model
	        self.nms_thresh = nms_thresh
	        self.n_train_pre_nms = n_train_pre_nms
	        self.n_train_post_nms = n_train_post_nms
	        self.n_test_pre_nms = n_test_pre_nms
	        self.n_test_post_nms = n_test_post_nms
	        self.min_size = min_size
	
	    def __call__(self, locs, scores, anchors, img_size):
	        """
	        function description: 通過rpn網絡輸出的locs來校正先驗框anchors的位置並完成NMS, 返回固定數量的rois
	
	        :param locs: rpn網絡中的1x1卷積的一個輸出, 維度爲[w*h*k, 4]
	        :param scores: rpn網絡中的1x1卷積的另一個輸出, 維度爲:[w*h*k, 2]
	        :param anchors: 先驗框
	        :param img_size: 輸入整個Faster-RCNN網絡的圖片尺寸
	        :return:
	            roi_after_nms: 通過rpn網絡輸出的locs來校正先驗框anchors的位置並完成NMS之後的rois
	        """
	        if self.parent_model.training:
	            n_pre_nms = self.n_train_pre_nms
	            n_post_nms = self.n_train_post_nms
	        else:
	            n_pre_nms = self.n_test_pre_nms
	            n_post_nms = self.n_test_post_nms
	
	        # 根據rpn_locs微調先驗框即將anchors轉化爲rois
	        roi = loc2box(anchors, locs)
	
	        # 防止建議框即rois超出圖像邊緣
	        roi[:, [0, 2]] = np.clip(roi[:, [0, 2]], 0, img_size[0])  # 對X軸剪切
	        roi[:, [1, 3]] = np.clip(roi[:, [1, 3]], 0, img_size[1])  # 對Y軸剪切
	
	        # 去除高或寬<min_size的rois, 防止Roi pooling層切割後維度降爲0
	        min_size = self.min_size
	        roi_width = roi[:, 2] - roi[:, 0]
	        roi_height = roi[:, 3] - roi[:, 1]
	        keep = np.where((roi_width >= min_size) & (roi_height >= min_size))[0]  # 得到滿足條件的行index
	        roi = roi[keep, :]
	
	        scores = scores[:, 1]
	        scores = scores[keep]
	        # argsort()函數得到的是從小到大的索引, x[start:end:span]中如果span<0則逆序遍歷; 如果span>0則順序遍歷
	        order = scores.argsort()[::-1]  # 對roi通過rpn的scores進行排序, 得到scores的下降排列的座標
	        # 保留分數排在前面的n_pre_nms個rois
	        order = order[: n_pre_nms]
	        roi = roi[order, :]
	
	        # 非極大值抑制
	        roi_after_nms, _ = non_maximum_suppression(roi, thresh=self.nms_thresh)
	        # NMS之後保留分數排在前面的n_post_nms個rois
	        roi_after_nms = roi_after_nms[:n_post_nms]
	
	        return roi_after_nms

3、Fast-RCNN網絡

我自己實現的時候，在這個網絡裏就放了Roi pooling層和兩個全連接層。

① Fast-RCNN部分的全部代碼

	from torch import nn
	from nets.roi_pooling_2d import RoIPooling2D
	from nets.vgg16 import decom_VGG16
	from utils.util import normal_init
	
	
	class FastRCNN(nn.Module):
	    def __init__(self,
	                 n_class,
	                 roi_size,
	                 spatial_scale,
	                 classifier):
	        """
	        function description:
	            將rpn網絡提供的roi"投射"到vgg16的featuremap上, 進行相應的切割並maxpooling(RoI maxpooling),
	            再將其展開從2d變爲1d,投入兩個fc層,然後再分別帶入兩個分支fc層，作爲cls和reg的輸出
	
	        :param n_class: 分類的總數
	        :param roi_size: RoIPooling2D之後的維度
	        :param spatial_scale: roi(rpn推薦的區域-原圖上的區域)投射在feature map後需要縮小的比例, 這個個人感覺應該對應感受野大小
	        :param classifier: 從vgg16提取的兩層fc(Relu激活)
	        """
	        super(FastRCNN, self).__init__()
	
	        self.classifier = classifier
	        self.cls_layer = nn.Linear(4096, n_class)
	        self.reg_layer = nn.Linear(4096, n_class * 4)
	        normal_init(self.cls_layer, 0, 0.001)
	        normal_init(self.reg_layer, 0, 0.01)
	        self.n_class = n_class
	        self.roi_size = roi_size
	        self.spatial_scale = spatial_scale
	        self.roi = RoIPooling2D((self.roi_size, self.roi_size), self.spatial_scale)
	
	    def forward(self, x, sample_rois):
	        """
	        function decsription:
	
	        :param x: 預訓練好的特徵提取網絡的輸出即featuremap
	        :param sample_rois: 經過NMS後的rois
	        :return:
	            roi_locs: roi的迴歸損失
	            roi_scores: roi的分類損失
	        """
	        pool = self.roi(x, sample_rois)
	        pool = pool.view(pool.size(0), -1)
	        fc7 = self.classifier(pool)
	
	        roi_scores = self.cls_layer(fc7)
	        roi_locs = self.reg_layer(fc7)
	        return roi_locs, roi_scores

RoIPooling2D這個類封裝了最大池化，縮放到固定尺寸。

	class RoIPooling2D(nn.Module):
	    def __init__(self, output_size, spatial_scale, return_indices=False):
	        super(RoIPooling2D, self).__init__()
	
	        self.output_size = output_size
	        self.spatial_scale = spatial_scale
	        self.return_indices = return_indices
	        # 將輸入張量的維度變爲output_size, output_size是元組
	        self.adp_max_pool_2D = nn.AdaptiveMaxPool2d(output_size, return_indices)
	
	    def forward(self, x, rois):
	        """
	        function description: 將原圖中採樣後的roi變換到featuremap中的對應位置
	
	        :param x: 預訓練好的特徵提取網絡的輸出即featuremap
	        :param rois: 採樣後的roi
	        :return:
	        """
	        rois_ = torch.from_numpy(rois).float()
	        rois = rois_.mul(self.spatial_scale)
	        rois = rois.long()
	
	        num_rois = rois.size(0)
	        output = []
	
	        for i in range(num_rois):
	            # roi維度爲: [4]
	            roi = rois[i]
	            im = x[..., roi[0]:(roi[2] + 1), roi[1]:(roi[3] + 1)]
	            try:
	                output.append(self.adp_max_pool_2D(im))  # 元素維度 (1, channel, 7, 7)
	            except RuntimeError:
	                print("roi:", roi)
	                print("raw roi:", rois[i])
	                print("im:", im)
	                print("outcome:", self.adp_max_pool_2D(im))
	
	        output = torch.cat(output, 0)
	        return output

權重的初始化函數。這個函數的入參truncated代表着是否啓用SVD(奇異值分解)。

	def normal_init(m, mean, stddev, truncated=False):
	    """
	    function description: 權重初始化函數
	
	    :param m: 輸入
	    :param mean: 均值
	    :param stddev: 標準差
	    :param truncated: 是否截斷, paper中使用矩陣奇異值分解加速的話就視爲截斷
	    :return:
	    """
	    if truncated:
	        m.weight.data.normal_().fmod_(2).mul_(stddev).add_(mean)
	    else:
	        m.weight.data.normal_(mean, stddev)
	        m.bias.data.zero_()

測試暫無。

Faster-RCNN全面解讀(手把手帶你分析代碼實現)

一、RCNN大家庭論文介紹

1、Rich feature hierarchies for accurate object detection and semantic segmentation

2、FastR-CNN

3、Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

二、Faster-RCNN詳解

1、特徵提取網絡

2、RPN網絡

① 生成anchors部分

② 根據rpn_locs對anchors進行微調部分

③ NMS非極大值抑制算法

④ 整個RPN層代碼

⑤ ProposalCreator類的代碼

3、Fast-RCNN網絡

① Fast-RCNN部分的全部代碼

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

Java核心技術卷1掃盲筆記

AVL樹(自平衡的二叉搜索樹)

循環鏈表(單向+雙向)及典型應用場景

重寫Stack與Queue以及瀏覽器前進和後退的本質

圖的概述及DFS與BFS

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結