睿智的目標檢測27——Pytorch搭建Faster R-CNN目標檢測平臺

學習前言

好的pytorch版本也應該有個faster rcnn。

什麼是FasterRCNN目標檢測算法

Faster-RCNN是一個非常有效的目標檢測算法，雖然是一個比較早的論文，但它至今仍是許多目標檢測算法的基礎。

Faster-RCNN作爲一種two-stage的算法，與one-stage的算法相比，two-stage的算法更加複雜且速度較慢，但是檢測精度會更高。

事實上也確實是這樣，Faster-RCNN的檢測效果非常不錯，但是檢測速度與訓練速度有待提高。

源碼下載

https://github.com/bubbliiiing/faster-rcnn-pytorch
喜歡的可以點個star噢。

Faster-RCNN實現思路

一、預測部分

1、主幹網絡介紹

Faster-RCNN可以採用多種的主幹特徵提取網絡，常用的有VGG，Resnet，Xception等等，本文以Resnet網絡爲例子來給大家演示一下。

Faster-Rcnn對輸入進來的圖片尺寸沒有固定，但是一般會把輸入進來的圖片短邊固定成600，如輸入一張1200x1800的圖片，會把圖片不失真的resize到600x900上。

ResNet50有兩個基本的塊，分別名爲Conv Block和Identity Block，其中Conv Block輸入和輸出的維度是不一樣的，所以不能連續串聯，它的作用是改變網絡的維度；Identity Block輸入維度和輸出維度相同，可以串聯，用於加深網絡的。
Conv Block的結構如下：

Identity Block的結構如下：

這兩個都是殘差網絡結構。

Faster-RCNN的主幹特徵提取網絡部分只包含了長寬壓縮了四次的內容，第五次壓縮後的內容在ROI中使用。即Faster-RCNN在主幹特徵提取網絡所用的網絡層如圖所示。

以輸入的圖片爲600x600爲例，shape變化如下：

最後一層的輸出就是公用特徵層。

在代碼裏裏面，我們使用resnet50()函數來獲得resnet50的公用特徵層。

其中features部分爲公用特徵層，classifier部分爲第二階段用到的分類器。

def resnet50():
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    # 獲取特徵提取部分
    features = list([model.conv1, model.bn1, model.relu, model.maxpool, model.layer1, model.layer2, model.layer3])
    # 獲取分類部分
    classifier = list([model.layer4, model.avgpool])
    features = nn.Sequential(*features)
    classifier = nn.Sequential(*classifier)
    return features,classifier

全部實現代碼爲：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import math
import torch.utils.model_zoo as model_zoo
import pdb


model_urls = {
'resnet18': 'https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth',
'resnet34': 'https://s3.amazonaws.com/pytorch/models/resnet34-333f7ec4.pth',
'resnet50': 'https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth',
'resnet101': 'https://s3.amazonaws.com/pytorch/models/resnet101-5d3b4d8f.pth',
'resnet152': 'https://s3.amazonaws.com/pytorch/models/resnet152-b121ed2d.pth',
}


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) # change
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, # change
                    padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                    bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # change
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        
        self.avgpool = nn.AvgPool2d(7)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                    kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(planes * block.expansion),
        )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

def resnet50():
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    # 獲取特徵提取部分
    features = list([model.conv1, model.bn1, model.relu, model.maxpool, model.layer1, model.layer2, model.layer3])
    # 獲取分類部分
    classifier = list([model.layer4, model.avgpool])
    features = nn.Sequential(*features)
    classifier = nn.Sequential(*classifier)
    return features,classifier

2、獲得Proposal建議框

獲得的公用特徵層在圖像中就是Feature Map，其有兩個應用，一個是和ROIPooling結合使用、另一個是進行一次3x3的卷積後，進行一個18通道的1x1卷積，還有一個36通道的1x1卷積。

在Faster-RCNN中，num_priors也就是先驗框的數量就是9，所以兩個1x1卷積的結果實際上也就是：

9 x 4的卷積 用於預測 公用特徵層上 每一個網格點上每一個先驗框的變化情況。（爲什麼說是變化情況呢，這是因爲Faster-RCNN的預測結果需要結合先驗框獲得預測框，預測結果就是先驗框的變化情況。）

9 x 2的卷積 用於預測 公用特徵層上 每一個網格點上 每一個預測框內部是否包含了物體，序號爲1的內容爲包含物體的概率。

當我們輸入的圖片的shape是600x600x3的時候，公用特徵層的shape就是38x38x1024，相當於把輸入進來的圖像分割成38x38的網格，然後每個網格存在9個先驗框，這些先驗框有不同的大小，在圖像上密密麻麻。

9 x 4的卷積的結果會對這些先驗框進行調整，獲得一個新的框。
9 x 2的卷積會判斷上述獲得的新框是否包含物體。

到這裏我們可以獲得了一些有用的框，這些框會利用9 x 2的卷積判斷是否存在物體。

到此位置還只是粗略的一個框的獲取，也就是一個建議框。然後我們會在建議框裏面繼續找東西。

實現代碼爲：

class RegionProposalNetwork(nn.Module):
    def __init__(
            self, in_channels=512, mid_channels=512, ratios=[0.5, 1, 2],
            anchor_scales=[8, 16, 32], feat_stride=16,
            mode = "training",
    ):
        super(RegionProposalNetwork, self).__init__()
        self.anchor_base = generate_anchor_base(anchor_scales=anchor_scales, ratios=ratios)
        # 步長，壓縮的倍數
        self.feat_stride = feat_stride
        self.proposal_layer = ProposalCreator(mode)
        # 每一個網格上默認先驗框的數量
        n_anchor = self.anchor_base.shape[0]
        # 先進行一個3x3的卷積
        self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
        # 分類預測先驗框內部是否包含物體
        self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0)
        # 迴歸預測對先驗框進行調整
        self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0)
        normal_init(self.conv1, 0, 0.01)
        normal_init(self.score, 0, 0.01)
        normal_init(self.loc, 0, 0.01)

    def forward(self, x, img_size, scale=1.):
        n, _, hh, ww = x.shape
        # 對共享特徵層進行一個3x3的卷積
        h = F.relu(self.conv1(x))
        # 迴歸預測
        rpn_locs = self.loc(h)
        rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)
        # 分類預測
        rpn_scores = self.score(h)
        rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous().view(n, -1, 2)
        # 進行softmax
        rpn_softmax_scores = F.softmax(rpn_scores, dim=-1)
        rpn_fg_scores = rpn_softmax_scores[:, :, 1].contiguous()
        rpn_fg_scores = rpn_fg_scores.view(n, -1)
        rpn_scores = rpn_scores.view(n, -1, 2)

3、Proposal建議框的解碼

通過第二步我們獲得了38x38x9個先驗框的預測結果。預測結果包含兩部分。

9 x 4的卷積 用於預測 公用特徵層上 每一個網格點上每一個先驗框的變化情況。**

9 x 2的卷積 用於預測 公用特徵層上 每一個網格點上 每一個預測框內部是否包含了物體。

相當於就是將整個圖像分成38x38個網格；然後從每個網格中心建立9個先驗框，一共38x38x9個，12996個先驗框。

當輸入圖像shape不同時，先驗框的數量也會發生改變。

先驗框雖然可以代表一定的框的位置信息與框的大小信息，但是其是有限的，無法表示任意情況，因此還需要調整。

9 x 4中的9表示了這個網格點所包含的先驗框數量，其中的4表示了框的中心與長寬的調整情況。

實現代碼如下：

class ProposalCreator():
    def __init__(self,
                 mode,
                 nms_thresh=0.7,
                 n_train_pre_nms=3000,
                 n_train_post_nms=300,
                 n_test_pre_nms=3000,
                 n_test_post_nms=300,
                 min_size=16
                 ):
        self.mode = mode
        self.nms_thresh = nms_thresh
        self.n_train_pre_nms = n_train_pre_nms
        self.n_train_post_nms = n_train_post_nms
        self.n_test_pre_nms = n_test_pre_nms
        self.n_test_post_nms = n_test_post_nms
        self.min_size = min_size

    def __call__(self, loc, score,
                 anchor, img_size, scale=1.):
        if self.mode == "training":
            n_pre_nms = self.n_train_pre_nms
            n_post_nms = self.n_train_post_nms
        else:
            n_pre_nms = self.n_test_pre_nms
            n_post_nms = self.n_test_post_nms
        # 將RPN網絡預測結果轉化成建議框
        roi = loc2bbox(anchor, loc)

        # 利用slice進行分割，防止建議框超出圖像邊緣
        roi[:, slice(0, 4, 2)] = np.clip(roi[:, slice(0, 4, 2)], 0, img_size[1])
        roi[:, slice(1, 4, 2)] = np.clip(roi[:, slice(1, 4, 2)], 0, img_size[0])
        
        # 寬高的最小值不可以小於16
        min_size = self.min_size * scale
        # 計算高寬
        ws = roi[:, 2] - roi[:, 0]
        hs = roi[:, 3] - roi[:, 1]
        # 防止建議框過小
        keep = np.where((hs >= min_size) & (ws >= min_size))[0]
        roi = roi[keep, :]
        score = score[keep]
        # 取出成績最好的一些建議框
        order = score.ravel().argsort()[::-1]
        if n_pre_nms > 0:
            order = order[:n_pre_nms]
        roi = roi[order, :]
        roi = nms(roi,self.nms_thresh)
        roi = torch.Tensor(roi)
        roi = roi[:n_post_nms]
        return roi

def loc2bbox(src_bbox, loc):
    if src_bbox.shape[0] == 0:
        return np.zeros((0, 4), dtype=loc.dtype)

    src_bbox = src_bbox.astype(src_bbox.dtype, copy=False)
    src_width = src_bbox[:, 2] - src_bbox[:, 0]
    src_height = src_bbox[:, 3] - src_bbox[:, 1]
    src_ctr_x = src_bbox[:, 0] + 0.5 * src_width
    src_ctr_y = src_bbox[:, 1] + 0.5 * src_height

    dx = loc[:, 0::4]
    dy = loc[:, 1::4]
    dw = loc[:, 2::4]
    dh = loc[:, 3::4]

    ctr_x = dx * src_width[:, np.newaxis] + src_ctr_x[:, np.newaxis]
    ctr_y = dy * src_height[:, np.newaxis] + src_ctr_y[:, np.newaxis]
    w = np.exp(dw) * src_width[:, np.newaxis]
    h = np.exp(dh) * src_height[:, np.newaxis]

    dst_bbox = np.zeros(loc.shape, dtype=loc.dtype)
    dst_bbox[:, 0::4] = ctr_x - 0.5 * w
    dst_bbox[:, 1::4] = ctr_y - 0.5 * h
    dst_bbox[:, 2::4] = ctr_x + 0.5 * w
    dst_bbox[:, 3::4] = ctr_y + 0.5 * h

    return dst_bbox

4、對Proposal建議框加以利用（RoiPoolingConv）

讓我們對建議框有一個整體的理解：
事實上建議框就是對圖片哪一個區域有物體存在進行初步篩選。

通過主幹特徵提取網絡，我們可以獲得一個公用特徵層，當輸入圖片爲600x600x3的時候，它的shape是38x38x1024，然後建議框會對這個公用特徵層進行截取。

其實公用特徵層裏面的38x38對應着圖片裏的38x38個區域，38x38中的每一個點相當於這個區域內部所有特徵的濃縮。

建議框會對這38x38個區域進行截取，也就是認爲這些區域裏存在目標，然後將截取的結果進行resize，resize到14x14x1024的大小。

然後再對每個建議框再進行Resnet原有的第五次壓縮。壓縮完後進行一個平均池化，再進行一個Flatten，最後分別進行一個num_classes的全連接和(num_classes)x4全連接。

num_classes的全連接用於對最後獲得的框進行分類，(num_classes)x4全連接用於對相應的建議框進行調整。

通過這些操作，我們可以獲得所有建議框的調整情況，和這個建議框調整後框內物體的類別。

事實上，在上一步獲得的建議框就是ROI的先驗框。

對Proposal建議框加以利用的過程與shape變化如圖所示：

建議框調整後的結果就是最終的預測結果了，可以在圖上進行繪畫了。

class Resnet50RoIHead(nn.Module):
    def __init__(self, n_class, roi_size, spatial_scale,
                 classifier):
        # n_class includes the background
        super(Resnet50RoIHead, self).__init__()
        # 獲得用於分類的層
        self.classifier = classifier
        self.cls_loc = nn.Linear(2048, n_class * 4)
        self.score = nn.Linear(2048, n_class)

        normal_init(self.cls_loc, 0, 0.001)
        normal_init(self.score, 0, 0.01)
        # 分多少個類，包括背景
        self.n_class = n_class
        # 以VGG爲backbone時，roi_size爲7
        self.roi_size = roi_size
        self.spatial_scale = spatial_scale  
        self.roi = RoIPooling2D(self.roi_size, self.roi_size, self.spatial_scale)

    def forward(self, x, rois, roi_indices):
        roi_indices = torch.Tensor(roi_indices).cuda().float()
        rois = torch.Tensor(rois).cuda().float()
        indices_and_rois = torch.cat([roi_indices[:, None], rois], dim=1)

        xy_indices_and_rois = indices_and_rois[:, [0, 1, 2, 3, 4]]
        indices_and_rois =  xy_indices_and_rois.contiguous()
        # 利用建議框對公用特徵層進行截取
        pool = self.roi(x, indices_and_rois)
        fc7 = self.classifier(pool)
        fc7 = fc7.view(fc7.size(0), -1)
        roi_cls_locs = self.cls_loc(fc7)
        roi_scores = self.score(fc7)
        return roi_cls_locs, roi_scores

5、在原圖上進行繪製

在第四步的結尾，我們對建議框進行再一次進行解碼後，我們可以獲得預測框在原圖上的位置，而且這些預測框都是經過篩選的。這些篩選後的框可以直接繪製在圖片上，就可以獲得結果了。

6、整體的執行流程

幾個小tip：
1、共包含了兩次解碼過程。
2、先進行粗略的篩選再細調。
3、第一次獲得的建議框解碼後的結果是對共享特徵層featuremap進行截取。

二、訓練部分

Faster-RCNN的訓練過程和它的預測過程一樣，分爲兩部分，首先要訓練獲得建議框網絡，然後再訓練後面利用ROI獲得預測結果的網絡。

1、建議框網絡的訓練

公用特徵層如果要獲得建議框的預測結果，需要再進行一次3x3的卷積後，進行一個2通道的1x1卷積，還有一個36通道的1x1卷積。

在Faster-RCNN中，num_priors也就是先驗框的數量就是9，所以兩個1x1卷積的結果實際上也就是：

9 x 2的卷積 用於預測 公用特徵層上 每一個網格點上 每一個預測框內部是否包含了物體。

也就是說，我們直接利用Faster-RCNN建議框網絡預測到的結果，並不是建議框在圖片上的真實位置，需要解碼才能得到真實位置。

而在訓練的時候，我們需要計算loss函數，這個loss函數是相對於Faster-RCNN建議框網絡的預測結果的。我們需要把圖片輸入到當前的Faster-RCNN建議框的網絡中，得到建議框的結果；同時還需要進行編碼，這個編碼是把真實框的位置信息格式轉化爲Faster-RCNN建議框預測結果的格式信息。

也就是，我們需要找到 每一張用於訓練的圖片的每一個真實框對應的先驗框，並求出如果想要得到這樣一個真實框，我們的建議框預測結果應該是怎麼樣的。

從建議框預測結果獲得真實框的過程被稱作解碼，而從真實框獲得建議框預測結果的過程就是編碼的過程。

因此我們只需要將解碼過程逆過來就是編碼過程了。

實現代碼如下：


class AnchorTargetCreator(object):
    def __init__(self,
                 n_sample=256,
                 pos_iou_thresh=0.7, neg_iou_thresh=0.3,
                 pos_ratio=0.5):
        self.n_sample = n_sample
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh = neg_iou_thresh
        self.pos_ratio = pos_ratio

    def __call__(self, bbox, anchor, img_size):
        argmax_ious, label = self._create_label(anchor, bbox)
        # 利用先驗框和其對應的真實框進行編碼
        loc = bbox2loc(anchor, bbox[argmax_ious])

        return loc, label

    def _create_label(self, anchor, bbox):
        # 1是正樣本，0是負樣本，-1忽略
        label = np.empty((len(anchor),), dtype=np.int32)
        label.fill(-1)

        # argmax_ious爲每個先驗框對應的最大的真實框的序號
        # max_ious爲每個真實框對應的最大的真實框的iou
        # gt_argmax_ious爲每一個真實框對應的最大的先驗框的序號
        argmax_ious, max_ious, gt_argmax_ious = \
            self._calc_ious(anchor, bbox)

        # 如果小於門限函數則設置爲負樣本
        label[max_ious < self.neg_iou_thresh] = 0

        # 每個真實框至少對應一個先驗框
        label[gt_argmax_ious] = 1
        
        # 如果大於門限函數則設置爲正樣本
        label[max_ious >= self.pos_iou_thresh] = 1

        # 判斷正樣本數量是否大於128，如果大於的話則去掉一些
        n_pos = int(self.pos_ratio * self.n_sample)
        pos_index = np.where(label == 1)[0]
        if len(pos_index) > n_pos:
            disable_index = np.random.choice(
                pos_index, size=(len(pos_index) - n_pos), replace=False)
            label[disable_index] = -1

        # 平衡正負樣本，保持總數量爲256
        n_neg = self.n_sample - np.sum(label == 1)
        neg_index = np.where(label == 0)[0]
        if len(neg_index) > n_neg:
            disable_index = np.random.choice(
                neg_index, size=(len(neg_index) - n_neg), replace=False)
            label[disable_index] = -1

        return argmax_ious, label

    def _calc_ious(self, anchor, bbox):
        # 計算所有
        ious = bbox_iou(anchor, bbox)
        # 行是先驗框，列是真實框
        argmax_ious = ious.argmax(axis=1)
        # 找出每一個先驗框對應真實框最大的iou
        max_ious = ious[np.arange(len(anchor)), argmax_ious]
        # 行是先驗框，列是真實框
        gt_argmax_ious = ious.argmax(axis=0)
        # 找到每一個真實框對應的先驗框最大的iou
        gt_max_ious = ious[gt_argmax_ious, np.arange(ious.shape[1])]
        # 每一個真實框對應的最大的先驗框的序號
        gt_argmax_ious = np.where(ious == gt_max_ious)[0]

        return argmax_ious, max_ious, gt_argmax_ious

def bbox2loc(src_bbox, dst_bbox):
    width = src_bbox[:, 2] - src_bbox[:, 0]
    height = src_bbox[:, 3] - src_bbox[:, 1]
    ctr_x = src_bbox[:, 0] + 0.5 * width
    ctr_y = src_bbox[:, 1] + 0.5 * height

    base_width = dst_bbox[:, 2] - dst_bbox[:, 0]
    base_height = dst_bbox[:, 3] - dst_bbox[:, 1]
    base_ctr_x = dst_bbox[:, 0] + 0.5 * base_width
    base_ctr_y = dst_bbox[:, 1] + 0.5 * base_height

    eps = np.finfo(height.dtype).eps
    width = np.maximum(width, eps)
    height = np.maximum(height, eps)

    dx = (base_ctr_x - ctr_x) / width
    dy = (base_ctr_y - ctr_y) / height
    dw = np.log(base_width / width)
    dh = np.log(base_height / height)

    loc = np.vstack((dx, dy, dw, dh)).transpose()
    return loc

focal會忽略一些重合度相對較高但是不是非常高的先驗框，一般將重合度在0.3-0.7之間的先驗框進行忽略。

2、Roi網絡的訓練

通過上一步已經可以對建議框網絡進行訓練了，建議框網絡會提供一些位置的建議，在ROI網絡部分，其會將建議框根據進行一定的截取，並獲得對應的預測結果，事實上就是將上一步建議框當作了ROI網絡的先驗框。

因此，我們需要計算所有建議框和真實框的重合程度，並進行篩選，如果某個真實框和建議框的重合程度大於0.5則認爲該建議框爲正樣本，如果重合程度小於0.5則認爲該建議框爲負樣本

因此我們可以對真實框進行編碼，這個編碼是相對於建議框的，也就是，當我們存在這些建議框的時候，我們的ROI預測網絡需要有什麼樣的預測結果才能將這些建議框調整成真實框。

每次訓練我們都放入128個建議框進行訓練，同時要注意正負樣本的平衡。
實現代碼如下：

# 編碼
class ProposalTargetCreator(object):
    def __init__(self,n_sample=128,
                 pos_ratio=0.5, pos_iou_thresh=0.5,
                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
                 ):
        self.n_sample = n_sample
        self.pos_ratio = pos_ratio
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh_hi = neg_iou_thresh_hi
        self.neg_iou_thresh_lo = neg_iou_thresh_lo  # NOTE:default 0.1 in py-faster-rcnn

    def __call__(self, roi, bbox, label,
                 loc_normalize_mean=(0., 0., 0., 0.),
                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
        n_bbox, _ = bbox.shape

        # 計算正樣本
        roi = np.concatenate((roi, bbox), axis=0)
        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
        iou = bbox_iou(roi, bbox)
        gt_assignment = iou.argmax(axis=1)
        max_iou = iou.max(axis=1)
        # 真實框的標籤要+1因爲有背景的存在
        gt_roi_label = label[gt_assignment] + 1

        # 找到大於門限的真實框的索引
        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
        if pos_index.size > 0:
            pos_index = np.random.choice(
                pos_index, size=pos_roi_per_this_image, replace=False)

        # 正負樣本的平衡，滿足建議框和真實框重合程度小於neg_iou_thresh_hi大於neg_iou_thresh_lo作爲負樣本
        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
                             (max_iou >= self.neg_iou_thresh_lo))[0]
        if neg_index.size > 0:
            try:
                neg_index = np.random.choice(
                    neg_index, size=self.n_sample - pos_roi_per_this_image, replace=False)
            except:
                neg_index = np.random.choice(
                    neg_index, size=self.n_sample - pos_roi_per_this_image, replace=True)

        # 取出這些框對應的標籤
        keep_index = np.append(pos_index, neg_index)
        gt_roi_label = gt_roi_label[keep_index]
        gt_roi_label[pos_roi_per_this_image:] = 0
        sample_roi = roi[keep_index]

        # 找到
        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])
        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                       ) / np.array(loc_normalize_std, np.float32))

        return sample_roi, gt_roi_loc, gt_roi_label

def bbox2loc(src_bbox, dst_bbox):
    width = src_bbox[:, 2] - src_bbox[:, 0]
    height = src_bbox[:, 3] - src_bbox[:, 1]
    ctr_x = src_bbox[:, 0] + 0.5 * width
    ctr_y = src_bbox[:, 1] + 0.5 * height

    base_width = dst_bbox[:, 2] - dst_bbox[:, 0]
    base_height = dst_bbox[:, 3] - dst_bbox[:, 1]
    base_ctr_x = dst_bbox[:, 0] + 0.5 * base_width
    base_ctr_y = dst_bbox[:, 1] + 0.5 * base_height

    eps = np.finfo(height.dtype).eps
    width = np.maximum(width, eps)
    height = np.maximum(height, eps)

    dx = (base_ctr_x - ctr_x) / width
    dy = (base_ctr_y - ctr_y) / height
    dw = np.log(base_width / width)
    dh = np.log(base_height / height)

    loc = np.vstack((dx, dy, dw, dh)).transpose()
    return loc

訓練自己的Faster-RCNN模型

Faster-RCNN整體的文件夾構架如下：

本文使用VOC格式進行訓練。
訓練前將標籤文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。

訓練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。

在訓練前利用voc2faster-rcnn.py文件生成對應的txt。

再運行根目錄下的voc_annotation.py，運行前需要將classes改成你自己的classes。

classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

就會生成對應的2007_train.txt，每一行對應其圖片位置及其真實框的位置。

在訓練前需要修改model_data裏面的voc_classes.txt文件，需要將classes改成你自己的classes。

也需要將train.py文件下的NUM_CLASSES修改成需要分的類的個數。

運行train.py即可開始訓練。

睿智的目標檢測27——Pytorch搭建Faster R-CNN目標檢測平臺

睿智的目標檢測27——Pytorch搭建Faster R-CNN目標檢測平臺

學習前言

什麼是FasterRCNN目標檢測算法

源碼下載

Faster-RCNN實現思路

一、預測部分

1、主幹網絡介紹

2、獲得Proposal建議框

3、Proposal建議框的解碼

4、對Proposal建議框加以利用（RoiPoolingConv）

5、在原圖上進行繪製

6、整體的執行流程

二、訓練部分

1、建議框網絡的訓練

2、Roi網絡的訓練

訓練自己的Faster-RCNN模型

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

全面系統的AI學習路徑，幫助普通人也能玩轉AI

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

linux服務器設置ssh免密

HTML 00 Tutorial

睿智的目標檢測35——Pytorch 搭建YoloV4-Tiny目標檢測平臺

睿智的目標檢測36——Pytorch搭建Efficientdet目標檢測平臺

睿智的目標檢測34——Keras 搭建YoloV4-Tiny目標檢測平臺

神經網絡學習小記錄39——MobileNetV3（small）模型的復現詳解

神經網絡學習小記錄45——Keras常用學習率下降方式彙總

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結