28目標檢測

一、目標檢測

1.1 目標檢測是什麼?

在這裏插入圖片描述

目標檢測:判斷圖像中目標的類別和位置

目標檢測兩要素

  1. 分類:分類向量[p0, …, pn]
  2. 迴歸:迴歸邊界框[x1, y1, x2, y2]

1.2 代碼示例

# -*- coding: utf-8 -*-

import os
import time
import torch.nn as nn
import torch
import numpy as np
import torchvision.transforms as transforms
import torchvision
from PIL import Image
from matplotlib import pyplot as plt

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# classes_coco
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


if __name__ == "__main__":

    # path_img = os.path.join(BASE_DIR, "demo_img1.png")
    path_img = os.path.join(BASE_DIR, "demo_img2.png")

    # config
    preprocess = transforms.Compose([
        transforms.ToTensor(),
    ])

    # 1. load data & model
    input_image = Image.open(path_img).convert("RGB")
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    model.eval()

    # 2. preprocess
    img_chw = preprocess(input_image)

    # 3. to device
    if torch.cuda.is_available():
        img_chw = img_chw.to('cuda')
        model.to('cuda')

    # 4. forward
    input_list = [img_chw]
    with torch.no_grad():
        tic = time.time()
        print("input img tensor shape:{}".format(input_list[0].shape))
        output_list = model(input_list)
        output_dict = output_list[0]
        print("pass: {:.3f}s".format(time.time() - tic))
        for k, v in output_dict.items():
            print("key:{}, value:{}".format(k, v))

    # 5. visualization
    out_boxes = output_dict["boxes"].cpu()
    out_scores = output_dict["scores"].cpu()
    out_labels = output_dict["labels"].cpu()

    fig, ax = plt.subplots(figsize=(12, 12))
    ax.imshow(input_image, aspect='equal')

    num_boxes = out_boxes.shape[0]
    max_vis = 40
    thres = 0.5

    for idx in range(0, min(num_boxes, max_vis)):

        score = out_scores[idx].numpy()
        bbox = out_boxes[idx].numpy()
        class_name = COCO_INSTANCE_CATEGORY_NAMES[out_labels[idx]]

        if score < thres:
            continue

        ax.add_patch(plt.Rectangle((bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1], fill=False,
                                   edgecolor='red', linewidth=3.5))
        ax.text(bbox[0], bbox[1] - 2, '{:s} {:.3f}'.format(class_name, score), bbox=dict(facecolor='blue', alpha=0.5),
                fontsize=14, color='white')
    plt.show()
    plt.close()



    # appendix
    classes_pascal_voc = ['__background__',
                       'aeroplane', 'bicycle', 'bird', 'boat',
                       'bottle', 'bus', 'car', 'cat', 'chair',
                       'cow', 'diningtable', 'dog', 'horse',
                       'motorbike', 'person', 'pottedplant',
                       'sheep', 'sofa', 'train', 'tvmonitor']

    # classes_coco
    COCO_INSTANCE_CATEGORY_NAMES = [
        '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
        'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
        'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
        'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
        'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
        'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
        'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
        'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
        'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
        'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
    ]

二、目標檢測的實現

2.1 模型是如何完成目標檢測的?

在這裏插入圖片描述
將3D張量映射到兩個張量

  1. 分類張量: shape爲 [N, c+1]
  2. 邊界框張量: shape爲 [N, 4]

2.2 邊界框數量N如何確定?

傳統方法——滑動窗策略

在這裏插入圖片描述
缺點:

  1. 重複計算量大
  2. 窗口大小難確定

利用卷積減少重複計算:
在這裏插入圖片描述
說明:
通過上圖可以發現,下面的最後輸出的2x2x4的左上角的向量,它對應着原圖中左上角14x14的窗口中的圖像經過卷積神經網絡得到的分類輸出,同理,其他三個向量就對應着原圖中另外三個窗口經過網絡的輸出
所以,只要將上面的FC層改爲下面的卷積層,那麼通過一次卷積操作,就能得到原圖中不同窗口對應的分類輸出,這就實現了利用卷積操作來實現滑動窗口的策略

在這裏插入圖片描述
重要概念:
特徵圖一個像素對應原圖的一塊區域

三、深度學習目標檢測模型簡介

3.1 深度學習目標檢測模型

目標檢測模型簡介:
在這裏插入圖片描述
目標檢測綜述——《Object Detection in 20 Years A Survey》

3.2 One-stage和two-stage

One-stage和two-stage:
在這裏插入圖片描述
One-stage:
輸入圖像經過網絡直接得到分類和位置信息

Two-stage:
圖像經過Proposal generation層,得到候選框的位置信息,然後經過ROI pooling層,再經過卷積,最終得到分類和位置信息

注意:
Proposal generation得到的不是feature map,而是候選框的位置信息,候選框個數一般默認爲2000個

3.3 經典目標檢測模型的流程

3.3.1 One Stage——YOLO

模型結構:
在這裏插入圖片描述
總體流程:
輸入爲3-d張量,經過卷積得到一個特徵向量,然後Resize得到一個feature map,然後將其劃分爲nxn的網格(其中一個網格對應原圖中的一塊區域),然後對每個網絡進行位置的迴歸和分類

3.3.2 Two-stage——Faster RCNN

模型結構:
在這裏插入圖片描述
總體流程:
輸入爲3-d張量,經過Backbone網絡得到一個feature map,RPN層對其上的anchor box進行前景和背景的二分類,並且對前景區域推薦候選框,然後對推薦的候選框的分類概率進行降序排序,然後再進行非極大值抑制,得到2000個候選框
ROI層中的自適應池化層對不同候選區域進行池化,然後再經過全連接層,進行最終的位置迴歸和分類

四、Pytorch中的Faster RCNN訓練

4.1 Faster RCNN代碼結構

  1. torchvision.models.detection.fasterrcnn_resnet50_fpn() 返回 FasterRCNN實例
  2. class FasterRCNN(GeneralizedRCNN)
  3. class GeneralizedRCNN(nn.Module)

繼承關係:FasterRCNN繼承於GeneralizedRCNN繼承於nn.Module

FasterRCNN和GeneralizedRCNN的forward函數中主要部分:
forward():

  1. features = self.backbone(images.tensors)
  2. proposals, proposal_losses = self.rpn(images, features, targets)
  3. detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)

在這裏插入圖片描述
第一部分:通過backbone對輸入圖像進行特徵提取,得到特徵圖

在這裏插入圖片描述
第二部分:主要爲兩個模塊——rpn+NMS
rpn:將輸入的特徵向量映射到分類向量和位置迴歸向量,self.head()函數實現
NMS:從數十萬個proposals中挑選出num_anchors_per_level個

在這裏插入圖片描述
第三部分:roi_heads(包含roi pooling層以及後面所有的部分)
select_training_samples():進一步從2000個proposals中篩選512個
box_roi_pool():對特徵圖進行摳圖,得到同樣大小的特徵圖
box_features():將上一步得到的特徵圖經過兩個fc層,得到特徵向量
box_predictor():將得到的特徵向量經過兩個fc層,得到兩個特徵向量(類別和位置)
最終得到輸出detections(包括類別,位置信息,損失函數值),然後再將位置信息映射到原始尺寸上,得到最終輸出

Faster RCNN 主要組件

  1. backbone
  2. rpn
  3. filter_proposals(NMS)
  4. roi_heads

Faster RCNN的數據流:

  1. Feature map: [256, h_f, w_f]
  2. 2 Softmax: [num_anchors, h_f, w_f]
  3. Regressors: [num_anchors*4, h_f, w_f]
  4. NMS OUT: [n_proposals=2000, 4]
  5. ROI Layer: [512, 256, 7, 7],從2000箇中再選512個
  6. FC1 FC2: [512, 1024]
  7. c+1 Softmax: [512, c+1]
  8. Regressors: [512, (c+1)*4]

4.2 Faster RCNN——行人檢測

在這裏插入圖片描述
數據: PennFudanPed數據集, 70張行人照片共345行人標籤
官方地址: http://www.cis.upenn.edu/~jshi/ped_html/
模型: fasterrcnn_resnet50_fpn 進行finetune

# -*- coding: utf-8 -*-
"""
# @file name  : fasterrcnn_train.py
# @author     : TingsongYu https://github.com/TingsongYu
# @date       : 2019-11-30
# @brief      : 訓練faster rcnn
"""

import os
import time
import torch.nn as nn
import torch
import random
import numpy as np
import torchvision.transforms as transforms
import torchvision
from PIL import Image
import torch.nn.functional as F
from tools.my_dataset import PennFudanDataset
from tools.common_tools import set_seed
from torch.utils.data import DataLoader
from matplotlib import pyplot as plt
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.transforms import functional as F

set_seed(1)  # 設置隨機種子

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# classes_coco
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


def vis_bbox(img, output, classes, max_vis=40, prob_thres=0.4):
    fig, ax = plt.subplots(figsize=(12, 12))
    ax.imshow(img, aspect='equal')
    
    out_boxes = output_dict["boxes"].cpu()
    out_scores = output_dict["scores"].cpu()
    out_labels = output_dict["labels"].cpu()
    
    num_boxes = out_boxes.shape[0]
    for idx in range(0, min(num_boxes, max_vis)):

        score = out_scores[idx].numpy()
        bbox = out_boxes[idx].numpy()
        class_name = classes[out_labels[idx]]

        if score < prob_thres:
            continue

        ax.add_patch(plt.Rectangle((bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1], fill=False,
                                   edgecolor='red', linewidth=3.5))
        ax.text(bbox[0], bbox[1] - 2, '{:s} {:.3f}'.format(class_name, score), bbox=dict(facecolor='blue', alpha=0.5),
                fontsize=14, color='white')
    plt.show()
    plt.close()


class Compose(object):
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target


class RandomHorizontalFlip(object):
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
        return image, target


class ToTensor(object):
    def __call__(self, image, target):
        image = F.to_tensor(image)
        return image, target


if __name__ == "__main__":

    # config
    LR = 0.001
    num_classes = 2
    batch_size = 1
    start_epoch, max_epoch = 0, 30
    train_dir = os.path.join(BASE_DIR, "..", "..", "data", "PennFudanPed")
    train_transform = Compose([ToTensor(), RandomHorizontalFlip(0.5)])

    # step 1: data
    train_set = PennFudanDataset(data_dir=train_dir, transforms=train_transform)

    # 收集batch data的函數
    def collate_fn(batch):
        return tuple(zip(*batch))

    train_loader = DataLoader(train_set, batch_size=batch_size, collate_fn=collate_fn)

    # step 2: model
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) # replace the pre-trained head with a new one

    model.to(device)

    # step 3: loss
    # in lib/python3.6/site-packages/torchvision/models/detection/roi_heads.py
    # def fastrcnn_loss(class_logits, box_regression, labels, regression_targets)

    # step 4: optimizer scheduler
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=LR, momentum=0.9, weight_decay=0.0005)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

    # step 5: Iteration

    for epoch in range(start_epoch, max_epoch):

        model.train()
        for iter, (images, targets) in enumerate(train_loader):

            images = list(image.to(device) for image in images)
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            # if torch.cuda.is_available():
            #     images, targets = images.to(device), targets.to(device)

            loss_dict = model(images, targets)  # images is list; targets is [ dict["boxes":**, "labels":**], dict[] ]

            losses = sum(loss for loss in loss_dict.values())

            print("Training:Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} ".format(
                epoch, max_epoch, iter + 1, len(train_loader), losses.item()))

            optimizer.zero_grad()
            losses.backward()
            optimizer.step()

        lr_scheduler.step()

    # test
    model.eval()

    # config
    vis_num = 5
    vis_dir = os.path.join(BASE_DIR, "..", "..", "data", "PennFudanPed", "PNGImages")
    img_names = list(filter(lambda x: x.endswith(".png"), os.listdir(vis_dir)))
    random.shuffle(img_names)
    preprocess = transforms.Compose([transforms.ToTensor(), ])

    for i in range(0, vis_num):

        path_img = os.path.join(vis_dir, img_names[i])
        # preprocess
        input_image = Image.open(path_img).convert("RGB")
        img_chw = preprocess(input_image)

        # to device
        if torch.cuda.is_available():
            img_chw = img_chw.to('cuda')
            model.to('cuda')

        # forward
        input_list = [img_chw]
        with torch.no_grad():
            tic = time.time()
            print("input img tensor shape:{}".format(input_list[0].shape))
            output_list = model(input_list)
            output_dict = output_list[0]
            print("pass: {:.3f}s".format(time.time() - tic))

        # visualization
        vis_bbox(input_image, output_dict, COCO_INSTANCE_CATEGORY_NAMES, max_vis=20, prob_thres=0.5)  # for 2 epoch for nms

在這裏插入圖片描述

發佈了111 篇原創文章 · 獲贊 9 · 訪問量 9128
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章