https://blog.csdn.net/u011974639/article/details/78483779

簡介

代碼源於matterport的工作組，可以在github上fork它們組的工作。

軟件必備

復現的Mask R-CNN是基於Python3，Keras，TensorFlow。

Python 3.4+
TensorFlow 1.3+
Keras 2.0.8+
Jupyter Notebook
Numpy, skimage, scipy

建議配置一個高版本的Anaconda3+TensorFlow-GPU版本。

Mask R-CNN論文回顧

Mask R-CNN(簡稱MRCNN)是基於R-CNN系列、FPN、FCIS等工作之上的，MRCNN的思路很簡潔：Faster R-CNN針對每個候選區域有兩個輸出：種類標籤和bbox的偏移量。那麼MRCNN就在Faster R-CNN的基礎上通過增加一個分支進而再增加一個輸出，即物體掩膜(object mask)。

先回顧一下Faster R-CNN， Faster R-CNN主要由兩個階段組成：區域候選網絡(Region Proposal Network,RPN)和基礎的Fast R-CNN模型。

RPN用於產生候選區域
Fast R-CNN通過RoIPool層對每個候選區域提取特徵，從而實現目標分類和bbox迴歸

MRCNN採用和Faster R-CNN相同的兩個階段，具有相同的第一層(即RPN)，第二階段，除了預測種類和bbox迴歸，並且並行的對每個RoI預測了對應的二值掩膜(binary mask)。示意圖如下：

這樣做可以將整個任務簡化爲mulit-stage pipeline，解耦了多個子任務的關係，現階段來看，這樣做好處頗多。

主要工作

損失函數的定義

依舊採用的是多任務損失函數，針對每個每個RoI定義爲

L = L c l s + L b o x + L m a s k

。

掩膜分支針對每個RoI產生一個Km2。
掩膜分支的損失計算如下示意圖：

mask branch 預測K

對於預測的二值掩膜輸出，我們對每個像素點應用sigmoid函數，整體損失定義爲平均二值交叉損失熵。
引入預測K個輸出的機制，允許每個類都生成獨立的掩膜，避免類間競爭。這樣做解耦了掩膜和種類預測。不像是FCN的方法，在每個像素點上應用softmax函數，整體採用的多任務交叉熵，這樣會導致類間競爭，最終導致分割效果差。

掩膜表示到RoIAlign層

在Faster R-CNN上預測物體標籤或bbox偏移量是將feature map壓縮到FC層最終輸出vector，壓縮的過程丟失了空間上(平面結構)的信息，而掩膜是對輸入目標做空間上的編碼，直接用卷積形式表示像素點之間的對應關係那是最好的了。

輸出掩膜的操作是不需要壓縮輸出vector，所以可以使用FCN(Full Convolutional Network)，不僅效率高，而且參數量還少。爲了更好的表示出RoI輸入和FCN輸出的feature之間的像素對應關係，提出了RoIAlign層。

先回顧一下RoIPool層：

其核心思想是將不同大小的RoI輸入到RoIPool層，RoIPool層將RoI量化成不同粒度的特徵圖（量化成一個一個bin），在此基礎上使用池化操作提取特徵。

下圖是SPPNet內對RoI的操作，在Faster R-CNN中只使用了一種粒度的特徵圖：

平面示意圖如下：

這裏面存在一些問題，在上面量操作上，實際計算中是使用的是[x/16]是舍入操作(rounding)。這套量化舍入操作在提取特徵時有着較好的魯棒性(檢測物體具有平移不變性等)，但是這很不利於掩膜定位，有較大負面效果。

針對這個問題，提出了RoIAlign層：避免了對RoI邊界或bin的量化操作，在擴展feature map時使用雙線性插值算法。這裏實現的架構要看FPN論文：

一開始的Faster R-CNN是基於最上層的特徵映射做分割和預測的，這會丟失高分辨下的信息，直觀的影響就是丟失小目標檢測，對細節部分丟失不敏感。受到SSD的啓發，FPN也使用了多層特徵做預測。這裏使用的top-down的架構，是將高層的特徵反捲積帶到低層的特徵(即有了語義，也有精度)，而在MRCNN論文裏面說的雙線性差值算法就是這裏的top-down反捲積是用的插值算法。

總結

MRCNN有着優異的效果，除去了掩膜分支的作用，很大程度上是因爲基礎特徵網絡的增強，論文使用的是ResNeXt101+FPN的top-down組合，有着極強的特徵學習能力，並且在實驗中夾雜這多種工程調優技巧。

但是吧，MRCNN的缺點也很明顯，需要大的計算能力並且速度慢，這離實際應用還是有很長的路，坐等大神們發力！

如何使用代碼

項目的源代碼地址爲:github/Mask R-CNN

滿足運行環境

Python 3.4+
TensorFlow 1.3+
Keras 2.0.8+
Jupyter Notebook
Numpy, skimage, scipy, Pillow（安裝Anaconda3直接完事）
cv2

下載代碼

linux環境下直接clone到本地

git clone https://github.com/matterport/Mask_RCNN.git

Windows下下載代碼即可，地址在上面

下載模型在COCO數據集上預訓練權重（mask_rcnn_coco.h5），下載地址releasses Page.

如果需要在COCO數據集上訓練或測試，需要安裝pycocotools， clone下來，make生成對應的文件，拷貝下工程目錄下即可(方法可參考下面repos內的README.md文件)。

Linux: https://github.com/waleedka/coco
Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)

如果使用COCO數據集，需要：

pycocotools (即第4條描述的)
MS COCO Dataset。2014的訓練集數據
COCO子數據集，5K的minival和35K的validation-minus-minival。（這兩個數據集下載比較慢，沒有貼原地址，而是我的CSDN地址，分不夠下載的可以私信我~）

下面的代碼分析運行環境都是jupyter。

代碼分析-數據預處理

項目源代碼：matterport - github

inspect_data.ipynb展示了準備訓練數據的預處理步驟.

導包

導入的coco包需要從coco/PythonAPI上下載操作數據代碼，並在本地使用make指令編譯.將生成的pycocotools拷貝至工程的主目錄下，即和該inspect_data.ipynb文件同一目錄。

import os
import sys
import itertools
import math
import logging
import json
import re
import random
from collections import OrderedDict
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.lines as lines
from matplotlib.patches import Polygon

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()

# 選擇任意一個代碼塊 
# import shapes
# config = shapes.ShapesConfig()    # 使用代碼創建數據集，後面會有介紹

# MS COCO 數據集
import coco
config = coco.CocoConfig()
COCO_DIR = "/root/模型復現/Mask_RCNN-master/coco"  # COCO數據存放位置

加載數據集

COCO數據集的訓練集內有82081張圖片，共81類。

# 這裏使用的是COCO
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "train")

# Must call before using the dataset
dataset.prepare()

print("Image Count: {}".format(len(dataset.image_ids)))
print("Class Count: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
    print("{:3}. {:50}".format(i, info['name']))

>>>
>>>
loading annotations into memory...
Done (t=7.68s)
creating index...
index created!
Image Count: 82081
Class Count: 81
  0. BG                                                
  1. person                                            
  2. bicycle   
 ...
 77. scissors                                          
 78. teddy bear                                        
 79. hair drier                                        
 80. toothbrush

隨機找幾張照片看看：

# 加載和展示隨機幾張照片和對應的mask
image_ids = np.random.choice(dataset.image_ids, 4)
for image_id in image_ids:
    image = dataset.load_image(image_id)
    mask, class_ids = dataset.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

Bounding Boxes(bbox)

這裏我們不使用數據集本身提供的bbox座標數據，取而代之的是通過mask計算出bbox，這樣可以在不同的數據集下對bbox使用相同的處理方法。因爲我們是從mask上計算bbox，相比與從圖片計算bbox轉換來說，更便於放縮，旋轉，裁剪圖像。

# Load random image and mask.
image_id = random.choice(dataset.image_ids)
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id ", image_id, dataset.image_reference(image_id))
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id  41194 http://cocodataset.org/#explore?id=190360
image                    shape: (428, 640, 3)         min:    0.00000  max:  255.00000
mask                     shape: (428, 640, 5)         min:    0.00000  max:    1.00000
class_ids                shape: (5,)                  min:    1.00000  max:   59.00000
bbox                     shape: (5, 4)                min:    1.00000  max:  640.00000

調整圖片大小

因爲訓練時是批量處理的，每次batch要處理多張圖片，模型需要一個固定的輸入大小。故將訓練集的圖片放縮到一個固定的大小(1024×1024)，放縮的過程要保持不變的寬高比，如果照片本身不是正方形，那邊就在邊緣填充0.(這在R-CNN論文裏面論證過)。

需要注意的是：原圖片做了放縮，對應的mask也需要放縮，因爲我們的bbox是依據mask計算出來的，這樣省了修改程序了~

# Load random image and mask.
image_id = np.random.choice(dataset.image_ids, 1)[0]
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
original_shape = image.shape
# 調整到固定大小
image, window, scale, padding = utils.resize_image(
    image, 
    min_dim=config.IMAGE_MIN_DIM, 
    max_dim=config.IMAGE_MAX_DIM,
    padding=config.IMAGE_PADDING)
mask = utils.resize_mask(mask, scale, padding) # mask也要放縮
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id: ", image_id, dataset.image_reference(image_id))
print("Original shape: ", original_shape)
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id:  6104 http://cocodataset.org/#explore?id=139889
Original shape:  (426, 640, 3)
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000
mask                     shape: (1024, 1024, 2)       min:    0.00000  max:    1.00000
class_ids                shape: (2,)                  min:   24.00000  max:   24.00000
bbox                     shape: (2, 4)                min:  169.00000  max:  917.00000

原圖片從(426, 640, 3)放大到(1024, 1024, 3),圖片的上下兩端都填充了0(黑色的部分)：

Mini Mask

訓練高分辨率的圖片時，表示每個目標的二值mask也會非常大。例如，訓練一張1024×1024的圖片，其目標物體對應的mask需要1MB的內存(用boolean變量表示單點)，如果1張圖片有100個目標物體就需要100MB。講道理，如果是五顏六色就算了，但實際上表示mask的圖像矩陣上大部分都是0，很浪費空間。

爲了節省空間同時提升訓練速度，我們優化mask的表示方式，不直接存儲那麼多0，而是通過存儲有值座標的相對位置來壓縮表示數據的內存，原理和壓縮算法差類似。

我們存儲在對象邊界框內(bbox內)的mask像素，而不是存儲整張圖片的mask像素，大多數物體相對比於整張圖片是較小的，節省存儲空間是通過少存儲目標周圍的0實現的。
將mask調整到小尺寸56×56，對於大尺寸的物體會丟失一些精度，但是大多數對象的註解並不是很準確，所以大多數情況下這些損失是可以忽略的。（可以在config類中設置mini mask的size。）

說白了就是在處理數據的時候，我們先利用標註的mask信息計算出對應的bbox框，而後利用計算的bbox框反過來改變mask的表示方法，目的就是操作規範化，同時降低存儲空間和計算複雜度。

image_id = np.random.choice(dataset.image_ids, 1)[0]
# 使用load_image_gt方法獲取bbox和mask
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, use_mini_mask=False)

log("image", image)
log("image_meta", image_meta)
log("bbox", bbox)
log("mask", mask)

display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])

>>>
>>>
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  252.00000
image_meta               shape: (89,)                 min:    0.00000  max: 66849.00000
bbox                     shape: (1, 5)                min:   62.00000  max:  987.00000
mask                     shape: (1024, 1024, 1)       min:    0.00000  max:    1.00000

隨機選取一張圖片，可以看到圖片目標相對與圖片本身較小：

visualize.display_instances(image, bbox[:,:4], mask, bbox[:,4], dataset.class_names)

使用load_image_gt方法，傳入use_mini_mask=True實現mini mask操作：

# load_image_gt方法集成了mini_mask的操作
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, augment=True, use_mini_mask=True)
log("mask", mask)
display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])

>>>
>>>
mask                     shape: (56, 56, 1)           min:    0.00000  max:    1.00000

這裏爲了展現效果，將mini_mask表示方法通過expand_mask方法擴大到大圖像下的mask，再繪製試試：

mask = utils.expand_mask(bbox, mask, image.shape)
visualize.display_instances(image, bbox[:,:4], mask, bbox[:,4], dataset.class_names)

可以看到邊界是鋸齒狀，這也是壓縮的副作用，總體來說效果還可以～

Anchors

Anchors是Faster R-CNN內提出的方法。
模型在運行過程中有多層feature map，同時也會有非常多的Anchors，處理好Anchors的順序非常重要。例如使用anchors的順序要匹配卷積處理的順序等規則。

對於FPN網絡，anchor的順序要與卷積層的輸出相匹配：

先按金字塔等級排序，第一層的所有anchors,第二層所有anchors,etc..通過按層次可以很容易分開所有的anchors
對於每個層，通過feature map處理序列來排列anchors，通常，一個卷積層處理一個feature map 是從左上角開始，向右一行一行來整
對於feature map的每個cell，可爲不同比例的Anchors採用隨意順序，這裏我們將採用不同比例的順序當參數傳遞給相應的函數

Anchor步長：在FPN架構下，前幾層的feature map是高分辨率的。例如，如果輸入是1024×1024,那麼第一層的feature map大小爲256×256，這會產生約200K的anchors(2562563),這些anchor都是32×32,相對於圖片像素的步長爲4(1024/256=4),這裏面有很多重疊，如果我們能夠爲feature map的每個點生成獨有的anchor，就會顯著的降低負載，如果設置anchor的步長爲2，那麼anchor的數量就會下降4倍。

這裏我們使用的strides爲2，這和論文不一樣，在Config類中，我們配置了3中比例([0.5, 1, 2])的anchors，以第一層feature map舉例，其大小爲256×256,故有feature_map2×ratiosstride2=256×256×322=49152。

# 生成 Anchors
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 
                                          config.RPN_ANCHOR_RATIOS,
                                          config.BACKBONE_SHAPES,
                                          config.BACKBONE_STRIDES, 
                                          config.RPN_ANCHOR_STRIDE)

# Print summary of anchors
print("Scales: ", config.RPN_ANCHOR_SCALES)
print("ratios: {}, \nAnchors_per_cell:{}".format(config.RPN_ANCHOR_RATIOS , len(config.RPN_ANCHOR_RATIOS)))
print("backbone_shapes: ",config.BACKBONE_SHAPES)
print("backbone_strides: ",config.BACKBONE_STRIDES)
print("rpn_anchor_stride: ",config.RPN_ANCHOR_STRIDE)

num_levels = len(config.BACKBONE_SHAPES)
print("Count: ", anchors.shape[0])
print("Levels: ", num_levels)
anchors_per_level = []
for l in range(num_levels):
    num_cells = config.BACKBONE_SHAPES[l][0] * config.BACKBONE_SHAPES[l][1]
    anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE**2)
    print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))


>>>
>>>
Scales:  (32, 64, 128, 256, 512)
ratios: [0.5, 1, 2], 
 anchors_per_cell:3
backbone_shapes:  [[256 256] [128 128] [ 64  64]  [ 32  32]  [ 16  16]]
backbone_strides:  [4, 8, 16, 32, 64]
rpn_anchor_stride:  2
Count:  65472
Levels:  5
Anchors in Level 0: 49152
Anchors in Level 1: 12288
Anchors in Level 2: 3072
Anchors in Level 3: 768
Anchors in Level 4: 192

看看位置圖片中心點cell的不同層anchor表示：

# Load and draw random image
image_id = np.random.choice(dataset.image_ids, 1)[0]
image, image_meta, _, _ = modellib.load_image_gt(dataset, config, image_id)
fig, ax = plt.subplots(1, figsize=(10, 10))
ax.imshow(image)

levels = len(config.BACKBONE_SHAPES) # 共有5層 15個anchors

for level in range(levels):
    colors = visualize.random_colors(levels)
    # Compute the index of the anchors at the center of the image
    level_start = sum(anchors_per_level[:level]) # sum of anchors of previous levels
    level_anchors = anchors[level_start:level_start+anchors_per_level[level]]
    print("Level {}. Anchors: {:6}  Feature map Shape: {}".format(level, level_anchors.shape[0], 
                                                                config.BACKBONE_SHAPES[level]))
    center_cell = config.BACKBONE_SHAPES[level] // 2
    center_cell_index = (center_cell[0] * config.BACKBONE_SHAPES[level][1] + center_cell[1])
    level_center = center_cell_index * anchors_per_cell 
    center_anchor = anchors_per_cell * (
        (center_cell[0] * config.BACKBONE_SHAPES[level][1] / config.RPN_ANCHOR_STRIDE**2) \
        + center_cell[1] / config.RPN_ANCHOR_STRIDE)
    level_center = int(center_anchor)

    # Draw anchors. Brightness show the order in the array, dark to bright.
    for i, rect in enumerate(level_anchors[level_center:level_center+anchors_per_cell]):
        y1, x1, y2, x2 = rect
        p = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, facecolor='none',
                              edgecolor=(i+1)*np.array(colors[level]) / anchors_per_cell)
        ax.add_patch(p)


>>>
>>>
Level 0. Anchors:  49152  Feature map Shape: [256 256]
Level 1. Anchors:  12288  Feature map Shape: [128 128]
Level 2. Anchors:   3072  Feature map Shape: [64 64]
Level 3. Anchors:    768  Feature map Shape: [32 32]
Level 4. Anchors:    192  Feature map Shape: [16 16]

代碼分析-在自己的數據集上訓練模型

項目源代碼：matterport - github

train_shapes.ipynb展示瞭如何在自己的數據集上訓練Mask R-CNN.

如果想在你的個人訓練集上訓練模型，需要分別創建兩個子類繼承下面兩個父類：

Config類，該類包含了默認的配置，子類繼承該類在針對數據集定製配置。
Dataset類，該類提供了一套api，新的數據集繼承該類，同時覆寫相關方法即可，這樣可以在不修改模型代碼的情況下，使用多種數據集(包括同時使用)。

無論是Dataset還是Config都是基類，使用是要繼承並做相關定製，使用案例可參考下面的demo。

導包

因爲demo中使用的數據集是使用opencv創建出來的，故不需要另外在下載數據集了。爲了保證模型運行正常，此demo依舊需要在GPU上運行。

import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt

from config import Config
import utils
import model as modellib
import visualize
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") # Path to COCO trained weights

構建個人數據集

這裏直接使用opencv創建一個數據集，數據集是由畫布和簡單的幾何形狀(三角形，正方形，圓形)組成。

構造的數據集需要繼承utils.Dataset類，使用load_shapes()方法向外提供加載數據的方法，並需要重寫下面的方法：

load_image()
load_mask()
image_reference()

構造數據集的代碼：

class ShapesDataset(utils.Dataset):
    """
    生成一個數據集，數據集由簡單的(三角形，正方形，圓形)放置在空白畫布的圖片組成。
    """

    def load_shapes(self, count, height, width):
        """
        產生對應數目的固定大小圖片
        count: 生成數據的數量
        height, width: 產生圖片的大小
        """
        # 添加種類信息
        self.add_class("shapes", 1, "square")
        self.add_class("shapes", 2, "circle")
        self.add_class("shapes", 3, "triangle")

        # 生成隨機規格形狀，每張圖片依據image_id指定
        for i in range(count):
            bg_color, shapes = self.random_image(height, width)
            self.add_image("shapes", image_id=i, path=None,
                           width=width, height=height,
                           bg_color=bg_color, shapes=shapes)

    def load_image(self, image_id):
        """
        依據給定的iamge_id產生對應圖片。
        通常這個函數是讀取文件的，這裏我們是依據image_id到image_info裏面查找信息，再生成圖片
        """
        info = self.image_info[image_id]
        bg_color = np.array(info['bg_color']).reshape([1, 1, 3])
        image = np.ones([info['height'], info['width'], 3], dtype=np.uint8)
        image = image * bg_color.astype(np.uint8)
        for shape, color, dims in info['shapes']:
            image = self.draw_shape(image, shape, dims, color)
        return image

    def image_reference(self, image_id):
        """Return the shapes data of the image."""
        info = self.image_info[image_id]
        if info["source"] == "shapes":
            return info["shapes"]
        else:
            super(self.__class__).image_reference(self, image_id)

    def load_mask(self, image_id):
        """依據給定的image_id產生相應的規格形狀的掩膜"""
        info = self.image_info[image_id]
        shapes = info['shapes']
        count = len(shapes)
        mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
        for i, (shape, _, dims) in enumerate(info['shapes']):
            mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(),
                                                shape, dims, 1)
        # Handle occlusions
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count-2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion
            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        # Map class names to class IDs.
        class_ids = np.array([self.class_names.index(s[0]) for s in shapes])
        return mask, class_ids.astype(np.int32)

    def draw_shape(self, image, shape, dims, color):
        """繪製給定的形狀."""
        # Get the center x, y and the size s
        x, y, s = dims
        if shape == 'square':
            image = cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -1)
        elif shape == "circle":
            image = cv2.circle(image, (x, y), s, color, -1)
        elif shape == "triangle":
            points = np.array([[(x, y-s),
                                (x-s/math.sin(math.radians(60)), y+s),
                                (x+s/math.sin(math.radians(60)), y+s),
                                ]], dtype=np.int32)
            image = cv2.fillPoly(image, points, color)
        return image

    def random_shape(self, height, width):
        """
        依據給定的長寬邊界生成隨機形狀

        返回一個有三個值的元組：
        * shape: 形狀名稱(square, circle, ...)
        * color: 形狀顏色(a tuple of 3 values, RGB.)
        * dimensions: 隨機形狀的中心位置和大小(center_x,center_y,size)
        """
        # Shape
        shape = random.choice(["square", "circle", "triangle"])
        # Color
        color = tuple([random.randint(0, 255) for _ in range(3)])
        # Center x, y
        buffer = 20
        y = random.randint(buffer, height - buffer - 1)
        x = random.randint(buffer, width - buffer - 1)
        # Size
        s = random.randint(buffer, height//4)
        return shape, color, (x, y, s)

    def random_image(self, height, width):
        """
        產生有多種形狀的隨機規格的圖片
        返回背景色 和 可以用於繪製圖片的形狀規格列表
        """
        # 隨機生成三個通道顏色
        bg_color = np.array([random.randint(0, 255) for _ in range(3)])
        # 生成一些隨機形狀並記錄它們的bbox
        shapes = []
        boxes = []
        N = random.randint(1, 4)
        for _ in range(N):
            shape, color, dims = self.random_shape(height, width)
            shapes.append((shape, color, dims))
            x, y, s = dims
            boxes.append([y-s, x-s, y+s, x+s])
        # 使用非極大值抑制避免各種形狀之間覆蓋  閾值爲:0.3
        keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), 0.3)
        shapes = [s for i, s in enumerate(shapes) if i in keep_ixs]
        return bg_color, shapes

用上面的數據類構造一組數據，看看：

# 構建訓練集，大小爲500
dataset_train = ShapesDataset()
dataset_train.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_train.prepare()

# 構建驗證集，大小爲50
dataset_val = ShapesDataset()
dataset_val.load_shapes(50, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_val.prepare()

# 隨機選取4個樣本
image_ids = np.random.choice(dataset_train.image_ids, 4)  

for image_id in image_ids:
    image = dataset_train.load_image(image_id)
    mask, class_ids = dataset_train.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

爲上面構造的數據集配置一個對應的ShapesConfig類，該類的作用統一模型配置參數。該類需要繼承Config類：

class ShapesConfig(Config):
    """
    爲數據集添加訓練配置
    繼承基類Config
    """
    NAME = "shapes" # 該配置類的識別符

    #Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1 # GPU數量
    IMAGES_PER_GPU = 8 # 單GPU上處理圖片數(這裏我們構造的數據集圖片小，可以多處理幾張) 

    # 分類種類數目 (包括背景)
    NUM_CLASSES = 1 + 3  # background + 3 shapes

    # 使用小圖片可以更快的訓練
    IMAGE_MIN_DIM = 128 # 圖片的小邊長
    IMAGE_MAX_DIM = 128 # 圖片的大邊長

    # 使用小的anchors，因爲數據圖片和目標都小
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels

    # 減少訓練每張圖片上的ROIs，因爲圖片很小且目標很少，
    # Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 32

    STEPS_PER_EPOCH = 100     # 因爲數據簡單，使用小的epoch

    VALIDATION_STPES = 5    # 因爲epoch較小，使用小的交叉驗證步數

config = ShapesConfig()
config.print()

>>>
>>>

Configurations:
BACKBONE_SHAPES                [[32 32]
 [16 16]
 [ 8  8]
 [ 4  4]
 [ 2  2]]
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     8
BBOX_STD_DEV                   [ 0.1  0.1  0.2  0.2]
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
GPU_COUNT                      1
IMAGES_PER_GPU                 8
IMAGE_MAX_DIM                  128
IMAGE_MIN_DIM                  128
IMAGE_PADDING                  True
IMAGE_SHAPE                    [128 128   3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.002
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [ 123.7  116.8  103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           shapes
NUM_CLASSES                    4
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (8, 16, 32, 64, 128)
RPN_ANCHOR_STRIDE              2
RPN_BBOX_STD_DEV               [ 0.1  0.1  0.2  0.2]
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TRAIN_ROIS_PER_IMAGE           32
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STPES               5
WEIGHT_DECAY                   0.0001

加載模型並訓練

上面配置好了個人數據集和對應的Config了，下面加載預訓練模型：

# 模型有兩種模式: training inference
# 創建模型並設置training模式
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

# 選擇權重類型，這裏我們的預訓練權重是COCO的
init_with = "coco"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # 載入在MS COCO上的預訓練模型,跳過不一樣的分類數目層
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", 
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # 載入你最後訓練的模型，繼續訓練
    model.load_weights(model.find_last()[1], by_name=True)

訓練模型

我們前面基礎層是加載預訓練模型的，在預訓練模型的基礎上再訓練，分爲兩步：

只訓練head部分，爲了不破壞基礎層的提取能力，我們凍結所有backbone layers,只訓練隨機初始化的層，爲了達成只訓練head部分，訓練時需要向train()方法傳入layers='heads'參數。
Fine-tune所有層，上面訓練了一會head部分，爲了更好的適配新的數據集，需要fine-tune，使用layers='all'參數。

這兩個步驟也是做遷移學習的必備套路了~

1. 訓練head部分


# 通過傳入參數layers="heads" 凍結處理head部分的所有層。可以通過傳入一個正則表達式選擇要訓練的層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=1, 
            layers='heads')

>>>
>>>
Starting at epoch 0. LR=0.002

Checkpoint Path: /root/Mask_RCNNmaster/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
...
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch 1/1
100/100 [==============================] - 37s 371ms/step - loss: 2.5472 - rpn_class_loss: 0.0244 - rpn_bbox_loss: 1.1118 - mrcnn_class_loss: 0.3692 - mrcnn_bbox_loss: 0.3783 - mrcnn_mask_loss: 0.3223 - val_loss: 1.7634 - val_rpn_class_loss: 0.0143 - val_rpn_bbox_loss: 0.9989 - val_mrcnn_class_loss: 0.1673 - val_mrcnn_bbox_loss: 0.0857 - val_mrcnn_mask_loss: 0.1559

2. Fine tune 所有層


# 通過傳入參數layers="all"所有層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE / 10,
            epochs=2, 
            layers="all")

>>>
>>>

Starting at epoch 1. LR=0.0002

Checkpoint Path: /root/Mask_RCNN-master/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
conv1                  (Conv2D)
bn_conv1               (BatchNorm)
res2a_branch2a         (Conv2D)
bn2a_branch2a          (BatchNorm)
res2a_branch2b         (Conv2D)
...
...
res5c_branch2c         (Conv2D)
bn5c_branch2c          (BatchNorm)
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch 2/2
100/100 [==============================] - 38s 381ms/step - loss: 11.4351 - rpn_class_loss: 0.0190 - rpn_bbox_loss: 0.9108 - mrcnn_class_loss: 0.2085 - mrcnn_bbox_loss: 0.1606 - mrcnn_mask_loss: 0.2198 - val_loss: 11.2957 - val_rpn_class_loss: 0.0173 - val_rpn_bbox_loss: 0.8740 - val_mrcnn_class_loss: 0.1590 - val_mrcnn_bbox_loss: 0.0997 - val_mrcnn_mask_loss: 0.2296

模型預測

模型預測也需要配置一個類InferenceConfig類，大部分配置和train相同：

class InferenceConfig(ShapesConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

inference_config = InferenceConfig()

# 重新創建模型設置爲inference模式
model = modellib.MaskRCNN(mode="inference", 
                          config=inference_config,
                          model_dir=MODEL_DIR)

# 獲取保存的權重，或者手動指定目錄位置
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
model_path = model.find_last()[1]

# 加載權重
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# 測試隨機圖片
image_id = random.choice(dataset_val.image_ids)
original_image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset_val, inference_config, 
                           image_id, use_mini_mask=False)

log("original_image", original_image)
log("image_meta", image_meta)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

visualize.display_instances(original_image, gt_bbox[:,:4], gt_mask, gt_bbox[:,4], 
                            dataset_train.class_names, figsize=(8, 8))

>>>
>>>
original_image           shape: (128, 128, 3)         min:   18.00000  max:  231.00000
image_meta               shape: (12,)                 min:    0.00000  max:  128.00000
gt_bbox                  shape: (2, 5)                min:    1.00000  max:  115.00000
gt_mask                  shape: (128, 128, 2)         min:    0.00000  max:    1.00000

隨機幾張驗證集圖片看看:

使用模型預測：

def get_ax(rows=1, cols=1, size=8):
    """返回Matplotlib Axes數組用於可視化.提供中心點控制圖形大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

results = model.detect([original_image], verbose=1) # 預測

r = results[0]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                            dataset_val.class_names, r['scores'], ax=get_ax())

>>>
>>>
Processing 1 images
image                    shape: (128, 128, 3)         min:   18.00000  max:  231.00000
molded_images            shape: (1, 128, 128, 3)      min:  -98.80000  max:  127.10000
image_metas              shape: (1, 12)               min:    0.00000  max:  128.00000

計算ap值：

# Compute VOC-Style mAP @ IoU=0.5
# Running on 10 images. Increase for better accuracy.
image_ids = np.random.choice(dataset_val.image_ids, 10)
APs = []
for image_id in image_ids:
    # 加載數據
    image, image_meta, gt_bbox, gt_mask =\
        modellib.load_image_gt(dataset_val, inference_config,
                               image_id, use_mini_mask=False)
    molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0)
    # Run object detection
    results = model.detect([image], verbose=0)
    r = results[0]
    # Compute AP
    AP, precisions, recalls, overlaps =\
        utils.compute_ap(gt_bbox[:,:4], gt_bbox[:,4],
                         r["rois"], r["class_ids"], r["scores"])
    APs.append(AP)

print("mAP: ", np.mean(APs))

>>>
>>>
mAP:  0.9

代碼分析-Mask R-CNN 模型分析

測試，調試和評估Mask R-CNN模型。

導包

這裏會用到自定義的COCO子數據集，5K的minival和35K的validation-minus-minival。（這兩個數據集下載比較慢，沒有貼原地址，而是我的CSDN地址，分不夠下載的可以私信我~）

import os
import sys
import random
import math
import re
import time
import numpy as np
import scipy.misc
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "coco/mask_rcnn_coco.h5")  # Path to trained weights file
SHAPES_MODEL_PATH = os.path.join(ROOT_DIR, "log/shapes20171103T2047/mask_rcnn_shapes_0002.h5") # Path to Shapes trained weights

# Shapes toy dataset
# import shapes
# config = shapes.ShapesConfig()

# MS COCO Dataset
import coco
config = coco.CocoConfig()
COCO_DIR = os.path.join(ROOT_DIR, "coco")  # TODO: enter value here

def get_ax(rows=1, cols=1, size=16):
    """控制繪圖大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

# 創建一個預測配置類InferenceConfig，用於測試預訓練模型
class InferenceConfig(config.__class__):
    # Run detection on one image at a time
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
DEVICE = "/cpu:0"  # /cpu:0 or /gpu:0
TEST_MODE = "inference" # values: 'inference' or 'training'

# 加載驗證集 
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "minival")

# Must call before using the dataset
dataset.prepare()

# 創建模型並設置inference mode
with tf.device(DEVICE):
    model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,
                              config=config)

# Set weights file path
if config.NAME == "shapes":
    weights_path = SHAPES_MODEL_PATH
elif config.NAME == "coco":
    weights_path = COCO_MODEL_PATH
# Or, uncomment to load the last model you trained
# weights_path = model.find_last()[1]

# Load weights
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)


image_id = random.choice(dataset.image_ids)
image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, 
                                       dataset.image_reference(image_id)))
gt_class_id = gt_bbox[:, 4]

# Run object detection
results = model.detect([image], verbose=1)

# Display results
ax = get_ax(1)
r = results[0]
# visualize.display_instances(image, gt_bbox[:,:4], gt_mask, gt_bbox[:,4], 
#                             dataset.class_names, ax=ax[0], title="Ground Truth")
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            dataset.class_names, r['scores'], ax=ax,
                            title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

隨機在數據集中選張照片看看：

區域候選網絡(Region Proposal Network,RPN)

RPN網絡的任務就是做目標區域推薦，從R-CNN中使用的Selective Search方法到Faster R-CNN中使用的Anchor方法，目的就是用更快的方法產生更好的RoI。

RPN在圖像上創建大量的boxes(anchors)，並在anchors上運行一個輕量級的二值分類器返回有目標/無目標的分數。具有高分數的anchors(positive anchors，正樣本)會被傳到下一階段用於分類。

通常，positive anchors也不會完全覆蓋目標，所以RPN在對anchor打分的同時會迴歸一個偏移量和放縮值，用於修正anchors位置和大小。

RPN Target

RPN Target是需要找到有目標的anchor，傳遞到模型後面用於分類等任務。RPN會在一個完整的圖片上覆蓋多種不同形狀的anchors，通過計算anchors與標註的ground truth(GT box)的IoU，認爲IoU≥0.7爲正樣本，IoU≤0.3爲負樣本，卡在中間的丟棄爲中立樣本，訓練模型不使用。

上面提到了訓練RPN的同時會迴歸一個偏移量和放縮值，目的就是用來修正anchor的位置和大小，最終更好的與ground truth相cover。

# 生成RPN trainig targets
# target_rpn_match 值爲1代表positive anchors, -1代表negative，0代表neutral.
target_rpn_match, target_rpn_bbox = modellib.build_rpn_targets(
    image.shape, model.anchors, gt_bbox, model.config)

log("target_rpn_match", target_rpn_match)
log("target_rpn_bbox", target_rpn_bbox)

# 分類所有anchor
positive_anchor_ix = np.where(target_rpn_match[:] == 1)[0]
negative_anchor_ix = np.where(target_rpn_match[:] == -1)[0]
neutral_anchor_ix = np.where(target_rpn_match[:] == 0)[0]

positive_anchors = model.anchors[positive_anchor_ix]
negative_anchors = model.anchors[negative_anchor_ix]
neutral_anchors = model.anchors[neutral_anchor_ix]
log("positive_anchors", positive_anchors)
log("negative_anchors", negative_anchors)
log("neutral anchors", neutral_anchors)

# 對positive anchor做修正
refined_anchors = utils.apply_box_deltas(
    positive_anchors,
    target_rpn_bbox[:positive_anchors.shape[0]] * model.config.RPN_BBOX_STD_DEV)
log("refined_anchors", refined_anchors, )

>>>
>>>
target_rpn_match         shape: (65472,)              min:   -1.00000  max:    1.00000
target_rpn_bbox          shape: (256, 4)              min:   -3.66348  max:    7.29204
positive_anchors         shape: (19, 4)               min:  -53.01934  max: 1030.62742
negative_anchors         shape: (237, 4)              min:  -90.50967  max: 1038.62742
neutral anchors          shape: (65216, 4)            min: -362.03867  max: 1258.03867
refined_anchors          shape: (19, 4)               min:   -0.00000  max: 1024.00000

看看positive anchors和修正後的positive anchors:

visualize.draw_boxes(image, boxes=positive_anchors, refined_boxes=refined_anchors, ax=get_ax())

RPN Prediction

# Run RPN sub-graph
pillar = model.keras_model.get_layer("ROI").output  # node to start searching from
rpn = model.run_graph([image], [
    ("rpn_class", model.keras_model.get_layer("rpn_class").output),
    ("pre_nms_anchors", model.ancestor(pillar, "ROI/pre_nms_anchors:0")),
    ("refined_anchors", model.ancestor(pillar, "ROI/refined_anchors:0")),
    ("refined_anchors_clipped", model.ancestor(pillar, "ROI/refined_anchors_clipped:0")),
    ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")),
    ("proposals", model.keras_model.get_layer("ROI").output),
])

>>>
>>>
rpn_class                shape: (1, 65472, 2)         min:    0.00000  max:    1.00000
pre_nms_anchors          shape: (1, 10000, 4)         min: -362.03867  max: 1258.03870
refined_anchors          shape: (1, 10000, 4)         min: -1030.40588  max: 2164.92578
refined_anchors_clipped  shape: (1, 10000, 4)         min:    0.00000  max: 1024.00000
post_nms_anchor_ix       shape: (1000,)               min:    0.00000  max: 1879.00000
proposals                shape: (1, 1000, 4)          min:    0.00000  max:    1.00000

看看高分的anchors（沒有修正前):

limit = 100
sorted_anchor_ids = np.argsort(rpn['rpn_class'][:,:,1].flatten())[::-1]
visualize.draw_boxes(image, boxes=model.anchors[sorted_anchor_ids[:limit]], ax=get_ax())

看看修正後的高分anchors，超過的圖片邊界的會被截止:

limit = 50
ax = get_ax(1, 2)
visualize.draw_boxes(image, boxes=rpn["pre_nms_anchors"][0, :limit], 
           refined_boxes=rpn["refined_anchors"][0, :limit], ax=ax[0])
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, :limit], ax=ax[1])

對上面的anchors做非極大值抑制:

limit = 50
ixs = rpn["post_nms_anchor_ix"][:limit]
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, ixs], ax=get_ax())

最終的proposal和上面的步驟一致，只是在座標上做了歸一化操作:

limit = 50
# Convert back to image coordinates for display
h, w = config.IMAGE_SHAPE[:2]
proposals = rpn['proposals'][0, :limit] * np.array([h, w, h, w])
visualize.draw_boxes(image, refined_boxes=proposals, ax=get_ax())

測量RPN的召回率(目標被anchors覆蓋的比例)，這裏我們計算召回率有三種方法：

所有的 anchors
所有修正的anchors
經過極大值抑制後的修正Anchors

iou_threshold = 0.7

recall, positive_anchor_ids = utils.compute_recall(model.anchors, gt_bbox, iou_threshold)
print("All Anchors ({:5})       Recall: {:.3f}  Positive anchors: {}".format(
    model.anchors.shape[0], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(rpn['refined_anchors'][0], gt_bbox, iou_threshold)
print("Refined Anchors ({:5})   Recall: {:.3f}  Positive anchors: {}".format(
    rpn['refined_anchors'].shape[1], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(proposals, gt_bbox, iou_threshold)
print("Post NMS Anchors ({:5})  Recall: {:.3f}  Positive anchors: {}".format(
    proposals.shape[0], recall, len(positive_anchor_ids)))

>>>
>>>
All Anchors (65472)       Recall: 0.263  Positive anchors: 5
Refined Anchors (10000)   Recall: 0.895  Positive anchors: 126
Post NMS Anchors (   50)  Recall: 0.526  Positive anchors: 12

Proposal 分類

前面RPN Target是生成region proposal，這裏就要對其分類了~

Proposal Classification

將RPN推選出來的Proposal送到分類部分，最終生成種類概率分佈和bbox迴歸。

# Get input and output to classifier and mask heads.
mrcnn = model.run_graph([image], [
    ("proposals", model.keras_model.get_layer("ROI").output),
    ("probs", model.keras_model.get_layer("mrcnn_class").output),
    ("deltas", model.keras_model.get_layer("mrcnn_bbox").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
])

>>>
>>>
proposals                shape: (1, 1000, 4)          min:    0.00000  max:    1.00000
probs                    shape: (1, 1000, 81)         min:    0.00000  max:    0.99825
deltas                   shape: (1, 1000, 81, 4)      min:   -3.31265  max:    2.86541
masks                    shape: (1, 100, 28, 28, 81)  min:    0.00003  max:    0.99986
detections               shape: (1, 100, 6)           min:    0.00000  max:  930.00000

獲取檢測種類，除去填充的0部分:

det_class_ids = mrcnn['detections'][0, :, 4].astype(np.int32)
det_count = np.where(det_class_ids == 0)[0][0]
det_class_ids = det_class_ids[:det_count]
detections = mrcnn['detections'][0, :det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

captions = ["{} {:.3f}".format(dataset.class_names[int(c)], s) if c > 0 else ""
            for c, s in zip(detections[:, 4], detections[:, 5])]
visualize.draw_boxes(
    image, 
    refined_boxes=detections[:, :4],
    visibilities=[2] * len(detections),
    captions=captions, title="Detections",
    ax=get_ax())

>>>
>>>
11 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

Step by Step Detection

# Proposals是標準座標， 放縮回圖片座標
h, w = config.IMAGE_SHAPE[:2]
proposals = np.around(mrcnn["proposals"][0] * np.array([h, w, h, w])).astype(np.int32)

# Class ID, score, and mask per proposal
roi_class_ids = np.argmax(mrcnn["probs"][0], axis=1)
roi_scores = mrcnn["probs"][0, np.arange(roi_class_ids.shape[0]), roi_class_ids]
roi_class_names = np.array(dataset.class_names)[roi_class_ids]
roi_positive_ixs = np.where(roi_class_ids > 0)[0]

# How many ROIs vs empty rows?
print("{} Valid proposals out of {}".format(np.sum(np.any(proposals, axis=1)), proposals.shape[0]))
print("{} Positive ROIs".format(len(roi_positive_ixs)))

# Class counts
print(list(zip(*np.unique(roi_class_names, return_counts=True))))

>>>
>>>
1000 Valid proposals out of 1000
106 Positive ROIs
[('BG', 894), ('apple', 25), ('cup', 2), ('dog', 4), ('handbag', 2), ('orange', 36), ('person', 36), ('sandwich', 1)]

看一些隨機取出的proposal樣本，BG的不做顯示，主要看有類別的，還有其對應的分數:

limit = 200
ixs = np.random.randint(0, proposals.shape[0], limit)
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[ixs], roi_scores[ixs])]
visualize.draw_boxes(image, boxes=proposals[ixs],
                     visibilities=np.where(roi_class_ids[ixs] > 0, 2, 1),
                     captions=captions, title="ROIs Before Refinment",
                     ax=get_ax())

做bbox修正:

# Class-specific bounding box shifts.
roi_bbox_specific = mrcnn["deltas"][0, np.arange(proposals.shape[0]), roi_class_ids]
log("roi_bbox_specific", roi_bbox_specific)

# Apply bounding box transformations
# Shape: [N, (y1, x1, y2, x2)]
refined_proposals = utils.apply_box_deltas(
    proposals, roi_bbox_specific * config.BBOX_STD_DEV).astype(np.int32)
log("refined_proposals", refined_proposals)

# Show positive proposals
# ids = np.arange(roi_boxes.shape[0])  # Display all
limit = 5
ids = np.random.randint(0, len(roi_positive_ixs), limit)  # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[roi_positive_ixs][ids], roi_scores[roi_positive_ixs][ids])]
visualize.draw_boxes(image, boxes=proposals[roi_positive_ixs][ids],
                     refined_boxes=refined_proposals[roi_positive_ixs][ids],
                     visibilities=np.where(roi_class_ids[roi_positive_ixs][ids] > 0, 1, 0),
                     captions=captions, title="ROIs After Refinment",
                     ax=get_ax())

>>>
>>>
roi_bbox_specific        shape: (1000, 4)             min:   -3.31265  max:    2.86541
refined_proposals        shape: (1000, 4)             min:   -1.00000  max: 1024.00000

濾掉低分的檢測目標:

# Remove boxes classified as background
keep = np.where(roi_class_ids > 0)[0]
print("Keep {} detections:\n{}".format(keep.shape[0], keep))

# Remove low confidence detections
keep = np.intersect1d(keep, np.where(roi_scores >= config.DETECTION_MIN_CONFIDENCE)[0])
print("Remove boxes below {} confidence. Keep {}:\n{}".format(
    config.DETECTION_MIN_CONFIDENCE, keep.shape[0], keep))

>>>
>>>
Keep 106 detections:
[  0   1   2   3   4   5   6   7   9  10  11  12  13  14  15  16  17  18
  19  22  23  24  25  26  27  28  31  34  35  36  37  38  41  43  47  51
  56  65  66  67  68  71  73  75  82  87  91  92 101 102 105 109 110 115
 117 120 123 138 156 164 171 175 177 184 197 205 241 253 258 263 265 280
 287 325 367 430 451 452 464 469 491 514 519 527 554 597 610 686 697 712
 713 748 750 780 815 871 911 917 933 938 942 947 949 953 955 981]

Remove boxes below 0.7 confidence. Keep 44:
[  0   1   2   3   4   5   6   9  12  13  14  17  19  26  31  34  38  41
  43  47  67  75  82  87  92 120 123 164 171 175 177 205 258 325 452 469
 519 697 713 815 871 911 917 949]

做非極大值抑制操作:

# Apply per-class non-max suppression
pre_nms_boxes = refined_proposals[keep]
pre_nms_scores = roi_scores[keep]
pre_nms_class_ids = roi_class_ids[keep]

nms_keep = []
for class_id in np.unique(pre_nms_class_ids):
    # Pick detections of this class
    ixs = np.where(pre_nms_class_ids == class_id)[0]
    # Apply NMS
    class_keep = utils.non_max_suppression(pre_nms_boxes[ixs], 
                                            pre_nms_scores[ixs],
                                            config.DETECTION_NMS_THRESHOLD)
    # Map indicies
    class_keep = keep[ixs[class_keep]]
    nms_keep = np.union1d(nms_keep, class_keep)
    print("{:22}: {} -> {}".format(dataset.class_names[class_id][:20], 
                                   keep[ixs], class_keep))

keep = np.intersect1d(keep, nms_keep).astype(np.int32)
print("\nKept after per-class NMS: {}\n{}".format(keep.shape[0], keep))

>>>
>>>
person                : [  0   1   2   3   5   9  12  13  14  19  26  41  43  47  82  92 120 123
 175 177 258 452 469 519 871 911 917] -> [ 5 12  1  2  3 19]
dog                   : [  6  75 171] -> [75]
handbag               : [815] -> [815]
apple                 : [38] -> [38]
orange                : [  4  17  31  34  67  87 164 205 325 697 713 949] -> [ 4 87]

Kept after per-class NMS: 11
[  1   2   3   4   5  12  19  38  75  87 815]

看看最後的結果：

ixs = np.arange(len(keep))  # Display all
# ixs = np.random.randint(0, len(keep), 10)  # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c > 0 else ""
            for c, s in zip(roi_class_ids[keep][ixs], roi_scores[keep][ixs])]
visualize.draw_boxes(
    image, boxes=proposals[keep][ixs],
    refined_boxes=refined_proposals[keep][ixs],
    visibilities=np.where(roi_class_ids[keep][ixs] > 0, 1, 0),
    captions=captions, title="Detections after NMS",
    ax=get_ax())

生成Mask

在上一階段產生的實例基礎上，通過mask head爲每個實例產生分割mask。

Mask Target

即Mask分支的訓練目標:

display_images(np.transpose(gt_mask, [2, 0, 1]), cmap="Blues")

Predicted Masks

# Get predictions of mask head
mrcnn = model.run_graph([image], [
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
])

# Get detection class IDs. Trim zero padding.
det_class_ids = mrcnn['detections'][0, :, 4].astype(np.int32)
det_count = np.where(det_class_ids == 0)[0][0]
det_class_ids = det_class_ids[:det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

# Masks
det_boxes = mrcnn["detections"][0, :, :4].astype(np.int32)
det_mask_specific = np.array([mrcnn["masks"][0, i, :, :, c] 
                              for i, c in enumerate(det_class_ids)])
det_masks = np.array([utils.unmold_mask(m, det_boxes[i], image.shape)
                      for i, m in enumerate(det_mask_specific)])
log("det_mask_specific", det_mask_specific)
log("det_masks", det_masks)

display_images(det_mask_specific[:4] * 255, cmap="Blues", interpolation="none")

>>>
>>>
detections               shape: (1, 100, 6)           min:    0.00000  max:  930.00000
masks                    shape: (1, 100, 28, 28, 81)  min:    0.00003  max:    0.99986
11 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

det_mask_specific        shape: (11, 28, 28)          min:    0.00016  max:    0.99985
det_masks                shape: (11, 1024, 1024)      min:    0.00000  max:    1.00000

display_images(det_masks[:4] * 255, cmap="Blues", interpolation="none")

（轉）TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實現)

簡介