Mask R-CNN开源项目的设计非常易于扩展，只需做简单的修改就可以训练自己的数据集。

一、标注数据

这里我只是简单从ImageNet2012数据集中选取了两类图像：猫和狗，每一类各五十幅图像，作为训练集。再各另取二十副图像作为验证集。再各另取十副图像作为测试集。

标注图像采用VGG Image Annotator (VIA)标注工具。

使用方法请参考：深度学习图像标注工具VGG Image Annotator (VIA)使用教程

二、修改源代码

Mask R-CNN的代码仓库中已经有多个例子可以参考，我这里在samples目录下新建了一个文件夹catvsdog，将samples/balloon/balloon.py复制到samples/catvsdog/下，重命名为catvsdog.py。

2.1 修改config

我这里本来是分成2类，但由于我的训练集中混入了非cat和dog的图像，所以在标注是我定义了一个not_defined类别，所以这里是1+3，注意1代表背景是一类。IMAGES_PER_GPU改为1，其他的参数暂时不修改。

class CatVSDogConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    """
    # Give the configuration a recognizable name
    NAME = "catvsdog"

    # We use a GPU with 12GB memory, which can fit two images.
    # Adjust down if you use a smaller GPU.
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 3  # Background + cat + dog + not_defined

    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9

2.2 修改Dataset类

2.2.1 修改load_xxx函数

首先要添加类，然后是解析annotations信息。

   def load_cat_dog(self, dataset_dir, subset):
        """Load a subset of the CatVSDog dataset.
        dataset_dir: Root directory of the dataset.
        subset: Subset to load: train or val
        """
        # Add classes. We have only one class to add.
        self.add_class("catvsdog", 1, "cat")
        self.add_class("catvsdog", 2, "dog")
        self.add_class("catvsdog", 3, "not_defined")

        # Train or validation dataset?
        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        # Load annotations
        # VGG Image Annotator saves each image in the form:
        # { 'filename': '28503151_5b5b7ec140_b.jpg',
        #   'regions': {
        #       '0': {
        #           'region_attributes': {},
        #           'shape_attributes': {
        #               'all_points_x': [...],
        #               'all_points_y': [...],
        #               'name': 'polygon'}},
        #       ... more regions ...
        #   },
        #   'size': 100202
        # }
        # We mostly care about the x and y coordinates of each region
        annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
        annotations = list(annotations.values())  # don't need the dict keys

        # The VIA tool saves images in the JSON even if they don't have any
        # annotations. Skip unannotated images.
        annotations = [a for a in annotations if a['regions']]
        
        # Add images
        for a in annotations:
            # Get the x, y coordinaets of points of the rects that make up
            # the outline of each object instance. There are stores in the
            # shape_attributes (see json format above)
            rects = [r['shape_attributes'] for r in a['regions']]
            name = [r['region_attributes']['name'] for r in a['regions']]
            name_dict = {"cat":1, "dog":2, "not_defined":3}
            name_id = [name_dict[a] for a in name]

            # load_mask() needs the image size to convert rects to masks.
            # Unfortunately, VIA doesn't include it in JSON, so we must read
            # the image. This is only managable since the dataset is tiny.
            image_path = os.path.join(dataset_dir, a['filename'])
            image = skimage.io.imread(image_path)
            height, width = image.shape[:2]

            self.add_image(
                "catvsdog",
                image_id=a['filename'],  # use file name as a unique image id
                path=image_path,
                class_id=name_id,
                width=width, height=height,
                polygons=rects)

2.2.2 修改load_mask函数

这里因为我在标注是为简单起见，只用了矩形标注框，所以这里使用的是skimage.draw.rectangle和balloon里使用的skimage.draw.polyon不同。

    def load_mask(self, image_id):
        """Generate instance masks for an image.
       Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a balloon dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "catvsdog":
            return super(self.__class__, self).load_mask(image_id)
        
        name_id = image_info["class_id"]
        print(name_id)
        # Convert polygons to a bitmap mask of shape
        # [height, width, instance_count]
        info = self.image_info[image_id]
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        class_ids = np.array(name_id, dtype=np.int32)

        for i, p in enumerate(info["polygons"]):
            # Get indexes of pixels inside the polygon and set them to 1
            rr, cc = skimage.draw.rectangle((p['y'], p['x']), extent=(p['height'], p['width']))
            mask[rr, cc, i] = 1

        # Return mask, and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
        return (mask.astype(np.bool), class_ids)

2.2.3 修改image_reference函数

def image_reference(self, image_id):
    """Return the path of the image."""
    info = self.image_info[image_id]
    if info["source"] == "catvsdog":
        return info["path"]
    else:
        super(self.__class__, self).image_reference(image_id)

2.2.4 修改train函数

def train(model):
    """Train the model."""
    # Training dataset.
    dataset_train = CatVSDogDataset()
    dataset_train.load_cat_dog(args.dataset, "train")
    dataset_train.prepare()

    # Validation dataset
    dataset_val = CatVSDogDataset()
    dataset_val.load_cat_dog(args.dataset, "val")
    dataset_val.prepare()

    # *** This training schedule is an example. Update to your needs ***
    # Since we're using a very small dataset, and starting from
    # COCO trained weights, we don't need to train too long. Also,
    # no need to train all layers, just the heads should do it.
    print("Training network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=30,
                layers='heads')

三、训练

请提前下载好coco预训练数据mask_rcnn_coco.h5。

我在Mask R-CNN代码仓库根目录下执行：

python3 catvsdog.py train --dataset=/path/to/myCatVSDog --weights=coco

这里注意在哪个文件夹下执行命令修改相应的ROOT_DIR。

训练结束后生成了一些列模型数据。

四、测试

我不太习惯用.ipynb文件，所以把他转换成py文件。用jupyter notebook打开samples/demo.ipynb。

选择菜单File --> Download as --> Python(.py)，保存成python文件即可。

修改代码：

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("../")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import config
sys.path.append(os.path.join(ROOT_DIR, "samples/catvsdog/"))  # To find local version
import catvsdog

#get_ipython().run_line_magic('matplotlib', 'inline')

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "catvsdog_logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_catvsdog_0029.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)


class InferenceConfig(catvsdog.CatVSDogConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()


# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)


class_names = ['BG', 'cat', 'dog', 'not_defined']

image = skimage.io.imread('ILSVRC2012_val_00037858.JPEG')

# Run detection
results = model.detect([image], verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            class_names, r['scores'])

执行：

python3 demo.py

【Mask R-CNN】（七）：制作并训练自己的数据集最详细教程

一、标注数据

二、修改源代码

2.1 修改config

2.2 修改Dataset类

2.2.1 修改load_xxx函数

2.2.2 修改load_mask函数

2.2.3 修改image_reference函数

2.2.4 修改train函数

三、训练

四、测试

[转帖]cpupower

今天，昨天，近七天，近30天，近90天，js封装

【OpenVINO】學習筆記(03):英特爾® OpenVINO™工具套件初級課程-如何加速視頻處理進程？

【OpenVINO】學習筆記(05):英特爾® OpenVINO™工具套件初級課程-視頻分析處理的完整流程

【OpenVINO】學習筆記(04):英特爾® OpenVINO™工具套件初級課程-如何給視覺應用中的神經網絡加速？...

【OpenVINO】學習筆記(02):英特爾® OpenVINO™工具套件初級課程-什麼是視頻？什麼是計算機視覺？如何使用計算機來處理視頻?...

【OpenVINO】學習筆記(01):英特爾® OpenVINO™工具套件初級課程-爲什麼我們需要人工智能

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結