Mask R-CNN开源项目的设计非常易于扩展,只需做简单的修改就可以训练自己的数据集。
一、标注数据
这里我只是简单从ImageNet2012数据集中选取了两类图像:猫和狗,每一类各五十幅图像,作为训练集。再各另取二十副图像作为验证集。再各另取十副图像作为测试集。
标注图像采用VGG Image Annotator (VIA)标注工具。
使用方法请参考:深度学习图像标注工具VGG Image Annotator (VIA)使用教程
二、修改源代码
Mask R-CNN的代码仓库中已经有多个例子可以参考,我这里在samples目录下新建了一个文件夹catvsdog,将samples/balloon/balloon.py复制到samples/catvsdog/下,重命名为catvsdog.py。
2.1 修改config
我这里本来是分成2类,但由于我的训练集中混入了非cat和dog的图像,所以在标注是我定义了一个not_defined类别,所以这里是1+3,注意1代表背景是一类。IMAGES_PER_GPU改为1,其他的参数暂时不修改。
class CatVSDogConfig(Config):
"""Configuration for training on the toy dataset.
Derives from the base Config class and overrides some values.
"""
# Give the configuration a recognizable name
NAME = "catvsdog"
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 1
# Number of classes (including background)
NUM_CLASSES = 1 + 3 # Background + cat + dog + not_defined
# Number of training steps per epoch
STEPS_PER_EPOCH = 100
# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.9
2.2 修改Dataset类
2.2.1 修改load_xxx函数
首先要添加类,然后是解析annotations信息。
def load_cat_dog(self, dataset_dir, subset):
"""Load a subset of the CatVSDog dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes. We have only one class to add.
self.add_class("catvsdog", 1, "cat")
self.add_class("catvsdog", 2, "dog")
self.add_class("catvsdog", 3, "not_defined")
# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
# Load annotations
# VGG Image Annotator saves each image in the form:
# { 'filename': '28503151_5b5b7ec140_b.jpg',
# 'regions': {
# '0': {
# 'region_attributes': {},
# 'shape_attributes': {
# 'all_points_x': [...],
# 'all_points_y': [...],
# 'name': 'polygon'}},
# ... more regions ...
# },
# 'size': 100202
# }
# We mostly care about the x and y coordinates of each region
annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
annotations = list(annotations.values()) # don't need the dict keys
# The VIA tool saves images in the JSON even if they don't have any
# annotations. Skip unannotated images.
annotations = [a for a in annotations if a['regions']]
# Add images
for a in annotations:
# Get the x, y coordinaets of points of the rects that make up
# the outline of each object instance. There are stores in the
# shape_attributes (see json format above)
rects = [r['shape_attributes'] for r in a['regions']]
name = [r['region_attributes']['name'] for r in a['regions']]
name_dict = {"cat":1, "dog":2, "not_defined":3}
name_id = [name_dict[a] for a in name]
# load_mask() needs the image size to convert rects to masks.
# Unfortunately, VIA doesn't include it in JSON, so we must read
# the image. This is only managable since the dataset is tiny.
image_path = os.path.join(dataset_dir, a['filename'])
image = skimage.io.imread(image_path)
height, width = image.shape[:2]
self.add_image(
"catvsdog",
image_id=a['filename'], # use file name as a unique image id
path=image_path,
class_id=name_id,
width=width, height=height,
polygons=rects)
2.2.2 修改load_mask函数
这里因为我在标注是为简单起见,只用了矩形标注框,所以这里使用的是skimage.draw.rectangle和balloon里使用的skimage.draw.polyon不同。
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a balloon dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "catvsdog":
return super(self.__class__, self).load_mask(image_id)
name_id = image_info["class_id"]
print(name_id)
# Convert polygons to a bitmap mask of shape
# [height, width, instance_count]
info = self.image_info[image_id]
mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
dtype=np.uint8)
class_ids = np.array(name_id, dtype=np.int32)
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
rr, cc = skimage.draw.rectangle((p['y'], p['x']), extent=(p['height'], p['width']))
mask[rr, cc, i] = 1
# Return mask, and array of class IDs of each instance. Since we have
# one class ID only, we return an array of 1s
return (mask.astype(np.bool), class_ids)
2.2.3 修改image_reference函数
def image_reference(self, image_id):
"""Return the path of the image."""
info = self.image_info[image_id]
if info["source"] == "catvsdog":
return info["path"]
else:
super(self.__class__, self).image_reference(image_id)
2.2.4 修改train函数
def train(model):
"""Train the model."""
# Training dataset.
dataset_train = CatVSDogDataset()
dataset_train.load_cat_dog(args.dataset, "train")
dataset_train.prepare()
# Validation dataset
dataset_val = CatVSDogDataset()
dataset_val.load_cat_dog(args.dataset, "val")
dataset_val.prepare()
# *** This training schedule is an example. Update to your needs ***
# Since we're using a very small dataset, and starting from
# COCO trained weights, we don't need to train too long. Also,
# no need to train all layers, just the heads should do it.
print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=30,
layers='heads')
三、训练
请提前下载好coco预训练数据mask_rcnn_coco.h5。
我在Mask R-CNN代码仓库根目录下执行:
python3 catvsdog.py train --dataset=/path/to/myCatVSDog --weights=coco
这里注意在哪个文件夹下执行命令修改相应的ROOT_DIR。
训练结束后生成了一些列模型数据。
四、测试
我不太习惯用.ipynb文件,所以把他转换成py文件。用jupyter notebook打开samples/demo.ipynb。
选择菜单File --> Download as --> Python(.py),保存成python文件即可。
修改代码:
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
# Root directory of the project
ROOT_DIR = os.path.abspath("../")
# Import Mask RCNN
sys.path.append(ROOT_DIR) # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import config
sys.path.append(os.path.join(ROOT_DIR, "samples/catvsdog/")) # To find local version
import catvsdog
#get_ipython().run_line_magic('matplotlib', 'inline')
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "catvsdog_logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_catvsdog_0029.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
class InferenceConfig(catvsdog.CatVSDogConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
class_names = ['BG', 'cat', 'dog', 'not_defined']
image = skimage.io.imread('ILSVRC2012_val_00037858.JPEG')
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
class_names, r['scores'])
执行:
python3 demo.py