Detectron2訓練自己的數據集(較詳細)

上篇文章講了如何在Centos7上配置Detectron2的環境查看,這裏講下如何訓練自己的數據集,主要是針對目標檢測。

在GETTING_STARTED.md官方文檔裏寫了,官方提供了一個教程去將如何訓練自己的數據集,但是網址進入,我這邊沒有訪問成功,所以只能自行百度了,好在有好心的博主。
在這裏插入圖片描述

如何訓練自己的數據集呢?

1 需要將自己的數據集轉爲COCO格式,具體的轉換代碼,可以參考這個github很詳細,足夠讓您的數據集進行轉換了。
2 將數據集註冊到Detectron2中,說起來很高大上,其實就是將自己的數據集通過一種途徑加載到項目中。
這裏需要關注幾個文件:在detectron2/tools/中,有README.md,打開該文件,則看到了該文件夾下幾個py文件的作用,所以可以詳細看下,我們訓練時主要是關注 train_net.py 和 plain_train_net.py 。其實只要關注 train_net.py 就行了,我們可以將這個 train_net.py 這個文件複製爲 train.py 【直接複製即可,先不要改動內容】,即以下的情況:
該文件裏面包含了Trainer這個類,繼承了DefaultTrainer,這裏重寫了build_evaluator 和 test_with_TTA這兩個方法,根據英文註釋,可以明顯看出這兩個方法,一個是創建對應數據集的評價器,一個是對應於RCNN系列的評價。這個目前只需要知道就行。以下直接複製後的 train.py

#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Detection Training Script.

This scripts reads a given config file and runs the training or evaluation.
It is an entry point that is made to train standard models in detectron2.

In order to let one script support training of many models,
this script contains logic that are specific to these built-in models and therefore
may not be suitable for your own project.
For example, your research project perhaps only needs a single "evaluator".

Therefore, we recommend you to use detectron2 as an library and take
this file as an example of how to use the library.
You may want to write your own script with your datasets and other customizations.
"""

import logging
import os
from collections import OrderedDict
import torch

import detectron2.utils.comm as comm
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog
from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch
from detectron2.evaluation import (
    CityscapesInstanceEvaluator,
    CityscapesSemSegEvaluator,
    COCOEvaluator,
    COCOPanopticEvaluator,
    DatasetEvaluators,
    LVISEvaluator,
    PascalVOCDetectionEvaluator,
    SemSegEvaluator,
    verify_results,
)
from detectron2.modeling import GeneralizedRCNNWithTTA


class Trainer(DefaultTrainer):
    """
    We use the "DefaultTrainer" which contains pre-defined default logic for
    standard training workflow. They may not work for you, especially if you
    are working on a new research project. In that case you can write your
    own training loop. You can use "tools/plain_train_net.py" as an example.
    """

    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
        """
        Create evaluator(s) for a given dataset.
        This uses the special metadata "evaluator_type" associated with each builtin dataset.
        For your own dataset, you can simply create an evaluator manually in your
        script and do not have to worry about the hacky if-else logic here.
        """
        if output_folder is None:
            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
        evaluator_list = []
        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
        if evaluator_type in ["sem_seg", "coco_panoptic_seg"]:
            evaluator_list.append(
                SemSegEvaluator(
                    dataset_name,
                    distributed=True,
                    num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES,
                    ignore_label=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
                    output_dir=output_folder,
                )
            )
        if evaluator_type in ["coco", "coco_panoptic_seg"]:
            evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder))
        if evaluator_type == "coco_panoptic_seg":
            evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))
        if evaluator_type == "cityscapes_instance":
            assert (
                torch.cuda.device_count() >= comm.get_rank()
            ), "CityscapesEvaluator currently do not work with multiple machines."
            return CityscapesInstanceEvaluator(dataset_name)
        if evaluator_type == "cityscapes_sem_seg":
            assert (
                torch.cuda.device_count() >= comm.get_rank()
            ), "CityscapesEvaluator currently do not work with multiple machines."
            return CityscapesSemSegEvaluator(dataset_name)
        elif evaluator_type == "pascal_voc":
            return PascalVOCDetectionEvaluator(dataset_name)
        elif evaluator_type == "lvis":
            return LVISEvaluator(dataset_name, cfg, True, output_folder)
        if len(evaluator_list) == 0:
            raise NotImplementedError(
                "no Evaluator for the dataset {} with the type {}".format(
                    dataset_name, evaluator_type
                )
            )
        elif len(evaluator_list) == 1:
            return evaluator_list[0]
        return DatasetEvaluators(evaluator_list)

    @classmethod
    def test_with_TTA(cls, cfg, model):
        logger = logging.getLogger("detectron2.trainer")
        # In the end of training, run an evaluation with TTA
        # Only support some R-CNN models.
        logger.info("Running inference with test-time augmentation ...")
        model = GeneralizedRCNNWithTTA(cfg, model)
        evaluators = [
            cls.build_evaluator(
                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
            )
            for name in cfg.DATASETS.TEST
        ]
        res = cls.test(cfg, model, evaluators)
        res = OrderedDict({k + "_TTA": v for k, v in res.items()})
        return res


def setup(args):
    """
    Create configs and perform basic setups.
    """
    cfg = get_cfg()
    cfg.merge_from_file(args.config_file)
    cfg.merge_from_list(args.opts)
    cfg.freeze()
    default_setup(cfg, args)
    return cfg


def main(args):
    cfg = setup(args)

    if args.eval_only:
        model = Trainer.build_model(cfg)
        DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
            cfg.MODEL.WEIGHTS, resume=args.resume
        )
        res = Trainer.test(cfg, model)
        if cfg.TEST.AUG.ENABLED:
            res.update(Trainer.test_with_TTA(cfg, model))
        if comm.is_main_process():
            verify_results(cfg, res)
        return res

    """
    If you'd like to do anything fancier than the standard training logic,
    consider writing your own training loop (see plain_train_net.py) or
    subclassing the trainer.
    """
    trainer = Trainer(cfg)
    trainer.resume_or_load(resume=args.resume)
    if cfg.TEST.AUG.ENABLED:
        trainer.register_hooks(
            [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]
        )
    return trainer.train()


if __name__ == "__main__":
    args = default_argument_parser().parse_args()
    print("Command Line Args:", args)
    launch(
        main,
        args.num_gpus,
        num_machines=args.num_machines,
        machine_rank=args.machine_rank,
        dist_url=args.dist_url,
        args=(args,),
    )
更改以上的 train.py

1 註冊自己的數據集

注意這個CLASS_NAMES列表,這裏面一定要與你的COCO格式的文件種類的ID順序保持一致 程序中會將CLASS_NAMES這個列表映射爲[0,len(CLASS_NAMES))的形式,如果您的COCO格式的數據集,category_id是從1開始的,最好在你的 json文件中加上 category_id:0 name:background,可以不含該背景類的標註 Annotations信息,比如背景所在的區域/寬/高等但是一定要有這個category_id:0的這個類,不然等到訓練後測試時,你就傻眼了!!!!!或者就是你的json文件中的 category_id直接就是從0開始,比如,category_id:0,name:name1,category_id:1,name:name2 …類名儘量用英文!!!!中文會亂碼

以下信息直接放在 複製後的 train.py 裏面即可
數據存放路徑爲以下格式:

----- yourdataDir
--------- JPEGImages
-------------***.jpg
-------------…jpg
--------- COCOformat
--------------train.json
--------------test.json
--------------val.json

#引入以下注釋
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.data.datasets.coco import load_coco_json
import pycocotools
#聲明類別,儘量保持
CLASS_NAMES =["__background__","name_1","name_2"...]
# 數據集路徑
DATASET_ROOT = '/home/Yourdatadir'
ANN_ROOT = os.path.join(DATASET_ROOT, 'COCOformat')

TRAIN_PATH = os.path.join(DATASET_ROOT, 'JPEGImages')
VAL_PATH = os.path.join(DATASET_ROOT, 'JPEGImages')

TRAIN_JSON = os.path.join(ANN_ROOT, 'train.json')
#VAL_JSON = os.path.join(ANN_ROOT, 'val.json')
VAL_JSON = os.path.join(ANN_ROOT, 'test.json')

# 聲明數據集的子集
PREDEFINED_SPLITS_DATASET = {
    "coco_my_train": (TRAIN_PATH, TRAIN_JSON),
    "coco_my_val": (VAL_PATH, VAL_JSON),
}

#註冊數據集(這一步就是將自定義數據集註冊進Detectron2)
def register_dataset():
    """
    purpose: register all splits of dataset with PREDEFINED_SPLITS_DATASET
    """
    for key, (image_root, json_file) in PREDEFINED_SPLITS_DATASET.items():
        register_dataset_instances(name=key,
                                   json_file=json_file,
                                   image_root=image_root)


#註冊數據集實例,加載數據集中的對象實例
def register_dataset_instances(name, json_file, image_root):
    """
    purpose: register dataset to DatasetCatalog,
             register metadata to MetadataCatalog and set attribute
    """
    DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
    MetadataCatalog.get(name).set(json_file=json_file,
                                  image_root=image_root,
                                  evaluator_type="coco")


# 註冊數據集和元數據
def plain_register_dataset():
    #訓練集
    DatasetCatalog.register("coco_my_train", lambda: load_coco_json(TRAIN_JSON, TRAIN_PATH))
    MetadataCatalog.get("coco_my_train").set(thing_classes=CLASS_NAMES,  # 可以選擇開啓,但是不能顯示中文,這裏需要注意,中文的話最好關閉
                                                    evaluator_type='coco', # 指定評估方式
                                                    json_file=TRAIN_JSON,
                                                    image_root=TRAIN_PATH)

    #DatasetCatalog.register("coco_my_val", lambda: load_coco_json(VAL_JSON, VAL_PATH, "coco_2017_val"))
    #驗證/測試集
    DatasetCatalog.register("coco_my_val", lambda: load_coco_json(VAL_JSON, VAL_PATH))
    MetadataCatalog.get("coco_my_val").set(thing_classes=CLASS_NAMES, # 可以選擇開啓,但是不能顯示中文,這裏需要注意,中文的話最好關閉
                                                evaluator_type='coco', # 指定評估方式
                                                json_file=VAL_JSON,
                                                image_root=VAL_PATH)
# 查看數據集標註,可視化檢查數據集標註是否正確,
#這個也可以自己寫腳本判斷,其實就是判斷標註框是否超越圖像邊界
#可選擇使用此方法
def checkout_dataset_annotation(name="coco_my_val"):
    #dataset_dicts = load_coco_json(TRAIN_JSON, TRAIN_PATH, name)
    dataset_dicts = load_coco_json(TRAIN_JSON, TRAIN_PATH)
    print(len(dataset_dicts))
    for i, d in enumerate(dataset_dicts,0):
        #print(d)
        img = cv2.imread(d["file_name"])
        visualizer = Visualizer(img[:, :, ::-1], metadata=MetadataCatalog.get(name), scale=1.5)
        vis = visualizer.draw_dataset_dict(d)
        #cv2.imshow('show', vis.get_image()[:, :, ::-1])
        cv2.imwrite('out/'+str(i) + '.jpg',vis.get_image()[:, :, ::-1])
        #cv2.waitKey(0)
        if i == 200:
            break

以上就是數據集的註冊了,下面我們針對具體的任務,可以設置一些超參數,主要是在setup方法裏設置:
以RetinaNet_R_50爲例:

def setup(args):
    """
    Create configs and perform basic setups.
    """
    cfg = get_cfg()
    args.config_file = "../configs/COCO-Detection/retinanet_R_50_FPN_3x.yaml"
    cfg.merge_from_file(args.config_file)   # 從config file 覆蓋配置
    cfg.merge_from_list(args.opts)          # 從CLI參數 覆蓋配置

    # 更改配置參數
    cfg.DATASETS.TRAIN = ("coco_my_train",) # 訓練數據集名稱
    cfg.DATASETS.TEST = ("coco_my_val",)
    cfg.DATALOADER.NUM_WORKERS = 4  # 單線程

    cfg.INPUT.CROP.ENABLED = True
    cfg.INPUT.MAX_SIZE_TRAIN = 640 # 訓練圖片輸入的最大尺寸
    cfg.INPUT.MAX_SIZE_TEST = 640 # 測試數據輸入的最大尺寸
    cfg.INPUT.MIN_SIZE_TRAIN = (512, 768) # 訓練圖片輸入的最小尺寸,可以設定爲多尺度訓練
    cfg.INPUT.MIN_SIZE_TEST = 640
    #cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING,其存在兩種配置,分別爲 choice 與 range :
    # range 讓圖像的短邊從 512-768隨機選擇
    #choice : 把輸入圖像轉化爲指定的,有限的幾種圖片大小進行訓練,即短邊只能爲 512或者768
    cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING = 'range'

    cfg.MODEL.RETINANET.NUM_CLASSES = 81  # 類別數+1(因爲有background)
    #cfg.MODEL.WEIGHTS="/home/yourstorePath/.pth"
    cfg.MODEL.WEIGHTS = "/home/yourstorePath/model_final_5bd44e.pkl"    # 預訓練模型權重
    cfg.SOLVER.IMS_PER_BATCH = 4  # batch_size=2; iters_in_one_epoch = dataset_imgs/batch_size

    # 根據訓練數據總數目以及batch_size,計算出每個epoch需要的迭代次數
    #9000爲你的訓練數據的總數目,可自定義
    ITERS_IN_ONE_EPOCH = int(9000 / cfg.SOLVER.IMS_PER_BATCH)

    # 指定最大迭代次數
    cfg.SOLVER.MAX_ITER = (ITERS_IN_ONE_EPOCH * 12) - 1 # 12 epochs,
    # 初始學習率
    cfg.SOLVER.BASE_LR = 0.002
    # 優化器動能
    cfg.SOLVER.MOMENTUM = 0.9
    #權重衰減
    cfg.SOLVER.WEIGHT_DECAY = 0.0001
    cfg.SOLVER.WEIGHT_DECAY_NORM = 0.0
    # 學習率衰減倍數
    cfg.SOLVER.GAMMA = 0.1
    # 迭代到指定次數,學習率進行衰減
    cfg.SOLVER.STEPS = (7000,)
    # 在訓練之前,會做一個熱身運動,學習率慢慢增加初始學習率
    cfg.SOLVER.WARMUP_FACTOR = 1.0 / 1000
    # 熱身迭代次數
    cfg.SOLVER.WARMUP_ITERS = 1000

    cfg.SOLVER.WARMUP_METHOD = "linear"
    # 保存模型文件的命名數據減1
    cfg.SOLVER.CHECKPOINT_PERIOD = ITERS_IN_ONE_EPOCH - 1

    # 迭代到指定次數,進行一次評估
    cfg.TEST.EVAL_PERIOD = ITERS_IN_ONE_EPOCH
    #cfg.TEST.EVAL_PERIOD = 100

    #cfg.merge_from_file(args.config_file)
    #cfg.merge_from_list(args.opts)
    cfg.freeze()
    default_setup(cfg, args)
    return cfg

以上的註冊數據集和配置文件設置好後,其實就可以開始訓練了!輸入命令
訓練過程中,log日誌,保存模型等均在在tools/output下,log會保存全部的訓練過程,其實很方便,在測試過程中,log日誌會自動追加

訓練:
python train.py --num-gpus 1 # 訓練命令,預訓練權重、學習率和batch_size已經放在了 setup裏面,所以不用手動加載
斷點續訓:
python train.py --resume # 可加 --num-gpus  1 也可不加,因爲這個命令默認是從last_checkpoint進行訓練,如果想從特定的pth訓練,把他當作預訓練模型即可,這個時候就需要加上--num-gpus  1。
測試:
將coco_my_val,指代的 val.json,變爲test.json,即可,然後運行
python train.py --eval-only

訓練過程:
在這裏插入圖片描述
驗證集:
在這裏插入圖片描述
測試集:
在這裏插入圖片描述

預祝大家測試成功!!!!

【參考:[很詳細的博文]進入

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章