TTFNET实践记录

Training-Time-Friendly Network for Real-Time Object Detection

论文地址:https://arxiv.org/abs/1909.00700
项目地址:https://github.com/ZJULearning/ttfnet

1.环境配置

参考https://github.com/ZJULearning/ttfnet/blob/master/INSTALL.md

conda create -n ttfnet python=3.6 
conda activate ttfnet
  • Clone项目
git clone https://github.com/ZJULearning/ttfnet.git
cd ttfnet
  • 安装依赖Python库
# 安装Cython
conda install cython 
# 安装其他依赖
pip install -v -e .

2.测试单张图像

  • 下载模型
    在项目主页上下载所需的模型,例如TTFNet-53 (2x)
  • 测试脚本,在tool文件下新建一个脚本,写入:
from mmdet.apis import init_detector, inference_detector, show_result
import mmcv

config_file = '../configs/ttfnet/ttfnet_d53_2x.py' # 网络配置文件
checkpoint_file = '../checkpoints/ttfnet53_2x-b381dd.pth' # 刚刚下载的模型文件地址

# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = '../images/pose.jpg' # 测试图像地址
# or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
show_result(img, result, model.CLASSES)

# test a list of images and write the results to image files
#imgs = ['../images/16004479832_a748d55f21_k.jpg', #'../images/17790319373_bd19b24cfc_k.jpg']
#for i, result in enumerate(inference_detector(model, imgs)):
#    show_result(imgs[i], result, model.CLASSES, out_file='result_{}.jpg'.format(i))

# test a video and show the results
# video = mmcv.VideoReader('video.mp4')
# for frame in video:
#     result = inference_detector(model, frame)
#     show_result(frame, result, model.CLASSES, wait_time=1)
  • 测试结果
    测试结果

3.数据准备

支持COCO和作者自定义的格式,参考GETTING_STARTED.md.

  • 对与COCO格式的,可以将Pascal VOC、自己的数据集转化为COCO格式
  • 也可以不使用COCO格式的,使用作者自定义的格式,假设称为freestyle
  1. 对于freestyle格式,其annotations是一个listlist每个成员是一个dict,每个dict代表一个image。每个dict包括四个keyfilename,width,height,annoanno也是一个dict,存放了此图像的bounding box信息,
[
    {
        'filename': 'a.jpg', # 相对地址
        'width': 1280, # 照片的宽
        'height': 720, # 照片的高
        'ann': {
            'bboxes': <np.ndarray, float32> (n, 4), # n是此照片中包括的boxes的个数
            'labels': <np.ndarray, int64> (n, ),
            'bboxes_ignore': <np.ndarray, float32> (k, 4), # 一般情况下,一个图像中的boxes都不需要忽略。
            											  # 所以在保存时直接设置为 np.zeros((0, 4))
            'labels_ignore': <np.ndarray, int64> (k, ) (optional field) # np.zeros((0, ))
        }
    },
    ...
]
  1. 将所有图像的信息、annotation保存到list之后,可以使用多种文件格式来保存起来,支持的后缀包括'json', 'yaml', 'yml', 'pickle', 'pkl',就可以直接使用了。
  2. mmdet/datasets文件夹内,存放了所有数据集文件,我们新建一个自己的数据集文件,比如mydataset.py,在此文件内,我们新建一个MyDataSet类,来继承CustomDataSet类,直接使用此数据类型来读取刚刚我们制作的文件,脚本如下:
from .custom import CustomDataset
from .registry import DATASETS
import pickle

'''
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
'''
@DATASETS.register_module
class TrafficSign(CustomDataset):
    CLASSES = ('car', 'person', 'dog') # 不包括背景类
    ```需要重写以下这两个函数,一个用于读取整个数据集的信息,另一个负责输出单张的图像信息```
    def load_annotations(self, ann_file):
        info = pickle.load(open(ann_file, 'rb'))
        return info
    def get_ann_info(self, idx):
        anno = self.img_infos[idx]['ann']
        return anno

编写完成后,需要修改一下注册文件,datasets文件下有个__init__.py,将我们刚刚写的数据集加入其中,在import部分添加from .mydataset import MyDataSet,在__all__部分中添加MyDataSet,这样就可以啦。
5. 接着编辑config文件,复制configs/ttfnet/ttfnet_d53_2x.py 文件,例如configs/ttfnet/ttfnet_d53_2x_example.py

# model settings
NUM_CLASSES = 4 # 类别数+1
model = dict(
    type='TTFNet', # 网络的类型,不需要改
    # pretrained='./pretrain/darknet53.pth', # 作者提供了训练好的模型,不需要从backbone训练,所以注释掉
    pretrained=None,                         # 往下的`model`部分不需要改
    backbone=dict(
        type='DarknetV3',
        layers=[1, 2, 8, 8, 4],
        inplanes=[3, 32, 64, 128, 256, 512],
        planes=[32, 64, 128, 256, 512, 1024],
        norm_cfg=dict(type='BN'),
        out_indices=(1, 2, 3, 4),
        frozen_stages=1,
        norm_eval=False),
    neck=None,
    bbox_head=dict(
        type='TTFHead',
        inplanes=(128, 256, 512, 1024),
        head_conv=128,
        wh_conv=64,
        hm_head_conv_num=2,
        wh_head_conv_num=2,
        num_classes=NUM_CLASSES, # 
        wh_offset_base=16,
        wh_agnostic=True,
        wh_gaussian=True,
        shortcut_cfg=(1, 2, 3),
        norm_cfg=dict(type='BN'),
        alpha=0.54,
        hm_weight=1.,
        wh_weight=5.))
cudnn_benchmark = True
# training and testing settings
train_cfg = dict(
    vis_every_n_iters=100,
    debug=False)
test_cfg = dict(
    score_thr=0.01,
    max_per_img=100)
# dataset settings  往下的部分是重点,需要仔细改一下
dataset_type = 'MyDataSet' # 用于咱们使用”MyDataSet“类,所以改为这个
data_root = '/path/to/images/' # 此路径下存放了所有图像,绝对地址 
				#源码:img = mmcv.imread(osp.join(self.img_prefix, img_info['filename']))
pkl_path = '/path/to/data.pkl'# 这个写的是刚刚生成的文件,绝对地址
# img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
img_norm_cfg = dict(mean=[0,0,0], std=[1,1,1], to_rgb=True) # 当前数据上的均值和方差,
												#如果没有计算出来,则设置为mean=[0,0,0], std=[1,1,1],
												#def imnormalize(img, mean, std, to_rgb=True):
												#	    img = img.astype(np.float32)
												#	    if to_rgb:
												#	        img = bgr2rgb(img)
												#	    return (img - mean) / std
data = dict(
    imgs_per_gpu=4,
    workers_per_gpu=2,
    train=dict(                           # 训练的时候只用到`train`部分,所以下面的`val`和`test`与`train`相同即可
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512), # 训练时resize训练数据的大小,(w,h)
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False), # 如果设为False,最好上面的img_scale设为原图的等比缩放,
        						#比如说原始是(w=100,h=200,则img_scale=50100),保持这个w:h的比例
    val=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False),
    test=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=False,
        test_mode=True,
        resize_keep_ratio=False))
# optimizer
# optimizer = dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=0.0004,
optimizer = dict(type='SGD', lr=0.0002, momentum=0.9, weight_decay=0.0004,
                 paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 5,
    step=[18, 22])
checkpoint_config = dict(interval=4)
bbox_head_hist_config = dict(
    model_type=['ConvModule', 'DeformConvPack'],
    sub_modules=['bbox_head'],
    save_every_n_steps=500)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
    ])
# yapf:enable
# runtime settings
total_epochs = 24
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = 'work_dirs/ttfnet53_2x'
/*Difference between resume_from and load_from: resume_from loads both the model weights and optimizer status, 
and the epoch is also inherited from the specified checkpoint. 
It is usually used for resuming the training process that is interrupted accidentally. 
load_from only loads the model weights and the training epoch starts from 0. 
It is usually used for finetuning.`
*/
load_from = '/path/to/ttfnet-master/checkpoints/ttfnet53_2x-b381dd.pth' # 如果你finetuning,那就设置这个
resume_from = None # 如果你接续训练,比如说从epoch=10开始,那么你就设置这个
workflow = [('train', 1)] # 比如[('train', 2)('val', 1)]就是说每训练2个epoch,进行1次评测,
							# [('train', 1)]就是一直训练,不评测

  1. CustomDataset
/*
Custom dataset for detection.
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
    The `ann` field is optional for testing.
*/

    CLASSES = None # 元组,例如 (‘car’, ‘person’)

    def __init__(self,
                 ann_file, # annotation文件地址
                 img_prefix, # annotation中存有图像的xxx.jpg,需要一个dir地址
                 img_scale, # eg (512, 512),是网络的输入大小
                 img_norm_cfg, # 数据集基础属性,eg:dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
                 multiscale_mode='value', # 用于多尺度训练,可选有['value', 'range']
                 size_divisor=None, #eg:32,网络输入是不是32的整数倍,如果不是,就padding
                 proposal_file=None,
                 num_max_proposals=1000,
                 flip_ratio=0,
                 with_mask=True,
                 with_crowd=True,
                 with_label=True,
                 with_semantic_seg=False,
                 seg_prefix=None,
                 seg_scale_factor=1,
                 extra_aug=None,
                 resize_keep_ratio=True,
                 corruption=None,
                 corruption_severity=1,
                 skip_img_without_anno=True,
                 test_mode=False)
  • 先执行self.img_infos = self.load_annotations(ann_file),将filename,w,h保存起来
  • 训练时调用__getitem__函数产生数据
  • __getitem__直接调用prepare_train_img(idx)函数对原始annotation做预处理
  • prepare_train_img中又调用get_ann_info(idx)来得到boxes,labels信息。

4.训练

  • Train with a single GPU
    python tools/train.py ${CONFIG_FILE}
  • Train with multiple GPUs
    ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章