Training-Time-Friendly Network for Real-Time Object Detection

论文地址：https://arxiv.org/abs/1909.00700
项目地址：https://github.com/ZJULearning/ttfnet

1.环境配置

参考https://github.com/ZJULearning/ttfnet/blob/master/INSTALL.md

安装NCCL依赖，参考https://blog.csdn.net/lwplwf/article/details/82788818
创建一个新的conda环境，并激活

conda create -n ttfnet python=3.6 
conda activate ttfnet

Clone项目

git clone https://github.com/ZJULearning/ttfnet.git
cd ttfnet

安装依赖Python库

# 安装Cython
conda install cython 
# 安装其他依赖
pip install -v -e .

2.测试单张图像

下载模型
在项目主页上下载所需的模型，例如TTFNet-53 (2x)
测试脚本，在tool文件下新建一个脚本，写入：

from mmdet.apis import init_detector, inference_detector, show_result
import mmcv

config_file = '../configs/ttfnet/ttfnet_d53_2x.py' # 网络配置文件
checkpoint_file = '../checkpoints/ttfnet53_2x-b381dd.pth' # 刚刚下载的模型文件地址

# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = '../images/pose.jpg' # 测试图像地址
# or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
show_result(img, result, model.CLASSES)

# test a list of images and write the results to image files
#imgs = ['../images/16004479832_a748d55f21_k.jpg', #'../images/17790319373_bd19b24cfc_k.jpg']
#for i, result in enumerate(inference_detector(model, imgs)):
#    show_result(imgs[i], result, model.CLASSES, out_file='result_{}.jpg'.format(i))

# test a video and show the results
# video = mmcv.VideoReader('video.mp4')
# for frame in video:
#     result = inference_detector(model, frame)
#     show_result(frame, result, model.CLASSES, wait_time=1)

测试结果

3.数据准备

支持COCO和作者自定义的格式，参考GETTING_STARTED.md.

对与COCO格式的，可以将Pascal VOC、自己的数据集转化为COCO格式
也可以不使用COCO格式的，使用作者自定义的格式，假设称为freestyle

对于freestyle格式，其annotations是一个list，list每个成员是一个dict，每个dict代表一个image。每个dict包括四个key：filename，width，height，anno。anno也是一个dict，存放了此图像的bounding box信息，

[
    {
        'filename': 'a.jpg', # 相对地址
        'width': 1280, # 照片的宽
        'height': 720, # 照片的高
        'ann': {
            'bboxes': <np.ndarray, float32> (n, 4), # n是此照片中包括的boxes的个数
            'labels': <np.ndarray, int64> (n, ),
            'bboxes_ignore': <np.ndarray, float32> (k, 4), # 一般情况下，一个图像中的boxes都不需要忽略。
            											  # 所以在保存时直接设置为 np.zeros((0, 4))
            'labels_ignore': <np.ndarray, int64> (k, ) (optional field) # np.zeros((0, ))
        }
    },
    ...
]

将所有图像的信息、annotation保存到list之后，可以使用多种文件格式来保存起来，支持的后缀包括'json', 'yaml', 'yml', 'pickle', 'pkl'，就可以直接使用了。
在mmdet/datasets文件夹内，存放了所有数据集文件，我们新建一个自己的数据集文件，比如mydataset.py,在此文件内，我们新建一个MyDataSet类，来继承CustomDataSet类，直接使用此数据类型来读取刚刚我们制作的文件，脚本如下：

from .custom import CustomDataset
from .registry import DATASETS
import pickle

'''
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
'''
@DATASETS.register_module
class TrafficSign(CustomDataset):
    CLASSES = ('car', 'person', 'dog') # 不包括背景类
    ```需要重写以下这两个函数，一个用于读取整个数据集的信息，另一个负责输出单张的图像信息```
    def load_annotations(self, ann_file):
        info = pickle.load(open(ann_file, 'rb'))
        return info
    def get_ann_info(self, idx):
        anno = self.img_infos[idx]['ann']
        return anno

编写完成后，需要修改一下注册文件，datasets文件下有个__init__.py，将我们刚刚写的数据集加入其中，在import部分添加from .mydataset import MyDataSet，在__all__部分中添加MyDataSet，这样就可以啦。
5. 接着编辑config文件，复制configs/ttfnet/ttfnet_d53_2x.py 文件，例如configs/ttfnet/ttfnet_d53_2x_example.py

# model settings
NUM_CLASSES = 4 # 类别数+1
model = dict(
    type='TTFNet', # 网络的类型，不需要改
    # pretrained='./pretrain/darknet53.pth', # 作者提供了训练好的模型，不需要从backbone训练，所以注释掉
    pretrained=None,                         # 往下的`model`部分不需要改
    backbone=dict(
        type='DarknetV3',
        layers=[1, 2, 8, 8, 4],
        inplanes=[3, 32, 64, 128, 256, 512],
        planes=[32, 64, 128, 256, 512, 1024],
        norm_cfg=dict(type='BN'),
        out_indices=(1, 2, 3, 4),
        frozen_stages=1,
        norm_eval=False),
    neck=None,
    bbox_head=dict(
        type='TTFHead',
        inplanes=(128, 256, 512, 1024),
        head_conv=128,
        wh_conv=64,
        hm_head_conv_num=2,
        wh_head_conv_num=2,
        num_classes=NUM_CLASSES, # 
        wh_offset_base=16,
        wh_agnostic=True,
        wh_gaussian=True,
        shortcut_cfg=(1, 2, 3),
        norm_cfg=dict(type='BN'),
        alpha=0.54,
        hm_weight=1.,
        wh_weight=5.))
cudnn_benchmark = True
# training and testing settings
train_cfg = dict(
    vis_every_n_iters=100,
    debug=False)
test_cfg = dict(
    score_thr=0.01,
    max_per_img=100)
# dataset settings  往下的部分是重点，需要仔细改一下
dataset_type = 'MyDataSet' # 用于咱们使用”MyDataSet“类，所以改为这个
data_root = '/path/to/images/' # 此路径下存放了所有图像，绝对地址 
				#源码：img = mmcv.imread(osp.join(self.img_prefix, img_info['filename']))
pkl_path = '/path/to/data.pkl'# 这个写的是刚刚生成的文件，绝对地址
# img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
img_norm_cfg = dict(mean=[0,0,0], std=[1,1,1], to_rgb=True) # 当前数据上的均值和方差，
												#如果没有计算出来，则设置为mean=[0,0,0], std=[1,1,1],
												#def imnormalize(img, mean, std, to_rgb=True):
												#	    img = img.astype(np.float32)
												#	    if to_rgb:
												#	        img = bgr2rgb(img)
												#	    return (img - mean) / std
data = dict(
    imgs_per_gpu=4,
    workers_per_gpu=2,
    train=dict(                           # 训练的时候只用到`train`部分，所以下面的`val`和`test`与`train`相同即可
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512), # 训练时resize训练数据的大小，（w,h）
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False), # 如果设为False，最好上面的img_scale设为原图的等比缩放，
        						#比如说原始是（w=100,h=200）,则img_scale=（50，100），保持这个w:h的比例
    val=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False),
    test=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=False,
        test_mode=True,
        resize_keep_ratio=False))
# optimizer
# optimizer = dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=0.0004,
optimizer = dict(type='SGD', lr=0.0002, momentum=0.9, weight_decay=0.0004,
                 paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 5,
    step=[18, 22])
checkpoint_config = dict(interval=4)
bbox_head_hist_config = dict(
    model_type=['ConvModule', 'DeformConvPack'],
    sub_modules=['bbox_head'],
    save_every_n_steps=500)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
    ])
# yapf:enable
# runtime settings
total_epochs = 24
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = 'work_dirs/ttfnet53_2x'
/*Difference between resume_from and load_from: resume_from loads both the model weights and optimizer status, 
and the epoch is also inherited from the specified checkpoint. 
It is usually used for resuming the training process that is interrupted accidentally. 
load_from only loads the model weights and the training epoch starts from 0. 
It is usually used for finetuning.`
*/
load_from = '/path/to/ttfnet-master/checkpoints/ttfnet53_2x-b381dd.pth' # 如果你finetuning,那就设置这个
resume_from = None # 如果你接续训练，比如说从epoch=10开始，那么你就设置这个
workflow = [('train', 1)] # 比如[('train', 2)，('val', 1)]就是说每训练2个epoch，进行1次评测，
							# [('train', 1)]就是一直训练，不评测

CustomDataset类

/*
Custom dataset for detection.
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
    The `ann` field is optional for testing.
*/

    CLASSES = None # 元组，例如 （‘car’， ‘person’）

    def __init__(self,
                 ann_file, # annotation文件地址
                 img_prefix, # annotation中存有图像的xxx.jpg，需要一个dir地址
                 img_scale, # eg (512, 512),是网络的输入大小
                 img_norm_cfg, # 数据集基础属性，eg：dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
                 multiscale_mode='value', # 用于多尺度训练，可选有['value', 'range']
                 size_divisor=None, #eg：32,网络输入是不是32的整数倍,如果不是,就padding
                 proposal_file=None,
                 num_max_proposals=1000,
                 flip_ratio=0,
                 with_mask=True,
                 with_crowd=True,
                 with_label=True,
                 with_semantic_seg=False,
                 seg_prefix=None,
                 seg_scale_factor=1,
                 extra_aug=None,
                 resize_keep_ratio=True,
                 corruption=None,
                 corruption_severity=1,
                 skip_img_without_anno=True,
                 test_mode=False)

先执行self.img_infos = self.load_annotations(ann_file)，将filename，w，h保存起来
训练时调用__getitem__函数产生数据
__getitem__直接调用prepare_train_img(idx)函数对原始annotation做预处理
prepare_train_img中又调用get_ann_info（idx）来得到boxes，labels信息。

4.训练

Train with a single GPU
python tools/train.py ${CONFIG_FILE}
Train with multiple GPUs
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

TTFNET实践记录

Training-Time-Friendly Network for Real-Time Object Detection

1.环境配置

参考https://github.com/ZJULearning/ttfnet/blob/master/INSTALL.md

2.测试单张图像

3.数据准备

4.训练

linux安装cuda和cudnn

模拟手机设备：使用 Playwright 实现移动端自动化测试

Mellanox网卡开启SR-IOV

测试人员都是画画大神，让我看看谁还不会用代码图？

Object.values()对象遍历

我拍了拍Redis，被移出了群聊···

网络现代化通向云原生应用的高速公路

面试官：说说你对序列化的理解

我宣布，这是我找到的史上AI最全论文体系！

Keras--動態調整學習率

【C++學習】1.Kdevelop環境配置

TTFNET實踐記錄

Ubuntu opencv3.4.1 編譯之編譯錯誤: 'cuda_compile_generated_gpu_mat.cu.o'

【C++學習】2.CMakeLists

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結