Training-Time-Friendly Network for Real-Time Object Detection

論文地址：https://arxiv.org/abs/1909.00700
項目地址：https://github.com/ZJULearning/ttfnet

1.環境配置

參考https://github.com/ZJULearning/ttfnet/blob/master/INSTALL.md

安裝NCCL依賴，參考https://blog.csdn.net/lwplwf/article/details/82788818
創建一個新的conda環境，並激活

conda create -n ttfnet python=3.6 
conda activate ttfnet

Clone項目

git clone https://github.com/ZJULearning/ttfnet.git
cd ttfnet

安裝依賴Python庫

# 安裝Cython
conda install cython 
# 安裝其他依賴
pip install -v -e .

2.測試單張圖像

下載模型
在項目主頁上下載所需的模型，例如TTFNet-53 (2x)
測試腳本，在tool文件下新建一個腳本，寫入：

from mmdet.apis import init_detector, inference_detector, show_result
import mmcv

config_file = '../configs/ttfnet/ttfnet_d53_2x.py' # 網絡配置文件
checkpoint_file = '../checkpoints/ttfnet53_2x-b381dd.pth' # 剛剛下載的模型文件地址

# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = '../images/pose.jpg' # 測試圖像地址
# or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
show_result(img, result, model.CLASSES)

# test a list of images and write the results to image files
#imgs = ['../images/16004479832_a748d55f21_k.jpg', #'../images/17790319373_bd19b24cfc_k.jpg']
#for i, result in enumerate(inference_detector(model, imgs)):
#    show_result(imgs[i], result, model.CLASSES, out_file='result_{}.jpg'.format(i))

# test a video and show the results
# video = mmcv.VideoReader('video.mp4')
# for frame in video:
#     result = inference_detector(model, frame)
#     show_result(frame, result, model.CLASSES, wait_time=1)

測試結果

3.數據準備

支持COCO和作者自定義的格式，參考GETTING_STARTED.md.

對與COCO格式的，可以將Pascal VOC、自己的數據集轉化爲COCO格式
也可以不使用COCO格式的，使用作者自定義的格式，假設稱爲freestyle

對於freestyle格式，其annotations是一個list，list每個成員是一個dict，每個dict代表一個image。每個dict包括四個key：filename，width，height，anno。anno也是一個dict，存放了此圖像的bounding box信息，

[
    {
        'filename': 'a.jpg', # 相對地址
        'width': 1280, # 照片的寬
        'height': 720, # 照片的高
        'ann': {
            'bboxes': <np.ndarray, float32> (n, 4), # n是此照片中包括的boxes的個數
            'labels': <np.ndarray, int64> (n, ),
            'bboxes_ignore': <np.ndarray, float32> (k, 4), # 一般情況下，一個圖像中的boxes都不需要忽略。
            											  # 所以在保存時直接設置爲 np.zeros((0, 4))
            'labels_ignore': <np.ndarray, int64> (k, ) (optional field) # np.zeros((0, ))
        }
    },
    ...
]

將所有圖像的信息、annotation保存到list之後，可以使用多種文件格式來保存起來，支持的後綴包括'json', 'yaml', 'yml', 'pickle', 'pkl'，就可以直接使用了。
在mmdet/datasets文件夾內，存放了所有數據集文件，我們新建一個自己的數據集文件，比如mydataset.py,在此文件內，我們新建一個MyDataSet類，來繼承CustomDataSet類，直接使用此數據類型來讀取剛剛我們製作的文件，腳本如下：

from .custom import CustomDataset
from .registry import DATASETS
import pickle

'''
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
'''
@DATASETS.register_module
class TrafficSign(CustomDataset):
    CLASSES = ('car', 'person', 'dog') # 不包括背景類
    ```需要重寫以下這兩個函數，一個用於讀取整個數據集的信息，另一個負責輸出單張的圖像信息```
    def load_annotations(self, ann_file):
        info = pickle.load(open(ann_file, 'rb'))
        return info
    def get_ann_info(self, idx):
        anno = self.img_infos[idx]['ann']
        return anno

編寫完成後，需要修改一下注冊文件，datasets文件下有個__init__.py，將我們剛剛寫的數據集加入其中，在import部分添加from .mydataset import MyDataSet，在__all__部分中添加MyDataSet，這樣就可以啦。
5. 接着編輯config文件，複製configs/ttfnet/ttfnet_d53_2x.py 文件，例如configs/ttfnet/ttfnet_d53_2x_example.py

# model settings
NUM_CLASSES = 4 # 類別數+1
model = dict(
    type='TTFNet', # 網絡的類型，不需要改
    # pretrained='./pretrain/darknet53.pth', # 作者提供了訓練好的模型，不需要從backbone訓練，所以註釋掉
    pretrained=None,                         # 往下的`model`部分不需要改
    backbone=dict(
        type='DarknetV3',
        layers=[1, 2, 8, 8, 4],
        inplanes=[3, 32, 64, 128, 256, 512],
        planes=[32, 64, 128, 256, 512, 1024],
        norm_cfg=dict(type='BN'),
        out_indices=(1, 2, 3, 4),
        frozen_stages=1,
        norm_eval=False),
    neck=None,
    bbox_head=dict(
        type='TTFHead',
        inplanes=(128, 256, 512, 1024),
        head_conv=128,
        wh_conv=64,
        hm_head_conv_num=2,
        wh_head_conv_num=2,
        num_classes=NUM_CLASSES, # 
        wh_offset_base=16,
        wh_agnostic=True,
        wh_gaussian=True,
        shortcut_cfg=(1, 2, 3),
        norm_cfg=dict(type='BN'),
        alpha=0.54,
        hm_weight=1.,
        wh_weight=5.))
cudnn_benchmark = True
# training and testing settings
train_cfg = dict(
    vis_every_n_iters=100,
    debug=False)
test_cfg = dict(
    score_thr=0.01,
    max_per_img=100)
# dataset settings  往下的部分是重點，需要仔細改一下
dataset_type = 'MyDataSet' # 用於咱們使用”MyDataSet“類，所以改爲這個
data_root = '/path/to/images/' # 此路徑下存放了所有圖像，絕對地址 
				#源碼：img = mmcv.imread(osp.join(self.img_prefix, img_info['filename']))
pkl_path = '/path/to/data.pkl'# 這個寫的是剛剛生成的文件，絕對地址
# img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
img_norm_cfg = dict(mean=[0,0,0], std=[1,1,1], to_rgb=True) # 當前數據上的均值和方差，
												#如果沒有計算出來，則設置爲mean=[0,0,0], std=[1,1,1],
												#def imnormalize(img, mean, std, to_rgb=True):
												#	    img = img.astype(np.float32)
												#	    if to_rgb:
												#	        img = bgr2rgb(img)
												#	    return (img - mean) / std
data = dict(
    imgs_per_gpu=4,
    workers_per_gpu=2,
    train=dict(                           # 訓練的時候只用到`train`部分，所以下面的`val`和`test`與`train`相同即可
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512), # 訓練時resize訓練數據的大小，（w,h）
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False), # 如果設爲False，最好上面的img_scale設爲原圖的等比縮放，
        						#比如說原始是（w=100,h=200）,則img_scale=（50，100），保持這個w:h的比例
    val=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=True,
        resize_keep_ratio=False),
    test=dict(
        type=dataset_type,
        ann_file=pkl_path,
        img_prefix=data_root,
        img_scale=(512, 512),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=False,
        with_label=False,
        test_mode=True,
        resize_keep_ratio=False))
# optimizer
# optimizer = dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=0.0004,
optimizer = dict(type='SGD', lr=0.0002, momentum=0.9, weight_decay=0.0004,
                 paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 5,
    step=[18, 22])
checkpoint_config = dict(interval=4)
bbox_head_hist_config = dict(
    model_type=['ConvModule', 'DeformConvPack'],
    sub_modules=['bbox_head'],
    save_every_n_steps=500)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
    ])
# yapf:enable
# runtime settings
total_epochs = 24
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = 'work_dirs/ttfnet53_2x'
/*Difference between resume_from and load_from: resume_from loads both the model weights and optimizer status, 
and the epoch is also inherited from the specified checkpoint. 
It is usually used for resuming the training process that is interrupted accidentally. 
load_from only loads the model weights and the training epoch starts from 0. 
It is usually used for finetuning.`
*/
load_from = '/path/to/ttfnet-master/checkpoints/ttfnet53_2x-b381dd.pth' # 如果你finetuning,那就設置這個
resume_from = None # 如果你接續訓練，比如說從epoch=10開始，那麼你就設置這個
workflow = [('train', 1)] # 比如[('train', 2)，('val', 1)]就是說每訓練2個epoch，進行1次評測，
							# [('train', 1)]就是一直訓練，不評測

CustomDataset類

/*
Custom dataset for detection.
    Annotation format:
    [
        {
            'filename': 'a.jpg',
            'width': 1280,
            'height': 720,
            'ann': {
                'bboxes': <np.ndarray> (n, 4),
                'labels': <np.ndarray> (n, ),
                'bboxes_ignore': <np.ndarray> (k, 4),
                'labels_ignore': <np.ndarray> (k, 4) (optional field)
            }
        },
        ...
    ]
    The `ann` field is optional for testing.
*/

    CLASSES = None # 元組，例如 （‘car’， ‘person’）

    def __init__(self,
                 ann_file, # annotation文件地址
                 img_prefix, # annotation中存有圖像的xxx.jpg，需要一個dir地址
                 img_scale, # eg (512, 512),是網絡的輸入大小
                 img_norm_cfg, # 數據集基礎屬性，eg：dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
                 multiscale_mode='value', # 用於多尺度訓練，可選有['value', 'range']
                 size_divisor=None, #eg：32,網絡輸入是不是32的整數倍,如果不是,就padding
                 proposal_file=None,
                 num_max_proposals=1000,
                 flip_ratio=0,
                 with_mask=True,
                 with_crowd=True,
                 with_label=True,
                 with_semantic_seg=False,
                 seg_prefix=None,
                 seg_scale_factor=1,
                 extra_aug=None,
                 resize_keep_ratio=True,
                 corruption=None,
                 corruption_severity=1,
                 skip_img_without_anno=True,
                 test_mode=False)

先執行self.img_infos = self.load_annotations(ann_file)，將filename，w，h保存起來
訓練時調用__getitem__函數產生數據
__getitem__直接調用prepare_train_img(idx)函數對原始annotation做預處理
prepare_train_img中又調用get_ann_info（idx）來得到boxes，labels信息。

4.訓練

Train with a single GPU
python tools/train.py ${CONFIG_FILE}
Train with multiple GPUs
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

TTFNET實踐記錄

Training-Time-Friendly Network for Real-Time Object Detection

1.環境配置

參考https://github.com/ZJULearning/ttfnet/blob/master/INSTALL.md

2.測試單張圖像

3.數據準備

4.訓練

Keras--動態調整學習率

【C++學習】1.Kdevelop環境配置

TTFNET實踐記錄

Ubuntu opencv3.4.1 編譯之編譯錯誤: 'cuda_compile_generated_gpu_mat.cu.o'

【C++學習】2.CMakeLists

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結