YOLOv3訓練自己的數據詳解

1. 配置Darknet

  • 下載darknet源碼:git clone https://github.com/pjreddie/darknet
  • 進入darknet目錄: cd darknet
  • 如果是cpu直接make,否則需要修改Makefile,設置cuda和cudnn路徑:
GPU=1
CUDNN=1
NVCC=/usr/local/cuda-8.0/bin/nvcc
  • 如果需要調用攝像頭,還要設置OPENCV=1,這裏注意一下,如果設置了OPENCV=1,進行測試的時候可能會有錯,這個我在github上看到是因爲opencv版本太高導致的,可以切換爲opencv2進行測試
  • 下載yolov3的模型文件wget https://pjreddie.com/media/files/yolov3.weights
  • 進行測試:./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
  • 2. 製作VOC數據集

    這裏介紹一下如何製作PASCAL VOC數據集,首先來看VOC數據集的結構:

    這裏寫圖片描述
    這裏寫圖片描述
    我們訓練自己的數據時只需要修改Annotations、ImageSets、JPEGImages 三個文件夾,請自動忽略voc_label。接下來就可以先搞定Annotations這個文件夾,這個文件夾下存儲的是每一張圖片對應的boundingbox座標,是這種格式:
    這裏寫圖片描述
    這裏寫圖片描述
    ,在製作這個文件夾下的xml之前請先把訓練數據集放到JPEGImages下。然後可以使用我下面的腳本生成Annotations的各個xml。

    #coding=utf-8
    import os, sys
    import glob #用來查找特定文件名的文件
    from PIL import Image
    
    #Safety Hat圖片位置
    src_img_dir = "/home/zxy/PycharmProjects/Acmtest/input/train"
    #Safety Hat圖片的groundtruth的文件位置
    src_txt_dir = "/home/zxy/PycharmProjects/Acmtest/gt/train_labels.txt"
    src_xml_dir = "/home/zxy/PycharmProjects/darknet/VOCdevkit/VOC2007/Annotations"
    
    img_Lists = glob.glob(src_img_dir + '/*.jpg')
    #圖片名
    img_basenames = []
    for item in img_Lists:
        img_basenames.append(os.path.basename(item))
    
    
    print(len(img_basenames))
    image_names = []
    for item in img_basenames:
        temp1, temp2 = os.path.splitext(item)
        image_names.append(temp1)
    
    # open the crospronding txt file
    now_gt = {}
    
    
    fopen = open(src_txt_dir, 'r')
    lines = fopen.readlines()
    i = 0
    for num, line in enumerate(lines):
        temp1, temp2 = line.split(',')
        if(len(temp2.replace('\n', '').strip())!=0):
            t1, t2, t3, t4 = temp2.replace('\n', '').strip().split(' ')
            # print("*%s %s %s %s*" % (t1, t2, t3, t4))
            # print(temp2.replace('\n', '').strip().split(' '))
            if temp1 not in now_gt.keys():
                now_gt[temp1] = [[t1, t2, t3, t4]]
            else:
                now_gt[temp1].append([t1, t2, t3, t4])
    
        else:
            now_gt[temp1] = []
        print(num , ' is processing ... ')
    print(len(now_gt.keys()))
    
    
    total = 0
    
    for img in image_names:
        total += 1
        im = Image.open((src_img_dir+'/'+img+'.jpg'))
        width, height = im.size
        xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
        xml_file.write('<annotation>\n')
        xml_file.write('    <folder>VOC2007</folder>\n')
        xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')
        xml_file.write('    <size>\n')
        xml_file.write('        <width>' + str(width) + '</width>\n')
        xml_file.write('        <height>' + str(height) + '</height>\n')
        xml_file.write('        <depth>3</depth>\n')
        xml_file.write('    </size>\n')
        for img_each_label in now_gt[img+".jpg"]:
            spt = img_each_label
            cnt = len(img_each_label) // 4
            for i in range(0, cnt):
                xml_file.write('    <object>\n')
                xml_file.write('        <name>' + str("safetyhat") + '</name>\n')
                xml_file.write('        <pose>Unspecified</pose>\n')
                xml_file.write('        <truncated>0</truncated>\n')
                xml_file.write('        <difficult>0</difficult>\n')
                xml_file.write('        <bndbox>\n')
                xml_file.write('            <xmin>' + str(spt[i*4+0]) + '</xmin>\n')
                xml_file.write('            <ymin>' + str(spt[i*4+1]) + '</ymin>\n')
                xml_file.write('            <xmax>' + str(spt[i*4+2]) + '</xmax>\n')
                xml_file.write('            <ymax>' + str(spt[i*4+3]) + '</ymax>\n')
                xml_file.write('        </bndbox>\n')
                xml_file.write('    </object>\n')
    
        xml_file.write('</annotation>')
    
    print(total)
    
    

    生成了Annotation文件夾下的xml之後,就可以生成Main下的4個txt文件,這四個文件夾中存儲的時上一步中xml文件的文件名。trainval和 test內容相加爲所有xml文件,train和val內容相加爲trainval。代碼如下:

    import os
    import random
    
    trainval_percent = 0.5
    train_percent = 0.5
    xmlfilepath = 'Annotations'
    txtsavepath = 'ImageSets/Main'
    total_xml = os.listdir(xmlfilepath)
    
    num=len(total_xml)
    list=range(num)
    tv=int(num*trainval_percent)
    tr=int(tv*train_percent)
    trainval= random.sample(list,tv)
    train=random.sample(trainval,tr)
    
    ftrainval = open(txtsavepath+'/trainval.txt', 'w')
    ftest = open(txtsavepath+'/test.txt', 'w')
    ftrain = open(txtsavepath+'/train.txt', 'w')
    fval = open(txtsavepath+'/val.txt', 'w')
    
    for i  in list:
        name=total_xml[i][:-4]+'\n'
        if i in trainval:
            ftrainval.write(name)
            if i in train:
                ftrain.write(name)
            else:
                fval.write(name)
        else:
            ftest.write(name)
    
    ftrainval.close()
    ftrain.close()
    fval.close()
    ftest .close()
    
    

    最後一步是生成YOLO要用的VOC標籤格式,首先下載格式轉化文件:wget https://pjreddie.com/media/files/voc_label.py,gedit打開voc_label.py,進行修改

    # 因爲沒有用到VOC2012的數據,要修改年份
    sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
    # 修改檢測的物體種
    classes = ["safetyhat"]
    
    

    運行voc_label.py,即可完成文件轉化。用train和val的數據一起用來訓練,所以需要合併文件:cat 2007_train.txt 2007_val.txt > train.txt,其中voc_label.py是在這個目錄下運行的:
    這裏寫圖片描述
    OK啦,VOC數據集就製作完了,可以進行yolov3訓練了。

    3. yolov3訓練數據

    修改pascal數據的cfg文件,打開cfg/voc.data文件,進行如下修改:

    classes= 1  # 自己數據集的類別數
    train  = /home/xxx/darknet/train.txt  # train文件的路徑
    valid  = /home/xxx/darknet/2007_test.txt   # test文件的路徑
    names = /home/xxx/darknet/data/voc.names #用絕對路徑
    backup = backup #模型保存的文件夾
    
    

    注意需要在darknet文件夾下,新建名爲backup的文件夾,否則訓練過程報錯:Couldn’t open file: backup/yolov3-voc.backup。最後,打開data/voc.names文件,對應自己的數據集修改類別。

    修改cfg/yolov3-voc.cfg,首先修改分類數爲自己的分類數,然後注意開頭部分訓練的batchsize和subdivisions被註釋了,如果需要自己訓練的話就需要去掉,測試的時候需要改回來,最後可以修改動量參數爲0.99和學習率改小,這樣可以避免訓練過程出現大量nan的情況,最後把每個[yolo]前的filters改成18這裏怎麼改具體可以看這個issule:https://github.com/pjreddie/darknet/issues/582, 改完之後就可以訓練我們的模型了./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

    4. 訓練過程參數的意義

    • Region xx: cfg文件中yolo-layer的索引;
    • Avg IOU:當前迭代中,預測的box與標註的box的平均交併比,越大越好,期望數值爲1;
    • Class: 標註物體的分類準確率,越大越好,期望數值爲1;
    • obj: 越大越好,期望數值爲1;
    • No obj: 越小越好;
    • .5R: 以IOU=0.5爲閾值時候的recall; recall = 檢出的正樣本/實際的正樣本
    • 0.75R: 以IOU=0.75爲閾值時候的recall;
    • count:正樣本數目。
    • 待補充ing
    Loaded: 0.000034 seconds
    Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000009, .5R: -nan, .75R: -nan,  count: 0
    Region 94 Avg IOU: 0.790078, Class: 0.996943, Obj: 0.777700, No Obj: 0.001513, .5R: 1.000000, .75R: 0.833333,  count: 6
    Region 106 Avg IOU: 0.701132, Class: 0.998590, Obj: 0.710799, No Obj: 0.000800, .5R: 0.857143, .75R: 0.571429,  count: 14
    Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000007, .5R: -nan, .75R: -nan,  count: 0
    Region 94 Avg IOU: 0.688576, Class: 0.998360, Obj: 0.855777, No Obj: 0.000512, .5R: 1.000000, .75R: 0.500000,  count: 2
    Region 106 Avg IOU: 0.680646, Class: 0.998413, Obj: 0.675553, No Obj: 0.000405, .5R: 0.857143, .75R: 0.428571,  count: 7
    Region 82 Avg IOU: 0.478347, Class: 0.999972, Obj: 0.999957, No Obj: 0.000578, .5R: 0.000000, .75R: 0.000000,  count: 1
    Region 94 Avg IOU: 0.901106, Class: 0.999994, Obj: 0.999893, No Obj: 0.000308, .5R: 1.000000, .75R: 1.000000,  count: 1
    Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000025, .5R: -nan, .75R: -nan,  count: 0
    Region 82 Avg IOU: 0.724108, Class: 0.988430, Obj: 0.765983, No Obj: 0.003308, .5R: 1.000000, .75R: 0.400000,  count: 5
    Region 94 Avg IOU: 0.752382, Class: 0.996165, Obj: 0.848303, No Obj: 0.002020, .5R: 1.000000, .75R: 0.500000,  count: 8
    Region 106 Avg IOU: 0.652267, Class: 0.998596, Obj: 0.646115, No Obj: 0.000728, .5R: 0.818182, .75R: 0.545455,  count: 11
    Region 82 Avg IOU: 0.755896, Class: 0.999879, Obj: 0.999514, No Obj: 0.001232, .5R: 1.000000, .75R: 1.000000,  count: 1
    Region 94 Avg IOU: 0.749224, Class: 0.999670, Obj: 0.988916, No Obj: 0.000441, .5R: 1.000000, .75R: 0.500000,  count: 2
    Region 106 Avg IOU: 0.601608, Class: 0.999661, Obj: 0.714591, No Obj: 0.000147, .5R: 0.750000, .75R: 0.250000,  count: 4
    Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000011, .5R: -nan, .75R: -nan,  count: 0
    Region 94 Avg IOU: 0.797704, Class: 0.997323, Obj: 0.910817, No Obj: 0.001006, .5R: 1.000000, .75R: 0.750000,  count: 4
    Region 106 Avg IOU: 0.727626, Class: 0.998225, Obj: 0.798596, No Obj: 0.000121, .5R: 1.000000, .75R: 0.500000,  count: 2
    Region 82 Avg IOU: 0.669070, Class: 0.998607, Obj: 0.958330, No Obj: 0.001297, .5R: 1.000000, .75R: 0.000000,  count: 2
    Region 94 Avg IOU: 0.832890, Class: 0.999755, Obj: 0.965164, No Obj: 0.000829, .5R: 1.000000, .75R: 1.000000,  count: 1
    Region 106 Avg IOU: 0.613751, Class: 0.999541, Obj: 0.791765, No Obj: 0.000554, .5R: 0.833333, .75R: 0.333333,  count: 12
    Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000007, .5R: -nan, .75R: -nan,  count: 0
    Region 94 Avg IOU: 0.816189, Class: 0.999966, Obj: 0.999738, No Obj: 0.000673, .5R: 1.000000, .75R: 1.000000,  count: 2
    Region 106 Avg IOU: 0.756419, Class: 0.999139, Obj: 0.891591, No Obj: 0.000712, .5R: 1.000000, .75R: 0.500000,  count: 12
    12010: 0.454202, 0.404766 avg, 0.000100 rate, 2.424004 seconds, 768640 images
    Loaded: 0.000034 seconds
    

    這斷代碼展示了一個批次(batch),批次大小的劃分根據yolov3-voc.cfg的subdivisions參數。在我使用的 .cfg 文件中 batch =256,subdivision = 8,所以在訓練輸出中,訓練迭代包含了32組,每組又包含了8張圖片,跟設定的batch和subdivision的值一致。

    • 批輸出 針對上面的bacth的最後一行輸出來說,12010代表當前訓練的迭代次數,0.454202代表總體的loss,0.404766 avg代表平均損失,這個值越低越好,一般來說一旦這個數值低於0.060730 avg就可以終止訓練了。0.0001代表當前的學習率,2.424004 seconds代表當前批次花費的總時間。768640代表3002*256代表當前訓練的圖片總數。

    5. yolov3模型的批量測試和位置輸出

    預測時的命令爲:./darknet detect cfg/yolov3-voc.cfg yolov3-voc_900.weights test3.jpg ,需要批量測試需要修改yolo.c文件後重新編譯,修改後的代碼爲:

    void validate_yolo(char *cfgfile, char *weightfile)
    {
        network net = parse_network_cfg(cfgfile);
        if(weightfile){
            load_weights(&net, weightfile);
        }
        set_batch_network(&net, 1);
        fprintf(stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net.learning_rate, net.momentum, net.decay);
        srand(time(0));
    
        char *base = "results/comp4_det_test_";
        //list *plist = get_paths("data/voc.2007.test"); # 生成過程見官網,表示需要test的文件的路徑
        list *plist = get_paths("/home/pjreddie/data/voc/2007_test.txt"); # .txt文件爲需要test的文件的絕對路徑,和train.txt是相同的形式
        //list *plist = get_paths("data/voc.2012.test");
        char **paths = (char **)list_to_array(plist);
    
        layer l = net.layers[net.n-1];
        int classes = l.classes;
        int square = l.sqrt;
        int side = l.side;
    
        int j;
        FILE **fps = calloc(classes, sizeof(FILE *));
        for(j = 0; j < classes; ++j){
            char buff[1024];
            snprintf(buff, 1024, "%s%s.txt", base, voc_names[j]);
            fps[j] = fopen(buff, "w");
        }
        box *boxes = calloc(side*side*l.n, sizeof(box));
        float **probs = calloc(side*side*l.n, sizeof(float *));
        for(j = 0; j < side*side*l.n; ++j) probs[j] = calloc(classes, sizeof(float *));
    
        int m = plist->size;
        int i=0;
        int t;
    
        float thresh = .001;
        int nms = 1;
        float iou_thresh = .5;
    
        int nthreads = 2;
        image *val = calloc(nthreads, sizeof(image));
        image *val_resized = calloc(nthreads, sizeof(image));
        image *buf = calloc(nthreads, sizeof(image));
        image *buf_resized = calloc(nthreads, sizeof(image));
        pthread_t *thr = calloc(nthreads, sizeof(pthread_t));
    
        load_args args = {0};
        args.w = net.w;
        args.h = net.h;
        args.type = IMAGE_DATA;
    
        for(t = 0; t < nthreads; ++t){
            args.path = paths[i+t];
            args.im = &buf[t];
            args.resized = &buf_resized[t];
            thr[t] = load_data_in_thread(args);
        }
        time_t start = time(0);
        for(i = nthreads; i < m+nthreads; i += nthreads){
            fprintf(stderr, "%d\n", i);
            for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
                pthread_join(thr[t], 0);
                val[t] = buf[t];
                val_resized[t] = buf_resized[t];
            }
            for(t = 0; t < nthreads && i+t < m; ++t){
                args.path = paths[i+t];
                args.im = &buf[t];
                args.resized = &buf_resized[t];
                thr[t] = load_data_in_thread(args);
            }
            for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
                char *path = paths[i+t-nthreads];
                char *id = basecfg(path);
                float *X = val_resized[t].data;
                float *predictions = network_predict(net, X);
                int w = val[t].w;
                int h = val[t].h;
                convert_yolo_detections(predictions, classes, l.n, square, side, w, h, thresh, probs, boxes, 0);
                if (nms) do_nms_sort(boxes, probs, side*side*l.n, classes, iou_thresh);
                print_yolo_detections(fps, id, boxes, probs, side*side*l.n, classes, w, h);
                free(id);
                free_image(val[t]);
                free_image(val_resized[t]);
            }
        }
        fprintf(stderr, "Total Detection Time: %f Seconds\n", (double)(time(0) - start));
    }
    
    void print_yolo_detections(FILE **fps, char *id, box *boxes, float **probs, int total, int classes, int w, int h)
    {
        int i, j;
        for(i = 0; i < total; ++i){
            float xmin = boxes[i].x - boxes[i].w/2.;
            float xmax = boxes[i].x + boxes[i].w/2.;
            float ymin = boxes[i].y - boxes[i].h/2.;
            float ymax = boxes[i].y + boxes[i].h/2.;
    
            if (xmin < 0) xmin = 0;
            if (ymin < 0) ymin = 0;
            if (xmax > w) xmax = w;
            if (ymax > h) ymax = h;
    
            for(j = 0; j < classes; ++j){
                if (probs[i][j]) fprintf(fps[j], "%s %f %f %f %f %f\n", id, probs[i][j], xmin, ymin, xmax, ymax);
            }
        }
    }
    

    然後執行:./darknet yolo valid cfg/yolov3-voc.cfg yolov3-voc_900.weights就可以在批量生成測試數據集的結果了。

    6. 調參遇到的trick

    • CUDA: out of memory 以及 resizing 問題?顯存不夠,調小batch,關閉多尺度訓練:random = 0。
    • YOLOV3訓練出現nan的問題?在顯存允許的情況下,可適當增加batch大小,可以一定程度上減少NAN的出現,動量參數可以調爲0.99
    • YOLOv3打印的參數都是什麼含義?詳見yolo_layer.c文件的forward_yolo_layer函數。
    printf("Region %d Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, .5R: %f, .75R: %f,  count: %d\n", net.index, avg_iou/count, avg_cat/class_count, avg_obj/count, avg_anyobj/(l.w*l.h*l.n*l.batch), recall/count, recall75/count, count);
    
    

    剛開始迭代,由於沒有預測出相應的目標,所以查全率較低【.5R,0.75R】,會出現大面積爲0的情況,這個是正常的。

    發表評論
    所有評論
    還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
    相關文章