深度學習的數據格式轉換(mobilenet+ssd,centernet)

深度學習的數據格式轉換(mobilenet+ssd,centernet)

初步生成VOC2012數據集

1、數據標註

1.1有目標的圖片標註

應用labelImg對圖片進行標註,下載鏈接:https://github.com/tzutalin/labelImg
標註時,需要注意的點,(參考自https://blog.csdn.net/chenmaolin88/article/details/79357502)

標註什麼 預定義的所有類別的所有對象實例(就是說,如果圖片裏面有3只浣熊,就要分別標註3只浣熊), 除非:你拿不準那玩意兒是不是。對象非常非常的小(尺度自己拿捏),只能看見對象的不到 10-20%的部分 , 因此你拿不準那個到底是哪一類的,比如你只能看見一個輪胎,你不確定是卡車還是小轎車,這種就可以不用標註.如果圖片中的對象肉眼都難以識別,就丟掉這張圖片
難以識別(difficult) 若肉眼雖然可以大致識別,但確信度不是很高,則勾選difficult複選框,表示這個對象不是很好識別。
矩形框 用矩形框標註對象的可見區域, 不可見的區域不要標註. 非對象的區域不要標註,矩形框應該要且僅包括對象的所有可見的像素點, 除非爲了包括很小一部分的對象部件,需要擴大很大一個矩形框面積,比如,小轎車的天線可以不用框進來,因爲他太小了,且天線對於汽車來說無關緊要,並非主要特徵。
截斷Truncated 如果對象超過 15-20% 的部分不在矩形框內,則將對象標記爲Truncated. 這個標記意味着矩形框內沒有包含完成的對象實例。這個屬性在LabelImg中無法直接勾選,需要手工編輯XML文件裏的對應標籤。
遮擋Occlusion 如果矩形框內,對象有超過 5% 的部分被遮擋, 標記爲 Occluded. 這個標記指示矩形框內的圖像存在被遮擋的情況。這個屬性在LabelImg中無法直接勾選,需要手工編輯XML文件裏的對應標籤。
衣服、雪、泥etc 如果遮擋物是跟對象強相關的,則不用標記爲遮擋,比如 人身上的衣服,應視爲人的一部分。
透明 透過玻璃看到的對象也應該被標記, 但是若玻璃是有點反光的,則玻璃上的映像,應被標記爲遮擋 occlusion
鏡子 鏡子裏的對象也應該被標記。
透明 透過玻璃看到的對象也應該被標記, 但是若玻璃是有點反光的,則玻璃上的映像,應被標記爲遮擋 occlusion
海報 圖片裏面的海報、雜誌等上面的對象也應該被標記,除非是一些很浮誇的卡通畫

1.2無目標圖片的標註

應用以下代碼,自動生成無標註目標的xml文件
參考鏈接:https://www.jianshu.com/p/5b2254fdf8f8

#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image
 
# VEDAI 圖像存儲位置
src_img_dir = "/home/lehui/Desktop/負樣本700"
# VEDAI 圖像生成的xml文件存放位置
src_xml_dir = "/home/lehui/Desktop/xml"
 
img_Lists = glob.glob(src_img_dir + '/*.jpg')
 
img_basenames = [] # e.g. 100.jpg
for item in img_Lists:
    img_basenames.append(os.path.basename(item))
 
img_names = [] # e.g. 100
for item in img_basenames:
    temp1, temp2 = os.path.splitext(item)
    img_names.append(temp1)
 
for img in img_names:
    im = Image.open((src_img_dir + '/' + img + '.jpg'))
    width, height = im.size
    # write in xml file
    #os.mknod(src_xml_dir + '/' + img + '.xml')
    xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
    xml_file.write('<annotation>\n')
    xml_file.write('    <folder>VOC2007</folder>\n')
    xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')
    xml_file.write('    <path>'+ src_xml_dir + '/' + str(img) + '.jpg' + '</path>\n')
    xml_file.write('    <source>\n')
    xml_file.write('        <database>' + "Unknow" + '</database>\n')
    xml_file.write('    </source>\n')
    xml_file.write('    <size>\n')
    xml_file.write('        <width>' + str(width) + '</width>\n')
    xml_file.write('        <height>' + str(height) + '</height>\n')
    xml_file.write('        <depth>3</depth>\n')
    xml_file.write('    </size>\n')
    xml_file.write('    <segmented>0</segmented>\n')
    xml_file.write('</annotation>')

2、分配訓練集、驗證集、測試集

應用以下代碼,分別得到train、val、test對應的xml集合。
參考鏈接:https://www.cnblogs.com/gezhuangzhuang/p/10613468.html

import os  
import random  
import time  
import shutil
#xmlfilepath——所有xml的路徑,saveBasePath——保存結果的路徑,下面建立三個文件夾:train、val、test
xmlfilepath=r'./Annotations'  
saveBasePath=r"./Annotations"

trainval_percent=0.8  
train_percent=0.8  
total_xml = os.listdir(xmlfilepath)  
num=len(total_xml)  
list=range(num)  
tv=int(num*trainval_percent)  
tr=int(tv*train_percent)  
trainval= random.sample(list,tv)  
train=random.sample(trainval,tr)  
print("train and val size",tv)  
print("train size",tr) 

start = time.time()

test_num=0  
val_num=0  
train_num=0  

for i in list:  
    name=total_xml[i]
    if i in trainval:  #train and val set 
        if i in train: 
            directory="train"  
            train_num += 1  
            xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
            if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
            filePath=os.path.join(xmlfilepath,name)  
            newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
            shutil.copyfile(filePath, newfile)
        else:
            directory="validation"  
            xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
            if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
            val_num += 1  
            filePath=os.path.join(xmlfilepath,name)   
            newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
            shutil.copyfile(filePath, newfile)

    else:
        directory="test"  
        xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
        if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
        test_num += 1  
        filePath=os.path.join(xmlfilepath,name)  
        newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
        shutil.copyfile(filePath, newfile)

end = time.time()  
seconds=end-start  
print("train total : "+str(train_num))  
print("validation total : "+str(val_num))  
print("test total : "+str(test_num))  
total_num=train_num+val_num+test_num  
print("total number : "+str(total_num))  
print( "Time taken : {0} seconds".format(seconds))

3、文件格式組成

:所有的xml文件
:所有的圖片
:2中生成的訓練集、測試集、驗證集對應的txt
標籤分類的配置文件(label_map.txt)

item {
  id: 1    # id 從1開始編號
  name: 'red pedestrian'
}

item {
  id: 2
  name: 'green pedestrian'
}

tfrecored數據的生成

依賴\models\research\object_detection\dataset_tools\create_pascal_tf_record.py代碼,
修改對應參數,生成tfrecored格式數據。

json數據的生成

根據xml文件生成對應的json
參考鏈接:https://blog.csdn.net/weixin_41765699/article/details/100124689

import xml.etree.ElementTree as ET
import os
import json
 
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
 
category_set = dict()
image_set = set()
 
category_item_id = -1
image_id = 20180000000
annotation_id = 0
 
 
def addCatItem(name):
    global category_item_id
    category_item = dict()
    category_item['supercategory'] = 'none'
    category_item_id += 1
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_set[name] = category_item_id
    return category_item_id
 
 
def addImgItem(file_name, size):
    global image_id
    if file_name is None:
        raise Exception('Could not find filename tag in xml file.')
    if size['width'] is None:
        raise Exception('Could not find width tag in xml file.')
    if size['height'] is None:
        raise Exception('Could not find height tag in xml file.')
    image_id += 1
    image_item = dict()
    image_item['id'] = image_id
    image_item['file_name'] = file_name
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    coco['images'].append(image_item)
    image_set.add(file_name)
    return image_id
 
 
def addAnnoItem(object_name, image_id, category_id, bbox):
    global annotation_id
    annotation_item = dict()
    annotation_item['segmentation'] = []
    seg = []
    # bbox[] is x,y,w,h
    # left_top
    seg.append(bbox[0])
    seg.append(bbox[1])
    # left_bottom
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    # right_bottom
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    # right_top
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])
 
    annotation_item['segmentation'].append(seg)
 
    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['ignore'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_id += 1
    annotation_item['id'] = annotation_id
    coco['annotations'].append(annotation_item)
 
 
def parseXmlFiles(xml_path):
    for f in os.listdir(xml_path):
        if not f.endswith('.xml'):
            continue
 
        bndbox = dict()
        size = dict()
        current_image_id = None
        current_category_id = None
        file_name = None
        size['width'] = None
        size['height'] = None
        size['depth'] = None
 
        xml_file = os.path.join(xml_path, f)
        print(xml_file)
 
        tree = ET.parse(xml_file)
        root = tree.getroot()
        if root.tag != 'annotation':
            raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
 
        # elem is <folder>, <filename>, <size>, <object>
        for elem in root:
            current_parent = elem.tag
            current_sub = None
            object_name = None
 
            if elem.tag == 'folder':
                continue
 
            if elem.tag == 'filename':
                file_name = elem.text
                if file_name in category_set:
                    raise Exception('file_name duplicated')
 
            # add img item only after parse <size> tag
            elif current_image_id is None and file_name is not None and size['width'] is not None:
                if file_name not in image_set:
                    current_image_id = addImgItem(file_name, size)
                    print('add image with {} and {}'.format(file_name, size))
                else:
                    raise Exception('duplicated image: {}'.format(file_name))
                    # subelem is <width>, <height>, <depth>, <name>, <bndbox>
            for subelem in elem:
                bndbox['xmin'] = None
                bndbox['xmax'] = None
                bndbox['ymin'] = None
                bndbox['ymax'] = None
 
                current_sub = subelem.tag
                if current_parent == 'object' and subelem.tag == 'name':
                    object_name = subelem.text
                    if object_name not in category_set:
                        current_category_id = addCatItem(object_name)
                    else:
                        current_category_id = category_set[object_name]
 
                elif current_parent == 'size':
                    if size[subelem.tag] is not None:
                        raise Exception('xml structure broken at size tag.')
                    size[subelem.tag] = int(subelem.text)
 
                # option is <xmin>, <ymin>, <xmax>, <ymax>, when subelem is <bndbox>
                for option in subelem:
                    if current_sub == 'bndbox':
                        if bndbox[option.tag] is not None:
                            raise Exception('xml structure corrupted at bndbox tag.')
                        bndbox[option.tag] = int(option.text)
 
                # only after parse the <object> tag
                if bndbox['xmin'] is not None:
                    if object_name is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_image_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_category_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    bbox = []
                    # x
                    bbox.append(bndbox['xmin'])
                    # y
                    bbox.append(bndbox['ymin'])
                    # w
                    bbox.append(bndbox['xmax'] - bndbox['xmin'])
                    # h
                    bbox.append(bndbox['ymax'] - bndbox['ymin'])
                    print('add annotation with {},{},{},{}'.format(object_name, current_image_id, current_category_id,
                                                                   bbox))
                    addAnnoItem(object_name, current_image_id, current_category_id, bbox)
 
 
if __name__ == '__main__':
    xml_path = 'Z:\pycharm_projects\ssd\VOCtest60\Annotations'    # 這是xml文件所在的地址
    json_file = './test.json'                                     # 這是你要生成的json文件                        
    parseXmlFiles(xml_path)                                       # 只需要改動這兩個參數就行了
    json.dump(coco, open(json_file, 'w'))

3、模型轉換

3.1 pb to pbtxt

更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中對應的輸入參數,轉換得到.pbtxt文件。

3.2 pt to pth

3.2 pth to onnx

3.2 onnx to ncnn

4、利用opencvdnn庫調用mobilenet——ssd模型

複製描述模型的四個文件,後綴分別爲.data-00000-of-00001 .index .meta和checkpoint文件,應用以下命令

python object_detection/export_inference_graph.py --input_type=image_tensor
 --pipeline_config_path=\models\ssd_mobilenet_v2_coco.config 
 --trained_checkpoint_prefix=\pedestrian_data\model\model.ckpt-1589 
--output_directory\pedestrian_data\test

得到對應的pb模型。
ssd_mobilenet_v2 訓練得到的pb模型轉換爲pbtxt,更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中對應的輸入參數,轉換得到.pbtxt文件。運用opencv中的dnn庫讀取網絡,進行c++端的檢測,代碼如下:
參考鏈接:https://blog.csdn.net/atpalain_csdn/article/details/100098720

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>

#include <string>
#include <iostream>
#include <time.h>

using namespace std;
using namespace cv;
using namespace dnn;

float confThreshold, nmsThreshold;
std::vector<std::string> classes;

void postprocess(Mat& frame, const std::vector<Mat>& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);

int main(int argc, char** argv)
{
	// 根據選擇的檢測模型文件進行配置
	confThreshold = 0.5;
	nmsThreshold = 0.4;

	float scale = 1.0;
	Scalar mean = { 0, 0, 0 };
	bool swapRB = true;
	int inpWidth = 300;
	int inpHeight = 300;

	String modelPath = "frozen_inference_graph.pb";
	String configPath = "frozen_inference_graph.pbtxt";
	string image_file = "E:\\project\\data\\hands_data\\img1125\\";
	String framework = "";

	int backendId = cv::dnn::DNN_BACKEND_OPENCV;
	int targetId = cv::dnn::DNN_TARGET_CPU;

	//String classesFile = R"(object_detection_classes_coco.txt)";

	// Open file with classes names.
	//if (!classesFile.empty()) {
	//	const std::string& file = classesFile;
	//	std::ifstream ifs(file.c_str());
	//	if (!ifs.is_open())
	//		CV_Error(Error::StsError, "File " + file + " not found");
	//	std::string line;
	//	while (std::getline(ifs, line)) {
	//		classes.push_back(line);
	//	}
	//}
	classes.push_back("raiseHand");
	

	// Load a model.
	Net net = readNet(modelPath, configPath, framework);
	net.setPreferableBackend(backendId);
	net.setPreferableTarget(targetId);

	std::vector<String> outNames = net.getUnconnectedOutLayersNames();

	// Create a window
	static const std::string kWinName = "Deep learning object detection in OpenCV";

	// Process frames.
	Mat frame, blob;
	namedWindow(kWinName, 0);
	vector< cv::String > files;
	cv::glob(image_file, files);
	for (int i = 0; i < files.size(); i++)
	{
		frame = cv::imread(files[i]);

		//cv::Mat image(90, 120, CV_8UC3, cv::Scalar::all(0));
		if (frame.empty())
		{
			return 0;
		}

		//frame = imread("E:\\project\\data\\hands_data\\train_ssd\\hands_data_pos+neg\\raisehandData\\test\\0010.jpg");

		// Create a 4D blob from a frame.
		Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
			inpHeight > 0 ? inpHeight : frame.rows);
		blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);

		// Run a model.
		net.setInput(blob);
		if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
		{
			resize(frame, frame, inpSize);
			Mat imInfo = (Mat_<float>(1, 3) << inpSize.height, inpSize.width, 1.6f);
			net.setInput(imInfo, "im_info");
		}

		std::vector<Mat> outs;
		net.forward(outs, outNames);

		postprocess(frame, outs, net);

		// Put efficiency information.
		std::vector<double> layersTimes;
		double freq = getTickFrequency() / 1000;
		double t = net.getPerfProfile(layersTimes) / freq;
		std::string label = format("Inference time: %.2f ms", t);
		cout << label << endl;
		putText(frame, label, Point(0, 15), FONT_HERSHEY_PLAIN, 0.5, Scalar(0, 255, 0));
		
		imshow(kWinName, frame);
		waitKey(1);
	}
	return 0;
}

void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net)
{
	static std::vector<int> outLayers = net.getUnconnectedOutLayers();
	static std::string outLayerType = net.getLayer(outLayers[0])->type;

	std::vector<int> classIds;
	std::vector<float> confidences;
	std::vector<Rect> boxes;
	if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
	{
		// Network produces output blob with a shape 1x1xNx7 where N is a number of
		// detections and an every detection is a vector of values
		// [batchId, classId, confidence, left, top, right, bottom]
		CV_Assert(outs.size() == 1);
		float* data = (float*)outs[0].data;
		for (size_t i = 0; i < outs[0].total(); i += 7) {
			float confidence = data[i + 2];
			if (confidence > confThreshold) {
				int left = (int)data[i + 3];
				int top = (int)data[i + 4];
				int right = (int)data[i + 5];
				int bottom = (int)data[i + 6];
				int width = right - left + 1;
				int height = bottom - top + 1;
				classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
				boxes.push_back(Rect(left, top, width, height));
				confidences.push_back(confidence);
			}
		}
	}
	else if (outLayerType == "DetectionOutput") {
		// Network produces output blob with a shape 1x1xNx7 where N is a number of
		// detections and an every detection is a vector of values
		// [batchId, classId, confidence, left, top, right, bottom]
		CV_Assert(outs.size() == 1);
		float* data = (float*)outs[0].data;
		for (size_t i = 0; i < outs[0].total(); i += 7) {
			float confidence = data[i + 2];
			if (confidence > confThreshold) {
				int left = (int)(data[i + 3] * frame.cols);
				int top = (int)(data[i + 4] * frame.rows);
				int right = (int)(data[i + 5] * frame.cols);
				int bottom = (int)(data[i + 6] * frame.rows);
				int width = right - left + 1;
				int height = bottom - top + 1;
				classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
				boxes.push_back(Rect(left, top, width, height));
				confidences.push_back(confidence);
			}
		}
	}
	else if (outLayerType == "Region") {
		for (size_t i = 0; i < outs.size(); ++i) {
			// Network produces output blob with a shape NxC where N is a number of
			// detected objects and C is a number of classes + 4 where the first 4
			// numbers are [center_x, center_y, width, height]
			float* data = (float*)outs[i].data;
			for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
				Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
				Point classIdPoint;
				double confidence;
				minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
				if (confidence > confThreshold) {
					int centerX = (int)(data[0] * frame.cols);
					int centerY = (int)(data[1] * frame.rows);
					int width = (int)(data[2] * frame.cols);
					int height = (int)(data[3] * frame.rows);
					int left = centerX - width / 2;
					int top = centerY - height / 2;

					classIds.push_back(classIdPoint.x);
					confidences.push_back((float)confidence);
					boxes.push_back(Rect(left, top, width, height));
				}
			}
		}
	}
	else
		CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);

	std::vector<int> indices;
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	for (size_t i = 0; i < indices.size(); ++i) {
		int idx = indices[i];
		Rect box = boxes[idx];
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
	}
}

void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
	rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));

	std::string label = format("%.2f", conf);
	if (!classes.empty()) {
		CV_Assert(classId < (int)classes.size());
		label = classes[classId] + ": " + label;
	}

	int baseLine;
	Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.2, 1, &baseLine);

	top = max(top, labelSize.height);
	rectangle(frame, Point(left, top - labelSize.height),
		Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
	putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.2, Scalar());
}

問題集錦

pb轉pbtxt時報錯:

_graph_ssd.py --input frozen_inference_graph.pb --config ssd_mobilenet_v2_coco.config --output graph.pbtxt
Scale: [0.200000-0.950000]
Aspect ratios: [1.0, 2.0, 0.5, 3.0, 0.3333]
Reduce boxes in the lowest layer: True
Number of classes: 1
Number of layers: 6
box predictor: convolutional
Input image size: 300x300
Traceback (most recent call last):
  File "tf_text_graph_ssd.py", line 368, in <module>
    createSSDGraph(args.input, args.config, args.output)
  File "tf_text_graph_ssd.py", line 232, in createSSDGraph
    assert(graph_def.node[0].op == 'Placeholder')
AssertionError

在轉換之前,先對pb文件進行如下操作:

import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph
 
with tf.gfile.FastGFile('ssdlite.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    graph_def = TransformGraph(graph_def, ['image_tensor'], ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'], ['sort_by_execution_order'])
    with tf.gfile.FastGFile('ssdlite_new.pb', 'wb') as f:
        f.write(graph_def.SerializeToString())#保存新的模型

再應用models裏的代碼進行轉換,得到pbtxt文件。

發佈了16 篇原創文章 · 獲贊 5 · 訪問量 6226
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章