深度學習的數據格式轉換(mobilenet+ssd,centernet)
初步生成VOC2012數據集
1、數據標註
1.1有目標的圖片標註
應用labelImg對圖片進行標註,下載鏈接:https://github.com/tzutalin/labelImg
標註時,需要注意的點,(參考自https://blog.csdn.net/chenmaolin88/article/details/79357502)
標註什麼 | 預定義的所有類別的所有對象實例(就是說,如果圖片裏面有3只浣熊,就要分別標註3只浣熊), 除非:你拿不準那玩意兒是不是。對象非常非常的小(尺度自己拿捏),只能看見對象的不到 10-20%的部分 , 因此你拿不準那個到底是哪一類的,比如你只能看見一個輪胎,你不確定是卡車還是小轎車,這種就可以不用標註.如果圖片中的對象肉眼都難以識別,就丟掉這張圖片 |
難以識別(difficult) | 若肉眼雖然可以大致識別,但確信度不是很高,則勾選difficult複選框,表示這個對象不是很好識別。 |
矩形框 | 用矩形框標註對象的可見區域, 不可見的區域不要標註. 非對象的區域不要標註,矩形框應該要且僅包括對象的所有可見的像素點, 除非爲了包括很小一部分的對象部件,需要擴大很大一個矩形框面積,比如,小轎車的天線可以不用框進來,因爲他太小了,且天線對於汽車來說無關緊要,並非主要特徵。 |
截斷Truncated | 如果對象超過 15-20% 的部分不在矩形框內,則將對象標記爲Truncated. 這個標記意味着矩形框內沒有包含完成的對象實例。這個屬性在LabelImg中無法直接勾選,需要手工編輯XML文件裏的對應標籤。 |
遮擋Occlusion | 如果矩形框內,對象有超過 5% 的部分被遮擋, 標記爲 Occluded. 這個標記指示矩形框內的圖像存在被遮擋的情況。這個屬性在LabelImg中無法直接勾選,需要手工編輯XML文件裏的對應標籤。 |
衣服、雪、泥etc | 如果遮擋物是跟對象強相關的,則不用標記爲遮擋,比如 人身上的衣服,應視爲人的一部分。 |
透明 | 透過玻璃看到的對象也應該被標記, 但是若玻璃是有點反光的,則玻璃上的映像,應被標記爲遮擋 occlusion |
鏡子 | 鏡子裏的對象也應該被標記。 |
透明 | 透過玻璃看到的對象也應該被標記, 但是若玻璃是有點反光的,則玻璃上的映像,應被標記爲遮擋 occlusion |
海報 | 圖片裏面的海報、雜誌等上面的對象也應該被標記,除非是一些很浮誇的卡通畫 |
1.2無目標圖片的標註
應用以下代碼,自動生成無標註目標的xml文件
參考鏈接:https://www.jianshu.com/p/5b2254fdf8f8
#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image
# VEDAI 圖像存儲位置
src_img_dir = "/home/lehui/Desktop/負樣本700"
# VEDAI 圖像生成的xml文件存放位置
src_xml_dir = "/home/lehui/Desktop/xml"
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = [] # e.g. 100.jpg
for item in img_Lists:
img_basenames.append(os.path.basename(item))
img_names = [] # e.g. 100
for item in img_basenames:
temp1, temp2 = os.path.splitext(item)
img_names.append(temp1)
for img in img_names:
im = Image.open((src_img_dir + '/' + img + '.jpg'))
width, height = im.size
# write in xml file
#os.mknod(src_xml_dir + '/' + img + '.xml')
xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
xml_file.write('<annotation>\n')
xml_file.write(' <folder>VOC2007</folder>\n')
xml_file.write(' <filename>' + str(img) + '.jpg' + '</filename>\n')
xml_file.write(' <path>'+ src_xml_dir + '/' + str(img) + '.jpg' + '</path>\n')
xml_file.write(' <source>\n')
xml_file.write(' <database>' + "Unknow" + '</database>\n')
xml_file.write(' </source>\n')
xml_file.write(' <size>\n')
xml_file.write(' <width>' + str(width) + '</width>\n')
xml_file.write(' <height>' + str(height) + '</height>\n')
xml_file.write(' <depth>3</depth>\n')
xml_file.write(' </size>\n')
xml_file.write(' <segmented>0</segmented>\n')
xml_file.write('</annotation>')
2、分配訓練集、驗證集、測試集
應用以下代碼,分別得到train、val、test對應的xml集合。
參考鏈接:https://www.cnblogs.com/gezhuangzhuang/p/10613468.html
import os
import random
import time
import shutil
#xmlfilepath——所有xml的路徑,saveBasePath——保存結果的路徑,下面建立三個文件夾:train、val、test
xmlfilepath=r'./Annotations'
saveBasePath=r"./Annotations"
trainval_percent=0.8
train_percent=0.8
total_xml = os.listdir(xmlfilepath)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
print("train and val size",tv)
print("train size",tr)
start = time.time()
test_num=0
val_num=0
train_num=0
for i in list:
name=total_xml[i]
if i in trainval: #train and val set
if i in train:
directory="train"
train_num += 1
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
else:
directory="validation"
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
val_num += 1
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
else:
directory="test"
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
test_num += 1
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
end = time.time()
seconds=end-start
print("train total : "+str(train_num))
print("validation total : "+str(val_num))
print("test total : "+str(test_num))
total_num=train_num+val_num+test_num
print("total number : "+str(total_num))
print( "Time taken : {0} seconds".format(seconds))
3、文件格式組成
:所有的xml文件
:所有的圖片
:2中生成的訓練集、測試集、驗證集對應的txt
標籤分類的配置文件(label_map.txt)
item {
id: 1 # id 從1開始編號
name: 'red pedestrian'
}
item {
id: 2
name: 'green pedestrian'
}
tfrecored數據的生成
依賴\models\research\object_detection\dataset_tools\create_pascal_tf_record.py代碼,
修改對應參數,生成tfrecored格式數據。
json數據的生成
根據xml文件生成對應的json
參考鏈接:https://blog.csdn.net/weixin_41765699/article/details/100124689
import xml.etree.ElementTree as ET
import os
import json
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
category_set = dict()
image_set = set()
category_item_id = -1
image_id = 20180000000
annotation_id = 0
def addCatItem(name):
global category_item_id
category_item = dict()
category_item['supercategory'] = 'none'
category_item_id += 1
category_item['id'] = category_item_id
category_item['name'] = name
coco['categories'].append(category_item)
category_set[name] = category_item_id
return category_item_id
def addImgItem(file_name, size):
global image_id
if file_name is None:
raise Exception('Could not find filename tag in xml file.')
if size['width'] is None:
raise Exception('Could not find width tag in xml file.')
if size['height'] is None:
raise Exception('Could not find height tag in xml file.')
image_id += 1
image_item = dict()
image_item['id'] = image_id
image_item['file_name'] = file_name
image_item['width'] = size['width']
image_item['height'] = size['height']
coco['images'].append(image_item)
image_set.add(file_name)
return image_id
def addAnnoItem(object_name, image_id, category_id, bbox):
global annotation_id
annotation_item = dict()
annotation_item['segmentation'] = []
seg = []
# bbox[] is x,y,w,h
# left_top
seg.append(bbox[0])
seg.append(bbox[1])
# left_bottom
seg.append(bbox[0])
seg.append(bbox[1] + bbox[3])
# right_bottom
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1] + bbox[3])
# right_top
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1])
annotation_item['segmentation'].append(seg)
annotation_item['area'] = bbox[2] * bbox[3]
annotation_item['iscrowd'] = 0
annotation_item['ignore'] = 0
annotation_item['image_id'] = image_id
annotation_item['bbox'] = bbox
annotation_item['category_id'] = category_id
annotation_id += 1
annotation_item['id'] = annotation_id
coco['annotations'].append(annotation_item)
def parseXmlFiles(xml_path):
for f in os.listdir(xml_path):
if not f.endswith('.xml'):
continue
bndbox = dict()
size = dict()
current_image_id = None
current_category_id = None
file_name = None
size['width'] = None
size['height'] = None
size['depth'] = None
xml_file = os.path.join(xml_path, f)
print(xml_file)
tree = ET.parse(xml_file)
root = tree.getroot()
if root.tag != 'annotation':
raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
# elem is <folder>, <filename>, <size>, <object>
for elem in root:
current_parent = elem.tag
current_sub = None
object_name = None
if elem.tag == 'folder':
continue
if elem.tag == 'filename':
file_name = elem.text
if file_name in category_set:
raise Exception('file_name duplicated')
# add img item only after parse <size> tag
elif current_image_id is None and file_name is not None and size['width'] is not None:
if file_name not in image_set:
current_image_id = addImgItem(file_name, size)
print('add image with {} and {}'.format(file_name, size))
else:
raise Exception('duplicated image: {}'.format(file_name))
# subelem is <width>, <height>, <depth>, <name>, <bndbox>
for subelem in elem:
bndbox['xmin'] = None
bndbox['xmax'] = None
bndbox['ymin'] = None
bndbox['ymax'] = None
current_sub = subelem.tag
if current_parent == 'object' and subelem.tag == 'name':
object_name = subelem.text
if object_name not in category_set:
current_category_id = addCatItem(object_name)
else:
current_category_id = category_set[object_name]
elif current_parent == 'size':
if size[subelem.tag] is not None:
raise Exception('xml structure broken at size tag.')
size[subelem.tag] = int(subelem.text)
# option is <xmin>, <ymin>, <xmax>, <ymax>, when subelem is <bndbox>
for option in subelem:
if current_sub == 'bndbox':
if bndbox[option.tag] is not None:
raise Exception('xml structure corrupted at bndbox tag.')
bndbox[option.tag] = int(option.text)
# only after parse the <object> tag
if bndbox['xmin'] is not None:
if object_name is None:
raise Exception('xml structure broken at bndbox tag')
if current_image_id is None:
raise Exception('xml structure broken at bndbox tag')
if current_category_id is None:
raise Exception('xml structure broken at bndbox tag')
bbox = []
# x
bbox.append(bndbox['xmin'])
# y
bbox.append(bndbox['ymin'])
# w
bbox.append(bndbox['xmax'] - bndbox['xmin'])
# h
bbox.append(bndbox['ymax'] - bndbox['ymin'])
print('add annotation with {},{},{},{}'.format(object_name, current_image_id, current_category_id,
bbox))
addAnnoItem(object_name, current_image_id, current_category_id, bbox)
if __name__ == '__main__':
xml_path = 'Z:\pycharm_projects\ssd\VOCtest60\Annotations' # 這是xml文件所在的地址
json_file = './test.json' # 這是你要生成的json文件
parseXmlFiles(xml_path) # 只需要改動這兩個參數就行了
json.dump(coco, open(json_file, 'w'))
3、模型轉換
3.1 pb to pbtxt
更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中對應的輸入參數,轉換得到.pbtxt文件。
3.2 pt to pth
3.2 pth to onnx
3.2 onnx to ncnn
4、利用opencvdnn庫調用mobilenet——ssd模型
複製描述模型的四個文件,後綴分別爲.data-00000-of-00001 .index .meta和checkpoint文件,應用以下命令
python object_detection/export_inference_graph.py --input_type=image_tensor
--pipeline_config_path=\models\ssd_mobilenet_v2_coco.config
--trained_checkpoint_prefix=\pedestrian_data\model\model.ckpt-1589
--output_directory\pedestrian_data\test
得到對應的pb模型。
ssd_mobilenet_v2 訓練得到的pb模型轉換爲pbtxt,更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中對應的輸入參數,轉換得到.pbtxt文件。運用opencv中的dnn庫讀取網絡,進行c++端的檢測,代碼如下:
參考鏈接:https://blog.csdn.net/atpalain_csdn/article/details/100098720
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>
#include <string>
#include <iostream>
#include <time.h>
using namespace std;
using namespace cv;
using namespace dnn;
float confThreshold, nmsThreshold;
std::vector<std::string> classes;
void postprocess(Mat& frame, const std::vector<Mat>& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
int main(int argc, char** argv)
{
// 根據選擇的檢測模型文件進行配置
confThreshold = 0.5;
nmsThreshold = 0.4;
float scale = 1.0;
Scalar mean = { 0, 0, 0 };
bool swapRB = true;
int inpWidth = 300;
int inpHeight = 300;
String modelPath = "frozen_inference_graph.pb";
String configPath = "frozen_inference_graph.pbtxt";
string image_file = "E:\\project\\data\\hands_data\\img1125\\";
String framework = "";
int backendId = cv::dnn::DNN_BACKEND_OPENCV;
int targetId = cv::dnn::DNN_TARGET_CPU;
//String classesFile = R"(object_detection_classes_coco.txt)";
// Open file with classes names.
//if (!classesFile.empty()) {
// const std::string& file = classesFile;
// std::ifstream ifs(file.c_str());
// if (!ifs.is_open())
// CV_Error(Error::StsError, "File " + file + " not found");
// std::string line;
// while (std::getline(ifs, line)) {
// classes.push_back(line);
// }
//}
classes.push_back("raiseHand");
// Load a model.
Net net = readNet(modelPath, configPath, framework);
net.setPreferableBackend(backendId);
net.setPreferableTarget(targetId);
std::vector<String> outNames = net.getUnconnectedOutLayersNames();
// Create a window
static const std::string kWinName = "Deep learning object detection in OpenCV";
// Process frames.
Mat frame, blob;
namedWindow(kWinName, 0);
vector< cv::String > files;
cv::glob(image_file, files);
for (int i = 0; i < files.size(); i++)
{
frame = cv::imread(files[i]);
//cv::Mat image(90, 120, CV_8UC3, cv::Scalar::all(0));
if (frame.empty())
{
return 0;
}
//frame = imread("E:\\project\\data\\hands_data\\train_ssd\\hands_data_pos+neg\\raisehandData\\test\\0010.jpg");
// Create a 4D blob from a frame.
Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
inpHeight > 0 ? inpHeight : frame.rows);
blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);
// Run a model.
net.setInput(blob);
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN
{
resize(frame, frame, inpSize);
Mat imInfo = (Mat_<float>(1, 3) << inpSize.height, inpSize.width, 1.6f);
net.setInput(imInfo, "im_info");
}
std::vector<Mat> outs;
net.forward(outs, outNames);
postprocess(frame, outs, net);
// Put efficiency information.
std::vector<double> layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
std::string label = format("Inference time: %.2f ms", t);
cout << label << endl;
putText(frame, label, Point(0, 15), FONT_HERSHEY_PLAIN, 0.5, Scalar(0, 255, 0));
imshow(kWinName, frame);
waitKey(1);
}
return 0;
}
void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net)
{
static std::vector<int> outLayers = net.getUnconnectedOutLayers();
static std::string outLayerType = net.getLayer(outLayers[0])->type;
std::vector<int> classIds;
std::vector<float> confidences;
std::vector<Rect> boxes;
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN
{
// Network produces output blob with a shape 1x1xNx7 where N is a number of
// detections and an every detection is a vector of values
// [batchId, classId, confidence, left, top, right, bottom]
CV_Assert(outs.size() == 1);
float* data = (float*)outs[0].data;
for (size_t i = 0; i < outs[0].total(); i += 7) {
float confidence = data[i + 2];
if (confidence > confThreshold) {
int left = (int)data[i + 3];
int top = (int)data[i + 4];
int right = (int)data[i + 5];
int bottom = (int)data[i + 6];
int width = right - left + 1;
int height = bottom - top + 1;
classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id.
boxes.push_back(Rect(left, top, width, height));
confidences.push_back(confidence);
}
}
}
else if (outLayerType == "DetectionOutput") {
// Network produces output blob with a shape 1x1xNx7 where N is a number of
// detections and an every detection is a vector of values
// [batchId, classId, confidence, left, top, right, bottom]
CV_Assert(outs.size() == 1);
float* data = (float*)outs[0].data;
for (size_t i = 0; i < outs[0].total(); i += 7) {
float confidence = data[i + 2];
if (confidence > confThreshold) {
int left = (int)(data[i + 3] * frame.cols);
int top = (int)(data[i + 4] * frame.rows);
int right = (int)(data[i + 5] * frame.cols);
int bottom = (int)(data[i + 6] * frame.rows);
int width = right - left + 1;
int height = bottom - top + 1;
classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id.
boxes.push_back(Rect(left, top, width, height));
confidences.push_back(confidence);
}
}
}
else if (outLayerType == "Region") {
for (size_t i = 0; i < outs.size(); ++i) {
// Network produces output blob with a shape NxC where N is a number of
// detected objects and C is a number of classes + 4 where the first 4
// numbers are [center_x, center_y, width, height]
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > confThreshold) {
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
}
else
CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);
std::vector<int> indices;
NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i) {
int idx = indices[i];
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
}
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));
std::string label = format("%.2f", conf);
if (!classes.empty()) {
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ": " + label;
}
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.2, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - labelSize.height),
Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.2, Scalar());
}
問題集錦
pb轉pbtxt時報錯:
_graph_ssd.py --input frozen_inference_graph.pb --config ssd_mobilenet_v2_coco.config --output graph.pbtxt
Scale: [0.200000-0.950000]
Aspect ratios: [1.0, 2.0, 0.5, 3.0, 0.3333]
Reduce boxes in the lowest layer: True
Number of classes: 1
Number of layers: 6
box predictor: convolutional
Input image size: 300x300
Traceback (most recent call last):
File "tf_text_graph_ssd.py", line 368, in <module>
createSSDGraph(args.input, args.config, args.output)
File "tf_text_graph_ssd.py", line 232, in createSSDGraph
assert(graph_def.node[0].op == 'Placeholder')
AssertionError
在轉換之前,先對pb文件進行如下操作:
import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph
with tf.gfile.FastGFile('ssdlite.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
graph_def = TransformGraph(graph_def, ['image_tensor'], ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'], ['sort_by_execution_order'])
with tf.gfile.FastGFile('ssdlite_new.pb', 'wb') as f:
f.write(graph_def.SerializeToString())#保存新的模型
再應用models裏的代碼進行轉換,得到pbtxt文件。