- 環境配置
使用protobuf來配置模型和訓練參數,所以API正常使用必須先編譯protobuf庫,這裏可以下載直接編譯好的pb庫(https://github.com/google/protobuf/releases ),解壓壓縮包後,把protoc加入到環境變量中:
$ cd tensorflow/models
$ protoc object_detection/protos/*.proto --python_out=. #注意: *在這裏有時會報錯,找不到文件,可以手動添加文件命名
(我是把protoc加到環境變量中,遇到找不到*.proto文件的報錯,後來把protoc.exe放到models/object_detection目錄下,重新執行纔可以)
然後將models和slim(tf高級框架)加入python環境變量:
PYTHONPATH=$PYTHONPATH:/your/path/to/tensorflow/models:/your/path/to/tensorflow/models/slim
2.數據準備
自己製作了限速標誌的數據,總共250張,訓練200,測試50.有了數據以後我們需要給他們打標籤。我們需要手動在每一張圖中框出限速標誌的位置。一個比較好的打標工具是LabelImg,標籤:sign,生成VOC格式的數據。製作VOC格式數據文件夾形式:my_images
__VOCdevkit
__VOC2012
__Annotations(文件名:2007_0000)#xml格式的標籤
__JPEGImages(文件名:2007_0000)#jpg圖像
__ImageSets
__Main
__train
__val
import os import random pt="/tensorflow/model/research/object_detection/my_images/VOCdevkit/VOC2012/JPEGImages" image_name=os.listdir(pt) for temp in image_name: if temp.endswith(".jpg"): print (temp.replace('.jpg',''))
以上代碼可以生成train和val列表
將VOC數據轉化成tf.recoord數據 :參考 dataset_tools/create_pascal_tf_record.py 根據自己路徑文件格式做出適當修改,我的如下;
# Copyright 2017 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== r"""Convert raw PASCAL dataset to TFRecord for object_detection. Example usage: python object_detection/dataset_tools/create_pascal_tf_record.py \ --data_dir=/home/user/VOCdevkit \ --year=VOC2012 \ --output_path=/home/user/pascal.record """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import hashlib import io import logging import os from lxml import etree import PIL.Image import tensorflow as tf from object_detection.utils import dataset_util from object_detection.utils import label_map_util flags = tf.app.flags flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.') flags.DEFINE_string('set', 'train', 'Convert training set, validation set or ' 'merged set.') flags.DEFINE_string('annotations_dir', 'Annotations', '(Relative) path to annotations directory.') flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') flags.DEFINE_string('label_map_path', 'data/pascal_label_map.pbtxt', 'Path to label map proto') #此處修改自己標籤 flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore ' 'difficult instances') FLAGS = flags.FLAGS SETS = ['train', 'val', 'trainval', 'test'] YEARS = ['VOC2007', 'VOC2012', 'merged'] def dict_to_tf_example(data, dataset_directory, label_map_dict, ignore_difficult_instances=False, image_subdirectory='JPEGImages'): """Convert XML derived dict to tf.Example proto. Notice that this function normalizes the bounding box coordinates provided by the raw data. Args: data: dict holding PASCAL XML fields for a single image (obtained by running dataset_util.recursive_parse_xml_to_dict) dataset_directory: Path to root directory holding PASCAL dataset label_map_dict: A map from string label names to integers ids. ignore_difficult_instances: Whether to skip difficult instances in the dataset (default: False). image_subdirectory: String specifying subdirectory within the PASCAL dataset directory holding the actual image data. Returns: example: The converted tf.Example. Raises: ValueError: if the image pointed to by data['filename'] is not a valid JPEG """ img_path = os.path.join(image_subdirectory,data['filename'])#image_subdirectory=JPEGImages 此處修改了 year1='VOC2012'#此處修改 full_path = os.path.join(dataset_directory,year1, img_path)#此處修改 with tf.gfile.GFile(full_path, 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = PIL.Image.open(encoded_jpg_io) if image.format != 'JPEG': raise ValueError('Image format not JPEG') key = hashlib.sha256(encoded_jpg).hexdigest() width = int(data['size']['width']) height = int(data['size']['height']) xmin = [] ymin = [] xmax = [] ymax = [] classes = [] classes_text = [] truncated = [] poses = [] difficult_obj = [] if 'object' in data: for obj in data['object']: difficult = bool(int(obj['difficult'])) if ignore_difficult_instances and difficult: continue difficult_obj.append(int(difficult)) xmin.append(float(obj['bndbox']['xmin']) / width) ymin.append(float(obj['bndbox']['ymin']) / height) xmax.append(float(obj['bndbox']['xmax']) / width) ymax.append(float(obj['bndbox']['ymax']) / height) classes_text.append(obj['name'].encode('utf8')) classes.append(label_map_dict[obj['name']]) truncated.append(int(obj['truncated'])) poses.append(obj['pose'].encode('utf8')) example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature( data['filename'].encode('utf8')), 'image/source_id': dataset_util.bytes_feature( data['filename'].encode('utf8')), 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmin), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmax), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymin), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymax), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), 'image/object/difficult': dataset_util.int64_list_feature(difficult_obj), 'image/object/truncated': dataset_util.int64_list_feature(truncated), 'image/object/view': dataset_util.bytes_list_feature(poses), })) return example def main(_): if FLAGS.set not in SETS: raise ValueError('set must be in : {}'.format(SETS)) if FLAGS.year not in YEARS: raise ValueError('year must be in : {}'.format(YEARS)) data_dir = FLAGS.data_dir years = ['VOC2007', 'VOC2012'] if FLAGS.year != 'merged': years = [FLAGS.year] writer = tf.python_io.TFRecordWriter(FLAGS.output_path) label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path) for year in years: logging.info('Reading from PASCAL %s dataset.', year) examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main', FLAGS.set + '.txt') annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir) examples_list = dataset_util.read_examples_list(examples_path) for idx, example in enumerate(examples_list): if idx % 100 == 0: logging.info('On image %d of %d', idx, len(examples_list)) path = os.path.join(annotations_dir, example + '.xml')#此處修改 with tf.gfile.GFile(path, 'r') as fid: xml_str = fid.read() xml = etree.fromstring(xml_str) data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation'] tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict, FLAGS.ignore_difficult_instances) writer.write(tf_example.SerializeToString()) writer.close() if __name__ == '__main__': tf.app.run()
修改data/pascal_label_map.pbtxt 改成自己的標籤數據,我只有一類,從id:1開始:
item {
id: 1
name: 'sign'
}
生成訓練數據:pascal_train.record
python dataset_tools/create_pascal_tf_record.py --data_dir=my_images/VOCdevkit/ --year=VOC2012 --output_path=my_images/VOCdevkit/pascal_train.record --set=train
生成測試數據:pascal_val.record
python dataset_tools/create_pascal_tf_record.py --data_dir=my_images/VOCdevkit/ --year=VOC2012 --output_path=my_images/VOCdevkit/pascal_val.record --set=val
3,下載模型:ssd_mobilenet_v1_coco
下載ssd_mobilenet_v1_coco完後,將其解壓到my_images文件夾下,將model.ckpt 3個文件放在VOC2012下面
接下來呢,新建配置文件,samples/configs/文件夾下有一些示例文件,我們就模仿它們配置,參考faster_rcnn_inception_resnet_v2_atrous_coco.config文件,將其複製在VOC2012下面
文件名:ssd_mobilenet_v1.config
model { ssd { num_classes: 1 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } } feature_extractor { type: 'ssd_mobilenet_v1' min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid { } } localization_loss { weighted_smooth_l1 { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } } train_config: { batch_size: 24 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "my_images/VOCdevkit/VOC2012/model.ckpt" from_detection_checkpoint: true # Note: The below line limits the training process to 200K steps, which we # empirically found to be sufficient enough to train the pets dataset. This # effectively bypasses the learning rate schedule (the learning rate will # never decay). Remove the below line to train indefinitely. num_steps: 20000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } } train_input_reader: { tf_record_input_reader { input_path: "my_images/pascal_train.record" } label_map_path: "data/pascal_label_map.pbtxt" } eval_config: { num_examples: 50 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 10 } eval_input_reader: { tf_record_input_reader { input_path: "my_images/pascal_val.record" } label_map_path: "data/pascal_label_map.pbtxt" shuffle: false num_readers: 1 }
4,訓練模型
在object_detection下執行:創建檢查點文件在my_images/train
python legacy/train.py --train_dir=my_images/train/ --pipeline_config_path=my_images/VOCdevkit/VOC2012/ssd_mobilenet_v1.config
my_images目錄下新建一個eval目錄,用於保存eval的文件。另開終端,執行如下命令
/tensorflow/models/research/object_detection$ python legacy/eval.py
--logtostderr \
--pipeline_config_path=my_images/VOCdevkit/VOC2012/ssd_mobilenet_v1_raccoon.config \
--checkpoint_dir=my_images/train \
--eval_dir=my_images/eval
特別提醒:建議使用legacy/train.py 用object_detection/model_main.py本人沒成功過。可能還會遇到GPU內存方面的錯誤,建議指定GPU訓練。訓練了20k steps.效果測試精確度在94%,97%左右,數據太少每次測試有變化。但是效果還是不錯的,貼幾張效果圖如下:
下一篇:tensorflow 訓練完模型的導出和測試模型