爲了有效監測駕駛員是否疲勞駕駛、避免交通事故的發生,提出了一種利用人臉特徵點進行實時疲勞駕駛檢測的新方法。對駕駛員駕駛時的面部圖像進行實時監控,首先檢測人臉,並利用ERT算法定位人臉特徵點;然後根據人臉眼睛區域的特徵點座標信息計算眼睛縱橫比EAR來描述眼睛張開程度,根據合適的EAR閾值可判斷睜眼或閉眼狀態;最後基於EAR實測值和EAR閾值對監控視頻計算閉眼時間比例(PERCLOS)值度量駕駛員主觀疲勞程度,將其與設定的疲勞度閾值進行比較即可判定是否疲勞駕駛。
一、 人臉特徵點檢測
人臉特徵點檢測基於該類庫實現(https://github.com/610265158/Peppa_Pig_Face_Engine), 我嘗試過很多開源框架,包括 dlib, openface,pfld,clnf 等,在閉眼檢測方面,表現都不是十分理想,後來發現這個類庫,哇,眼前一亮,檢測的很牛逼,而且很穩定。在 i7 八代的cpu上,識別一幀大概平均在40ms左右( 因爲項目主機上沒有gpu ,所以沒有測試過gpu的檢測速度 ,但應該很快), 詳細參考該作者文章:https://blog.csdn.net/qq_35606924/article/details/99711208, 寫的真的很牛逼,不牛逼你找我。
在我的數據集上檢測閉眼的效果圖:
二、訓練自己的數據集( 基於 tf1 )
因爲開源數據集中包含閉眼的數據太少了,所以需要我們自己手動增加 ,這裏我使用的是 dlib 的標註工具。
1. 標註數據集 , 因爲主要檢測眼睛和頭部姿態,所以我標註了37個點。(大概標註了5000多張)
** 標註工具: (由於標註過程中經常出錯,所以增加了撤銷等功能)
2. 標註完成後,會生成一個xml文件,裏面包含所有的標註信息 (最好檢測下,不要標記少點或者多點情況),然後做下面操作
1. 打亂順序
imglab --shuffle dataset_0402.xml
2. 分隔數據集 ( 訓練集 和 測試集 )
imglab --split-train-test 0.95 dataset_0402.xml
3. 還可以 翻轉數據集 、去除相似樣本等操作
imglab --rmdupes xml/mydataset.xml ## 去除相似樣本
imglab --flip xml/mydataset.xml ## 翻轉圖片
4. 更多詳細操作參考: https://blog.csdn.net/u010168781/article/details/91048497
3. 將訓練集和測試集轉換成作者提供的格式
import json
from xml.dom.minidom import parse
from tqdm import tqdm
def json_to_txt(json_file, txt_file):
txt_file = open(txt_file, mode='w')
with open(json_file, 'r') as f:
data = json.load(f)
tmp_str = ""
for sub_data in data:
file_name = sub_data['image_path']
tmp_str += file_name + '|'
key_points = sub_data['keypoints']
for points in key_points:
tmp_str = tmp_str + str(points[0]) + ' ' + str(points[1]) + ' '
tmp_str = tmp_str + '\n'
txt_file.write(tmp_str)
def read_xml_to_json(path, out_file_path):
domTree = parse(path)
# 文檔根元素
rootNode = domTree.documentElement
images = rootNode.getElementsByTagName("image")
with open(out_file_path, 'w') as f:
train_json_list = []
for image in tqdm(images):
one_image_ann = {}
if image.hasAttribute("file"):
info = ""
# 文件路徑
file_path = image.getAttribute("file")
print("path:" + file_path)
one_image_ann['image_path'] = file_path
box = image.getElementsByTagName("box")
top = box[0].getAttribute("top")
left = box[0].getAttribute("left")
width = box[0].getAttribute("width")
height = box[0].getAttribute("height")
print("top:" + top + " left:" + left + " width:" + width + " height:" + height)
bbox = [float(top), float(left), float(width), float(height)]
parts = box[0].getElementsByTagName("part")
if len(parts) == 0:
continue
key = []
for part in parts:
key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))
one_image_ann['keypoints'] = key
one_image_ann['bbox'] = bbox
one_image_ann['attr'] = None
train_json_list.append(one_image_ann)
json.dump(train_json_list, f, indent=2)
def read_xml_to_txt(path, out_txt_file_path):
domTree = parse(path)
# 文檔根元素
rootNode = domTree.documentElement
images = rootNode.getElementsByTagName("image")
with open(out_txt_file_path, 'w') as f:
txt_str = ""
for image in tqdm(images):
if image.hasAttribute("file"):
# 文件路徑
file_path = image.getAttribute("file")
txt_str += file_path + '|'
# print("path:" + file_path)
box = image.getElementsByTagName("box")
parts = box[0].getElementsByTagName("part")
if len(parts) == 0:
continue
key = []
for part in parts:
key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
txt_str = txt_str + str(float(part.getAttribute("x"))) + ' ' + str(
float(part.getAttribute("y"))) + ' '
# print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))
txt_str = txt_str + '\n'
f.write(txt_str)
if __name__ == '__main__':
data_path = ["data/test.xml", "data/train.xml"]
out_path = ["data/test.txt", "data/train.txt"]
for path, out in zip(data_path, out_path):
read_xml_to_txt(path, out)
4. 配置訓練參數
1. 修改特徵點下標,因爲作者使用的數據集是基於68點的,所以說下標肯定是不同的,如果你標註點的順序和作者使用的數據 集一樣,則不需要更改。
2. 配置數據集訓練和測試路徑以及其他的一些參數,參考 train_config.py。
5. 修改數據讀取方式
1. 如果直接開始訓練,會報這個錯: Can't pickle local object 'DataFromGenerator.init..' , 因爲作者是在linux下訓練的,而我是在windows下訓練,可能window不支持這個多線程預加載。如果我們將這個 ds = MultiProcessPrefetchData(ds, self.prefetch_size, self.process_num) 註釋掉,重新訓練,會發現訓練速度很慢,迭代10次,大概需要耗時 1min , gpu 利用率大概在 2% 左右,我猜可能是因爲我們沒有做預加載處理,導致大部分時間都耗時在讀取數據上。
2. 通過 tfrecord 方式來讀取數據,這樣可以大大加快我們的訓練速度,在 mx150的gpu上測試,10迭代耗時大概在5s左右,提高了至少10倍的速度,但唯一難受的是,生成的 record 文件太大了(解決文件太大,看後面)。 下面是實現過程:
1. 生成 tfrecord 文件
# 寫這段代碼的時候,只有上帝和我知道它是幹嘛的
# 現在,只有上帝知道
# @File : generate_tfrecord.py
# @Time : 2020/4/24 14:10
# @Author : J.
# @desc : 生成 tfrecord 文件
import tensorflow as tf
from lib.dataset.dataietr import FaceKeypointDataIter
from train_config import config as cfg
from tqdm import tqdm
import argparse
import sys
def int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def create_tf_example(image_file, is_train):
crop_image, label = _train_data._map_func(image_file, is_train)
tf_example = tf.train.Example(
features=tf.train.Features(
feature={
'image': bytes_feature(crop_image.tobytes()),
'label': bytes_feature(label.tobytes())
}
))
return tf_example
def generate_tfrecord(images_files, record_path, is_train=True):
num_tf_example = 0
writer = tf.python_io.TFRecordWriter(record_path)
with tqdm(images_files, ncols=100) as files:
for image in files:
tf_example = create_tf_example(image, is_train)
writer.write(tf_example.SerializeToString())
num_tf_example += 1
# if num_tf_example % 100 == 0:
# print("Create %d TF_Example" % num_tf_example)
writer.close()
print("{} tf_examples has been created successfully, which are saved in {}".format(num_tf_example, record_path))
def main(_):
global _train_data
global _val_data
_train_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
FLAGS.train_data,
True)
_val_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
FLAGS.val_data,
False)
print("================== generate train tf_record start ===================")
train_images_files = _train_data.get_parse_file()
generate_tfrecord(train_images_files, FLAGS.train_save_path, True)
print("================== generate train tf_record end ===================")
print("================== generate val tf_record start ===================")
val_images_files = _val_data.get_parse_file()
generate_tfrecord(val_images_files, FLAGS.val_save_path, False)
print("================== generate val tf_record end ===================")
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--train_data',
type=str,
default='data/train.txt',
help='訓練數據.')
parser.add_argument(
'--val_data',
type=str,
default='data/test.txt',
help='驗證數據.')
parser.add_argument(
'--train_save_path',
type=str,
default='record/train.record',
help='生成訓練數據路徑.')
parser.add_argument(
'--val_save_path',
type=str,
default='record/val.record',
help='生成驗證數據路徑.')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
2. 讀取 record 數據
class read_face_data():
def __init__(self, tfrecord_path, batch_size, out_channel, win, hin, num_threads):
self.tfrecord_path = tfrecord_path
self.batch_size = batch_size
self.win = win
self.hin = hin
self.num_threads = num_threads
self.out_channel = out_channel
def read_and_decode(self):
filename_queue = tf.train.string_input_producer([self.tfrecord_path], shuffle=False)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,
features={
'image': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.string)
})
images = tf.decode_raw(features['image'], tf.float32)
images = tf.reshape(images, [self.win, self.hin, 3])
labels = tf.decode_raw(features['label'], tf.float32)
labels = tf.reshape(labels, [self.out_channel])
# capacity:隊列中元素的最大數量
# min_after_dequeue出隊後隊列中元素的最小數量,用於確保元素的混合級別
_images, _labels = tf.train.shuffle_batch([images, labels],
num_threads=self.num_threads,
batch_size=self.batch_size,
capacity=self.batch_size * 2,
min_after_dequeue=self.batch_size)
return _images, _labels
3. 修改 net_work.py
修改 loop 、_train、_val 方法
def loop(self, ):
self.build()
self.load_weight()
sess = tf.Session()
train_face_data = read_face_data(cfg.DATA.train_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
cfg.MODEL.win, cfg.MODEL.hin, 8)
self.train_image_data, self.train_label_data = train_face_data.read_and_decode()
val_face_data = read_face_data(cfg.DATA.val_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
cfg.MODEL.win, cfg.MODEL.hin, 8)
self.val_image_data, self.val_label_data = val_face_data.read_and_decode()
init_op = tf.global_variables_initializer()
sess.run(init_op)
## 啓動多線程處理輸入數據
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
with self._graph.as_default():
# Create a saver.
self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)
# Build the summary operation from the last tower summaries.
self.summary_op = tf.summary.merge(self.summaries)
self.summary_writer = tf.summary.FileWriter(cfg.MODEL.model_path, self._sess.graph)
# epoch 2000
min_loss_control = 1000.
for epoch in range(cfg.TRAIN.epoch):
self._train(epoch, sess)
val_loss = self._val(epoch, sess)
logger.info('**************'
'val_loss %f ' % (val_loss))
# tmp_model_name=cfg.MODEL.model_path + \
# 'epoch_' + str(epoch ) + \
# 'L2_' + str(cfg.TRAIN.weight_decay_factor) + \
# '.ckpt'
# logger.info('save model as %s \n'%tmp_model_name)
# self.saver.save(self.sess, save_path=tmp_model_name)
if 1:
min_loss_control = val_loss
low_loss_model_name = cfg.MODEL.model_path + \
'epoch_' + str(epoch) + \
'L2_' + str(cfg.TRAIN.weight_decay_factor) + '.ckpt'
logger.info('A new low loss model saved as %s \n' % low_loss_model_name)
self.saver.save(self._sess, save_path=low_loss_model_name)
self._sess.close()
sess.close()
coord.request_stop()
coord.join(threads)
def _train(self, _epoch, sess):
# config.TRAIN.train_set_size // config.TRAIN.num_gpu // config.TRAIN.batch_size
for step in range(cfg.TRAIN.iter_num_per_epoch):
self.ite_num += 1
start_time = time.time()
# 64 * 160 * 160 *3 64*143
example_images, example_labels = sess.run([self.train_image_data, self.train_label_data])
# example_images, example_labels = next(self.train_ds)
# example_images = train_iter_data['image']
# example_labels = train_iter_data['label']
########show_flag check the data
if cfg.TRAIN.vis:
for i in range(cfg.TRAIN.batch_size):
example_image = example_images[i, :, :, :] / 255.
example_label = example_labels[i, :]
Landmark = example_label[0:136]
cla = example_label[136:]
# print(np.max(example_image))
# print(np.min(example_image))
# print(Landmark)
print(cla)
Landmark = Landmark.reshape([-1, 2])
_h, _w, _ = example_image.shape
for _index in range(Landmark.shape[0]):
x_y = Landmark[_index]
cv2.circle(example_image, center=(int(x_y[0] * _w), int(x_y[1] * _w)), color=(122, 122, 122),
radius=1, thickness=1)
# cv2.putText(img_show, 'left_eye:open', (xmax, ymin),
# cv2.FONT_HERSHEY_SIMPLEX, 1,
# (255, 0, 255), 2)
cv2.namedWindow('img', 0)
cv2.imshow('img', example_image)
cv2.waitKey(0)
fetch_duration = time.time() - start_time
feed_dict = {}
for n in range(cfg.TRAIN.num_gpu):
feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
:, :, :]
feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
:]
feed_dict[self.inputs[2]] = True
_, total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate, = \
self._sess.run([*self.outputs],
feed_dict=feed_dict)
duration = time.time() - start_time
run_duration = duration - fetch_duration
if self.ite_num % cfg.TRAIN.log_interval == 0:
num_examples_per_step = cfg.TRAIN.batch_size * cfg.TRAIN.num_gpu
examples_per_sec = num_examples_per_step / duration
sec_per_batch = duration / cfg.TRAIN.num_gpu
format_str = ('epoch %d: iter %d, '
'total_loss=%.6f '
'loss=%.6f '
'leye_loss=%.6f '
'reye_loss=%.6f '
'mouth_loss=%.6f '
'leye_acc=%.6f '
'reye_acc=%.6f '
'mouth_acc=%.6f '
'l2_loss=%.6f '
'learn_rate =%e '
'(%.1f examples/sec; %.3f sec/batch) '
'fetch data time = %.6f'
'run time = %.6f')
logger.info(format_str % (_epoch,
self.ite_num,
total_loss_value,
loss_value,
leye_loss_value,
reye_loss_value,
mouth_loss_value,
leye_cla_accuracy_value,
reye_cla_accuracy_value,
mouth_cla_accuracy_value,
l2_loss_value,
learn_rate,
examples_per_sec,
sec_per_batch,
fetch_duration,
run_duration))
if self.ite_num % 100 == 0:
summary_str = self._sess.run(self.summary_op, feed_dict=feed_dict)
self.summary_writer.add_summary(summary_str, self.ite_num)
def _val(self, _epoch, sess):
all_total_loss = 0
for step in range(cfg.TRAIN.val_iter):
# example_images, example_labels = next(self.val_ds) # 在會話中取出image和label
example_images, example_labels = sess.run([self.val_image_data, self.val_label_data])
feed_dict = {}
for n in range(cfg.TRAIN.num_gpu):
feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
:, :, :]
feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
:]
feed_dict[self.inputs[2]] = False
total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate = \
self._sess.run([*self.val_outputs],
feed_dict=feed_dict)
all_total_loss += total_loss_value - l2_loss_value
return all_total_loss / cfg.TRAIN.val_iter
4. 重新開始訓練,速度絕對飛起,gpu利用率達到 95%, 10個迭代大概在 5s 左右。
5. 我訓練最終 loss 大概在 5.5左右 ,作者大概在 3 左右,可能訓練參數還有待優化。然後將生成的模型轉換成pb文件,找幾張圖片或者視頻,驗證下識別效果。
** 關於解決 tfrecord 文件太大問題:
當我們數據集增強後,有可能導致tfrecord文件太大,我試過大約寫入100萬多條的數據,tfrecord 文件大約接近 270G。
解決:
1. 壓縮數據
# 壓縮數據
writer_options = tf.python_io.TFRecordOptions(
tf.python_io.TFRecordCompressionType.ZLIB)
writer = tf.python_io.TFRecordWriter(record_path, options=writer_options)
# 解壓縮數據
tfrecord_options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)
2. 將數據分成多個record文件保存,讀取時,只需要將多個record文件的路徑列表交給“tf.train.string_input_producer”
參考: https://www.ppkanshu.com/index.php/post/2856.html
with open(self.tfrecord_path, "r") as f:
lines = f.readlines()
files_list = []
for line in lines:
files_list.append(line.rstrip())
filename_queue = tf.train.string_input_producer(files_list, shuffle=False)
*** 壓縮後,10000條數據,大約 700多M
三、疲勞檢測
通過識別的特徵點,計算眼睛的最小的距離,來判斷是否屬於閉眼狀態,然後定義單位時間內 (一般取1 分鐘或者 30 秒) 眼睛閉合一定比例 (70%或80%) 所佔的時間,來判斷是否發生了瞌睡,即PERCLOS值。
四、源碼地址
改編後的訓練源碼(tf1) : https://download.csdn.net/download/haiyangyunbao813/12363123
*** 如果 tfrecord 文件太大 ,可壓縮、拆分數據集等辦法解決。
識別 (參考作者源碼): https://github.com/610265158/Peppa_Pig_Face_Engine
(dlib)標註工具 : https://download.csdn.net/download/haiyangyunbao813/12365194
*** 增加撤銷 ,刪除當前圖片等功能,方便標註。
五、最後
本人屬於小白一枚,很多地方懂的也不是很多,平時喜歡瞎搞搞,所以希望大家有什麼好的建議或想法什麼的,歡迎在下面留言,不對的地方,大家多多指正。
附幾張檢測效果圖:
** 在閉眼檢測方面,表現十分優秀。