基於人臉特徵點實現疲勞檢測

爲了有效監測駕駛員是否疲勞駕駛、避免交通事故的發生,提出了一種利用人臉特徵點進行實時疲勞駕駛檢測的新方法。對駕駛員駕駛時的面部圖像進行實時監控,首先檢測人臉,並利用ERT算法定位人臉特徵點;然後根據人臉眼睛區域的特徵點座標信息計算眼睛縱橫比EAR來描述眼睛張開程度,根據合適的EAR閾值可判斷睜眼或閉眼狀態;最後基於EAR實測值和EAR閾值對監控視頻計算閉眼時間比例(PERCLOS)值度量駕駛員主觀疲勞程度,將其與設定的疲勞度閾值進行比較即可判定是否疲勞駕駛。

一、人臉特徵點檢測

人臉特徵點檢測基於該類庫實現(https://github.com/610265158/Peppa_Pig_Face_Engine), 我嘗試過很多開源框架，包括 dlib, openface,pfld,clnf 等，在閉眼檢測方面，表現都不是十分理想，後來發現這個類庫，哇，眼前一亮，檢測的很牛逼，而且很穩定。在 i7 八代的cpu上，識別一幀大概平均在40ms左右( 因爲項目主機上沒有gpu ，所以沒有測試過gpu的檢測速度，但應該很快), 詳細參考該作者文章:https://blog.csdn.net/qq_35606924/article/details/99711208, 寫的真的很牛逼，不牛逼你找我。

在我的數據集上檢測閉眼的效果圖：

二、訓練自己的數據集（基於 tf1 ）

因爲開源數據集中包含閉眼的數據太少了，所以需要我們自己手動增加，這裏我使用的是 dlib 的標註工具。

1. 標註數據集，因爲主要檢測眼睛和頭部姿態，所以我標註了37個點。（大概標註了5000多張）

** 標註工具: (由於標註過程中經常出錯，所以增加了撤銷等功能)

2. 標註完成後，會生成一個xml文件，裏面包含所有的標註信息 (最好檢測下，不要標記少點或者多點情況)，然後做下面操作

1. 打亂順序

imglab --shuffle dataset_0402.xml

2. 分隔數據集 ( 訓練集和測試集 )

imglab --split-train-test 0.95 dataset_0402.xml

3. 還可以翻轉數據集、去除相似樣本等操作

imglab --rmdupes  xml/mydataset.xml ## 去除相似樣本
imglab --flip   xml/mydataset.xml  ## 翻轉圖片

4. 更多詳細操作參考: https://blog.csdn.net/u010168781/article/details/91048497

3. 將訓練集和測試集轉換成作者提供的格式

import json
from xml.dom.minidom import parse
from tqdm import tqdm


def json_to_txt(json_file, txt_file):
    txt_file = open(txt_file, mode='w')
    with open(json_file, 'r') as f:
        data = json.load(f)
        tmp_str = ""
        for sub_data in data:
            file_name = sub_data['image_path']

            tmp_str += file_name + '|'

            key_points = sub_data['keypoints']

            for points in key_points:
                tmp_str = tmp_str + str(points[0]) + ' ' + str(points[1]) + ' '
            tmp_str = tmp_str + '\n'

        txt_file.write(tmp_str)


def read_xml_to_json(path, out_file_path):
    domTree = parse(path)
    # 文檔根元素
    rootNode = domTree.documentElement
    images = rootNode.getElementsByTagName("image")
    with open(out_file_path, 'w') as f:
        train_json_list = []
        for image in tqdm(images):
            one_image_ann = {}
            if image.hasAttribute("file"):
                info = ""
                # 文件路徑
                file_path = image.getAttribute("file")
                print("path:" + file_path)

                one_image_ann['image_path'] = file_path

                box = image.getElementsByTagName("box")

                top = box[0].getAttribute("top")
                left = box[0].getAttribute("left")
                width = box[0].getAttribute("width")
                height = box[0].getAttribute("height")

                print("top:" + top + " left:" + left + " width:" + width + " height:" + height)
                bbox = [float(top), float(left), float(width), float(height)]

                parts = box[0].getElementsByTagName("part")

                if len(parts) == 0:
                    continue
                key = []
                for part in parts:
                    key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
                    print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))

                one_image_ann['keypoints'] = key
                one_image_ann['bbox'] = bbox
                one_image_ann['attr'] = None

                train_json_list.append(one_image_ann)

        json.dump(train_json_list, f, indent=2)


def read_xml_to_txt(path, out_txt_file_path):
    domTree = parse(path)
    # 文檔根元素
    rootNode = domTree.documentElement
    images = rootNode.getElementsByTagName("image")
    with open(out_txt_file_path, 'w') as f:
        txt_str = ""
        for image in tqdm(images):
            if image.hasAttribute("file"):
                # 文件路徑
                file_path = image.getAttribute("file")
                txt_str += file_path + '|'
                # print("path:" + file_path)

                box = image.getElementsByTagName("box")
                parts = box[0].getElementsByTagName("part")

                if len(parts) == 0:
                    continue

                key = []
                for part in parts:
                    key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
                    txt_str = txt_str + str(float(part.getAttribute("x"))) + ' ' + str(
                        float(part.getAttribute("y"))) + ' '
                    # print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))

                txt_str = txt_str + '\n'

        f.write(txt_str)


if __name__ == '__main__':
    data_path = ["data/test.xml", "data/train.xml"]
    out_path = ["data/test.txt", "data/train.txt"]
    for path, out in zip(data_path, out_path):
        read_xml_to_txt(path, out)

4. 配置訓練參數

1. 修改特徵點下標，因爲作者使用的數據集是基於68點的，所以說下標肯定是不同的，如果你標註點的順序和作者使用的數據集一樣，則不需要更改。

2. 配置數據集訓練和測試路徑以及其他的一些參數，參考 train_config.py。

5. 修改數據讀取方式

1. 如果直接開始訓練，會報這個錯: Can't pickle local object 'DataFromGenerator.init..' ，因爲作者是在linux下訓練的，而我是在windows下訓練，可能window不支持這個多線程預加載。如果我們將這個 ds = MultiProcessPrefetchData(ds, self.prefetch_size, self.process_num) 註釋掉，重新訓練，會發現訓練速度很慢，迭代10次，大概需要耗時 1min , gpu 利用率大概在 2% 左右，我猜可能是因爲我們沒有做預加載處理，導致大部分時間都耗時在讀取數據上。

2. 通過 tfrecord 方式來讀取數據，這樣可以大大加快我們的訓練速度，在 mx150的gpu上測試，10迭代耗時大概在5s左右，提高了至少10倍的速度，但唯一難受的是，生成的 record 文件太大了（解決文件太大，看後面）。下面是實現過程：

1. 生成 tfrecord 文件

# 寫這段代碼的時候，只有上帝和我知道它是幹嘛的
# 現在，只有上帝知道
# @File : generate_tfrecord.py
# @Time : 2020/4/24 14:10 
# @Author : J.
# @desc : 生成 tfrecord 文件

import tensorflow as tf
from lib.dataset.dataietr import FaceKeypointDataIter
from train_config import config as cfg
from tqdm import tqdm
import argparse
import sys


def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def create_tf_example(image_file, is_train):
    crop_image, label = _train_data._map_func(image_file, is_train)
    tf_example = tf.train.Example(
        features=tf.train.Features(
            feature={
                'image': bytes_feature(crop_image.tobytes()),
                'label': bytes_feature(label.tobytes())
            }
        ))
    return tf_example


def generate_tfrecord(images_files, record_path, is_train=True):
    num_tf_example = 0
    writer = tf.python_io.TFRecordWriter(record_path)
    with tqdm(images_files, ncols=100) as files:
        for image in files:
            tf_example = create_tf_example(image, is_train)
            writer.write(tf_example.SerializeToString())
            num_tf_example += 1
            # if num_tf_example % 100 == 0:
            #     print("Create %d TF_Example" % num_tf_example)
        writer.close()
        print("{} tf_examples has been created successfully, which are saved in {}".format(num_tf_example, record_path))


def main(_):
    global _train_data
    global _val_data
    _train_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
                                       FLAGS.train_data,
                                       True)

    _val_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
                                     FLAGS.val_data,
                                     False)

    print("==================  generate train tf_record start  ===================")
    train_images_files = _train_data.get_parse_file()
    generate_tfrecord(train_images_files, FLAGS.train_save_path, True)
    print("==================  generate train tf_record end  ===================")

    print("==================  generate val tf_record start  ===================")
    val_images_files = _val_data.get_parse_file()
    generate_tfrecord(val_images_files, FLAGS.val_save_path, False)
    print("==================  generate val tf_record end  ===================")


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--train_data',
        type=str,
        default='data/train.txt',
        help='訓練數據.')

    parser.add_argument(
        '--val_data',
        type=str,
        default='data/test.txt',
        help='驗證數據.')

    parser.add_argument(
        '--train_save_path',
        type=str,
        default='record/train.record',
        help='生成訓練數據路徑.')

    parser.add_argument(
        '--val_save_path',
        type=str,
        default='record/val.record',
        help='生成驗證數據路徑.')

    FLAGS, unparsed = parser.parse_known_args()
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

2. 讀取 record 數據

class read_face_data():

    def __init__(self, tfrecord_path, batch_size, out_channel, win, hin, num_threads):
        self.tfrecord_path = tfrecord_path
        self.batch_size = batch_size
        self.win = win
        self.hin = hin
        self.num_threads = num_threads
        self.out_channel = out_channel

    def read_and_decode(self):
        filename_queue = tf.train.string_input_producer([self.tfrecord_path], shuffle=False)
        reader = tf.TFRecordReader()
        _, serialized_example = reader.read(filename_queue)
        features = tf.parse_single_example(serialized_example,
                                           features={
                                               'image': tf.FixedLenFeature([], tf.string),
                                               'label': tf.FixedLenFeature([], tf.string)
                                           })

        images = tf.decode_raw(features['image'], tf.float32)
        images = tf.reshape(images, [self.win, self.hin, 3])

        labels = tf.decode_raw(features['label'], tf.float32)
        labels = tf.reshape(labels, [self.out_channel])

        # capacity：隊列中元素的最大數量
        # min_after_dequeue出隊後隊列中元素的最小數量，用於確保元素的混合級別
        _images, _labels = tf.train.shuffle_batch([images, labels],
                                                  num_threads=self.num_threads,
                                                  batch_size=self.batch_size,
                                                  capacity=self.batch_size * 2,
                                                  min_after_dequeue=self.batch_size)

        return _images, _labels

3. 修改 net_work.py

修改 loop 、_train、_val 方法

    def loop(self, ):

        self.build()
        self.load_weight()

        sess = tf.Session()

        train_face_data = read_face_data(cfg.DATA.train_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
                                         cfg.MODEL.win, cfg.MODEL.hin, 8)
        self.train_image_data, self.train_label_data = train_face_data.read_and_decode()

        val_face_data = read_face_data(cfg.DATA.val_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
                                       cfg.MODEL.win, cfg.MODEL.hin, 8)
        self.val_image_data, self.val_label_data = val_face_data.read_and_decode()
        init_op = tf.global_variables_initializer()
        sess.run(init_op)
        ## 啓動多線程處理輸入數據
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
        with self._graph.as_default():
            # Create a saver.
            self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)

            # Build the summary operation from the last tower summaries.
            self.summary_op = tf.summary.merge(self.summaries)

            self.summary_writer = tf.summary.FileWriter(cfg.MODEL.model_path, self._sess.graph)

        # epoch 2000
        min_loss_control = 1000.
        for epoch in range(cfg.TRAIN.epoch):
            self._train(epoch, sess)
            val_loss = self._val(epoch, sess)
            logger.info('**************'
                        'val_loss %f ' % (val_loss))

            # tmp_model_name=cfg.MODEL.model_path + \
            #               'epoch_' + str(epoch ) + \
            #               'L2_' + str(cfg.TRAIN.weight_decay_factor) + \
            #               '.ckpt'
            # logger.info('save model as %s \n'%tmp_model_name)
            # self.saver.save(self.sess, save_path=tmp_model_name)

            if 1:
                min_loss_control = val_loss
                low_loss_model_name = cfg.MODEL.model_path + \
                                      'epoch_' + str(epoch) + \
                                      'L2_' + str(cfg.TRAIN.weight_decay_factor) + '.ckpt'
                logger.info('A new low loss model  saved as %s \n' % low_loss_model_name)
                self.saver.save(self._sess, save_path=low_loss_model_name)

        self._sess.close()
        sess.close()
        coord.request_stop()
        coord.join(threads)

    def _train(self, _epoch, sess):
        #   config.TRAIN.train_set_size // config.TRAIN.num_gpu // config.TRAIN.batch_size
        for step in range(cfg.TRAIN.iter_num_per_epoch):
            self.ite_num += 1
            start_time = time.time()
            # 64 * 160 * 160 *3   64*143
            example_images, example_labels = sess.run([self.train_image_data, self.train_label_data])
            # example_images, example_labels = next(self.train_ds)
            # example_images = train_iter_data['image']
            # example_labels = train_iter_data['label']

            ########show_flag check the data
            if cfg.TRAIN.vis:
                for i in range(cfg.TRAIN.batch_size):
                    example_image = example_images[i, :, :, :] / 255.
                    example_label = example_labels[i, :]

                    Landmark = example_label[0:136]
                    cla = example_label[136:]

                    # print(np.max(example_image))
                    # print(np.min(example_image))
                    # print(Landmark)
                    print(cla)
                    Landmark = Landmark.reshape([-1, 2])
                    _h, _w, _ = example_image.shape
                    for _index in range(Landmark.shape[0]):
                        x_y = Landmark[_index]
                        cv2.circle(example_image, center=(int(x_y[0] * _w), int(x_y[1] * _w)), color=(122, 122, 122),
                                   radius=1, thickness=1)

                    # cv2.putText(img_show, 'left_eye:open', (xmax, ymin),
                    #             cv2.FONT_HERSHEY_SIMPLEX, 1,
                    #             (255, 0, 255), 2)
                    cv2.namedWindow('img', 0)
                    cv2.imshow('img', example_image)
                    cv2.waitKey(0)

            fetch_duration = time.time() - start_time

            feed_dict = {}
            for n in range(cfg.TRAIN.num_gpu):
                feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
                                               :, :, :]
                feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
                                               :]

            feed_dict[self.inputs[2]] = True
            _, total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
            leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate, = \
                self._sess.run([*self.outputs],
                               feed_dict=feed_dict)

            duration = time.time() - start_time
            run_duration = duration - fetch_duration
            if self.ite_num % cfg.TRAIN.log_interval == 0:
                num_examples_per_step = cfg.TRAIN.batch_size * cfg.TRAIN.num_gpu
                examples_per_sec = num_examples_per_step / duration
                sec_per_batch = duration / cfg.TRAIN.num_gpu

                format_str = ('epoch %d: iter %d, '
                              'total_loss=%.6f '
                              'loss=%.6f '
                              'leye_loss=%.6f '
                              'reye_loss=%.6f '
                              'mouth_loss=%.6f '
                              'leye_acc=%.6f '
                              'reye_acc=%.6f '
                              'mouth_acc=%.6f '
                              'l2_loss=%.6f '
                              'learn_rate =%e '
                              '(%.1f examples/sec; %.3f sec/batch) '
                              'fetch data time = %.6f'
                              'run time = %.6f')
                logger.info(format_str % (_epoch,
                                          self.ite_num,
                                          total_loss_value,
                                          loss_value,
                                          leye_loss_value,
                                          reye_loss_value,
                                          mouth_loss_value,
                                          leye_cla_accuracy_value,
                                          reye_cla_accuracy_value,
                                          mouth_cla_accuracy_value,
                                          l2_loss_value,
                                          learn_rate,
                                          examples_per_sec,
                                          sec_per_batch,
                                          fetch_duration,
                                          run_duration))

            if self.ite_num % 100 == 0:
                summary_str = self._sess.run(self.summary_op, feed_dict=feed_dict)
                self.summary_writer.add_summary(summary_str, self.ite_num)

    def _val(self, _epoch, sess):
        all_total_loss = 0
        for step in range(cfg.TRAIN.val_iter):

            # example_images, example_labels = next(self.val_ds)  # 在會話中取出image和label
            example_images, example_labels = sess.run([self.val_image_data, self.val_label_data])

            feed_dict = {}
            for n in range(cfg.TRAIN.num_gpu):
                feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
                                               :, :, :]
                feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
                                               :]
            feed_dict[self.inputs[2]] = False
            total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
            leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate = \
                self._sess.run([*self.val_outputs],
                               feed_dict=feed_dict)

            all_total_loss += total_loss_value - l2_loss_value

        return all_total_loss / cfg.TRAIN.val_iter

4. 重新開始訓練，速度絕對飛起，gpu利用率達到 95%, 10個迭代大概在 5s 左右。

5. 我訓練最終 loss 大概在 5.5左右，作者大概在 3 左右，可能訓練參數還有待優化。然後將生成的模型轉換成pb文件，找幾張圖片或者視頻，驗證下識別效果。

** 關於解決 tfrecord 文件太大問題：

當我們數據集增強後，有可能導致tfrecord文件太大，我試過大約寫入100萬多條的數據，tfrecord 文件大約接近 270G。

解決：

1. 壓縮數據

 # 壓縮數據
 writer_options = tf.python_io.TFRecordOptions(
        tf.python_io.TFRecordCompressionType.ZLIB)
 writer = tf.python_io.TFRecordWriter(record_path, options=writer_options)


#  解壓縮數據
 tfrecord_options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)

2. 將數據分成多個record文件保存，讀取時，只需要將多個record文件的路徑列表交給“tf.train.string_input_producer”

參考: https://www.ppkanshu.com/index.php/post/2856.html

 with open(self.tfrecord_path, "r") as f:
            lines = f.readlines()
            files_list = []
            for line in lines:
                files_list.append(line.rstrip())

filename_queue = tf.train.string_input_producer(files_list, shuffle=False)

*** 壓縮後，10000條數據，大約 700多M