dlib人臉識別安裝及使用教程

文章目錄

一 dlib本地安裝與編譯

1.1 dlib源碼下載

下載地址：https://github.com/davisking/dlib （當前最新版dlib 19.15）

爲了區分版本，將下載目錄命名爲dlib-19-15，如上圖所示。

1.2 dlib C++編譯示例程序

1.2.1 dlib庫編譯

編譯需要安裝VS，在此安裝的是最新版Visual Studio 15 2017版本。

進入dlib-master/dlib-19-15目錄，運行：

# mkdir build
# cd build
# cmake .. 
# cmake --build .

指定運行環境及模式：

# cmake .. -G "Visual Studio 15 2017 Win64" -T host=x64

在上圖的目錄下能看到生成的.lib依賴項，則代表dllib庫成功編譯。

1.2.2 C++示例程序配置、運行

以examples/train_shape_predictor_ex.cpp爲例，其他示例代碼操作相同。
1、創建ConsoleApplication1.cpp和source.cpp來源
首先，打開VS新建一個C++控制檯工程，將train_shape_predictor_ex.cpp的代碼複製到ConsoleApplication1.cpp，以添加現有項的方式加入source.cpp文件，source.cpp文件在dlib-master/dlib-19-15/dlib/all目錄下。

2、修改stadfx屬性
進入項目-屬性進行以下修改，避免預編譯頭帶來的error。

3、加入目錄

4、加入生成的依賴項.lib的路徑

5、圖形處理類配置
加入DLIB_JPG_SUPPORT、DLIB_JPEG_ SUPPORT、DLIB_JPEG_STATIC

項目配置完成後，點擊生成-生成解決方案，工程目錄下將會生成ConsoleApplication1.exe文件。以命令行的方式運行ConsoleApplication1.exe文件，或者在VS上點擊調試-開始執行即可。有參數輸入的需要輸入命令行參數。

1.3 dlib python API編譯

方法一：
進入目錄，運行：

# python setup.py install

之後進入python_examples便可運行python示例程序。

方法二：

# pip3 install dlib

這種方法目前本地dlib19.15版本不能成功安裝，只能安裝低版本的dlib，這樣python示例中的某些函數調用可能不能正常運行。

二 dlib庫的主要功能及準確率評估

dlib庫中的主要功能包括人臉檢測、人臉關鍵點檢測、人臉識別三部分。此處研究python_examples示例代碼部分，C++程序示例類似。這裏的評估實現主要是參考2.1節中示例代碼的二次開發代碼。

2.1 代碼功能簡介

主要代碼在dlib庫的python_examples目錄下，其中需要用到的模型文件下載地址爲http://dlib.net/files：

face_detector.py
人臉正面檢測器，主要使用dlib.get_frontal_face_detector()。
cnn_face_detector.py
人臉檢測器，主要使用dlib.cnn_face_detection_model_v1 (‘mmod_human_face_detector.dat’)，官方指出比dlib.get_frontal_face_detector()準確率高。
face_landmark.py
人臉關鍵點檢測，主要使用dlib.get_frontal_face_detector()和dlib.shape_predictor(‘shape_predictor_68_face_landmarks.dat’)。
face_recognition.py
人臉識別，主要使用dlib.get_frontal_face_detector()和dlib.shape_predictor(‘shape_predictor_5_face_landmarks.dat’)和dlib.face_recognition_model_v1(‘dlib_face_recognition_resnet_model_v1.dat’)。
opencv_webcam_face_detection.py
人臉檢測的視頻使用，主要使用dlib.get_frontal_face_detector()和cv2.VideoCapture()。
train_object_detector.py
人臉正面檢測器的訓練部分，訓練生成detector.svm文件。
train_shape_predictor.py
人臉關鍵點檢測器的訓練部分，訓練生成predictor.dat文件。

2.2 人臉檢測和人臉關鍵點

2.2.1 數據集、代碼準備

使用參考代碼：examples/face_landmark_detection.py，爲了進行人臉準確率統計，將其改寫並命名爲face_landmark.py，目前只能統計圖片中含單個人臉的準確率（每張圖片含多個人臉難以統計總的準確率）。

需要的模型文件：shape_predictor_68_face_landmarks.dat是訓練好的人臉關鍵點檢測器。

待測圖像數據集：LFW數據集。

face_landmark.py代碼如下：

import os
import dlib
from skimage import io

# 待測人臉數據集
faces_folder_path = "lfwdata"

# 第一步，人臉檢測器和人臉關鍵點檢測器加載
# 人臉檢測器
detector = dlib.get_frontal_face_detector()
# 人臉關鍵點檢測器
predictor = dlib.shape_predictor("../shape_predictor_68_face_landmarks.dat")

# 第二步，遍歷圖片，使用人臉檢測器和人臉關鍵點檢測器，並顯示
# 窗口
win = dlib.image_window()
# 統計檢測正確數
tol = ans = 0
# 遍歷文件夾中的jpg圖片
for (path, dirnames, filenames) in os.walk(faces_folder_path):
    for filename in filenames:
        if filename.endswith('.jpg') or filename.endswith('.png'):
            tol += 1
            img_path = path + '/' + filename
            print("Processing file: {}".format(img_path))
            # 讀取圖片
            img = io.imread(img_path)

            win.clear_overlay()
            win.set_image(img)

            # 人臉檢測器的使用
            dets = detector(img, 1)
            # 統計每張圖片人臉個數>0判斷是否檢測成功
            face_num = len(dets)
            print("Number of faces detected: {}".format(len(dets)))
            if face_num > 0:
                ans += 1
            else:
                print("fail")
            for k, d in enumerate(dets):
                print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
                    k, d.left(), d.top(), d.right(), d.bottom()))
                # 人臉關鍵點檢測器的使用
                shape = predictor(img, d)
                print("Part 0: {}, Part 1: {} ...".format(shape.part(0), shape.part(1)))

                win.add_overlay(shape)

            win.add_overlay(dets)
            # 鼠標控制下一張
            # dlib.hit_enter_to_continue()

# 第三步，計算準確率
# 打印準確率
print("correct:{},total{}".format(ans, tol))
print("correct:{}".format(ans/tol))

2.2.2 測試效果圖

2.2.3 準確率

檢測總共13234張圖片，檢測到有人臉的有13172張照片，準確率爲：99.53%。

測試失敗的圖像中，人像多爲半臉、側臉、曝光或有遮擋。這與代碼中使用的是正臉檢測器dlib.get_frontal_face_detector()有很大關係，檢測失敗的部分圖片如下：

2.3 人臉識別

2.3.1 數據集、代碼準備

使用參考代碼：face_recognition.py，爲了進行準確率統計，將其改寫並命名爲face_recog.py。

需要的模型文件：shape_predictor_68_face_landmarks.dat是訓練好的人臉關鍵點檢測器。dlib_face_recognition_resnet_model_v1.dat是訓練好的ResNet人臉識別模型。

數據集：lfw數據集挑選候選人臉398張正臉（每人一張圖片），待測人臉525張正臉（每個人可能含有多張圖片）。

face_recog.py代碼如下：

import os
import dlib
import glob
import numpy
from skimage import io

# 訓練人臉文件夾
faces_folder_path = "recog_train"
# 待測人臉文件夾
img_folder_path = "recog_test"


# 第二步，生成訓練人臉標籤和描述子，供人臉識別使用
# 對文件夾下的每一個人臉進行:
# 1.人臉檢測
# 2.關鍵點檢測
# 3.描述子提取
# 訓練人臉標籤和描述子list
def train(faces_folder_path):
    trainlabel = []
    train_descriptors = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('_0')[0].split('\\')[1]
        trainlabel.append(labelName)
        # print("Processing file: {}".format(labelName))

        face = io.imread(file)
        # 1.人臉檢測
        dets = detector(face, 1)
        # print("Number of faces detected: {}".format(len(dets)))

        for k, d in enumerate(dets):
            # 2.關鍵點檢測
            shape = predictor(face, d)

            # 3.描述子提取，128D向量
            face_descriptor = facerec.compute_face_descriptor(face, shape)
            # 轉換爲numpy array
            face_vector = numpy.array(face_descriptor)
            train_descriptors.append(face_vector)
    return trainlabel, train_descriptors
        
# 第三步，識別待測人臉是哪個人
def recognition(trainlabel, train_descriptors):
    ans_right = 0
    ans_wrong = 0
    # 對需識別人臉進行同樣處理
    for file in glob.glob(os.path.join(img_folder_path, "*.jpg")):
        img = io.imread(file)
        # 人臉檢測
        dets = detector(img, 1)

        # 待測人臉與所有訓練人臉的距離
        dists = []
        for k, d in enumerate(dets):
            # 關鍵點檢測
            shape = predictor(img, d)
            # 提取描述子
            test_descriptor = facerec.compute_face_descriptor(img, shape)
            d_test = numpy.array(test_descriptor)

            # 計算歐式距離
            for d_train in train_descriptors:
                dist = numpy.linalg.norm(d_train-d_test)
                dists.append(dist)
        # 待測人臉和所有訓練人臉的標籤、距離組成一個dict
        c_d = dict(zip(trainlabel, dists))
        cd_sorted = sorted(c_d.items(), key=lambda d:d[1])

        nametest = file.split('_0')[0].split('\\')[1]
        print(cd_sorted[0][1])
        # 設置閾值判斷是哪個人
        if cd_sorted[0][1] < 0.6:
            namepredict = cd_sorted[0][0]
        else:
            namepredict = "Unknown"
        print(nametest, namepredict)

        # 判斷識別是否正確識別
        if(namepredict == nametest) or (namepredict == "Unknown" and nametest not in trainlabel):
           print("right")
           ans_right += 1
        else:
            print("wrong")
            ans_wrong += 1
        # dlib.hit_enter_to_continue()
    print("total:", ans_right + ans_wrong, "\nright:", ans_right, "\nwrong:", ans_wrong)

if  __name__ == '__main__':
    # 第一步，三種檢測器的加載
    # 1.加載正臉檢測器
    detector = dlib.get_frontal_face_detector()
    # 2.加載人臉關鍵點檢測器
    predictor = dlib.shape_predictor("../shape_predictor_68_face_landmarks.dat")
    # 3. 加載人臉識別模型
    facerec = dlib.face_recognition_model_v1("../dlib_face_recognition_resnet_model_v1.dat")

    # 第二步，生成訓練人臉標籤和描述子，供人臉識別使用
    trainlabel, train_descriptors = train(faces_folder_path)

    # 第三步，識別待測人臉是哪個人並統計正確率
    recognition(trainlabel, train_descriptors)

2.3.2 人臉識別步驟

首先，先將候選人臉文件夾中的人臉進行：
1.人臉檢測
2.關鍵點檢測，畫出人臉區域和和關鍵點
3.描述子提取，128D向量，轉換爲numpy array
4.將候選人圖像的文件名提取出來，作爲候選人名單

然後，對待測人臉進行同樣的處理：
1.人臉檢測，關鍵點檢測，描述子提取
2.計算待測人臉描述子和候選人臉描述子之間的歐氏距離
3.將所有候選人與待測人臉描述子的距離組成一個dict
4.排序
5.距離最小者且閾值小於0.6，判定爲同一個人

2.3.3 準確率

檢測的525張圖片中，有503張檢測成功，準確率爲：503/525=95.81%。

2.4 視頻中的人臉檢測、人臉識別

2.4.1 攝像頭讀入檢測時間測試

代碼命名爲face_detector_video.py。代碼如下：

import cv2
import dlib
import time

# 初始化dlib人臉檢測器
detector = dlib.get_frontal_face_detector()

# 初始化顯示窗口
win = dlib.image_window()
# opencv加載視頻文件
# cap = cv2.VideoCapture(r'../test.mp4')
cap = cv2.VideoCapture(0) #加載攝像頭

while True:
    start = time.time()
    ret, cv_img = cap.read()
    if cv_img is None:
        break

    # 縮小圖像至1/4
    cv_img = cv2.resize(cv_img, (0, 0), fx=0.25, fy=0.25)

    # OpenCV默認是讀取爲RGB圖像，而dlib需要的是BGR圖像，因此這一步轉換不能少
    img = cv2.cvtColor(cv_img, cv2.COLOR_RGB2BGR)

    # 檢測人臉
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))

    for i, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            i, d.left(), d.top(), d.right(), d.bottom()))

    print(time.time() - start)
    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)

cap.release()

dlib的人臉檢測精度比OpenCV自帶的高很多，因此本文采用dlib的人臉檢測器。從攝像頭讀入數據，結合OpenCV將視頻流截成圖像幀，使用正臉檢測器dlib.get_frontal_face_detector()進行檢測。

測試效果圖：

測試時的輸出：

測試速度：
0.09s~0.11s/幀。

2.4.2 mp4文件讀入檢測時間測試

將2.4.1節的代碼中加載攝像頭語句更改爲加載mp4文件。然後同樣將視頻截成圖像，使用正臉檢測器dlib.get_frontal_face_detector()進行檢測。
測試效果圖：

測試時的輸出：

測試速度：
0.09s~0.11s/幀。

注意：視頻文件中的人臉檢測的速度跟文件的大小（幀高、幀寬）有很大關係。

2.4.3 視頻中的人臉識別

分別使用dlib中的人臉識別功能，代碼命名爲face_recogn_video.py；和dlib二次開發包face_recognition中的人臉識別功能，代碼命名爲face_recognition_video.py。

face_recogn_video.py代碼如下：

import dlib
import numpy as np
import cv2
import json
import os
import glob

# 候選人數據集
faces_folder_path = r'../train_person'
video_path = r'../test.mp4'

# 獲取訓練集標籤和人臉識別描述子
def train(faces_folder_path):
    trainlabel = []
    train_descriptors = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('.jpg')[0].split('\\')[1]
        trainlabel.append(labelName)
        print("Processing file: {}".format(labelName))

        face = cv2.imread(file)
        # 1.人臉檢測
        dets = detector(face, 1)
        # print("Number of faces detected: {}".format(len(dets)))

        for k, d in enumerate(dets):
            # 2.關鍵點檢測
            shape = predictor(face, d)

            # 3.描述子提取，128D向量
            face_descriptor = facerec.compute_face_descriptor(face, shape)
            # 轉換爲numpy array
            face_vector = np.array(face_descriptor)
            train_descriptors.append(face_vector)
    return trainlabel, train_descriptors

# 識別確定哪個人
def findNearestClassForImage(face_descriptor, trainlabel, train_descriptors):
    train_descriptors = np.array(train_descriptors)
    dist = np.linalg.norm(face_descriptor - train_descriptors, axis=1, keepdims=True)
    min_distance = dist.min()
    print('distance: ', min_distance)
    if min_distance > threshold:
        return 'Unknown'
    index = np.argmin(dist)
    return trainlabel[index]

# 人臉識別
def recognition(img, trainlabel, train_descriptors):
    # 人臉檢測
    dets = detector(img, 1)
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # 人臉關鍵點檢測器
        shape = predictor(img, d)
        # 人臉識別描述子
        face_descriptor = facerec.compute_face_descriptor(img, shape)

        # 識別確定哪個人
        class_pre = findNearestClassForImage(face_descriptor, trainlabel, train_descriptors)
        print(class_pre)
        cv2.rectangle(img, (d.left(), d.top() + 10), (d.right(), d.bottom()), (0, 255, 0), 2)
        cv2.putText(img, class_pre, (d.left(), d.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2, cv2.LINE_AA)

    cv2.imshow('image', img)

if  __name__ == '__main__':
    # 加載網絡模型
    detector = dlib.get_frontal_face_detector()
    predictor = dlib.shape_predictor('../shape_predictor_68_face_landmarks.dat')
    facerec = dlib.face_recognition_model_v1('../dlib_face_recognition_resnet_model_v1.dat')
    # 設置識別閾值
    threshold = 0.6

    # 訓練標籤及人臉識別描述子
    trainlabel, train_descriptors = train(faces_folder_path)
    # cap = cv2.VideoCapture(0)
    cap = cv2.VideoCapture(video_path)
    # 保存視頻
    # fps = 10
    # size = (640, 480)
    # fourcc = cv2.VideoWriter_fourcc(*'XVID')
    # videoWriter = cv2.VideoWriter('video.MP4', fourcc, fps, size)

    while (1):
        ret, frame = cap.read()
        # 縮小圖像至1/4
        frame = cv2.resize(frame, (0,0), fx=0.25, fy=0.25)

        # 人臉識別
        recognition(frame, trainlabel, train_descriptors)
        # videoWriter.write(frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    videoWriter.release()
cv2.destroyAllWindows()

	face_recognition_video.py代碼如下：
import face_recognition
import cv2
import os
import glob

# 視頻路徑和已知人臉文件夾
video_path = r'../test.mp4'
faces_folder_path = '../train_person'


# 讀取訓練集人臉姓名和人臉識別編碼
def train(faces_folder_path):
    known_face_names = []
    known_face_encodings = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('.jpg')[0].split('\\')[1]
        known_face_names.append(labelName)

        image = face_recognition.load_image_file(file)
        face_encoding = face_recognition.face_encodings(image)[0]
        known_face_encodings.append(face_encoding)
    return known_face_names, known_face_encodings

def recognition(rgb_small_frame, known_face_names, known_face_encodings):
    # 根據encoding來判斷是不是同一個人，是就輸出true，不是爲flase
    face_locations = face_recognition.face_locations(rgb_small_frame)
    face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)

    face_names = []
    for face_encoding in face_encodings:
        # 默認爲unknown
        matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
        name = "Unknown"

        if True in matches:
            first_match_index = matches.index(True)
            name = known_face_names[first_match_index]
        face_names.append(name)
    return face_locations, face_names

def main():
    face_locations = []
    face_names = []

    # 設置顯示窗口
    wnd = 'OpenCV Video'
    cv2.namedWindow(wnd, flags=0)
    cv2.resizeWindow(wnd, 1920, 1080)

    known_face_names, known_face_encodings = train(faces_folder_path)

    # 讀取視頻
    # video_capture = cv2.VideoCapture(0)
    video_capture = cv2.VideoCapture(video_path)
    # 隔幾幀顯示
    process_this_frame = 0
    while True:
        # 讀取攝像頭畫面
        ret, frame = video_capture.read()
        # 改變攝像頭圖像的大小，圖像小，所做的計算就少
        small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
        # opencv的圖像是BGR格式的，而我們需要是的RGB格式的，因此需要進行一個轉換。
        rgb_small_frame = small_frame[:, :, ::-1]

        process_this_frame += 1
        if process_this_frame % 5 == 0:
            # 位置，姓名
            face_locations, face_names = recognition(rgb_small_frame, known_face_names, known_face_encodings)

        # 將捕捉到的人臉顯示出來
        for (top, right, bottom, left), name in zip(face_locations, face_names):
            # 放大至真實值
            top *= 4
            right *= 4
            bottom *= 4
            left *= 4

            # 矩形框
            cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

            #加上標籤
            cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
            font = cv2.FONT_HERSHEY_DUPLEX
            cv2.putText(frame, name, (left + 6, bottom - 6), font, 1.0, (255, 255, 255), 1)

        # 顯示
        cv2.imshow(wnd, frame)

        # 按Q退出
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    video_capture.release()
    cv2.destroyAllWindows()

if  __name__ == '__main__':
main()

測試效果圖：

測試結果：
face_recognition中的人臉識別功能比dlib中的人臉識別功能識別速度較快。

2.4.4 優化部分

在face_recognition中的人臉識別功能代碼中，加入了兩點優化：
1、識別時縮小圖像至1/4，顯示時擴大至圖像原大小；
2、每5幀進行一次人臉識別。
最後達到的人臉識別速度接近實時識別（即接近達到正常播放視頻的速度）。

三 python訓練自己的模型

用python_examples的示例代碼訓練自己的模型較簡單，爲了加快訓練，將在linux服務器上運行python_examples的示例代碼。關於linux服務器dlib庫的安裝參考1.3節。此處採用的方法是pip3 install dib。

3.1 數據集標註

3.1.1 imglab簡介

imglab是dlib提供用來製作數據集的工具，通過給圖片打標籤，最後會生成一個xml文件。

3.1.2 imglab使用方法

在dlib官方源碼中提供了這個工具，文件路徑爲：tools/imglab。
使用前要先安裝好cmake。

使用步驟：

打開cmd
進入tools/imglab目錄
新建一個build文件夾，進入build
輸入：cmake …
輸入：cmake --build . --config Release
進入Release
新建一個image文件夾，將訓練集所有圖片複製進去
在Release目錄下，輸入：imglab -c mydataset.xml image，將會創建一個mydataset.xml文件
輸入：imglab mydataset.xml

出現imglab標註軟件了，可以自己進行標註了。

標註方法如下：

按Shift+左鍵進行畫框。先鬆開左鍵，框就畫上去了；先鬆開Shift鍵，則取消畫人臉框。
對框雙擊左鍵，按delete鍵可刪除。
對框雙擊左鍵，按i鍵可將物體標註爲ignore，即是不明物體，進行忽略。
按e鍵，會曝光圖片，效果如下。
按Ctrl鍵加滾輪，可以縮放圖片加標籤。
雙擊選中框後，按shift+左鍵可畫關鍵點。
畫完人臉框和關鍵點之後，點filesave保存，然後exit退出，就可以在mydataset.xml文件中看到人臉檢測的數據集了。

3.2 訓練自己的人臉關鍵點檢測器

3.2.1 數據集

使用imglab工具，給訓練的圖片和測試的圖片標註人臉框和關鍵點（5個關鍵點：眼睛、鼻子、嘴巴），訓練圖片7張，測試圖片5張。生成標註文件train_landmarks.xml和test_landmarks.xml。目錄如下，train、test文件夾中存放訓練、測試圖片。

3.2.2 訓練部分

訓練代碼參考python_examples/train_shape_predictor.py，如下：

import os
import sys
import glob
import dlib

options = dlib.shape_predictor_training_options()
# Now make the object responsible for training the model.
# This algorithm has a bunch of parameters you can mess with.  The
# documentation for the shape_predictor_trainer explains all of them.
# You should also read Kazemi's paper which explains all the parameters
# in great detail.  However, here I'm just setting three of them
# differently than their default values.  I'm doing this because we
# have a very small dataset.  In particular, setting the oversampling
# to a high amount (300) effectively boosts the training set size, so
# that helps this example.
options.oversampling_amount = 300
# I'm also reducing the capacity of the model by explicitly increasing
# the regularization (making nu smaller) and by using trees with
# smaller depths.
options.nu = 0.05
options.tree_depth = 2
options.be_verbose = True

# dlib.train_shape_predictor() does the actual training.  It will save the
# final predictor to predictor.dat.  The input is an XML file that lists the
# images in the training dataset and also contains the positions of the face
# parts.
training_xml_path = ' /home/users/chenzhuo/program/dlib-19-15/python_test/mytest/train_landmarks.xml '
dlib.train_shape_predictor(training_xml_path, "predictor.dat", options)

# Now that we have a model we can test it.  dlib.test_shape_predictor()
# measures the average distance between a face landmark output by the
# shape_predictor and where it should be according to the truth data.
print("\nTraining accuracy: {}".format(
    dlib.test_shape_predictor(training_xml_path, "predictor.dat")))
# The real test is to see how well it does on data it wasn't trained on.  We
# trained it on a very small dataset so the accuracy is not extremely high, but
# it's still doing quite good.  Moreover, if you train it on one of the large
# face landmarking datasets you will obtain state-of-the-art results, as shown
# in the Kazemi paper.
testing_xml_path = ‘/home/users/chenzhuo/program/dlib-19-15/python_test/mytest/test_landmarks.xml’
print("Testing accuracy: {}".format(
dlib.test_shape_predictor(testing_xml_path, "predictor.dat")))

將上述代碼命名爲shape_predictor_train.py，將代碼中training_xml_path改爲自己的數據集xml文件路徑，進入.py文件所在目錄，執行

# python3 shape_predictor_train.py

3.2.3 測試部分

測試代碼訓練代碼參考python_examples/shape_predictor_test.py，如下：

import os
import sys
import glob
import cv2
import dlib

if len(sys.argv) != 2:
    print(
        "Give the path to the examples/faces directory as the argument to this "
        "program. For example, if you are in the python_examples folder then "
        "execute this program by running:\n"
        "    ./train_shape_predictor.py ../examples/faces")
    exit()
faces_folder = sys.argv[1]

# Now let's use it as you would in a normal application.  First we will load it
# from disk. We also need to load a face detector to provide the initial
# estimate of the facial location.
predictor = dlib.shape_predictor("predictor.dat")
detector = dlib.get_frontal_face_detector()

# Now let's run the detector and shape_predictor over the images in the faces
# folder and display the results.
print("Showing detections and predictions on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.jpg")):
    print("Processing file: {}".format(f))
    # img = dlib.load_rgb_image(f)
    img = cv2.imread(f)

    win.clear_overlay()
    win.set_image(img)

    # Ask the detector to find the bounding boxes of each face. The 1 in the
    # second argument indicates that we should upsample the image 1 time. This
    # will make everything bigger and allow us to detect more faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # Get the landmarks/parts for the face in box d.
        shape = predictor(img, d)
        print("Part 0: {}, Part 1: {} ...".format(shape.part(0),
                                                  shape.part(1)))
        # Draw the face landmarks on the screen.
        win.add_overlay(shape)

    win.add_overlay(dets)
    dlib.hit_enter_to_continue()

將上述代碼命名爲shape_predictor_test.py，打開VNC客戶端，進入.py文件所在目錄，執行

# python3 shape_predictor_test.py /home/users/chenzhuo/program/dlib-19-15/examples/faces

3.2.4 優化部分

訓練時可以用多姿態的訓練數據，比如正臉、左側臉、右側臉的標註數據集進行訓練。

3.3 訓練自己的人臉檢測器

3.3.1 數據集

# wget http://dlib.net/files/data/dlib_face_detector_training_data.tar.gz

這是dlib訓練使用的數據集，裏面有數千張人臉的標註數據集，此處僅使用frontal_faces.xml，如下圖。

3.3.2 訓練部分

訓練代碼參考python_examples/train_object_detection.py，如下：

import os
import sys
import glob
import dlib

# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
# 超參數
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
# 對稱檢測器
options.add_left_right_image_flips = True
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True

training_xml_path = '/home/users/chenzhuo/program/dlib-19-15/python_test/dlib_face_detector_training_data/frontal_faces.xml'
# testing_xml_path = '/home/users/chenzhuo/program/dlib-19-15/python_test/cats/cats_test/cat_test.xml'
# This function does the actual training.  It will save the final detector to
# detector.svm.  The input is an XML file that lists the images in the training
# dataset and also contains the positions of the face boxes.  To create your
# own XML files you can use the imglab tool which can be found in the
# tools/imglab folder.  It is a simple graphical tool for labeling objects in
# images with boxes.  To see how to use it read the tools/imglab/README.txt
# file.  But for this example, we just use the training.xml file included with
# dlib.
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)

# Now that we have a face detector we can test it.  The first statement tests
# it on the training data.  It will print(the precision, recall, and then)
# average precision.
print("")  # Print blank line to create gap from previous output
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(training_xml_path, "detector.svm")))
# However, to get an idea if it really worked without overfitting we need to
# run it on images it wasn't trained on.  The next line does this.  Happily, we
# see that the object detector works perfectly on the testing images.
# print("Testing accuracy: {}".format(
#    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))

將上述代碼命名爲object_detection_train.py，將代碼中training_xml_path改爲自己的數據集xml文件路徑，進入.py文件所在目錄，執行

# python3 object_detection_train.py

3.3.3 測試部分

測試代碼訓練代碼參考python_examples/train_object_detection.py，如下：

import os
import sys
import glob
import dlib
import cv2

if len(sys.argv) != 2:
    print(
        "Give the path to the examples/faces directory as the argument to this "
        "program. For example, if you are in the python_examples folder then "
        "execute this program by running:\n"
        "    ./train_object_detector.py ../examples/faces")
    exit()
faces_folder = sys.argv[1]

# Now let's use the detector as you would in a normal application.  First we
# will load it from disk.
detector = dlib.simple_object_detector("detector.svm")

# We can look at the HOG filter we learned.  It should look like a face.  Neat!
win_det = dlib.image_window()
win_det.set_image(detector)

# Now let's run the detector over the images in the faces folder and display the
# results.
print("Showing detections on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.jpg")):
    print("Processing file: {}".format(f))
    # img = dlib.load_rgb_image(f)
    img = cv2.imread(f)
    dets = detector(img)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))

    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)
    dlib.hit_enter_to_continue()

將上述代碼命名爲object_detection_test.py，打開VNC客戶端，進入.py文件所在目錄，執行

# python3 object_detection_test.py /home/users/chenzhuo/program/dlib-19-15/examples/faces

3.3.4 優化部分

目前訓練好的人臉檢測器爲正臉檢測器，對側臉的檢測效果較差。
爲了提高人臉檢測的準確性，可以訓練多個人臉檢測器進行人臉預測，比如訓練正臉檢測器、左側臉檢測器、右側臉檢測器等多個檢測器進行組合，使用關鍵操作如下：

image = dlib.load_rgb_image(faces_folder + '/2008_002506.jpg')
detector1 = dlib.fhog_object_detector("detector.svm")
detector2 = dlib.fhog_object_detector("detector.svm")
detectors = [detector1, detector2]
 [boxes, confidences, detector_idxs] = dlib.fhog_object_detector.run_multiple (detectors, image, upsample_num_times=1, adjust_threshold=0.0)
for i in range(len(boxes)):
    print("detector {} found box {} with confidence {}.".format(detector_idxs[i], boxes[i], confidences[i]))

3.4 總結

從上面的訓練操作流程看，dlib庫不僅可以做人臉檢測、識別，還可以做其他物體的檢測、識別等功能。

四 C++訓練自己的模型

4.1 訓練自己的人臉關鍵點檢測器

每一個代碼的程序配置參見1.2.2節。選擇在Release模式下進行項目配置並運行，加快運行速度。

4.1.1 數據集

4.1.2 訓練部分

使用examples/train_shape_predictor_ex.cpp代碼進行項目配置後，命令參數中輸入標註xml文件所在的目錄，點擊調試-開始執行，進行模型的訓練，生成模型文件sp.dat。

load_image_dataset(images_train, face_boxes_train,faces_directory+"\\***.xml");
load_image_dataset(images_test,face_boxes_test, faces_directory+"\\***.xml");

測試誤差：

4.1.3 測試

使用examples/face_landmark_detection_ex.cpp代碼進行項目配置後，在命令參數中輸入生成的模型文件sp.dat的路徑和待檢測的圖片路徑，點擊調試-開始執行，測試結果如下：

4.1.4 優化部分

訓練時可以用多姿態的訓練數據，比如正臉、左側臉、右側臉的標註數據集進行訓練。

4.2 訓練自己的人臉檢測器

4.2.1 數據集

使用imglab工具，給訓練的圖片和測試的圖片標註人臉框，訓練圖片7張，測試圖片5張。生成標註文件train.xml和test.xml。

4.2.2 訓練部分

使用examples/ fhog_object_detector_ex.cpp代碼進行項目配置後，在代碼裏修改以下語句，將自己標註的xml文件名寫入代碼相應位置中。

load_image_dataset(images_train, face_boxes_train,faces_directory+"\\***.xml");
load_image_dataset(images_test,face_boxes_test, faces_directory+"\\***.xml");

點擊調試-開始執行，訓練效果圖如下，結果會生成face_predictor.svm模型文件：

4.2.3 測試

示例中沒提供測試代碼，該部分爲自寫代碼，命名爲face_object_detection：

/*
人臉檢測器測試
*/

#include <dlib/svm_threaded.h>
#include <dlib/gui_widgets.h>
#include <dlib/image_processing.h>
#include <dlib/data_io.h>

#include <iostream>
#include <fstream>


using namespace std;
using namespace dlib;

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv)
{

	try
	{
		// In this example we are going to train a face detector based on the
		// small faces dataset in the examples/faces directory.  So the first
		// thing we do is load that dataset.  This means you need to supply the
		// path to this faces folder as a command line argument so we will know
		// where it is.
		if (argc == 1)
		{
			cout << "Call this program like this:" << endl;
			cout << "./face_detector.svm faces/*.jpg" << endl;
			return 0;
		}

		//定義scanner類型，用於掃描圖片並提取特徵（HOG）
		typedef scan_fhog_pyramid<pyramid_down<6> > image_scanner_type;
		// 加載模型
		object_detector<image_scanner_type> detector;
		deserialize(argv[1]) >> detector;

		//顯示hog
		image_window hogwin(draw_fhog(detector), "Learned fHOG detector");

		// 顯示測試集的人臉檢測結果
		image_window win;
		// Loop over all the images provided on the command line.
		for (int i = 2; i < argc; ++i)
		{
			cout << "processing image " << argv[i] << endl;
			array2d<rgb_pixel> img;
			// 讀取圖片數據
			load_image(img, argv[i]);
			// Make the image larger so we can detect small faces.
			pyramid_up(img);

			// Now tell the face detector to give us a list of bounding boxes
			// around all the faces in the image.
			// 人臉預測
			std::vector<rectangle> dets = detector(img);
			cout << "Number of faces detected: " << dets.size() << endl;
			win.clear_overlay();
			win.set_image(img);
			win.add_overlay(dets, rgb_pixel(255, 0, 0));
			cout << "Hit enter to process the next image..." << endl;
			cin.get();
		} 

	}
	catch (exception& e)
	{
		cout << "\nexception thrown!" << endl;
		cout << e.what() << endl;
	}
	system("pause");
}

在命令參數中輸入生成的模型文件face_predictor.svm的路徑和待檢測的圖片路徑，點擊調試-開始執行，測試結果如下：

4.2.4 優化部分

目前訓練好的人臉檢測器爲正臉檢測器，對側臉的檢測效果較差：

frontal_face_detector detector = get_frontal_face_detector();

訓練多個人臉檢測器進行人臉預測，比如訓練正臉檢測器、左側臉檢測器、右側臉檢測器等多個檢測器進行組合，使用關鍵操作如下：

std::vector<object_detector<image_scanner_type> > my_detectors;
my_detectors.push_back(detector);
std::vector<rectangle> dets = evaluate_detectors(my_detectors, image);