【dlib代碼解讀】人臉關鍵點檢測器的訓練

1. 源代碼

先給出測試的結果，關鍵點並不是特別準，原因是訓練樣本數據量太少。

以下給出完整的人臉關鍵點檢測器訓練代碼。詳細的代碼解讀請看第二部分。

/* faceLandmarksTrain.cpp
function:藉助dlib訓練自己的人臉關鍵點檢測器(參考dlib/examples/train_shape_predictor_ex)
date:2016/11/6
author:Elaine_Bao
*/

#include <dlib/image_processing.h>
#include <dlib/data_io.h>
#include <iostream>

using namespace dlib;
using namespace std;

// ----------------------------------------------------------------------------------------
//獲取兩眼間距離，輸出D[i][j]表示objects[i][j]中人臉的兩眼間距離
std::vector<std::vector<double> > get_interocular_distances(
    const std::vector<std::vector<full_object_detection> >& objects
    );

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv)
{
    try
    {
        //一、preprocessing
        //1. 載入訓練集，測試集
        const std::string faces_directory = "faces";
        dlib::array<array2d<unsigned char> > images_train, images_test;
        std::vector<std::vector<full_object_detection> > faces_train, faces_test;

        load_image_dataset(images_train, faces_train, faces_directory + "/training_with_face_landmarks.xml");
        load_image_dataset(images_test, faces_test, faces_directory + "/testing_with_face_landmarks.xml");

        // 二、training
        //1. 定義trainer類型
        shape_predictor_trainer trainer;
        //設置訓練參數
        trainer.set_oversampling_amount(300); 
        trainer.set_nu(0.05);
        trainer.set_tree_depth(2);
        trainer.be_verbose();

        // 2. 訓練，生成人臉關鍵點檢測器
        shape_predictor sp = trainer.train(images_train, faces_train);


        // 三、測試
        cout << "mean training error: " <<
            test_shape_predictor(sp, images_train, faces_train, get_interocular_distances(faces_train)) << endl;
        cout << "mean testing error:  " <<
            test_shape_predictor(sp, images_test, faces_test, get_interocular_distances(faces_test)) << endl;

        // 四、存儲
        serialize("sp.dat") << sp;
    }
    catch (exception& e)
    {
        cout << "\nexception thrown!" << endl;
        cout << e.what() << endl;
    }
}

// ----------------------------------------------------------------------------------------

double interocular_distance(
    const full_object_detection& det
    )
{
    dlib::vector<double, 2> l, r;
    double cnt = 0;
    // Find the center of the left eye by averaging the points around 
    // the eye.
    for (unsigned long i = 36; i <= 41; ++i)
    {
        l += det.part(i);
        ++cnt;
    }
    l /= cnt;

    // Find the center of the right eye by averaging the points around 
    // the eye.
    cnt = 0;
    for (unsigned long i = 42; i <= 47; ++i)
    {
        r += det.part(i);
        ++cnt;
    }
    r /= cnt;

    // Now return the distance between the centers of the eyes
    return length(l - r);
}

std::vector<std::vector<double> > get_interocular_distances(
    const std::vector<std::vector<full_object_detection> >& objects
    )
{
    std::vector<std::vector<double> > temp(objects.size());
    for (unsigned long i = 0; i < objects.size(); ++i)
    {
        for (unsigned long j = 0; j < objects[i].size(); ++j)
        {
            temp[i].push_back(interocular_distance(objects[i][j]));
        }
    }
    return temp;
}

// ----------------------------------------------------------------------------------------

2. 代碼解讀 step by step

2.1 預處理階段

2.1.1 載入訓練集、測試集

const std::string faces_directory = "faces";
dlib::array<array2d<unsigned char> > images_train, images_test;
std::vector<std::vector<full_object_detection> > faces_train, faces_test;

load_image_dataset(images_train, faces_train, faces_directory + "/training_with_face_landmarks.xml");
load_image_dataset(images_test, faces_test, faces_directory + "/testing_with_face_landmarks.xml");

訓練集和測試集圖片存儲在”faces”文件夾下，另外該文件夾下還需包含training_with_face_landmarks.xml，testing_with_face_landmarks.xml，包含圖片中人臉bounding box的位置、68個人臉關鍵點的位置。組織形式爲：

其中faces_train, faces_test的類型中包含full_object_detection，其原型如下（在full_object_detection.h中定義），其中rect存儲的是人臉bounding box的位置，parts則存儲68個關鍵點的位置。

full_object_detection(
            const rectangle& rect_,
            const std::vector<point>& parts_
        ) : rect(rect_), parts(parts_) {}

2.2 訓練階段

2.2.1 定義trainer，用於訓練人臉關鍵點檢測器

shape_predictor_trainer trainer;
//設置訓練參數
trainer.set_oversampling_amount(300); 
trainer.set_nu(0.05);
trainer.set_tree_depth(2);
trainer.be_verbose();

人臉關鍵點檢測器的算法原理主要來自於文章[1]中的方法。簡單地說就是通過多級級聯的迴歸樹進行關鍵點回歸，在[1]中表述爲如下式子：

其中S^(t) 表示第t級迴歸器的形狀，rt 表示第t級迴歸器的更新量，更新策略採用GBDT，即每級迴歸器學習的都是當前形狀與groundtruth形狀的殘差。
其中訓練器shape_predictor_trainer在shape_predictor.h中定義如下：

shape_predictor_trainer (
)
{
    _cascade_depth = 10;
    _tree_depth = 4;
    _num_trees_per_cascade_level = 500;
    _nu = 0.1;
    _oversampling_amount = 20;
    _feature_pool_size = 400;
    _lambda = 0.1;
    _num_test_splits = 20;
    _feature_pool_region_padding = 0;
    _verbose = false;
}

逐項解釋每個參數的意思：
(1) _cascade_depth: 表示級聯的級數，默認爲10級級聯。
(2) _tree_depth: 樹深，則樹的葉子節點個數爲2(_tree_depth) 個。
(3) _num_trees_per_cascade_level: 每個級聯包含的樹的數目，默認每級500棵樹。則整個模型中樹的總數爲_cascade_depth * _num_trees_per_cascade_level，默認爲5000棵樹。

(4) _nu:正則項，nu越大，表示對訓練樣本fit越好，當然也越有可能發生過擬合。_nu取值範圍(0,1]，默認取0.1。

(5) _oversampling_amount:通過對訓練樣本進行隨機變形擴大樣本數目。比如你原來有N張訓練圖片，通過該參數的設置，訓練樣本數將變成N*_oversampling_amount張。所以通常該值越大越好，只是訓練耗時也會越久。

(6) _feature_pool_size:在每級級聯中，我們從圖片中隨機採樣_feature_pool_size個pixel用來作爲訓練迴歸樹的特徵池，這種稀疏的採樣能夠保證複雜度相比於從原圖像所有pixel中進行訓練的複雜度要低。當然該參數值越大通常精度越高，只是訓練耗時也會越久。_feature_pool_size取值範圍>1。

(7) _lambda:在迴歸樹中是否分裂節點是通過計算pixel pairs的強度差是否滿足閾值來決定的。如下式所示，如果所選的pixel pairs的強度大於閾值，則表示迴歸樹需要進一步分裂。

這些pixel pairs是通過在上述特徵池中隨機採樣得到的，傾向於選擇鄰近的pixels。這個_lambda就是控制選擇pixel的遠近程度的，值小表示傾向於選擇離得近的pixel,值大表示並不太在意是否選取鄰近的pixel pairs。_lambda取值範圍(0,1)。

(8) _num_test_splits:如何分裂節點？在生成迴歸樹時我們在每個節點隨機生成_num_test_splits個可能的分裂，然後從中選取最佳的分裂。該參數值越大結果越精確，只是訓練耗時也會越久。

(9) _feature_pool_region_padding:當我們要從圖像中隨機採樣pixel來構建特徵池時，我們會在training landmarks周圍_feature_pool_region_padding範圍內進行特徵採樣。當_feature_pool_region_padding=0時，則表示在landmark的1*1 box內採樣。

通過以上對參數的理解我們基本可以知道每個參數設什麼值合適。例如在本例中，選擇設置_oversampling_amount=300,這是因爲我們的訓練樣本很少，通過oversampling來增加樣本量。對_nu和_tree_depth的設置也是爲了防止過擬合。

2.2.2 訓練，生成人臉關鍵點檢測器

shape_predictor sp = trainer.train(images_train, faces_train);

訓練過程則是通過GBDT建立各級迴歸樹。

2.3 測試

cout << "mean training error: " <<
            test_shape_predictor(sp, images_train, faces_train, get_interocular_distances(faces_train)) << endl;
cout << "mean testing error:  " <<
            test_shape_predictor(sp, images_test, faces_test, get_interocular_distances(faces_test)) << endl;

//模型存儲
serialize("sp.dat") << sp;

這樣就可以將sp.dat用於其他圖片的人臉關鍵點檢測了。具體使用方法可以參見dlib/examples/face_landmark_detection_ex.cpp。

[1] One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan, CVPR 2014.

Elaine_Bao

發佈了53 篇原創文章 · 獲贊 266 · 訪問量 52萬+

私信關注

【dlib代碼解讀】人臉關鍵點檢測器的訓練

1. 源代碼

2. 代碼解讀 step by step

2.1 預處理階段

2.1.1 載入訓練集、測試集

2.2 訓練階段

2.2.1 定義trainer，用於訓練人臉關鍵點檢測器

2.2.2 訓練，生成人臉關鍵點檢測器

2.3 測試

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

人臉檢測中的bounding box regression詳解

基於docker的caffe環境搭建與使用示例

physical CPU vs logical CPU vs Core vs Thread vs Socket

【論文筆記】物體檢測與分割系列 Deformable Convolution Network

【論文筆記】CNN圖像分類Tricks合集

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結