人臉特徵點檢測(Facial landmark detection),即人臉特徵點定位、人臉對齊(Face Alignment),是在人臉檢測的基礎上進行的,對人臉上的特徵點例如嘴角、眼角等進行定位。
人臉特徵點檢測有很多用途,例如:
(1)改善人臉識別:通過人臉特徵點檢測將人臉對齊到平均臉,對齊後的人臉圖像識別算法更加有效。
(2)人臉平均:利用人臉特徵點檢測的結果,將多個人臉進行融合成新的平均人臉。嘗試做了一下2017年巴塞羅那足球俱樂部FCB一線隊所有成員的平均臉,如下圖,哈哈,很帥有木有?!
(3)人臉交換:利用人臉特徵點檢測的結果,對兩張人臉進行無縫換臉,將一張人臉換到另一張上,做了下把貝克漢姆的臉換到梅西上的效果,如下圖所示。
(4)人臉化妝&裝扮:這方面的應用很多,感覺也是最具有商業應用價值的。可以做很多有趣的事情,日常生活中常見的,例如給你的臉上加上貓貓狗狗的小鬍鬚、兔耳朵,塗上腮紅、帶上聖誕帽等裝扮。還有美圖秀秀美妝相機、美顏相機等,例如粉底、腮紅、脣彩、眼影眼線、睫毛、雙眼皮、美瞳、亮眼、大眼、鼻子高挺、自動瘦臉、眉毛等人臉化妝,都是在人臉特徵點檢測的基礎上實現的。不得不說,現在的PS技術很強大,而且還是提供了傻瓜式的,用戶量很大…
上述這些人臉特徵點檢測的應用,說明特徵點的檢測確實很有用很重要。特徵點檢測的又快又準,接下來的工作纔好開展。
論文Facial Landmark Detection by Deep Multi-task Learning對人臉特徵點檢測有很好的效果,如下圖所示,魯棒性很強,但只公佈了演示程序,沒有公佈源碼及提供使用藉口,無法實際使用,且論文實現和訓練起來難度很大。
在Happynear大神github主頁有論文Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks的實現代碼,暫時還沒用到。
Seetaface中科院計算所山世光研究員帶領的人臉識別研究組研發,代碼基於C++實現,不依賴第三方庫,開源免費,其人臉對齊模塊支持5個人臉關鍵點的檢測,其採用的是一種由粗到精的自編碼器網絡(Coarse-to-Fine Auto-encoder Networks, CFAN)來求解這個複雜的非線性映射過程。Dlib庫實現了2014年一篇非常經典的人臉特徵點檢測的論文:Face Alignment at 3000 FPS via Regression Local Binary Features,其人臉特徵點檢測又快又準。深圳大學於仕祺老師公佈的免費的libfacedetect,人臉特徵點檢測也非常快,效果也不錯,和Dlib一樣爲68特徵點檢測,但魯棒性不如Dlib。Seetaface、Dlib和libfacedetect都提供了人臉特徵點檢測的接口。
下面僅介紹三種方式來實現人臉特徵點檢測。
1.級聯迴歸CNN人臉特徵點檢測
2.Dlib人臉特徵點檢測
3.libfacedetect人臉特徵點檢測
4.Seetaface人臉特徵點檢測方法
1.級聯迴歸CNN人臉特徵點檢測
採用該Cascade級聯迴歸CNN方法來定位一個人臉中的5個特徵點,在我的機器上(GTX 1060)耗時7ms,算比較快了(然而,dlib、libfacedetect等做人臉68個特徵點檢測的速度比這都還要快…),目前人臉特徵點檢測的耗時主要還是在之前的要做的人臉檢測上。用caffe訓練網絡實現該方法所用到的數據集樣本、製作數據集和預測特徵點的python腳本打包地址:下載鏈接
人臉特徵點檢測實際上是在人臉檢測的基礎上,在人臉框中預測特徵點的位置。很多人臉數據集都提供了圖像樣本中人臉框的位置及特徵點的座標,我們需要做的是訓練能預測特徵點在人臉框中相對位置的網絡。在實際預測時,我們首先通過人臉檢測方法獲取人臉框位置,然後在人臉框中預測特徵點座標。
卷積神經網絡可以用於分類和迴歸任務,做分類任務時最後一個全連接層的輸出維度爲類別數,接着Softmax層採用Softmax Loss計算損失函數,而如果做迴歸任務,最後一個全連接層的輸出維度則是要回歸的座標值的個數,採用的是歐幾裏何損失Euclidean Loss。
訓練卷積神經網絡來回歸特徵點座標,這裏博主只做了人臉中5個特徵點的檢測(如上圖所示)。如果只採用一個網絡來做迴歸訓練,會發現得到的特徵點座標並不夠準確,爲了更加快速、準確的定位人臉特徵點,採用級聯迴歸CNN的方法,借鑑級聯CNN中的級聯思想,進行分段式特徵點定位,其具體思路爲:
(1)首先在整個人臉圖像(藍色框)上訓練一個網絡來對人臉特徵點座標進行粗迴歸,實際採用的網絡其輸入大小爲39x39的人臉區域灰度圖,預測時可以得到特徵點的大致位置。
(2)設計另一個迴歸網絡,以人臉特徵點周圍的局部區域圖像(紅色框)作爲輸入進行訓練,實際採用的網絡其輸入大小爲15x15的特徵點局部區域灰度圖,以預測到更加準確的特徵點位置。
需要注意的是,由於採用的是歐幾裏何損失,在計算座標時,使用的是相對座標而不是絕對座標,例如,在(1)中使用的是鼻子點在人臉框(藍色框)中的相對座標(0~1),在(2)中使用的是鼻子點在選定的周圍區域框(紅色框)中的相對座標,這樣能夠促進模型收斂,避免網絡訓練發散。
在理解思路後,準備製作數據集並設計或選取網絡了,首先是數據集製作。採用的是MTFL人臉數據庫,在data\face_fp文件夾下,如圖lfw_5590和net_7876文件夾中包括了所有的樣本(包括訓練集和驗證集),訓練集和測試集的標籤文本trainImageList.txt或testImageList.txt中的每一行,依次對應圖像路徑、人臉框座標值和五個特徵點的座標值標籤,具體參照Readme.txt。
在第一階段訓練時,對數據集進行了增廣(只針對訓練集),除了做鏡像之外,還對人臉框位置做了兩組縮放和四組平移(實際檢測時檢測出到的人臉框位置可能不夠準確,爲了克服這種影響,提高泛化能力),然後將圖像中的人臉框區域裁剪出來,並統一縮放到39x39大小,這樣數據增廣了3x5=15倍,會增加訓練耗時,但不影響測試時間。事實證明以上的數據增廣使得第一階段預測的特徵點更加準確,實際上博主還嘗試了對人臉框做兩組隨機的小角度旋轉,但最後對特徵點位置預測的準確性並沒有多大提高。在做數據增廣的時候,對應的特徵點座標也要變換過來,而且要轉化爲相對座標(第一階段是相對人臉框,0~1)。
使用caffe訓練CNN網絡,由於是迴歸問題,多標籤,而lmdb不支持多標籤(雖然可以修改caffe源碼以支持多標籤,但這裏沒有必要),因此使用hdf5格式以支持多標籤,在data\face_fp下的stage1.py腳本可以執行生成第一階段的經過數據增廣的hdf5格式的訓練集和驗證集以及對應的標籤文本,輸出到data\face_fp\1F文件夾下。
# -*- coding: utf-8 -*-
"""
Created on Mon May 15 21:34:35 2017
@author: Administrator
"""
import os
from os.path import join, exists
import cv2
import numpy as np
import h5py
from common_utils import shuffle_in_unison_scary, logger,processImage, getDataFromTxt, BBox
from utils import flip, rotate
import time
###第一階段,大致確定關鍵點位置
TRAIN = './'
OUTPUT = './1_F'
if not exists(OUTPUT):
os.mkdir(OUTPUT)
assert(exists(TRAIN) and exists(OUTPUT))
###生成hdf5文件,訓練集做數據增廣
def generate_hdf5(ftxt, output, mode='train', augment=False): #輸入參數:(原始圖像和關鍵點座標標籤文本,h5文件輸出目錄,h5文件名,是否數據增廣)
data = getDataFromTxt(ftxt) #讀取存放了文件路徑和人臉框及其關鍵點的標籤文本,座標轉換成相對座標,返回讀取結果(圖像完整路徑,人臉框,關鍵點絕對座標)
F_imgs = [] #人臉框圖
F_landmarks = [] #相對座標
if not augment: #如果不做數據增廣
for (imgPath, bbox, landmarkGt) in data:
img = cv2.imread(imgPath)
assert(img is not None) #檢查img是否存在
logger("process %s" % imgPath) #打印信息
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
f_bbox = bbox
f_face = gray[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1] #人臉框圖像
landmarkGt_p = f_bbox.projectLandmark(landmarkGt) #轉換成相對人臉框相對座標
### 原圖
f_face = cv2.resize(f_face, (39, 39))
F_imgs.append(f_face.reshape((1, 39, 39)))
F_landmarks.append(landmarkGt_p.reshape(10))
else:
for (imgPath, bbox, landmarkGt) in data:
img = cv2.imread(imgPath)
assert(img is not None) #檢查img是否存在
logger("process %s" % imgPath) #打印信息
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
height,width = gray.shape
for exp in range(3): #5x3共15種變換,3種外擴
bbox_e = bbox.expand(0.1*exp) #分別往外擴0.0,0.1,0.2
for ori in range(5): #5種平移
if ori == 1:
bbox_s = bbox_e.subBBox(0.1,1.1,0.0,1.0) #向右平移0.1
elif ori == 2:
bbox_s = bbox_e.subBBox(-0.1,0.9,0.0,1.0) #向左平移0.1
elif ori == 3:
bbox_s = bbox_e.subBBox(0.0,1.0,0.1,1.1) #向下平移0.1
elif ori == 4:
bbox_s = bbox_e.subBBox(0.0,1.0,-0.1,0.9) #向上平移0.1
else:
bbox_s = bbox_e
f_bbox = BBox([int(bbox_s.left),int(bbox_s.right),int(bbox_s.top),int(bbox_s.bottom)]) #人臉框
if (f_bbox.top < 0 or f_bbox.left < 0 or f_bbox.bottom + 1 > height or f_bbox.right + 1 > width) : #如果人臉框超出圖像邊界,忽略之
continue
f_face = gray[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1] #人臉框圖像
landmarkGt_p = f_bbox.projectLandmark(landmarkGt) #轉換成相對人臉框相對座標
#水平鏡像
face_flipped, landmark_flipped = flip(f_face, landmarkGt_p) #將人臉框圖像和關鍵點座標同時鏡像
face_flipped = cv2.resize(face_flipped, (39, 39)) #人臉框圖像縮放到統一大小,默認雙線性插值
F_imgs.append(face_flipped.reshape((1, 39, 39))) #opencv讀取的圖像shape爲(h,w,c),轉變爲(c,h,w)
F_landmarks.append(landmark_flipped.reshape(10)) #將5x2的標籤reshape成一維
### 原圖
f_face = cv2.resize(f_face, (39, 39))
F_imgs.append(f_face.reshape((1, 39, 39)))
F_landmarks.append(landmarkGt_p.reshape(10))
length = len(F_imgs)
print 'length = %d' % length
F_imgs, F_landmarks = np.asarray(F_imgs), np.asarray(F_landmarks) #轉化成array
F_imgs = processImage(F_imgs) #圖像預處理:去均值、歸一化
shuffle_in_unison_scary(F_imgs, F_landmarks) #亂序
logger("generate %s" % output) #打印日誌
num = length / 100000
h5files = []
for index in range(num):
suffix = '_%d.h5' % index
h5file = join(output,mode + suffix) #拼接成h5文件全路徑
h5files.append(h5file)
with h5py.File(h5file, 'w') as h5: #以“寫”方式打開h5文件
h5['data'] = F_imgs[index*100000 : (index + 1)*100000].astype(np.float32) #數據轉換成float32類型,存圖像
h5['landmark'] = F_landmarks[index*100000 : (index + 1)*100000].astype(np.float32) #數據轉換成float32類型,存座標標籤
suffix = '_%d.h5' % num
h5file = join(output,mode + suffix) #拼接成h5文件全路徑
h5files.append(h5file)
with h5py.File(h5file, 'w') as h5: #以“寫”方式打開h5文件
h5['data'] = F_imgs[num*100000 : length].astype(np.float32) #數據轉換成float32類型,存圖像
h5['landmark'] = F_landmarks[num*100000 : length].astype(np.float32) #數據轉換成float32類型,存座標標籤
#將h5文件全路徑,存放到文本文件中
with open(join(OUTPUT, mode + '.txt'), 'w') as fd:
for h5file in h5files:
fd.write(h5file + '\n')
if __name__ == '__main__':
np.random.seed(int(time.time())) #seed指定隨機數生成時所用算法開始的整數值,使隨機值的產生隨時間而變化,而不會每次產生的隨機數都相同
# train data
train_txt = join(TRAIN,'trainImageList.txt') #join函數相當於matlab中的fullfile函數,用來連接目錄和文件名,得到完整文件路徑
generate_hdf5(train_txt,OUTPUT,'train',True) #輸入參數:(原始圖像和關鍵點座標標籤文本,h5文件輸出目錄,h5文件名,是否數據增廣)
test_txt = join(TRAIN,'testImageList.txt')
generate_hdf5(test_txt, OUTPUT, 'test')
準備好了第一階段的數據集,下面看第一階段所使用的網絡1_F_train.prototxt,網絡的輸入層大小爲39x39的單通道灰度圖像,最後一個全連接層的輸出維度爲10,代表5個特徵點的座標值,而最後一層使用的是歐幾裏何Euclidean Loss,計算的是網絡預測的座標值與真實值(都是相對值)之間的均方誤差的累積。
# This file gives the CNN model to predict all landmark in Stage1
name: "landmark_1_F"
layer {
name: "hdf5_train_data"
type: "HDF5Data"
top: "data"
top: "landmark"
include {
phase: TRAIN
}
hdf5_data_param {
source: "../../data/face_fp/1_F/train.txt"
batch_size: 128
}
}
layer {
name: "hdf5_test_data"
type: "HDF5Data"
top: "data"
top: "landmark"
include {
phase: TEST
}
hdf5_data_param {
source: "../../data/face_fp/1_F/test.txt"
batch_size: 64
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 4
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 40
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 60
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv4"
type: "Convolution"
bottom: "pool3"
top: "conv4"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 80
kernel_size: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "pool3_flat"
type: "Flatten"
bottom: "pool3"
top: "pool3_flat"
}
layer {
name: "conv4_flat"
type: "Flatten"
bottom: "conv4"
top: "conv4_flat"
}
layer {
name: "concat"
type: "Concat"
bottom: "pool3_flat" ###
bottom: "conv4_flat" ###
top: "faker"
concat_param {
concat_dim: 1
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "faker"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 120
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu_fc1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu_fc2"
type: "ReLU"
bottom: "fc2"
top: "fc2"
}
layer {
name: "error"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "landmark"
top: "error"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "landmark"
top: "loss"
include {
phase: TRAIN
}
}
設置訓練超參數文件1_F_solver.prototxt如下,然後就可以開始訓練了,訓練迭代200000次後,loss就降得就很小了。
net: "./1_F_train.prototxt"
test_iter: 55 #3466/64=55
test_interval: 1000
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
#lr_policy: "step"
#gamma: 0.1
#stepsize: 50000
display: 200
max_iter: 500000
snapshot: 20000
snapshot_prefix: "./1_F/"
test_compute_loss: true
solver_mode: GPU
準備好1_F_deploy.prototxt,我們首先看看只用第一階段訓練來做預測的結果,如下圖所示,可以看到第一階段能夠大致預測到特徵點位置,但仍不夠準確,接下來需要我們進行第二階段的訓練。
第二階段訓練,共5個特徵點,每個特徵點做兩組數據集,即第一組數據集取以特徵點爲中心,局部框大小爲(2*0.18*W,2*0.18*H),其中W、H爲人臉框的大小,並對此局部框做隨機的微小平移使得特徵點在局部框中的位置隨機,裁剪出局部框圖像並統一到15x15大小,第二組數據集和第一組數據集製作過程一樣,只是局部框取得是(2*0.16*W,2*0.16*H)。對每個特徵點,針對這兩組數據集採用同樣的網絡模型進行訓練,可以得到兩組訓練好的模型,在預測時,採取兩組模型預測的均值作爲預測結果,提高預測的準確性。
上述第二階段數據集的製作代碼在stage2.py腳本中,同樣需要注意的是需要將特徵點的座標值標籤變換爲相對於局部框的相對座標(0~1),最後生成hdf5格式的數據集文件及對應的train.txt、test.txt。
import time
from collections import defaultdict
import cv2
import numpy as np
import h5py
from common_utils import logger, createDir, getDataFromTxt, getPatch, processImage
from common_utils import shuffle_in_unison_scary
from utils import randomShiftWithArgument #,randomShift
types = [(0, 'LE1', 0.16),
(0, 'LE2', 0.18),
(1, 'RE1', 0.16),
(1, 'RE2', 0.18),
(2, 'N1', 0.16),
(2, 'N2', 0.18),
(3, 'LM1', 0.16),
(3, 'LM2', 0.18),
(4, 'RM1', 0.16),
(4, 'RM2', 0.18)] #5個關鍵點,兩種padding
for t in types:
d = './2_%s' % t[1]
createDir(d)
def generate(ftxt, mode, augment=False):
"""
Generate Training Data for LEVEL-2
mode = train or test
"""
data = getDataFromTxt(ftxt) #讀取存放了文件路徑和人臉框及其關鍵點的標籤文本,座標轉換成相對座標,返回讀取結果(圖像完整路徑,人臉框,關鍵點絕對座標)
trainData = defaultdict(lambda: dict(patches=[], landmarks=[])) #數據字典
for (imgPath, bbox, landmarkGt) in data:
img = cv2.imread(imgPath, cv2.CV_LOAD_IMAGE_GRAYSCALE) #讀取灰度圖像
assert(img is not None) #檢查圖像是否存在
logger("process %s" % imgPath)
landmarkGt_p = bbox.projectLandmark(landmarkGt) #絕對座標投影到相對於人臉框的相對座標
landmarkPs = randomShiftWithArgument(landmarkGt_p, 0.05, 2) #對關鍵點相對座標的位置做2組隨機平移,得到2組“新的關鍵點”,0.05表示關鍵點相對於人臉框相對座標的最大平移度
if not augment:
landmarkPs = [landmarkPs[0]] #測試集只做一組隨機平移
for landmarkP in landmarkPs: #對做的2組隨機平移,將所有局部框圖像和關鍵點相對於局部框的相對座標送入到數據字典trainData
for idx, name, padding in types: #對每個關鍵點和padding
patch, patch_bbox = getPatch(img, bbox, landmarkP[idx], padding) #根據隨機平移過的關鍵點相對座標和padding得到局部框圖像和局部框
patch = cv2.resize(patch, (15, 15)) #局部框圖像縮放到15x15
patch = patch.reshape((1, 15, 15)) #每個patch爲c,h,w,append之後就變成了n,c,h,w
trainData[name]['patches'].append(patch)
_ = patch_bbox.project(landmarkGt[idx]) #‘真’關鍵點相對人臉框相對座標反投影到絕對座標,再投影到局部框得到相對局部框的相對座標
trainData[name]['landmarks'].append(_)
for idx, name, padding in types:
logger('writing training data of %s'%name)
patches = np.asarray(trainData[name]['patches']) #從數據字典中取出
landmarks = np.asarray(trainData[name]['landmarks'])
patches = processImage(patches) #預處理,去均值、歸一化
shuffle_in_unison_scary(patches, landmarks) #亂序
with h5py.File('./2_%s/%s.h5'%(name, mode), 'w') as h5: #生成mode.h5(train/test)
h5['data'] = patches.astype(np.float32)
h5['landmark'] = landmarks.astype(np.float32)
with open('./2_%s/%s.txt'%(name, mode), 'w') as fd: #生成mode.txt(train/test),寫入h5文件路徑
fd.write('./2_%s/%s.h5'%(name, mode))
if __name__ == '__main__':
np.random.seed(int(time.time())) #seed指定隨機數生成時所用算法開始的整數值,使隨機值的產生隨時間而變化,而不會每次產生的隨機數都相同
# trainImageList.txt
generate('./trainImageList.txt', 'train', augment=True) #生成train.h5和train.txt,訓練集做數據增強(實際上只是多做了一組隨機平移)
# testImageList.txt
generate('./testImageList.txt', 'test') #生成test.h5和test.txt
# Done
總共5個特徵點,每個特徵點使用了兩種數據集,使用的是同一個網絡,最終訓練得到10個模型。以下爲第二階段左眼第一組數據集的訓練模型2_LE1_train.prototxt,其它訓練網絡只需修改數據集路徑即可。
# This file gives the CNN model to predict landmark in Stage2
name: "landmark_2_LE1"
layer {
name: "hdf5_train_data"
type: "HDF5Data"
top: "data"
top: "landmark"
include {
phase: TRAIN
}
hdf5_data_param {
source: "../../data/face_fp/2_LE1/train.txt"
batch_size: 64
}
}
layer {
name: "hdf5_test_data"
type: "HDF5Data"
top: "data"
top: "landmark"
include {
phase: TEST
}
hdf5_data_param {
source: "../../data/face_fp/2_LE1/test.txt"
batch_size: 64
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 4
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 40
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 60
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu_fc1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu_fc2"
type: "ReLU"
bottom: "fc2"
top: "fc2"
}
layer {
name: "error"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "landmark"
top: "error"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "landmark"
top: "loss"
include {
phase: TRAIN
}
}
對應的網絡超參數文件2_LE1_solver.prototxt,需要注意的是對不同特徵點,可能需要嘗試不同的初始學習率來使得模型更好的收斂。因此,需要訓練10個小網絡,還是挺繁瑣的…
net: "./2_LE1_train.prototxt"
test_iter: 55 #3466/64=55
test_interval: 1000
base_lr: 0.005
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
#lr_policy: "step"
#gamma: 0.1
#stepsize: 50000
display: 200
max_iter: 100000
snapshot: 20000
snapshot_prefix: "./2_LE1/"
test_compute_loss: true
solver_mode: GPU
接下來就可以開始訓練,訓練迭代100000次後,loss也降得差不多了。然後接着把剩下的9個都訓練完,注意可能要調下學習率,batchsize不用調。
然後準備好預測用的2_LE1_deploy.prototxt,剩下9個deploy.prototxt與其完全一致。現在可以來看看級聯後的特徵點預測結果了,如圖所示,可以看到預測結果更加準確了,但魯棒性還不夠強。
如果採取更大的網絡,特徵點的預測會更加準確魯棒,但耗時多,爲了在速度和性能上做找到平衡點,使用較小的網絡,並採用級聯的思想,先進行粗檢測,然後微調特徵點位置。
下面是最終預測人臉特徵點的landmarks_detection.py,其中人臉檢測採用的是級聯CNN或者opencv人臉檢測,在人臉檢測的基礎上預測人臉特徵點位置,並將預測的相對位置轉換成圖像上的絕對座標。
#coding:utf-8
import os
from os.path import join
import cv2
import caffe
import numpy as np
from face_detection_functions import *
from load_model_functions import *
import time
#定義一個CNN類,初始化網絡,以及前向傳播返回結果
class CNN(object):
"""
Generalized CNN for simple run forward with given Model
"""
def __init__(self, net, model):
self.net = net
self.model = model
self.cnn = caffe.Net(net, model, caffe.TEST) # failed if not exists
def forward(self, data, layer='fc2'):
print data.shape
fake = np.zeros((len(data), 1, 1, 1))
self.cnn.set_input_arrays(data.astype(np.float32), fake.astype(np.float32)) #指定一塊連續的數據
self.cnn.forward() #前向傳播
result = self.cnn.blobs[layer].data[0] #獲取指定layer結果
t = lambda x: np.asarray([np.asarray([x[2*i], x[2*i+1]]) for i in range(len(x)/2)]) #定義匿名函數t,將輸入的10x1座標數組轉換成5x2矩陣
result = t(result)
return result
class BBox(object): #BoundingBox類
"""
Bounding Box of face
"""
def __init__(self, bbox):
self.left = int(bbox[0])
self.right = int(bbox[1])
self.top = int(bbox[2])
self.bottom = int(bbox[3])
self.x = bbox[0]
self.y = bbox[2]
self.w = bbox[1] - bbox[0]
self.h = bbox[3] - bbox[2]
def expand(self, scale=0.05): #向外擴展
bbox = [self.left, self.right, self.top, self.bottom]
bbox[0] -= int(self.w * scale)
bbox[1] += int(self.w * scale)
bbox[2] -= int(self.h * scale)
bbox[3] += int(self.h * scale)
return BBox(bbox)
def project(self, point): #投影變換,將點座標轉換爲相對於BBox框的相對座標
x = (point[0]-self.x) / self.w
y = (point[1]-self.y) / self.h
return np.asarray([x, y])
def reproject(self, point): #投影逆變換,將點相對於BBox框的相對座標轉換成點的絕對座標值
x = self.x + self.w*point[0]
y = self.y + self.h*point[1]
return np.asarray([x, y])
def reprojectLandmark(self, landmark): #投影逆變換,將所有關鍵點相對於BBox框的相對座標轉換成點的絕對座標值
print len(landmark)
if not len(landmark) == 5:
landmark = landmark[0]
p = np.zeros((len(landmark), 2))
for i in range(len(landmark)):
p[i] = self.reproject(landmark[i])
return p
def projectLandmark(self, landmark): #投影變換,將所有點座標轉換爲相對於BBox框的相對座標
p = np.zeros((len(landmark), 2))
for i in range(len(landmark)):
p[i] = self.project(landmark[i])
return p
def subBBox(self, leftR, rightR, topR, bottomR):
leftDelta = self.w * leftR
rightDelta = self.w * rightR
topDelta = self.h * topR
bottomDelta = self.h * bottomR
left = self.left + leftDelta
right = self.left + rightDelta
top = self.top + topDelta
bottom = self.top + bottomDelta
return BBox([left, right, top, bottom])
def cropImage(self, img): #根據BBox返回裁剪圖像
"""
crop img with left,right,top,bottom
**Make Sure is not out of box**
"""
return img[self.top:self.bottom+1, self.left:self.right+1]
class Landmarker(object):
"""
class Landmarker wrapper functions for predicting facial landmarks
"""
def __init__(self):
"""
Initialize Landmarker with files under VERSION
"""
#model_path = join(PROJECT_ROOT, VERSION)
deploy_path = "../../models/face_fp"
model_path = "../../models/face_fp"
CNN_TYPES = ['LE1', 'RE1', 'N1', 'LM1', 'RM1', 'LE2', 'RE2', 'N2', 'LM2', 'RM2']
level1 = [(join(deploy_path, '1_F_deploy.prototxt'), join(model_path, '1_F/_iter_200000.caffemodel'))]
level2 = [(join(deploy_path, '2_%s_deploy.prototxt'%name), join(model_path, '2_%s/_iter_100000.caffemodel'%name)) \
for name in CNN_TYPES]
self.level1 = [CNN(p, m) for p, m in level1] #初始化第一階段網絡
self.level2 = [CNN(p, m) for p, m in level2] #初始化第二階段網絡
def detectLandmark(self, image, bbox):
"""
Predict landmarks for face with bbox in image
apply level-1 and level-2
"""
#if not isinstance(bbox, BBox) or image is None:
#return None, False
face = bbox.cropImage(image) #裁剪出人臉框圖像
#face = image
#print face.shape
face = cv2.resize(face, (39, 39)) #縮放人臉框圖像到39x39
#print face.shape
face = face.reshape((1, 1, 39, 39)) #人臉框圖像數據矩陣->[n,c,h,w]
face = self._processImage(face) #人臉框圖像預處理,歸一化
# level-1, only F in implemented
landmark = self.level1[0].forward(face) #第一階段,直接調用CNN類Level1[0]的前向傳播函數,返回第一階段迴歸結果(相對人臉框的相對座標)
# level-2
landmark = self._level(image, bbox, landmark, self.level2, [0.16, 0.18]) #第二階段,根據padding和前一階段的關鍵點回歸結果,重新取人臉框中的局部框,繼續迴歸
return landmark
def _level(self, img, bbox, landmark, cnns, padding):
"""
LEVEL-?
"""
for i in range(5): #五個關鍵點
x, y = landmark[i] #獲取上一階段預測結果的關鍵點座標
patch, patch_bbox = self._getPatch(img, bbox, (x, y), padding[0]) #根據第一種padding獲取局部框圖像patch、局部框patch_bbox
patch = cv2.resize(patch, (15, 15)).reshape((1, 1, 15, 15)) #局部小窗口框縮放到15x15
patch = self._processImage(patch) #預處理,歸一化
d1 = cnns[i].forward(patch) #第一種padding每個關鍵點對應的網絡,前向傳播,返回的是相對於局部框的相對座標
patch, patch_bbox = self._getPatch(img, bbox, (x, y), padding[1]) #根據第二種padding獲取局部框圖像patch、局部框patch_bbox
patch = cv2.resize(patch, (15, 15)).reshape((1, 1, 15, 15))
patch = self._processImage(patch)
d2 = cnns[i+5].forward(patch) #第二種padding每個關鍵點對應的網絡,前向傳播,返回的是相對於局部框的相對座標
d1 = bbox.project(patch_bbox.reproject(d1[0])) #對第一padding,相對局部框patch_size的相對座標->絕對座標->相對於人臉的相對座標
d2 = bbox.project(patch_bbox.reproject(d2[0])) #對第一padding,相對局部框patch_size的相對座標->絕對座標->相對於人臉的相對座標
landmark[i] = (d1 + d2) / 2
return landmark
def _getPatch(self, img, bbox, point, padding): #根據相對座標和padding獲取局部框圖像patch、局部框patch_bbox
"""
Get a patch iamge around the given point in bbox with padding
point: relative_point in [0, 1] in bbox
"""
point_x = bbox.x + point[0] * bbox.w
point_y = bbox.y + point[1] * bbox.h
patch_left = point_x - bbox.w * padding
patch_right = point_x + bbox.w * padding
patch_top = point_y - bbox.h * padding
patch_bottom = point_y + bbox.h * padding
patch = img[patch_top: patch_bottom+1, patch_left: patch_right+1]
patch_bbox = BBox([patch_left, patch_right, patch_top, patch_bottom])
return patch, patch_bbox #返回局部框圖像patch、局部框patch_bbox
def _processImage(self, imgs): #預處理,歸一化
"""
process images before feeding to CNNs
imgs: N x 1 x W x H
"""
imgs = imgs.astype(np.float32)
for i, img in enumerate(imgs):
m = img.mean()
s = img.std()
imgs[i] = (img - m) / s
return imgs
def drawLandmark(img, landmark):
for x, y in landmark:
cv2.circle(img, (int(x), int(y)), 2, (0,255,0), -1)
return img
#利用opencv的harr + adaboost人臉檢測算法進行人臉檢測
def detectFaces(cascadeCls,img):
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.equalizeHist(gray)
faces = cascadeCls.detectMultiScale(gray,1.1,3,0,(64,64),(256,256)) #多尺度人臉檢測
return faces
if __name__ == '__main__':
#cascade級聯CNN人臉檢測+分段式特徵點檢測
# ================== load models ======================================
net_12c_full_conv, net_12_cal, net_24c, net_24_cal, net_48c, net_48_cal = load_face_models(loadNet=True)
nets = (net_12c_full_conv, net_12_cal, net_24c, net_24_cal, net_48c, net_48_cal)
min_face_size = 48
stride = 5
get_landmark = Landmarker()
result_folder = './result-folder/'
test_folder = './test-folder/'
test_images = os.listdir(test_folder)
start_time = time.time()
for test_image in test_images:
imgPath = test_folder + test_image
img = cv2.imread(imgPath)
assert(img is not None)
print 'imgPath: %s' % imgPath
print img.shape
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img_forward = np.array(img,dtype=np.float32)
img_forward -= np.array((104,117,123)) #去均值,級聯CNN訓練時減去的是ImageNet數據集均值
rects = detect_faces_net(nets,img_forward,min_face_size,stride,True,1.414,0.85) #調用級聯CNN人臉檢測方法
for rect in rects:
cv2.rectangle(img,(rect[0],rect[1]),(rect[2],rect[3]),(255,0,0),2)
bbox = BBox([rect[0],rect[2],rect[1],rect[3]])
final_landmark = get_landmark.detectLandmark(gray,bbox)
final_landmark = bbox.reprojectLandmark(final_landmark)
img = drawLandmark(img,final_landmark)
cv2.imwrite(result_folder + test_image,img)
end_time = time.time()
print 'the time of face detection and feature points location per image:',(end_time - start_time)*1000/len(test_images),'ms'
'''
###opencv(harr+adaboost)視頻中人臉檢測 + 分段式特徵點提取
xmlPath = 'D:/OPENCV2.4.9/opencv/sources/data/haarcascades/haarcascade_frontalface_alt2.xml'
cascadeCls = cv2.CascadeClassifier(xmlPath) #加載xml人臉檢測文件,獲取CascadeClassifier對象
get_landmark = Landmarker() #定義一個關鍵點類
video = cv2.VideoCapture('himetan.avi')
if video.isOpened():
success,frame = video.read()
while success:
faces = detectFaces(cascadeCls,frame)
if len(faces) == 0:
cv2.imshow('image',frame)
else:
img = frame.copy()
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
for face in faces:
bbox = BBox([face[0],face[0]+face[2],face[1],face[1]+face[3]])
cv2.rectangle(img,(bbox.left,bbox.top),(bbox.right,bbox.bottom),(255,0,0),2)
final_landmark= get_landmark.detectLandmark(gray, bbox) #調用關鍵點檢測函數,返回檢測到的相對座標
final_landmark = bbox.reprojectLandmark(final_landmark) #反投影得到檢測到的關鍵點絕對座標
img = drawLandmark(img, final_landmark) #在圖像上標出所有關鍵點
cv2.imshow('image',img)
if cv2.waitKey(1) > 0:
break
success,frame = video.read()
video.release()
cv2.destroyAllWindows()
'''
'''
###opencv(harr+adaboost)文件夾下圖像中人臉檢測 + 分段式特徵點提取
xmlPath = 'D:/OPENCV2.4.9/opencv/sources/data/haarcascades/haarcascade_frontalface_alt2.xml'
cascadeCls = cv2.CascadeClassifier(xmlPath) #加載xml人臉檢測文件,獲取CascadeClassifier對象
result_folder = './result-folder/'
test_folder = './test-folder/'
test_images = os.listdir(test_folder)
get_landmark = Landmarker() #定義一個關鍵點類
start_time = time.time()
for image in test_images:
img = cv2.imread(test_folder+image)
#bbox = BBox([320,391,55,152]) #人臉框位置,left,right,top,bottom
faces = detectFaces(cascadeCls,img)
if len(faces) == 0:
break
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
for face in faces:
bbox = BBox([face[0],face[0] + face[2],face[1],face[1] + face[3]])
cv2.rectangle(img, (bbox.left, bbox.top), (bbox.right, bbox.bottom), (0,0,255), 2)
#cv2.resize(gray,(256,256))
final_landmark= get_landmark.detectLandmark(gray, bbox) #調用關鍵點檢測函數,返回檢測到的相對座標
final_landmark = bbox.reprojectLandmark(final_landmark) #反投影得到檢測到的關鍵點絕對座標
img = drawLandmark(img, final_landmark) #在圖像上標出所有關鍵點
#cv2.imwrite(result_folder+'level1-'+image, img)
#cv2.imwrite(result_folder+'level2-'+image, img)
end_time = time.time()
print 'the time of face detection and feature points location per image:',(end_time - start_time)*1000/len(test_images),'ms'
'''