頭部姿態估計原理及可視化

一、簡述

頭部姿態估計（Head Pose Estimation ）：通過一幅面部圖像來獲得頭部的姿態角. 在3D 空間中，表示物體的旋轉可以由三個歐拉角(Euler Angle)來表示：分別計算 pitch(圍繞X軸旋轉)，yaw(圍繞Y軸旋轉) 和 roll(圍繞Z軸旋轉) ，分別學名俯仰角、偏航角和滾轉角，通俗講就是擡頭、搖頭和轉頭。百聞不如一見，上示意圖：

二、原理

若對相機標定熟悉的話，就比較好理解，因爲 Head Pose Estimation 比較有難度的部分已經被大牛們搞定了，一種比較經典的 Head Pose Estimation 算法的步驟一般爲：

（1）2D人臉關鍵點檢測；（2）3D人臉模型匹配；（3）求解3D點和對應2D點的轉換關係；（4）根據旋轉矩陣求解歐拉角。

衆所周知一個物體相對於相機的姿態可以使用旋轉矩陣和平移矩陣來表示：

（1）平移矩陣：物體相對於相機的空間位置關係矩陣，用T表示；

（2）旋轉矩陣：物體相對於相機的空間姿態關係矩陣，用R表示.

看來必然少不了座標系轉換，分別是：世界座標系(UVW)、相機座標系(XYZ)、圖像中心座標系(uv)和像素座標系(xy)，如下圖：

世界座標系到相機座標系：

相機座標系到像素座標系：

像素座標系和世界座標系的關係如下：

上式的求解可用DLT(Direct Linear Transform)算法結合最小二乘進行迭代求解, 相機總有點瑕疵，比如徑向和切向畸變，那關係就要稍微複雜一些，相機座標系要先轉換到圖像中心座標系, 圖像中心座標系到像素座標系:

確定pose就是：確定從3D model到圖片中人臉的仿射變換矩陣，它包含旋轉和平移的信息；看來只要知道世界座標系內點的位置、像素座標位置和相機參數就可以搞定旋轉和平移矩陣，關係分明是非線性的，其實OpenCV已經給我們提供了求解PnP問題的函數solvePnp()，它的輸出結果包括旋轉向量(roatation vector)和平移向量(translation vector)，只關心旋轉信息，所以主要將對 roatation vector進行操作。

得到旋轉矩陣後，就可以得到歐拉角了。rotation vector 是物體旋轉信息的表示方式之一，是OpenCV常用的表示方式。除了rotation vector還有歐拉角(Euler angle)、旋轉矩陣(Rotation Matrix)、方向餘弦矩陣(Direction Cosine Matrix)、四元數(Quaternion) 和軸-角表示(Axis-Angle)。因爲我需要的是歐拉角，所以這裏只介紹將rotation vector 轉換爲歐拉角的方法。

三維空間的任意旋轉，都可以用繞三維空間的某個軸旋轉過某個角度來表示，即Axis-Angle表示方法。Axis可用一個三維向量(x,y,z)來表示，theta可以用一個角度值來表示，直觀來講，一個四維向量(theta,x,y,z)就可以表示出三維空間任意的旋轉。
注意，這裏的三維向量(x,y,z)只是用來表示axis的方向朝向，因此更緊湊的表示方式是用一個單位向量來表示方向axis，而用該三維向量的長度來表示角度值theta。這樣以來，可以用一個三維向量(theta*x,theta*y, theta*z)就可以表示出三維空間任意的旋轉，前提是其中(x,y,z)是單位向量，這就是旋轉向量(Rotation Vector)的表示方式。

四元數(Quaternion)也是一種常用的旋轉表示方式。從四元數轉換到歐拉角公式較簡單，所以我先將rotation vector轉換爲四元數。假設(x,y,z)是axis方向的單位向量，theta是繞axis轉過的角度，那麼四元數可以表示爲：

四元數到歐拉角的轉換公式如下：

arctan和arcsin的結果爲[-pi/2,pi/2]，不能覆蓋所有的歐拉角，因此採用atan2代替arctan：

三、代碼實現

import cv2
import math
import numpy as np


def face_orientation(frame, landmarks):
    size = frame.shape  # (height, width, color_channel)

    image_points = np.array([
                            (landmarks[4], landmarks[5]),     # Nose tip
                            (landmarks[10], landmarks[11]),   # Chin
                            (landmarks[0], landmarks[1]),     # Left eye left corner
                            (landmarks[2], landmarks[3]),     # Right eye right corne
                            (landmarks[6], landmarks[7]),     # Left Mouth corner
                            (landmarks[8], landmarks[9])      # Right mouth corner
                        ], dtype="double")




    model_points = np.array([
                            (0.0, 0.0, 0.0),             # Nose tip
                            (0.0, -330.0, -65.0),        # Chin
                            (-165.0, 170.0, -135.0),     # Left eye left corner
                            (165.0, 170.0, -135.0),      # Right eye right corne
                            (-150.0, -150.0, -125.0),    # Left Mouth corner
                            (150.0, -150.0, -125.0)      # Right mouth corner
                        ])

    # Camera internals

    center = (size[1]/2, size[0]/2)
    focal_length = center[0] / np.tan(60/2 * np.pi / 180)   # 焦距
    camera_matrix = np.array(
                         [[focal_length, 0, center[0]],
                         [0, focal_length, center[1]],
                         [0, 0, 1]], dtype="double"
                         )

    dist_coeffs = np.zeros((4, 1))  # Assuming no lens distortion   （距離係數/假設沒有鏡頭失真）
    # 計算旋轉矩陣rotation_vector和平移矩陣translation_vector
    (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.CV_ITERATIVE)


    axis = np.float32([[500, 0, 0],
                          [0, 500, 0],
                          [0, 0, 500]])

    imgpts, jac = cv2.projectPoints(axis, rotation_vector, translation_vector, camera_matrix, dist_coeffs)
    modelpts, jac2 = cv2.projectPoints(model_points, rotation_vector, translation_vector, camera_matrix, dist_coeffs)
    rvec_matrix = cv2.Rodrigues(rotation_vector)[0]

    proj_matrix = np.hstack((rvec_matrix, translation_vector))
    eulerAngles = cv2.decomposeProjectionMatrix(proj_matrix)[6]


    pitch, yaw, roll = [math.radians(_) for _ in eulerAngles]


    pitch = math.degrees(math.asin(math.sin(pitch)))
    roll = -math.degrees(math.asin(math.sin(roll)))
    yaw = math.degrees(math.asin(math.sin(yaw)))

    return imgpts, modelpts, (str(int(roll)), str(int(pitch)), str(int(yaw))), (landmarks[4], landmarks[5])

f = open('/home/jerry/Documents/test/test/landmark.txt','r')
for line in iter(f):
    img_info = line.split(' ')
    img_path = img_info[0]
    frame = cv2.imread(img_path)
    landmarks = map(int, img_info[1:])

    print(img_path)
    imgpts, modelpts, rotate_degree, nose = face_orientation(frame, landmarks)

    cv2.line(frame, nose, tuple(imgpts[1].ravel()), (0, 255, 0), 3)  # GREEN
    cv2.line(frame, nose, tuple(imgpts[0].ravel()), (255, 0, 0), 3)  # BLUE
    cv2.line(frame, nose, tuple(imgpts[2].ravel()), (0, 0, 255), 3)  # RED

    remapping = [2,3,0,4,5,1]
    for index in range(len(landmarks)/2):
        random_color = tuple(np.random.random_integers(0,255,size=3))

        cv2.circle(frame, (landmarks[index*2], landmarks[index*2+1]), 5, random_color, -1)
        cv2.circle(frame,  tuple(modelpts[remapping[index]].ravel().astype(int)), 2, random_color, -1)


#    cv2.putText(frame, rotate_degree[0]+' '+rotate_degree[1]+' '+rotate_degree[2], (10, 30),
#                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0),
#                thickness=2, lineType=2)

    for j in xrange(len(rotate_degree)):
                cv2.putText(frame, ('{:05.2f}').format(float(rotate_degree[j])), (10, 30 + (50 * j)), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), thickness=2, lineType=2)

    cv2.imwrite('test2/'+img_path.split('/')[1], frame)

f.close()

代碼運行結果：

頭部姿態估計原理及可視化

一、簡述

二、原理

三、代碼實現

深度學習——卷積神經網絡的應用——目標檢測

Tensorflow一些常用基本概念與函數彙總（四）

FaceNet源碼使用方法及其遷移學習訓練自己數據集的代碼修改

PyTorch學習（一）——Linear Model、Gradient Desent、Back propogation

PyTorch學習（基礎）—— Tensor & autograd

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結