Tensorflow + 基於CNN神經網絡的面部表情識別

最近在學習使用Tensorflow框架,在學習到了CNN卷積神經網絡的時候,跟着書上寫了一個基於CNN網絡的一個面部表情識別的小項目。

說一下我的硬件設備:

CPU:G4560,,這什麼年代了,我還在用4560,你敢信??滑稽----

GPU:GTX1050 4G

我的python版本是3.6,Tensorflow版本是1.5

在這裏我使用的Tensorflow-gpu版本是1.5的,,在運行的時候層遇到運行的庫與編譯庫的版本不一致,後來經過將CUDNN降級,成功解決了這個問題,好了廢話不多說,下面開始進入正題。

1.數據來源

  在這裏我使用的人臉表情的數據集是來自kaggle競賽中的數據,網址是https://inclass.kaggle.com/c/facial-keypoints-detector/data, 從這個網址就可以下載我使用的數據集了,,不過首先你需要註冊一個Kaggle賬號才行。

  這個數據集中包括三個.csv文件:test.csv, train.csv, train_identity.csv.

  每張圖片的表情由一個0~6中的一個數字代表,0=生氣,1=反感,2=恐懼,3=高興,4=難過,5=驚喜,6=無表情。

2.說明我的網絡架構

在這裏我一開始使用的的網絡架構是:

input--->conv1-->pool1-->conv2-->pool2--->全連接層--->輸出層

input:輸入層 48 × 48 × 1

conv1:卷積層 5 × 5 × 1 × 32 

pool1:最大池化層 2 × 2

conv2:卷積層  3 × 3  × 32 × 64

pool2:最大池化層 2 × 2

全連接層: 包含256個神經元  輸入: 12 × 12 × 64 的一維張量

輸出層:包含7個神經元,對應7種表情

在這裏我一開始按照書上使用的是兩個卷積層、兩個池化層再加上兩個全連接層。但是經過訓練、驗證後,發現模型的精確度不是非常好,所以我改變了一下網絡的架構,多加了兩個卷積層,這個改變的想法取自與VGG網絡架構,但是由於硬件的問題,並不是全部按照VGG的網絡架構來,,改變後的網絡架構如下:

input---->conv1---->conv2---->pool1---->conv3----->conv4--->pool2--->全連接層--->輸出層

input:輸入層 48 × 48 × 1

conv1:卷積層 3 × 3 × 32 × 64 

conv2:卷積層  3 × 3  × 64 × 64

pool1:最大池化層 2 × 2

conv3:卷積層  3 × 3  × 64 × 128

conv4:卷積層  3 × 3  × 128 × 128

pool2:最大池化層 2 × 2

全連接層: 包含256個神經元  輸入: 12 × 12 × 128 的一維張量

輸出層:包含7個神經元,對應7種表情

經過驗證,,改變後的模型準確率達到了90%,相比之前有很大的提高,但是呢,準確度並不是非常的高,如果使用更大、更多樣化的訓練集,干預網絡參數,在進一步修改網絡架構,應該可以得到更好的模型。

3.搭建模型並進行訓練

import tensorflow as tf
import numpy as np
# import os, sys, inspect
from datetime import datetime
import EmotionDetectorUtils


FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string("data_dir", "EmotionDetector/", "Path to data files")
tf.flags.DEFINE_string("logs_dir", "logs/EmotionDetector_logs/", "Path to where log files are to be saved")
tf.flags.DEFINE_string("mode", "train", "mode: train (Default)/ test")

BATCH_SIZE = 128
LEARNING_RATE = 0.001  # 學習率
MAX_ITERATIONS = 1001  # 最大迭代次數
REGULARIZATION = 1e-2  # 正則項參數大小
IMAGE_SIZE = 48  # 圖像大小
NUM_LABELS = 7  # 輸出的類別數量
VALIDATION_PERCENT = 0.1

# 添加正則項
def add_to_regularization_loss(W, b):
    tf.add_to_collection("losses", tf.nn.l2_loss(W))
    tf.add_to_collection("losses", tf.nn.l2_loss(b))

# 按照傳遞進來的shape形狀初始化權重
def weight_variable(shape, stddev=0.02, name=None):
    initial = tf.truncated_normal(shape, stddev=stddev)
    if name is None:
        return tf.Variable(initial)
    else:
        return tf.get_variable(name, initializer=initial)

# 初始化偏差
def bias_variable(shape, name=None):
    initial = tf.constant(0.0, shape=shape)
    if name is None:
        return tf.Variable(initial)
    else:
        return tf.get_variable(name, initializer=initial)


def conv2d_basic(x, W, bias):
    conv = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
    return tf.nn.bias_add(conv, bias)

# 池化層操作
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding="SAME")

# 模型的實現
def emotion_cnn(dataset):
    print("input dataset's shape-->", dataset.shape)
    with tf.name_scope("conv1") as scope:
        tf.summary.histogram("W_conv1", weights['wc1'])
        tf.summary.histogram("b_conv1", biases['bc1'])
        conv_1 = tf.nn.conv2d(dataset, weights['wc1'],
                              strides=[1, 1, 1, 1], padding="SAME")
        print("conv_1's shape--->", conv_1.shape)
        h_conv1 = tf.nn.bias_add(conv_1, biases['bc1'])
        h_1 = tf.nn.relu(h_conv1)
        # h_pool1 = max_pool_2x2(h_1)
        # print("h_pool1 shape-->", h_pool1.shape)
        add_to_regularization_loss(weights['wc1'], biases['bc1'])

    with tf.name_scope("conv2") as scope:
        tf.summary.histogram("W_conv2", weights['wc2'])
        tf.summary.histogram("b_conv2", biases['bc2'])
        # conv_2 = tf.nn.conv2d(h_pool1, weights['wc2'], strides=[1, 1, 1, 1], padding="SAME")
        conv_2 = tf.nn.conv2d(h_1, weights['wc2'], strides=[1, 1, 1, 1], padding="SAME")
        print("conv_2's shape--->", conv_2.shape)
        h_conv2 = tf.nn.bias_add(conv_2, biases['bc2'])
        h_2 = tf.nn.relu(h_conv2)
        h_pool2 = max_pool_2x2(h_2)
        add_to_regularization_loss(weights['wc2'], biases['bc2'])

    with tf.name_scope("conv3") as scope:
        tf.summary.histogram("W_conv3", weights['wc3'])
        tf.summary.histogram("b_conv3", biases['bc3'])
        conv_3 = tf.nn.conv2d(h_pool2, weights['wc3'], strides=[1, 1, 1, 1], padding="SAME")
        print("conv_3 shape-->", conv_3.shape)
        h_conv3 = tf.nn.bias_add(conv_3, biases['bc3'])
        h_3 = tf.nn.relu(h_conv3)
        # h_pool3 = max_pool_2x2(h_3)
        # print("h_pool3 shape-->", h_pool3.shape)
        add_to_regularization_loss(weights['wc3'], biases['bc3'])

    with tf.name_scope("conv4") as scope:
        tf.summary.histogram("W_conv4", weights['wc4'])
        tf.summary.histogram("b_conv4", biases['bc4'])
        # conv_4 = tf.nn.conv2d(h_pool3, weights['wc4'], strides=[1, 1, 1, 1], padding="SAME")
        conv_4 = tf.nn.conv2d(h_3, weights['wc4'], strides=[1, 1, 1, 1], padding="SAME")
        print("conv_4 shape-->", conv_4.shape)
        h_conv4 = tf.nn.bias_add(conv_4, biases['bc4'])
        h_4 = tf.nn.relu(h_conv4)
        h_pool4 = max_pool_2x2(h_4)
        print("h_pool4 shape-->", h_pool4.shape)
        add_to_regularization_loss(weights['wc4'], biases['bc4'])

    with tf.name_scope("fc_1") as scope:
        prob = 0.5
        image_size = IMAGE_SIZE // 4
        h_flat = tf.reshape(h_pool4, [-1, image_size * image_size * 128])
        print("h_flat shape--->", h_flat.shape)
        tf.summary.histogram("W_fc1", weights['wf1'])
        tf.summary.histogram("b_fc1", biases['bf1'])
        h_fc1 = tf.nn.relu(tf.matmul(h_flat, weights['wf1']) + biases['bf1'])
        print("h_fc1'shape--->", h_fc1.shape)
        h_fc1_dropout = tf.nn.dropout(h_fc1, prob)
        print("h_fc1_dropout shape-->", h_fc1_dropout.shape)

    with tf.name_scope("fc_2") as scope:
        tf.summary.histogram("W_fc2", weights['wf2'])
        tf.summary.histogram("b_fc2", biases['bf2'])
        pred = tf.matmul(h_fc1_dropout, weights['wf2']) + biases['bf2']
        print("pred shape-->", pred.shape)

    return pred


weights = {
    'wc1': weight_variable([3, 3, 1, 64], name="W_conv1"),
    'wc2': weight_variable([3, 3, 64, 64], name="W_conv2"),
    'wc3': weight_variable([3, 3, 64, 128], name="W_conv3"),
    'wc4': weight_variable([3, 3, 128, 128], name="W_conv4"),
    'wf1': weight_variable([(IMAGE_SIZE // 4) * (IMAGE_SIZE // 4) * 128, 256], name="W_fc1"),
    'wf2': weight_variable([256, NUM_LABELS], name="W_fc2")
}

biases = {
    'bc1': bias_variable([64], name="b_conv1"),
    'bc2': bias_variable([64], name="b_conv2"),
    'bc3': bias_variable([128], name="b_conv3"),
    'bc4': bias_variable([128], name="b_conv4"),
    'bf1': bias_variable([256], name="b_fc1"),
    'bf2': bias_variable([NUM_LABELS], name="b_fc2")
}


def loss(pred, label):
    cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred, labels=label))
    tf.summary.scalar('Entropy', cross_entropy_loss)
    reg_losses = tf.add_n(tf.get_collection("losses"))
    # tf.summary.scalar('Reg_loss', reg_losses)
    return cross_entropy_loss + REGULARIZATION * reg_losses


def train(loss, step):
    return tf.train.AdamOptimizer().minimize(loss, global_step=step)


def get_next_batch(images, labels, step):
    offset = (step * BATCH_SIZE) % (images.shape[0] - BATCH_SIZE)
    batch_images = images[offset: offset + BATCH_SIZE]
    batch_labels = labels[offset:offset + BATCH_SIZE]
    return batch_images, batch_labels


# 入口函數
def main(argv=None):
    # 獲得數據
    train_images, train_labels, valid_images, valid_labels, test_images = EmotionDetectorUtils.read_data(FLAGS.data_dir)
    print("Train size: %s" % train_images.shape[0])
    print('Validation size: %s' % valid_images.shape[0])
    print("Test size: %s" % test_images.shape[0])

    # 定義global_step變量追蹤當前已進行優化迭代次數,trainable=Flase意爲Tensorflow不會試圖優化該變量
    global_step = tf.Variable(0, trainable=False)
    dropout_prob = tf.placeholder(tf.float32)
    # 爲輸入圖像定義佔位符變量 None表示該張量可以載入任意數量的圖像,每個圖像高和寬都爲IMAGE_SIZE像素,顏色通達數爲1
    input_dataset = tf.placeholder(tf.float32, [None, IMAGE_SIZE, IMAGE_SIZE, 1], name="input")
    # 爲輸入input_data中的圖像真實標籤定義佔位符變量
    input_labels = tf.placeholder(tf.float32, [None, NUM_LABELS])
    # 獲得網絡輸出
    pred = emotion_cnn(input_dataset)
    # output_pred變量爲預測結果,用於網絡的測試和驗證
    output_pred = tf.nn.softmax(pred, name="output")
    correct_prediction = tf.equal(tf.argmax(output_pred, 1), tf.argmax(input_labels, 1))
    # tf.cast(x, dtype, name=None) 將輸入轉換成dtype的類型
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("accuracy", accuracy)
    # loss_val變量爲預測的類(pred)和輸入圖像的真實類(input_labels)之間的的誤差
    loss_val = loss(pred, input_labels)
    # 獲得優化器對象實例
    train_op = train(loss_val, global_step)
    # 定義summary_op變量用於Tensorboard可視化
    summary_op = tf.summary.merge_all()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        summary_writer = tf.summary.FileWriter(FLAGS.logs_dir, sess.graph_def)
        # 定義saver變量,以存儲該模型
        saver = tf.train.Saver()
        ckpt = tf.train.get_checkpoint_state(FLAGS.logs_dir)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
            print("Model Restored!")
        # 開始訓練
        for step in range(MAX_ITERATIONS):
            # 獲得batch_size大小的一批訓練樣本
            batch_image, batch_label = get_next_batch(train_images, train_labels, step)
            # print("batch image's shape--->", batch_image.shape)
            feed_dict = {input_dataset: batch_image, input_labels: batch_label}
            # 運行優化器,將對應占位符的數據傳遞進去
            sess.run(train_op, feed_dict=feed_dict)
            if step % 10 == 0:
                train_loss, summary_str = sess.run([loss_val, summary_op], feed_dict=feed_dict)
                summary_writer.add_summary(summary_str, global_step=step)
                print("Training Loss: %f" % train_loss)
            # 當運行步數爲100的倍數時,在驗證集上驗證訓練出的模型,並且保存該模型
            if step % 100 == 0:
                valid_loss = sess.run(loss_val, feed_dict={input_dataset: valid_images, input_labels: valid_labels})
                print("%s Validation Loss: %f" % (datetime.now(), valid_loss))
                print("Accuracy: ", accuracy.eval(feed_dict={input_dataset: valid_images, input_labels: valid_labels}))
                saver.save(sess, FLAGS.logs_dir + 'model.ckpt', global_step=step)


if __name__ == "__main__":
    tf.app.run()

還有一個輔助的py文件

import pandas as pd
import numpy as np
import os, sys, inspect
from six.moves import cPickle as pickle
import scipy.misc as misc

IMAGE_SIZE = 48
NUM_LABELS = 7
VALIDATION_PERCENT = 0.1  # use 10 percent of training images for validation

IMAGE_LOCATION_NORM = IMAGE_SIZE // 2

np.random.seed(0)

emotion = {0: 'anger', 1: 'disgust',
           2: 'fear', 3: 'happy',
           4: 'sad', 5: 'surprise', 6: 'neutral'}


class testResult:

    def __init__(self):
        self.anger = 0
        self.disgust = 0
        self.fear = 0
        self.happy = 0
        self.sad = 0
        self.surprise = 0
        self.neutral = 0

    def evaluate(self, label):

        if (0 == label):
            self.anger = self.anger + 1
        if (1 == label):
            self.disgust = self.disgust + 1
        if (2 == label):
            self.fear = self.fear + 1
        if (3 == label):
            self.happy = self.happy + 1
        if (4 == label):
            self.sad = self.sad + 1
        if (5 == label):
            self.surprise = self.surprise + 1
        if (6 == label):
            self.neutral = self.neutral + 1

    def display_result(self, evaluations):
        print("生氣 = " + str((self.anger / float(evaluations)) * 100) + "%")
        print("反感 = " + str((self.disgust / float(evaluations)) * 100) + "%")
        print("恐懼 = " + str((self.fear / float(evaluations)) * 100) + "%")
        print("高興 = " + str((self.happy / float(evaluations)) * 100) + "%")
        print("難過 = " + str((self.sad / float(evaluations)) * 100) + "%")
        print("驚喜 = " + str((self.surprise / float(evaluations)) * 100) + "%")
        print("無表情 = " + str((self.neutral / float(evaluations)) * 100) + "%")


def read_data(data_dir, force=False):
    def create_onehot_label(x):
        label = np.zeros((1, NUM_LABELS), dtype=np.float32)
        label[:, int(x)] = 1
        return label

    pickle_file = os.path.join(data_dir, "EmotionDetectorData.pickle")
    if force or not os.path.exists(pickle_file):
        train_filename = os.path.join(data_dir, "train.csv")
        data_frame = pd.read_csv(train_filename)
        data_frame['Pixels'] = data_frame['Pixels'].apply(lambda x: np.fromstring(x, sep=" ") / 255.0)
        data_frame = data_frame.dropna()
        print("Reading train.csv ...")

        train_images = np.vstack(data_frame['Pixels']).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)
        print(train_images.shape)
        train_labels = np.array(list(map(create_onehot_label, data_frame['Emotion'].values))).reshape(-1, NUM_LABELS)
        print(train_labels.shape)

        permutations = np.random.permutation(train_images.shape[0])
        train_images = train_images[permutations]
        train_labels = train_labels[permutations]
        validation_percent = int(train_images.shape[0] * VALIDATION_PERCENT)
        validation_images = train_images[:validation_percent]
        validation_labels = train_labels[:validation_percent]
        train_images = train_images[validation_percent:]
        train_labels = train_labels[validation_percent:]

        print("Reading test.csv ...")
        test_filename = os.path.join(data_dir, "test.csv")
        data_frame = pd.read_csv(test_filename)
        data_frame['Pixels'] = data_frame['Pixels'].apply(lambda x: np.fromstring(x, sep=" ") / 255.0)
        data_frame = data_frame.dropna()
        test_images = np.vstack(data_frame['Pixels']).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)

        with open(pickle_file, "wb") as file:
            try:
                print('Picking ...')
                save = {
                    "train_images": train_images,
                    "train_labels": train_labels,
                    "validation_images": validation_images,
                    "validation_labels": validation_labels,
                    "test_images": test_images,
                }
                pickle.dump(save, file, pickle.HIGHEST_PROTOCOL)

            except:
                print("Unable to pickle file :/")

    with open(pickle_file, "rb") as file:
        save = pickle.load(file)
        train_images = save["train_images"]
        train_labels = save["train_labels"]
        validation_images = save["validation_images"]
        validation_labels = save["validation_labels"]
        test_images = save["test_images"]

    return train_images, train_labels, validation_images, validation_labels, test_images

4.模型測試

  用於模型測試的圖片,可以自行百度下載,或者使用自己的照片,由於自己拍或者百度上下載來的圖片不可能像用於訓練的數據那樣好,所以會包含很多的噪聲,影響準確度。在這裏如果使用人臉識別將臉部圖片提取出來再進行測試效果應該會更好,即將人臉識別與面部表情識別結合起來可能會產生1+1>2的效果,當然這裏也只是一個猜測。

from scipy import misc
import numpy as np
import matplotlib.cm as cm
import tensorflow as tf
import os, sys, inspect
from datetime import datetime
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
from scipy import misc
import EmotionDetectorUtils
from EmotionDetectorUtils import testResult
import time


def rbg_to_gray(RGB_JPG):
    """
    將RGB彩色圖像轉換爲灰度圖像
    :param RGB_JPG: 輸入的圖像
    :return: 灰度圖像
    """
    # print(type(RGB_JPG))
    return np.dot(RGB_JPG[..., :3], [0.299, 0.587, 0.114])

l = []
image_list = []
# 讀取圖像
img = mpimg.imread('myself.jpg')
img2 = mpimg.imread('author_img.jpg')
img3 = mpimg.imread('test.jpg')
img4 = mpimg.imread('test1.jpg')
img5 = mpimg.imread('test2.jpg')
image_list.append(img)
image_list.append(img2)
image_list.append(img3)
image_list.append(img4)
image_list.append(img5)
for img in image_list:
    # 將讀入的彩色圖像轉換爲灰度圖像
    gray = rbg_to_gray(img)
    l.append(gray)


sess = tf.InteractiveSession()
# 調用之前保存的模型
new_saver = tf.train.import_meta_graph('logs/EmotionDetector_logs/model.ckpt-1000.meta')
new_saver.restore(sess, 'logs/EmotionDetector_logs/model.ckpt-1000')
tf.get_default_graph().as_graph_def()

x = sess.graph.get_tensor_by_name("input:0")
y_conv = sess.graph.get_tensor_by_name("output:0")


tResult = testResult()
num_evaluation = 1000
for img_gray in l:
    image_test = np.resize(img_gray, (1, 48, 48, 1))
    # 展示圖片
    plt.imshow(img_gray, cmap=plt.get_cmap('gray'))
    plt.show()
    print("開始訓練")
    start_time = time.time()
    for i in range(0, num_evaluation):
        result = sess.run(y_conv, feed_dict={x: image_test})
        label = sess.run(tf.argmax(result, 1))
        label = label[0]
        label = int(label)
        tResult.evaluate(label)
    end_time = time.time()
    tResult.display_result(num_evaluation)
    print("用時----> %s 秒" % (end_time - start_time))

5.總結

  如果你想要一個效果很好的面部表情識別模型,你可以嘗試去擴充數據集,或者改進模型的網絡架構,更換更好的超參數等等。不過在做這些事情之前可以先根據損失函數的圖像來判斷模型當前是處於高偏差還是高方差的狀態,,再去決定做哪些事情。。。

  哪裏如果寫的有問題,,請大家指出來,謝謝。——一個菜雞程序員

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章