百度PaddlePaddle >>> 7. 利用深度學習玩轉手勢識別

在這裏插入圖片描述

前言

今天搞了搞手勢識別,利用了深度學習網絡。

這是手勢數據集:Dataset.zip
在這裏插入圖片描述

一、數據準備

1. 解壓數據

首先我們需要先解壓上述數據集的壓縮包,並將其中無關文件刪除(將目錄換成你的即可):

cd /home/aistudio/data/data23668 && unzip -qo Dataset.zip
cd /home/aistudio/data/data23668/Dataset && rm -f */.DS_Store # 刪除無關文件 

然後按照數據集中圖片生成圖像列表:

2. 生成圖像列表

# 生成圖像列表
data_path = '/home/aistudio/data/data23668/Dataset'
character_folders = os.listdir(data_path)
# print(character_folders)
if(os.path.exists('./train_data.list')):
    os.remove('./train_data.list')
if(os.path.exists('./test_data.list')):
    os.remove('./test_data.list')
    
for character_folder in character_folders:
    with open('./train_data.list', 'a') as f_train:
        with open('./test_data.list', 'a') as f_test:
            if character_folder == '.DS_Store':
                continue
            character_imgs = os.listdir(os.path.join(data_path,character_folder))
            count = 0 
            for img in character_imgs:
                if img =='.DS_Store':
                    continue
                if count%10 == 0:
                    f_test.write(os.path.join(data_path,character_folder,img) + '\t' + character_folder + '\n')
                else:
                    f_train.write(os.path.join(data_path,character_folder,img) + '\t' + character_folder + '\n')
                count +=1
print('列表已生成')

3. 定義數據reader

# 定義訓練集和測試集的reader
def data_mapper(sample):
    img, label = sample
    img = Image.open(img)
    img = img.resize((100, 100), Image.ANTIALIAS)
    img = np.array(img).astype('float32')
    img = img.transpose((2, 0, 1))
    img = img/255.0
    return img, label

def data_reader(data_list_path):
    def reader():
        with open(data_list_path, 'r') as f:
            lines = f.readlines()
            for line in lines:
                img, label = line.split('\t')
                yield img, int(label)
    return paddle.reader.xmap_readers(data_mapper, reader, cpu_count(), 512)

# 用於訓練的數據提供器
train_reader = paddle.batch(reader=paddle.reader.shuffle(reader=data_reader('./train_data.list'), buf_size=256), batch_size=32)
# 用於測試的數據提供器
test_reader = paddle.batch(reader=data_reader('./test_data.list'), batch_size=32) 

在這裏插入圖片描述

二、定義網絡

在這裏我拋磚引玉,給出兩種網絡定義,分別是較爲簡單的全連接神經網絡,與較複雜的卷積神經網絡

1. 全連接神經網絡

#定義DNN網絡
class MyDNN(fluid.dygraph.Layer):
    def __init__(self):
        super(MyDNN, self).__init__()
        self.hidden1 = Linear(100, 300, act="relu")
        self.hidden2 = Linear(300, 300, act="relu")
        self.hidden3 = Linear(300, 100, act="relu")
        self.hidden4 = Linear(3*100*100, 10, act="softmax")
    def forward(self, input):
        x = self.hidden1(input)
        x = self.hidden2(x)
        x = self.hidden3(x)
        x = fluid.layers.reshape(x, shape=[-1, 3*100*100])
        y = self.hidden4(x)
        return y

這個網絡一共有4層網絡(除輸入層),其中3層隱藏層,1層輸出層;

但這個網絡實際應用效果不佳,經過大量測試,最多隻能達到**50%-60%**的識別正確率。

2. 卷積神經網絡

class MyDNN(fluid.dygraph.Layer):
    def __init__(self,training=True):
        super(MyDNN,self).__init__()
        self.conv1 = Conv2D(num_channels=3,num_filters=32,filter_size=3,act='relu')
        self.pool1 = Pool2D(pool_size=2,pool_stride=2,pool_type='max')
        self.conv2 = Conv2D(num_channels=32, num_filters=32, filter_size=3, act='relu')
        self.pool2 = Pool2D(pool_size=2, pool_stride=2,pool_type='max')
        self.conv3 = Conv2D(num_channels=32, num_filters=64, filter_size=3, act='relu')
        self.pool3 = Pool2D(pool_size=2, pool_stride=2,pool_type='max')
        self.conv4 = Conv2D(num_channels=64, num_filters=128, filter_size=3, act='relu')
        self.pool4 = Pool2D(pool_size=2, pool_stride=2,pool_type='max')
        self.conv5 = Conv2D(num_channels=128, num_filters=256, filter_size=3, act='relu')
        self.fc1 = Linear(input_dim=1024,output_dim=5000,act='relu')
        self.drop_ratiol = 0.5 if training else 0.0
        self.fc2 = Linear(input_dim=5000, output_dim=10)

    def forward(self,input1):
        # 卷積層1 --> 池化層1 --> 卷積層2 --> 池化層2 --> 卷積層3 --> 池化層3 --> 卷積層4 --> 池化層4 --> 卷積層5 --> 全連接網絡1 --> 丟棄 --> 全連接網絡2
        conv1 = self.conv1(input1)
        pool1 = self.pool1(conv1)
        conv2 = self.conv2(pool1)
        pool2 = self.pool2(conv2)
        conv3 = self.conv3(pool2)
        pool3 = self.pool3(conv3)
        conv4 = self.conv4(pool3)
        pool4 = self.pool4(conv4)
        conv5 = self.conv5(pool4)
        rs_1 = fluid.layers.reshape(conv5,[conv5.shape[0],-1])
        fc1 = self.fc1(rs_1)
        drop1 = fluid.layers.dropout(fc1,self.drop_ratiol)
        y = self.fc2(drop1)
        return y

可以看到,這個卷積神經網絡比剛剛的全連接神經網絡要複雜很多;

其中包括5層卷積層、4層池化層和2層全連接網絡。
在這裏插入圖片描述

三、模型訓練

採用動態圖方式進行訓練:

#用動態圖進行訓練
with fluid.dygraph.guard():
    model=MyDNN(True) #模型實例化
    model.train() #訓練模式
    # opt=fluid.optimizer.SGDOptimizer(learning_rate=0.01, parameter_list=model.parameters())#優化器選用SGD隨機梯度下降,學習率爲0.001.
    opt = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9, parameter_list=model.parameters())
    epochs_num=160 #迭代次數
    
    for pass_num in range(epochs_num):
        
        for batch_id,data in enumerate(train_reader()):
            
            images=np.array([x[0].reshape(3,100,100) for x in data],np.float32)
            
            labels = np.array([x[1] for x in data]).astype('int64')
            labels = labels[:, np.newaxis]
            # print(images.shape)
            image=fluid.dygraph.to_variable(images)
            label=fluid.dygraph.to_variable(labels)
            predict=model(image)#預測
            # print(predict)
            sf_predict = fluid.layers.softmax(predict)
            # loss=fluid.layers.cross_entropy(predict,label)
            loss = fluid.layers.softmax_with_cross_entropy(predict, label)
            avg_loss=fluid.layers.mean(loss)#獲取loss值

            acc=fluid.layers.accuracy(sf_predict,label)#計算精度
            
            if batch_id!=0 and batch_id%50==0:
                print("train_pass:{},batch_id:{},train_loss:{},train_acc:{}".format(pass_num,batch_id,avg_loss.numpy(),acc.numpy()))
            
            avg_loss.backward()
            opt.minimize(avg_loss)
            model.clear_gradients()
            
    fluid.save_dygraph(model.state_dict(),'MyDNN')#保存模型

經過150次訓練:
在這裏插入圖片描述
在這裏插入圖片描述

四、進行測試

對測試集進行測試,以校驗模型

#模型校驗
with fluid.dygraph.guard():
    accs = []
    model_dict, _ = fluid.load_dygraph('MyDNN')
    model = MyDNN()
    model.load_dict(model_dict) #加載模型參數
    model.eval() #訓練模式
    for batch_id,data in enumerate(test_reader()):#測試集
        images=np.array([x[0].reshape(3,100,100) for x in data],np.float32)
        labels = np.array([x[1] for x in data]).astype('int64')
        labels = labels[:, np.newaxis]

        image=fluid.dygraph.to_variable(images)
        label=fluid.dygraph.to_variable(labels)
        
        predict=model(image)       
        acc=fluid.layers.accuracy(predict,label)
        accs.append(acc.numpy()[0])
        avg_acc = np.mean(accs)
    print(avg_acc)

經過剛剛的150次訓練,此卷積神經網絡的準確率可達94%+
在這裏插入圖片描述
再對一張預測圖像進行測試:

#讀取預測圖像,進行預測

def load_image(path):
    img = Image.open(path)
    img = img.resize((100, 100), Image.ANTIALIAS)
    img = np.array(img).astype('float32')
    img = img.transpose((2, 0, 1))
    img = img/255.0
    print(img.shape)
    return img

#構建預測動態圖過程
with fluid.dygraph.guard():
    infer_path = '手勢.JPG'
    model=MyDNN()#模型實例化
    model_dict,_=fluid.load_dygraph('MyDNN')
    model.load_dict(model_dict)#加載模型參數
    model.eval()#評估模式
    infer_img = load_image(infer_path)
    infer_img=np.array(infer_img).astype('float32')
    infer_img=infer_img[np.newaxis,:, : ,:]
    infer_img = fluid.dygraph.to_variable(infer_img)
    result=model(infer_img)
    display(Image.open('手勢.JPG'))
    print(np.argmax(result.numpy()))

在這裏插入圖片描述
結果:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章