機器學習需要大量的數據,從多渠道獲得原始數據後,需要將圖片中的人臉提取出來做訓練集。
代碼如下
#-*-coding:utf8-*-
import os
import cv2
import time
import shutil
def getAllPath(dirpath, *suffix):
PathArray = []
for r, ds, fs in os.walk(dirpath):
for fn in fs:
if os.path.splitext(fn)[1] in suffix:
fname = os.path.join(r, fn)
PathArray.append(fname)
return PathArray
def readPicSaveFace_1(sourcePath,targetPath,invalidPath,*suffix):
try:
ImagePaths=getAllPath(sourcePath, *suffix)
#對list中圖片逐一進行檢查,找出其中的人臉然後寫到目標文件夾下
count = 1
# haarcascade_frontalface_alt.xml爲庫訓練好的分類器文件,下載opencv,安裝目錄中可找到
face_cascade = cv2.CascadeClassifier('E:\opencv-3.4.2\data\haarcascades\haarcascade_frontalface_alt.xml')
for imagePath in ImagePaths:
try:
img = cv2.imread(imagePath)
if type(img) != str:
faces = face_cascade.detectMultiScale(img, 1.1, 5)
if len(faces):
for (x, y, w, h) in faces:
# 設置人臉寬度大於16像素,去除較小的人臉
if w>=16 and h>=16:
# 以時間戳和讀取的排序作爲文件名稱
listStr = [str(int(time.time())), str(count)]
fileName = ''.join(listStr)
# 擴大圖片,可根據座標調整
X = int(x)
W = min(int(x + w),img.shape[1])
Y = int(y)
H = min(int(y + h),img.shape[0])
f = cv2.resize(img[Y:H, X:W], (W-X,H-Y))
cv2.imwrite(targetPath+os.sep+'%s.jpg' % fileName, f)
count += 1
print (imagePath + "have face")
#else:
# shutil.move(imagePath, invalidPath)
except:
continue
except IOError:
print ("Error")
else:
print ('Find '+str(count-1)+' faces to Destination '+targetPath)
if __name__ == '__main__':
invalidPath = r'E:\unused'
sourcePath = r'E:\test'
targetPath1 = r'E:\target\alt'
readPicSaveFace_1(sourcePath,targetPath1,invalidPath,'.jpg','.JPG','png','PNG')
我嘗試了opencv3.4.2中haarcascades目錄下所有的分類器,最終選定了alt、alt2、default和profileface四個來使用。其中alt、alt2和profileface識別出的較少,但準確率較高,default識別出的數量較多,但準確率較低。
參考:https://blog.csdn.net/haohuajie1988/article/details/79163318