備註:閱讀博客後的筆記,代碼來自他人博客。
1. 基於線性SVM的cifar10圖像分類
博客爲:svm實現圖片分類(python) 博客對應的代碼倉庫:https://github.com/452896915/cs231n_course_homework
1.1 cifar10數據集的構成:http://www.cs.toronto.edu/~kriz/cifar.html
數據集訓練集有5個batch: 每個batch爲10k數據。測試集有一個batch,10k數據。帶有標籤。分類的圖像都是32×32×3。
1.2 當前的result
cifar10目前的測試集準確率到了什麼水平呢?
1.3 基於線性SVM的圖片分類
https://blog.csdn.net/red_stone1/article/details/80661133 更好的線性SVM的博客
博客爲:svm實現圖片分類(python) 博客對應的代碼倉庫:https://github.com/452896915/cs231n_course_homework
- 方法原理:將32×32×3的圖像直接作爲SVM的輸入,輸入特徵就有:3072維度,每個Pixel作爲一個特徵。即不進行手工特徵提取。
- 線性SVM分類器爲博主自己寫的,重點理解了Hinge Loss和gradient梯度推導,進一步理解了SVM多分類的原理。
- 小批量200的隨機梯度下降。
- 數據在送入SVM之前,都減去了training的50k個樣本的均值,即減去了所有樣本的平均值。
- 對於分類問題的訓練誤差討論的指標都是:accuracy=0.35,準確率,而不是訓練誤差。
- 特徵3072維,訓練樣本數50k,基本上不存在過擬合。同時訓練誤差與測試誤差基本上一致,都高於human誤差,所以結果是模型欠擬合。
- 可以根據樣本的規模畫出學習曲線,accuracy隨着樣本規模的變化規律!
- 所以,現在基本上是欠擬合的狀態。所以增大樣本基本上不再有變化了。
- 需要更多的特徵
- 嘗試更復雜的模型
- 減小正則化
1.4 基於HOG+SVM的cifar10的圖像分類
比1.3的直接將32*32*3的圖像扔進SVM相比,這個先提取HOG特徵,再進入SVM
- 3通道彩色圖->單通道灰度圖,得到灰度圖的HOG
- HOG特徵維度爲288,最後一維爲分類的標籤。
- 32*32的HOG提取維度應該爲36*9=324維度。
- 訓練集的準確率才0.50,測試集的準確率爲0.49.
- 所以泛化誤差基本不存在的,模型仍然是欠擬合。需要更復雜的特徵、更復雜的網絡等。
(1)之前的特徵爲3720維度,但是是原始的pixel作爲特徵,可以看出仍然欠擬合,所以是特徵不夠好!
(2)現在的特徵維度爲288,但是在訓練集的準確率增加到0.5,之前爲0.36。更少的維度獲取到更高的準確率
(3)兩個模型下的泛化誤差基本等於訓練誤差,所以仍然是模型欠擬合的問題。
- 欠擬合問題需要更復雜的模型、更好的特徵表示、正則化不要太強。
(4)288維度的HOG特徵相比於3720的Pixel特徵,準確率卻提升了14%,所以特徵相當重要,但是這樣的分類效率對於Human error來說,還是不夠,所以需要神經網絡。
稍微更改了讀取數據的那一部分代碼:
import os
import cv2
import math
import time
import numpy as np
import tqdm
from skimage.feature import hog
from sklearn.svm import LinearSVC
class Classifier(object):
def __init__(self, filePath):
self.filePath = filePath
def unpickle(self, file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
def get_data(self):
TrainData = []
TestData = []
for b in range(1,6):
f = os.path.join(self.filePath, 'data_batch_%d' % (b, ))
data = self.unpickle(f)
train = np.reshape(data[b'data'], (10000, 3, 32 * 32))
labels = np.reshape(data[b'labels'], (10000, 1))
fileNames = np.reshape(data[b'filenames'], (10000, 1))
datalebels = zip(train, labels, fileNames)
TrainData.extend(datalebels)
f = os.path.join(self.filePath,'test_batch')
data = self.unpickle(f)
test = np.reshape(data[b'data'], (10000, 3, 32 * 32))
labels = np.reshape(data[b'labels'], (10000, 1))
fileNames = np.reshape(data[b'filenames'], (10000, 1))
TestData.extend(zip(test, labels, fileNames))
'''
for childDir in os.listdir(self.filePath):
if 'data_batch' in childDir:
f = os.path.join(self.filePath, childDir)
data = self.unpickle(f)
# train = np.reshape(data[str.encode('data')], (10000, 3, 32 * 32))
# If your python version do not support to use this way to transport str to bytes.
# Think another way and you can.
train = np.reshape(data[b'data'], (10000, 3, 32 * 32))
labels = np.reshape(data[b'labels'], (10000, 1))
fileNames = np.reshape(data[b'filenames'], (10000, 1))
datalebels = zip(train, labels, fileNames)
TrainData.extend(datalebels)
if childDir == "test_batch":
f = os.path.join(self.filePath, childDir)
data = self.unpickle(f)
test = np.reshape(data[b'data'], (10000, 3, 32 * 32))
labels = np.reshape(data[b'labels'], (10000, 1))
fileNames = np.reshape(data[b'filenames'], (10000, 1))
TestData.extend(zip(test, labels, fileNames))
'''
print("data read finished!")
return TrainData, TestData
def get_hog_feat(self, image, stride=8, orientations=8, pixels_per_cell=(8, 8), cells_per_block=(2, 2)):
cx, cy = pixels_per_cell
bx, by = cells_per_block
sx, sy = image.shape
n_cellsx = int(np.floor(sx // cx)) # number of cells in x
n_cellsy = int(np.floor(sy // cy)) # number of cells in y
n_blocksx = (n_cellsx - bx) + 1
n_blocksy = (n_cellsy - by) + 1
gx = np.zeros((sx, sy), dtype=np.float32)
gy = np.zeros((sx, sy), dtype=np.float32)
eps = 1e-5
grad = np.zeros((sx, sy, 2), dtype=np.float32)
for i in range(1, sx-1):
for j in range(1, sy-1):
gx[i, j] = image[i, j-1] - image[i, j+1]
gy[i, j] = image[i+1, j] - image[i-1, j]
grad[i, j, 0] = np.arctan(gy[i, j] / (gx[i, j] + eps)) * 180 / math.pi
if gx[i, j] < 0:
grad[i, j, 0] += 180
grad[i, j, 0] = (grad[i, j, 0] + 360) % 360
grad[i, j, 1] = np.sqrt(gy[i, j] ** 2 + gx[i, j] ** 2)
normalised_blocks = np.zeros((n_blocksy, n_blocksx, by * bx * orientations))
for y in range(n_blocksy):
for x in range(n_blocksx):
block = grad[y*stride:y*stride+16, x*stride:x*stride+16]
hist_block = np.zeros(32, dtype=np.float32)
eps = 1e-5
for k in range(by):
for m in range(bx):
cell = block[k*8:(k+1)*8, m*8:(m+1)*8]
hist_cell = np.zeros(8, dtype=np.float32)
for i in range(cy):
for j in range(cx):
n = int(cell[i, j, 0] / 45)
hist_cell[n] += cell[i, j, 1]
hist_block[(k * bx + m) * orientations:(k * bx + m + 1) * orientations] = hist_cell[:]
normalised_blocks[y, x, :] = hist_block / np.sqrt(hist_block.sum() ** 2 + eps)
return normalised_blocks.ravel()
def get_feat(self, TrainData, TestData):
train_feat = []
test_feat = []
for data in tqdm.tqdm(TestData):
image = np.reshape(data[0].T, (32, 32, 3))
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)/255.
fd = self.get_hog_feat(gray) #你可以用我寫的hog提取函數,也可以用下面skimage提供的,我的速度會慢一些
# fd = hog(gray, 9, [8, 8], [2, 2])
fd = np.concatenate((fd, data[1]))
test_feat.append(fd)
test_feat = np.array(test_feat)
np.save("test_feat.npy", test_feat)
print("Test features are extracted and saved.")
for data in tqdm.tqdm(TrainData):
image = np.reshape(data[0].T, (32, 32, 3))
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) / 255.
fd = self.get_hog_feat(gray)
# fd = hog(gray, 9, [8, 8], [2, 2])
fd = np.concatenate((fd, data[1]))
train_feat.append(fd)
train_feat = np.array(train_feat)
np.save("train_feat.npy", train_feat)
print("Train features are extracted and saved.")
return train_feat, test_feat
def classification(self, train_feat, test_feat):
t0 = time.time()
clf = LinearSVC()
print("Training a Linear SVM Classifier.")
clf.fit(train_feat[:, :-1], train_feat[:, -1])
predict_result = clf.predict(test_feat[:, :-1])
num = 0
for i in range(len(predict_result)):
if int(predict_result[i]) == int(test_feat[i, -1]):
num += 1
rate = float(num) / len(predict_result)
t1 = time.time()
print('The testing classification accuracy is %f' % rate)
print('The testing cast of time is :%f' % (t1 - t0))
predict_result2 = clf.predict(train_feat[:, :-1])
num2 = 0
for i in range(len(predict_result2)):
if int(predict_result2[i]) == int(train_feat[i, -1]):
num2 += 1
rate2 = float(num2) / len(predict_result2)
print('The Training classification accuracy is %f' % rate2)
def run(self):
if os.path.exists("train_feat.npy") and os.path.exists("test_feat.npy"):
train_feat = np.load("train_feat.npy")
test_feat = np.load("test_feat.npy")
else:
TrainData, TestData = self.get_data()
train_feat, test_feat = self.get_feat(TrainData, TestData)
self.classification(train_feat, test_feat)
if __name__ == '__main__':
#filePath = r'F:\DataSets\cifar-10-batches-py'
filePath = r'.\datasets'
cf = Classifier(filePath)
cf.run()