渣硕被毕设逼到最后,真的是逼急了。毫无信心地尝试用sklearn的SVM做了个CU划分问题的分类,周期很长,中间差点放弃,但是一想到如果放弃的话自己毕设只能写出30几页,就坚持做完了。最后的效果未达到超过所有state-of-art的水平,但估计可以踩着毕设线毕业了。
1、数据集与数据预处理
数据集使用的是CPIH,一个专门为intra编码下的CU划分问题建立的数据集。github链接如下。之前是看了他们的用CNN做CU划分快速算法的paper,年前一直希望可以重现论文,遇到了很多困难。虽然最后还是没做出效果因为训练结果真的不收敛。感谢paper作者北航小哥哥邮件里热心的回复。https://github.com/HEVC-Projects/CPIH
下载数据集并按照说明提取了36个data集,是4个QP与3个CU尺寸的组合,再分别对应Train Test Valid。
之后用了个python脚本,将数据解析并将特征提出来。
这里主要提取了3个特征,Variation,Block Flatness,SubVariation(不再详细描述)。
脚本如下,:parseData是读取数据集的数据,提出特征,label,然后写到文件里的核心函数。在写一些特征的时候小心一点不要数据不要越界。然后没啥大毛病了。
import re
import string
import os
import struct
import numpy as np
# get equal numbers of Positive Samples and Negtive Samples
def getSubBlock(cuData, cuSize):
t0 = 0
t1 = 0
t2 = 0
t3 = 0
total_num = cuSize * cuSize
half_num = total_num / 2
half_size = cuSize / 2
sub0 = np.arange(0, half_size * half_size)
sub1 = np.arange(0, half_size * half_size)
sub2 = np.arange(0, half_size * half_size)
sub3 = np.arange(0, half_size * half_size)
k = 0
while k < half_num:
i = 0
while i < half_size:
sub0[t0] = cuData[k]
t0 += 1
k += 1
i += 1
while i < cuSize:
sub1[t1] = cuData[k]
t1 += 1
k += 1
i += 1
while k < total_num:
i = 0
while i < half_size:
sub2[t2] = cuData[k]
t2 += 1
k += 1
i += 1
while i < cuSize:
sub3[t3] = cuData[k]
t3 += 1
k += 1
i += 1
return sub0, sub1, sub2, sub3
def getSCCD(cuData, size):
sub0, sub1, sub2, sub3 = getSubBlock(cuData, size)
var0 = sub0.var()
var1 = sub1.var()
var2 = sub2.var()
var3 = sub3.var()
var_mean = (var0 + var1 + var2 + var3) / 4
SCCD0 = (var0 - var_mean) **2
SCCD1 = (var1 - var_mean) **2
SCCD2 = (var2 - var_mean) **2
SCCD3 = (var3 - var_mean) **2
SCCD = (SCCD0 + SCCD1 + SCCD2 + SCCD3) / 4
return SCCD
def calBlockFlatness(cuData):
i = 0
temp = np.zeros(cuData.size, dtype=np.int64)
while i < cuData.size:
a = int(cuData[i])
temp[i] = int(a*a)
i += 1
t1 = float(cuData.sum()) / temp.sum()
t2 = float(cuData.sum()) / cuData.size
bf = t1 * t2
return bf
def getCuData(cuData, label):
i = 0
chunk = ""
while i < cuData.size:
chunk += str(cuData[i])
chunk += " "
i += 1
chunk += str(label) + "\n"
return chunk
def parseData(readData, writelabel, dirWritePicPath, size):
samplePos = 8000;
sampleNeg = 8000;
while samplePos > 0 or sampleNeg > 0:
label = readData.read(1)
if label != b'\x01' and label != b'\x00':
print("label Error!")
break
if label == "":
print("End Processing!")
break
if label == b'\x01':
if samplePos > 0:
label = 1
samplePos -= 1
else:
uselessData = readData.read(size * size)
continue
if label == b'\x00':
if sampleNeg > 0:
label = 0
sampleNeg -= 1
else:
uselessData = readData.read(size * size)
continue
cuData = readData.read(size * size)
cuData = np.frombuffer(cuData, dtype=np.uint8) # cuData = 0 ~ 255
SCCD = getSCCD(cuData, size)
cuMean = cuData.mean()
cuVar = cuData.var()
cuBF = calBlockFlatness(cuData)
#chunk = str(cuMean) + " " + str(cuVar) + " " + str(cuBF) + " " + str(label) + "\n"
#chunk = str(cuVar) + " " + str(cuBF) + " " + str(label) + "\n"
chunk = str(SCCD) + " " + str(cuVar) + " " + str(cuBF) + " " + str(label) + "\n"
#chunk = str(SCCD) + " " + str(cuBF) + " " + str(label) + "\n"
#chunk = str(SCCD) + " " + str(cuVar) + " " + str(label) + "\n"
#chunk = getCuData(cuData, label)
writelabel.write(chunk)
i = i + 1
print(samplePos)
print(sampleNeg)
def parseSet(dirReadData, seqName, cuSize):
readData = open(dirReadData, 'rb')
dirWritelabel = '/Users/mengwang/Documents/MyCode/Machine Learning/cuSplit/' + seqName + '_labels.data'
writelabel = open(dirWritelabel, 'w')
dirWriteDataPath = '/Users/mengwang/Documents/MyCode/Machine Learning/cuSplit/'+ seqName + '/'
parseData(readData, writelabel, dirWriteDataPath, cuSize)
readData.close()
writelabel.close()
def parseWrapper(cuSize, qp, dataSet):
print("Extracting CU"+ str(cuSize) + "_QP" + str(qp)+ "_" + dataSet + "...")
dirReadData = '/Users/mengwang/Documents/MyCode/Machine Learning/cuSplit/data/CU' + str(cuSize) + 'Samples_AI_CPIH_768_1536_2880_4928_qp' + str(qp) + '_' + dataSet + '.dat'
seqName='CU' + str(cuSize) + '_QP' + str(qp) + '_' + dataSet
parseSet(dirReadData, seqName, cuSize)
if __name__=="__main__":
parseWrapper(64, 22, "Train")
parseWrapper(64, 22, "Test")
提取的数据,具体的第一列是SCCD 第二列是Var 第三列是BF 第四列是划分1/不划分0
2、SVM分类
使用了sklearn的svm.LinearSVC,主要是考虑到算法复杂度、和集成难易程度(这一点我考虑的不对,C++版的libsvm可以直接集成过去的)选用线性SVM核函数。
下面的代码是载入数据,并做标准化处理。
path = '/Users/mengwang/Documents/MyCode/Machine Learning/cuSplit/CU64_QP22_Train_labels.data'
data = np.loadtxt(path, dtype=str, delimiter=' ')
data = data.astype(float)
x_train, y_train = np.split(data, (3,), axis=1)
print(x_train.mean(axis=0),'mean of x_train')
print(x_train.std(axis=0), 'std of x_train')
x_train = preprocessing.scale(x_train, axis=0)
path = '/Users/mengwang/Documents/MyCode/Machine Learning/cuSplit/CU64_QP22_Test_labels.data'
data = np.loadtxt(path, dtype=str, delimiter=' ')
data = data.astype(float)
x_test, y_test = np.split(data, (3,), axis=1)
print(x_test.mean(axis=0), 'mean of x_test')
print(x_test.std(axis=0), 'std of x_test')
x_test = preprocessing.scale(x_test, axis=0)
下面开始训练和打出结果
clf = svm.LinearSVC(random_state=3, multi_class='ovr',C=1, class_weight={1:1,0:1})
clf.fit(x_train, y_train.ravel())
y_pred_train = clf.predict(x_train)
print(accuracy_score(y_train, y_pred_train),'accuracy in train set')
y_pred = clf.predict(x_test)
print(accuracy_score(y_test, y_pred),'accuracy in test set') #accuracy_score only used in classification problem
print(clf.get_params())
print(clf.coef_)
print(clf.intercept_)
print(classification_report(y_test, y_pred, target_names=['non-split', 'split']))
后面使用两个特征来说明,特征空间是有重叠的,选用的特征是Var和SubVar,取log10以后画散点图,代码和图如下:
x_min, x_max = x_train[:, 0].min() - 1, x_train[:, 0].max() + 1
y_min, y_max = x_train[:, 1].min() - 1, x_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),np.arange(y_min, y_max, 0.1))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.xlabel("log10(SubVar)")
plt.ylabel("log10(Var)")
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train.ravel(), alpha=0.7, marker='+', linewidths=10)
plt.show()
可以看出明显的重叠啊,划分误差大编码性能损失很严重的。所以如果用两个SVM来分类的话,就像是这样:
那就完美许多了。这里是通过调整初始化模型的参数中的“class_weight”
冒号前代表类的标签“1” “0”,冒号后对应的是其类的惩罚权重;默认情况下设为1:1
如果设为1:2, 0:1则代表“划分”类的惩罚权重加大,意味着如果“不划分”的类误判为“划分”,Loss是翻倍的,所以{1:2, 0:1}结果相比于{1:1, 0:1},“不划分”类的精确度是提高了的!同理,为了提高“划分”精度,可以给标签0更大的权值。
之前对plot不太熟,现在看起来很套路嘛。如果想画出刚刚那个“三分类的图”,其实就是初始化两个新的带权重的SVM:clf和clf0,画图的代码如下。
x_min, x_max = x_train[:, 0].min() - 1, x_train[:, 0].max() + 1
y_min, y_max = x_train[:, 1].min() - 1, x_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
Z0 = clf0.predict(np.c_[xx.ravel(), yy.ravel()])
Z0 = Z0.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3)
plt.contourf(xx, yy, Z0, alpha=0.3)
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train.ravel(), alpha=0.7, marker='+', linewidths=10)
plt.xlabel("log10(SubVar)")
plt.ylabel("log10(Var)")
plt.show()
3、HM中的映射
300行代码;写的是有点点乱;
因为是不同的QP和不同的CU尺寸组合下的双SVM,因此一共有4*3*2=24套系数;
需要注意的就是特征同样需要归一化,减去均值,除以方差:X = (X_org -X_mean) / X_var;
均值方差使用的就是各种特征在训练集里的均值方差;
最后发现每组双SVM的系数接近于 仅需调整截距,斜率系数只是微调;这跟我最初只使用一个SVM然后通过加减1,手动调节上下边界线的做法基本是一致的,当初没想明白怎么回事;
最终实现了全I帧0.8%的BD-rate loss 40%多的加速,希望可以毕业。
4、总结
从不知道方案是否可行、不会提数据、提取特征遇到bug、不知道数据归一化、训练完SVM只有70%准确率这样的SVM有没有达到最优、还有哪里是可以优化的空间、加到HM代码里优化极限又是什么样的、论文该怎么写等等都很心虚;