最近剛做完機器學習技法的第一次作業,三次attempt後達到了400,但是有些問題還不是很清楚,作業中用的是libsvm+python3。作爲一個ML愛好者跟新手,更多的是記錄作業的過程,當時做的也蠻辛苦的,如果有錯,望客官海涵。
Question 1:
Question 2:
Question 3~4:
-1 1:1.0 2:0.0
-1 1:0.0 2:1.0
-1 1:0.0 2:-1.0
1 1:-1.0 2:0.0
1 1:0.0 2:2.0
1 1:0.0 2:-2.0
1 1:-2.0 2:0.0
相關代碼import os
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
y, x = svm_read_problem('E:/ML/Taiwan_ML/homework2/3/train.txt')
model = svm_train(y, x, '-t 1 -c 10000 -g 1 -r 1 -d 2')
support_vectors = model.get_SV()
support_vector_coefficients = model.get_sv_coef()
import numpy as np
sv=np.array(support_vectors)
svc=np.array(support_vector_coefficients)
最後輸出SVs和SV_COEF的值(sv_coef的值是lanrange multiplier和對應y的乘積) n X1 X2 COEF
4 -1 0 0.887
5 0 2 0.150
6 0 -2 0.368
2 0 1 -0.485
3 0 -1 -0.921
明顯1和7對應的乘子爲0,則第三題很明顯了。Question 5:
Question 6~10:
Question 11:
Question 12:
Question 13:
Question 14:
我沒有很嚴謹的推導出來,不過我的想法是核函數的值跟w負相關,當核函數放大p倍,w則縮小p倍。而objective還是原來的最優解則C也要縮小p倍。Question 15:
import os
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
y, x = svm_read_problem('E:/ML/Taiwan_ML/homework2/15/train0.txt')
model = svm_train(y, x, '-t 0 -c 0.01')
support_vectors = model.get_SV()
support_vector_coefficients = model.get_sv_coef()
import numpy as np
sv=np.array(support_vectors)
svc=np.array(support_vector_coefficients)
import numpy as np
w=np.array([0,0])
for i in range(0,2389):
w=w+np.array([svc[i]*sv[i][1],svc[i]*sv[i][2]])
輸出w的值Question 16:
import os
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
for i in range(0,5):
outfile='E:/ML/Taiwan_ML/homework2/15/train'+str(i*2)+'.txt'
y, x = svm_read_problem(outfile)
model = svm_train(y, x, '-t 1 -d 2 -r 1 -g 1 -c 0.01')
print(str(i*2)+'hahahahahhahahah')
p_label, p_acc, p_val = svm_predict(y, x, model)
輸出結果是Question 17:
import os
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
for i in range(0,5):
outfile='E:/ML/Taiwan_ML/homework2/15/train'+str(i*2)+'.txt'
y, x = svm_read_problem(outfile)
model = svm_train(y, x, '-t 1 -d 2 -r 1 -g 1 -c 0.01')
print(str(i*2)+'hahahahahhahahah')
support_vectors = model.get_SV()
support_vector_coefficients = model.get_sv_coef()
import numpy as np
sv=np.array(support_vectors)
svc=np.array(support_vector_coefficients)
res=0
for i in range(0,svc.shape[0]):
if(svc[i]>0):
res=res+svc[i]
else:
res=res-svc[i]
print(str(res)+'wocao')
Question 18:
Question 19:
import os
import math
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
for i in range(0,7):
trainfile='E:/ML/Taiwan_ML/homework2/15/train0.txt'
testfile='E:/ML/Taiwan_ML/homework2/15/test0.txt'
y, x = svm_read_problem(trainfile)
yt,xt = svm_read_problem(testfile)
parameter='-c 0.1 -g '+str(math.pow(10,i))
model = svm_train(y, x, parameter)
print('g='+str(math.pow(10,i))+'hahahahahhahahah')
p_label, p_acc, p_val = svm_predict(yt, xt, model)
Question 20:
cv也是手動寫的,每次隨機生成train和test,代碼如下
import os
import random
import math
os.chdir('D:/developEnvironment/libsvm-3.20/python')
from svmutil import *
trainfile='E:/ML/Taiwan_ML/homework2/15/train0.txt'
gamaArr=[0,0,0,0,0]
for time in range(0,100):
acc=0
n_acc=0
print(str(time)+"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
for i in range(0,5):
parameter='-c 0.1 -g '+str(math.pow(10,i))
y, x = svm_read_problem(trainfile)
yt=list()
xt=list()
for j in range(0,1000):
index=random.randint(0,len(x)-1)
yt.append(y[index])
xt.append(x[index])
del y[index]
del x[index]
model = svm_train(y, x, parameter)
p_label, p_acc, p_val = svm_predict(yt, xt, model)
if(p_acc[0]>acc):
acc=p_acc[0]
n_acc=i
print(parameter)
print(p_acc[0])
print(acc)
print(n_acc)
gamaArr[n_acc]=gamaArr[n_acc]+1
print(gamaArr)
Question 20:
import numpy as np
data=np.loadtxt('E:/ML/Taiwan_ML/homework2/features.train.txt')
x=data[:,1:]
for i in range(0,5):
outfile='E:/ML/Taiwan_ML/homework2/15/train'
y=data[:,0:1]
y=y==i*2
y=y*2-1
data2=np.hstack((y,x))
outfile=outfile+str(i*2)+'.txt'
np.savetxt(outfile,data2,fmt="%d 1:%s 2:%s",newline='\r\n')