svm_light下載地址http://www.cs.cornell.edu/People/tj/svm_light/
svm-light集成進自己的工程大致有3種方式:
1.工程內部system調用兩個exe;
2.封裝源碼嵌入工程。
3、單獨把文件拉出來:svmlight的learn需要的文件:
svmlight的classify需要的文件:
在VS中如何測試learn?
首先在官網下載test.dat和model.dat兩個文件,
properties-->configuration properties-->DEBUGGING-->COMMAND ARGUMENTS輸入[options] example_file model_file
如: -t 2 test.dat model.dat
其他參數修改類似,可以通過修改options來使訓練達到更優
預測:在classify的debugging中輸入[options] example_file model_file output_file
【options】test.dat svm_model test.txt
================================================================================================================================
下面對svmlight的主要參數進行介紹
options in train:
Available options are:
General options:
-? - this help(查看幫助,不輸入測試數據文件的情況下將輸出這個幫助文件)
-v [0..3] - verbosity level (default 1)(主要是調整輸入輸出參數)
可參考http://www.mathworks.cn/cn/help/stats/classificationsvm.resume.html
我的理解如下:v=0,不顯示任何信息, 不收集信息
v=1,輸出診斷信息,保存迭代信息
v=2,只輸出診斷信息(具體見下圖)
具體如下:
Learning options:
-z {c,r,p} - select between classification (c), regression (r), and preference ranking (p) (see [Joachims, 2002c])
(default classification)
(用來選擇分類、迴歸、復原,分類器中一般只需要用到分類,剛好默認也是分類,個人認爲這個參數一般不需要調整)
-c float - C: trade-off between training error and margin (default [avg. x*x]^-1)
(懲罰因子,參考http://blog.csdn.net/qll125596718/article/details/6910921)
它是一個變化範圍較大的值,需通過交叉驗證來確定,一般情況下,C越大,類似於經驗風險係數最小化原則,則C的取值最好
和|w|是一個數量級,默認可選10(來自網友)
-w [0..] - epsilon width of tube for regression (default 0.1)
w用來改epsilon的值得,是迴歸參數中的不敏感係數,一般情況下分類用不上
-j float - Cost: cost-factor, by which training errors on positive examples outweight errors on negative
examples (default 1)(see [Morik et al., 1999])
鬆弛變量,詳見http://www.blogjava.net/zhenandaci/archive/2009/03/15/259786.html
在libsvm中,這個值是取正負樣本的比例~
-b [0,1] - use biased hyperplane (i.e. x*w+b0) instead of unbiased hyperplane (i.e. x*w0) (default 1)
假如有一個線性函數,g(x)=wx+b0,當選取b=0時,不使用b0這個參數,g(x)=wx,當b=1時,b0這個參數要
考慮進去,即g(x)=wx+b0
-i [0,1] - remove inconsistent training examples and retrain (default 0)
(這個參數可以用來重新訓練,當 i = 1時可以把不一致的樣本去除,重新訓練數據)
Performance estimation options(性能評估選項):
-x [0,1] - compute leave-one-out estimates (default 0)(see [5])
-o ]0..2] - value of rho for XiAlpha-estimator and for pruning leave-one-out computation (default 1.0)
(see [Joachims, 2002a])
-k [0..100] - search depth for extended XiAlpha-estimator(default 0)Transduction options
(see [Joachims, 1999c], [Joachims, 2002a]):
-p [0..1] - fraction of unlabeled examples to be classified into the positive class (default is the ratio of
positive and negative examples in the training data)
Kernel options:(核函數選項,這是重點要調的參數)
-t int - type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(-gamma ||a-b||^2) (RBF是比較常用的)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
-d int - parameter d in polynomial kernel d默認可選3
-g float - parameter gamma in rbf kernel 默認可選1/k
-s float - parameter s in sigmoid/poly kernel
-r float - parameter c in sigmoid/poly kernel 默認可選1
-u string - parameter of user defined kernel(用戶可以利用這個參數定義自己的kernel)
Optimization options (優化選項 )
(see [Joachims, 1999a], [Joachims, 2002a]):
-q [2..] - maximum size of QP-subproblems (default 10)
從最一般的定義上說,一個求最小值的問題就是一個優化問題(也叫尋優問題,更文縐縐的叫法是規劃——Programming),
它由兩部分組成,目標函數和約束條件,可以用下面的式子表示: (不確定說的是不是這個約束條件的優化?)
更詳細內容,請見http://www.blogjava.net/zhenandaci/archive/2009/02/14/254630.html
-n [2..q] - number of new variables entering the working set in each iteration (default n = q).
Set n<q to prevent zig-zagging.
見上面q解釋
-m [5..] - size of cache for kernel evaluations in MB (default 40)The larger the faster...
設置cache內存大小,以MB爲單位(默認40)libsvm也是默認40
-e float - eps: Allow that error for termination criterion
[y [w*x+b] - 1] = eps (default 0.001)
設置允許的終止判據(默認0.001)
-h [5..] - number of iterations a variable needs to be optimal before considered for shrinking (default 100)
在考慮啓發式之前,一個變量優化需要設置的迭代次數
-f [0,1] - do final optimality check for variables removed by shrinking. Although this test is usually positive,there
is no guarantee that the optimum was found if the test is omitted. (default 1)
-y string -> if option is given, reads alphas from file with given and uses them as starting point. (default 'disabled')
從文件中讀取alphas的值
-# int -> terminate optimization, if no progress after this number of iterations. (default 100000)
設置在這些次數迭代完後停止優化,如默認100000次迭代完後停止優化
Output options:
-l char - file to write predicted labels of unlabeled examples into after transductive learning
把未訓練的樣本預測後寫入文件
-a char - write all alphas to this file after learning (in the same order as in the training set)
訓練後把所有alphas的值寫入文件中(按訓練數據集的順序寫入)
predict:
Available options are:
-h Help. (幫助文件) -v [0..3] Verbosity level (default 2).(參看learn的v) -f [0,1] 0: old output format of V1.0 輸出結果如下所示: 1: output the value of decision function (default)輸出結果如下所示:
用測試文件測試出來的輸出結果是一樣的~