svmlight使用說明

svm_light下載地址http://www.cs.cornell.edu/People/tj/svm_light/

svm-light集成進自己的工程大致有3種方式:

1.工程內部system調用兩個exe;

2.封裝源碼嵌入工程。

3、單獨把文件拉出來:svmlight的learn需要的文件:

svmlight的classify需要的文件:


在VS中如何測試learn?

首先在官網下載test.dat和model.dat兩個文件,

properties-->configuration properties-->DEBUGGING-->COMMAND ARGUMENTS輸入[options] example_file model_file

如:  -t 2 test.dat model.dat

其他參數修改類似,可以通過修改options來使訓練達到更優

預測:在classify的debugging中輸入[options] example_file model_file output_file

【options】test.dat svm_model test.txt

================================================================================================================================
下面對svmlight的主要參數進行介紹

options in train:

Available options are:

General options:
         -?          - this help(查看幫助,不輸入測試數據文件的情況下將輸出這個幫助文件)
         -v [0..3]   - verbosity level (default 1)(主要是調整輸入輸出參數)

         可參考http://www.mathworks.cn/cn/help/stats/classificationsvm.resume.html
         我的理解如下:v=0,不顯示任何信息, 不收集信息
                       v=1,輸出診斷信息,保存迭代信息
                       v=2,只輸出診斷信息(具體見下圖)
         具體如下:


 Learning options:
-z {c,r,p} - select between classification (c), regression (r), and preference ranking (p) (see [Joachims, 2002c])
(default classification)
(用來選擇分類、迴歸、復原,分類器中一般只需要用到分類,剛好默認也是分類,個人認爲這個參數一般不需要調整)

-c float - C: trade-off between training error and margin (default [avg. x*x]^-1)
(懲罰因子,參考http://blog.csdn.net/qll125596718/article/details/6910921
它是一個變化範圍較大的值,需通過交叉驗證來確定,一般情況下,C越大,類似於經驗風險係數最小化原則,則C的取值最好
和|w|是一個數量級,默認可選10(來自網友)


 -w [0..] - epsilon width of tube for regression (default 0.1)
w用來改epsilon的值得,是迴歸參數中的不敏感係數,一般情況下分類用不上
-j float - Cost: cost-factor, by which training errors on positive examples outweight errors on negative
examples (default 1)(see [Morik et al., 1999])
鬆弛變量,詳見http://www.blogjava.net/zhenandaci/archive/2009/03/15/259786.html
在libsvm中,這個值是取正負樣本的比例~

  -b [0,1] - use biased hyperplane (i.e. x*w+b0) instead of unbiased hyperplane (i.e. x*w0) (default 1)
假如有一個線性函數,g(x)=wx+b0,當選取b=0時,不使用b0這個參數,g(x)=wx,當b=1時,b0這個參數要
考慮進去,即g(x)=wx+b0

-i [0,1] - remove inconsistent training examples and retrain (default 0)
(這個參數可以用來重新訓練,當 i = 1時可以把不一致的樣本去除,重新訓練數據)


 Performance estimation options(性能評估選項):
-x [0,1] - compute leave-one-out estimates (default 0)(see [5])
-o ]0..2] - value of rho for XiAlpha-estimator and for pruning leave-one-out computation (default 1.0)
(see [Joachims, 2002a])
-k [0..100] - search depth for extended XiAlpha-estimator(default 0)Transduction options
 (see [Joachims, 1999c], [Joachims, 2002a]):
-p [0..1] - fraction of unlabeled examples to be classified into the positive class (default is the ratio of
positive and negative examples in the training data)


Kernel options:(核函數選項,這是重點要調的參數)
-t int - type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(-gamma ||a-b||^2) (RBF是比較常用的)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
-d int - parameter d in polynomial kernel d默認可選3
-g float - parameter gamma in rbf kernel 默認可選1/k
-s float - parameter s in sigmoid/poly kernel
-r float - parameter c in sigmoid/poly kernel 默認可選1
-u string - parameter of user defined kernel
(用戶可以利用這個參數定義自己的kernel)


 Optimization options (優化選項 )
(see [Joachims, 1999a], [Joachims, 2002a]):

-q [2..] - maximum size of QP-subproblems (default 10)

從最一般的定義上說,一個求最小值的問題就是一個優化問題(也叫尋優問題,更文縐縐的叫法是規劃——Programming),

它由兩部分組成,目標函數和約束條件,可以用下面的式子表示: (不確定說的是不是這個約束條件的優化?)

clip_image002
更詳細內容,請見http://www.blogjava.net/zhenandaci/archive/2009/02/14/254630.html

-n [2..q] - number of new variables entering the working set in each iteration (default n = q).
Set n<q to prevent zig-zagging.
見上面q解釋

-m [5..] - size of cache for kernel evaluations in MB (default 40)The larger the faster...
設置cache內存大小,以MB爲單位(默認40)libsvm也是默認40

-e float - eps: Allow that error for termination criterion
[y [w*x+b] - 1] = eps (default 0.001)
設置允許的終止判據(默認0.001)

-h [5..] - number of iterations a variable needs to be optimal before considered for shrinking (default 100)
在考慮啓發式之前,一個變量優化需要設置的迭代次數

-f [0,1] - do final optimality check for variables removed by shrinking. Although this test is usually positive,there
is no guarantee that the optimum was found if the test is omitted. (default 1)

-y string -> if option is given, reads alphas from file with given and uses them as starting point. (default 'disabled')
從文件中讀取alphas的值
 
-# int -> terminate optimization, if no progress after this number of iterations. (default 100000)
設置在這些次數迭代完後停止優化,如默認100000次迭代完後停止優化


Output options:
-l char - file to write predicted labels of unlabeled examples into after transductive learning
把未訓練的樣本預測後寫入文件

-a char - write all alphas to this file after learning (in the same order as in the training set)
訓練後把所有alphas的值寫入文件中(按訓練數據集的順序寫入)

predict:

Available options are:

-h         Help. (幫助文件)
-v [0..3]  Verbosity level (default 2).(參看learn的v)
-f [0,1]   0: old output format of V1.0
               輸出結果如下所示:
            
           1: output the value of decision function (default)
           
               輸出結果如下所示:


用測試文件測試出來的輸出結果是一樣的~
發佈了39 篇原創文章 · 獲贊 4 · 訪問量 5萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章