svm理論與實驗之21: 自定義核函數的使用


徐海蛟博士


真實場景下,數據的特徵可能比較複雜,系統提供的4種核函數或許達不到最佳效果,那麼就需要自定義核函數了。當然,有很多大牛幹這個事情,我們可以拿來使用,通過自定義核方式。


如何用?這時候不再把訓練與測試數據文件作爲輸入參數了,而是使用核矩陣作爲輸入參數。


Assume there are L training instances x1, ..., xL . ... L行訓練樣本

Let K(x, y) be the kernel value of two instances x 與 y. The input formats are:

New training instance for xi:

<label> 0:i 1:K(xi,x1) ... L:K(xi,xL)


New testing instance for any x:

<label> 0:? 1:K(x,x1) ... L:K(x,xL)


That is, in the training file the first column must be the "ID" of xi. In testing, ? can be any value.


All kernel values including ZEROs must be explicitly provided. Any permutation or random subsets of the training/testing files are also valid (see examples below).


Note: the format is slightly different from the precomputed kernel

package released in libsvmtools earlier.


例子:

Assume the original training data has 3個four-feature instances, testing data has one instance:

15 1:1 2:1 3:1 4:1

45 2:3 4:3

25 3:1

-----------------------------------

15 1:1 3:1


若使用線性核, we have the following new training/testing sets:

15 0:1 1:4 2:6 3:1

45 0:2 1:6 2:18 3:0

25 0:3 1:1 2:0 3:1

-------------------------------------

15 0:? 1:2 2:0 3:1


? can be any value.


Any subset of the above training file is also valid. 例如,

25 0:3 1:1 2:0 3:1

45 0:2 1:6 2:18 3:0

意味着核矩陣是:

[K(2,2) K(2,3)] = [18 0]

[K(3,2) K(3,3)] = [0 1]



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章