liblinear簡單使用說明

原創

lingerlanlan

2020-02-26 07:02

liblinear簡單使用說明

liblinear適合解決大規模數據和高維稀疏特徵的分類和迴歸問題。

特徵文件格式：跟libsvm的一致，每一行都是

label index1:value1 index2:value2

的稀疏向量的格式。

離線的訓練和測試階段，爲了方便，我是通過命令行來做的，不需要再寫代碼。

其中liblinear封裝了一個train和predict命令（java和C都有），我們只需要調整參數即可方便調用。

模型訓練好和測試符合我們要求之後，爲了方便hive調用，需要java程序加載模型來做預測，這一步需要在java版的liblinear之上再做一次封裝或者改動。

訓練階段：

train命令的使用

1 最簡單的使用方式，不調整任何參數，直接使用默認參數

train train.txt model.txt

2 常見參數調整

-s 表示模型的類型，liblinear裏面不止實現一種模型，裏面還分爲大的小的模型類別。

值得注意的是，svm只能輸出分類的label，lr可以輸出分類的概率。

目前我常使用 –s 0

-wi 針對不同類別設置不同的懲罰因子

通過調整此參數，可以調整預測類別的分佈。

比如男女分類中測試集的真實分佈是55:45,男label爲1，女label爲2。

某次模型訓練後，對上面測試集的預測後的類別分佈是3：7，說明模型偏向於女性，

爲了讓預測分佈跟真實分佈一致，爲了調整模型更偏向於男性，需要再次訓練模型

調大w1的值（男的label爲1，所以是w1），故可以嘗試 –w1 2來訓練看看效果。

如此不斷嘗試。

測試階段：

1 默認參數使用方式

predict test.txt model.txt predict.txt

表示使用模型model.txt來預測測試集test.txt，結果保存在predict.txt

2 常用參數

-b 取值0和1。默認是0。如設置爲1，表示輸出預測分類的概率。

java程序使用模型預測：

爲了方便hive調用，需要根據liblinear中的predict.java重寫一個預測函數。

predict.java原本是輸入一個特徵文件，輸出一個預測文件。

我們改寫的預測函數，是輸入一個字符串表示的特徵向量，輸出一個字符串表示的類別預測概率。

package de.bwaldvogel.liblinear;

import static de.bwaldvogel.liblinear.Linear.atof;
import static de.bwaldvogel.liblinear.Linear.atoi;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.regex.Pattern;

public class Predictor {

    private static boolean       flag_predict_probability = true;

    private static final Pattern COLON                    = Pattern.compile(":");

    /**
     * <p><b>Note: The streams are NOT closed</b></p>
     */
    static public String doPredict(Model model,String line) throws IOException {


        int nr_class = model.getNrClass();
        double[] prob_estimates = null;
        int n;
        int nr_feature = model.getNrFeature();
        if (model.bias >= 0)
            n = nr_feature + 1;
        else
            n = nr_feature;

        if (flag_predict_probability && !model.isProbabilityModel()) {
            throw new IllegalArgumentException("probability output is only supported for logistic regression");
        }

        if (flag_predict_probability) {
            prob_estimates = new double[nr_class];
        }

            List<Feature> x = new ArrayList<Feature>();
            StringTokenizer st = new StringTokenizer(line, " \t\n");
      

            while (st.hasMoreTokens()) {
                String[] split = COLON.split(st.nextToken(), 2);
                if (split == null || split.length < 2) {
                    throw new RuntimeException("Wrong input format at line "+line);
                }

                try {
                    int idx = atoi(split[0]);
                    double val = atof(split[1]);

                    // feature indices larger than those in training are not used
                    if (idx <= nr_feature) {
                        Feature node = new FeatureNode(idx, val);
                        x.add(node);
                    }
                } catch (NumberFormatException e) {
                    throw new RuntimeException("Wrong input format at line " + line, e);
                }
            }

            if (model.bias >= 0) {
                Feature node = new FeatureNode(n, model.bias);
                x.add(node);
            }

            Feature[] nodes = new Feature[x.size()];
            nodes = x.toArray(nodes);

            double predict_label;
            String res="";
            if (flag_predict_probability) {
            	int[] labels = model.getLabels();
                assert prob_estimates != null;
                predict_label = Linear.predictProbability(model, nodes, prob_estimates);

               // System.out.printf("%g", predict_label);
                for (int j = 0; j < model.nr_class; j++)
                {
                	res =res+ labels[j]+":"+prob_estimates[j]+";";
                	//System.out.printf(" %g", prob_estimates[j]);
                }
          
                
            } else {
                predict_label = Linear.predict(model, nodes);
            }

           // System.out.println(res);
            return res;

    }


    public static void main(String[] argv) throws IOException {
        flag_predict_probability = true;
        try {
          
           String line = "438:1.0 4659:1.0 4661:1.0 5026:1.0 5067:1.0 5914:1.0 6020:1.0 9924:1.0 13845:1.0 17295:1.0 19792:1.0 21466:1.0 22054:1.0 22095:1.0 22425:1.0 26541:1.0";
           
           
          String model_path="/model/AgePredicotr4LiblinearAdmaster.model";
		InputStream inputStream = this.getClass().getResourceAsStream(model_path);
		BufferedReader br=new BufferedReader(new InputStreamReader(inputStream));	
		model = Linear.loadModel(br);	
		br.close();
            
            String res = doPredict( model,line);
            System.out.println(res);
        }
        finally {

        }
    }
}

參考資料

http://www.csie.ntu.edu.tw/~cjlin/liblinear/

http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf

https://github.com/bwaldvogel/liblinear-java java版linlinear

本文作者:linger

本文鏈接：http://blog.csdn.net/lingerlanlan/article/details/48659803

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

liblinear簡單使用說明

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

全面系統的AI學習路徑，幫助普通人也能玩轉AI

HTML 00 Tutorial

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

linux服務器設置ssh免密

一天一段scala代碼（十五）

一天一段scala代碼（九）

一天一段scala代碼（八）

map-reduce入門

Numpy數組的序列化和反序列化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結