mahout中布爾型數據推薦系統的生成與評估

按照《mahout實戰》中的布爾型數據的生成與評估代碼如下:

    public static void booleanPrefEvaluator() throws IOException, TasteException
    {
        DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderBuilder builder = new RecommenderBuilder(){

            @Override
            public Recommender buildRecommender(DataModel arg0)
                    throws TasteException {
                // TODO Auto-generated method stub
//                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserSimilarity similarity = new LogLikelihoodSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
            
        };
        DataModelBuilder modelBuilder = new DataModelBuilder(){

            @Override
            public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
                // TODO Auto-generated method stub
                //所需參數是FastByIDMap(FastIDSet)
                return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
            }
            
        };
        double score = evaluator.evaluate(builder, modelBuilder, model, 0.9, 1.0);
        System.out.println("平均差值:" + score);
        
    }

對於布爾型中的偏好值,並不是沒有,而是全部爲一個假的偏好值1.0

所以對於Pearson相關係數來說,會出現以下錯誤:

Exception in thread "main" java.lang.IllegalArgumentException: DataModel doesn't have preference values
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:125)
    at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:74)
    at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:66)
    at test.mahout.recommendation.BooleanPreferenceRecommender$1.buildRecommender(BooleanPreferenceRecommender.java:43)
    at org.apache.mahout.cf.taste.impl.eval.AbstractDifferenceRecommenderEvaluator.evaluate(AbstractDifferenceRecommenderEvaluator.java:125)
    at test.mahout.recommendation.BooleanPreferenceRecommender.booleanPrefEvaluator(BooleanPreferenceRecommender.java:60)
    at test.mahout.recommendation.BooleanPreferenceRecommender.main(BooleanPreferenceRecommender.java:138)

換成了LogLikelihoodSimilarity,結果爲0.0

但是這個結果是無效的。

但是可以利用準確率和召回率來評價,代碼如下:

    public static void preRecallBooleanPrefEvaluator() throws TasteException, IOException
    {
        //用DataModel作爲參數的GenericBooleanPrefDataModel構造函數在0.9版已經被棄用,可以用下面的構造函數替代
        DataModel model_old = new GenericBooleanPrefDataModel(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base")));
        DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
//        要想使用布爾型的 一定要在model就獲取布爾型的,不能還用如下的構造函數獲取model,否則結果不正確
//        DataModel model = new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"));
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder(){

            @Override
            public Recommender buildRecommender(DataModel model)
                    throws TasteException {
                // TODO Auto-generated method stub
                UserSimilarity similarity = new LogLikelihoodSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
//                return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
            }
            
        };
        DataModelBuilder modelBuilder = new DataModelBuilder(){

            @Override
            public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingdata) {
                // TODO Auto-generated method stub
                return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingdata));
            }
            
        };
        IRStatistics stats = evaluator.evaluate(recommenderBuilder, modelBuilder, model_old, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
        System.out.println("準確率:" + stats.getPrecision());
        System.out.println("召回率" + stats.getRecall());

    }

其中第一行的model_old,書中使用的是該構造函數,經運行結果爲:

準確率:0.24496288441145259
召回率0.24496288441145259

但是由於該構造函數在mahout-0.9中已經被棄用,所以使用第二行的構造函數代替,注意一定要使用布爾型的獲取model

結果同上。

代碼中return new GenericUserBasedRecommender(model, neighborhood, similarity);

由此可見,這個推薦程序扔基於估計的偏好進行排序,但是估計的偏好都爲1.0 所以排序是隨機的。故而將其更改爲

 return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);

結果爲:

準確率:0.22926829268292725
召回率0.22926829268292725

可見結果並沒有變得更好,這個例子旨在審視如何在mahout中高效部署布爾型數據。


發佈了53 篇原創文章 · 獲贊 8 · 訪問量 9萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章