按照《mahout實戰》中的布爾型數據的生成與評估代碼如下:
public static void booleanPrefEvaluator() throws IOException, TasteException
{
DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder builder = new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel arg0)
throws TasteException {
// TODO Auto-generated method stub
// UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
DataModelBuilder modelBuilder = new DataModelBuilder(){
@Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
// TODO Auto-generated method stub
//所需參數是FastByIDMap(FastIDSet)
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
}
};
double score = evaluator.evaluate(builder, modelBuilder, model, 0.9, 1.0);
System.out.println("平均差值:" + score);
}
對於布爾型中的偏好值,並不是沒有,而是全部爲一個假的偏好值1.0
所以對於Pearson相關係數來說,會出現以下錯誤:
Exception in thread "main" java.lang.IllegalArgumentException: DataModel doesn't have preference values
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:125)
at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:74)
at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:66)
at test.mahout.recommendation.BooleanPreferenceRecommender$1.buildRecommender(BooleanPreferenceRecommender.java:43)
at org.apache.mahout.cf.taste.impl.eval.AbstractDifferenceRecommenderEvaluator.evaluate(AbstractDifferenceRecommenderEvaluator.java:125)
at test.mahout.recommendation.BooleanPreferenceRecommender.booleanPrefEvaluator(BooleanPreferenceRecommender.java:60)
at test.mahout.recommendation.BooleanPreferenceRecommender.main(BooleanPreferenceRecommender.java:138)
換成了LogLikelihoodSimilarity,結果爲0.0
但是這個結果是無效的。
但是可以利用準確率和召回率來評價,代碼如下:
public static void preRecallBooleanPrefEvaluator() throws TasteException, IOException
{
//用DataModel作爲參數的GenericBooleanPrefDataModel構造函數在0.9版已經被棄用,可以用下面的構造函數替代
DataModel model_old = new GenericBooleanPrefDataModel(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base")));
DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
// 要想使用布爾型的 一定要在model就獲取布爾型的,不能還用如下的構造函數獲取model,否則結果不正確
// DataModel model = new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel model)
throws TasteException {
// TODO Auto-generated method stub
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
// return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
}
};
DataModelBuilder modelBuilder = new DataModelBuilder(){
@Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingdata) {
// TODO Auto-generated method stub
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingdata));
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, modelBuilder, model_old, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
System.out.println("準確率:" + stats.getPrecision());
System.out.println("召回率" + stats.getRecall());
}
其中第一行的model_old,書中使用的是該構造函數,經運行結果爲:
準確率:0.24496288441145259
召回率0.24496288441145259
但是由於該構造函數在mahout-0.9中已經被棄用,所以使用第二行的構造函數代替,注意一定要使用布爾型的獲取model
結果同上。
代碼中return new GenericUserBasedRecommender(model, neighborhood, similarity);
由此可見,這個推薦程序扔基於估計的偏好進行排序,但是估計的偏好都爲1.0 所以排序是隨機的。故而將其更改爲
return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
結果爲:
準確率:0.22926829268292725
召回率0.22926829268292725
可見結果並沒有變得更好,這個例子旨在審視如何在mahout中高效部署布爾型數據。