假如我們在特殊的應用場景中,需要忽略tf、df所產生的影響,可以如下實現:
1、實現自己的相似度計算方式:
public class MySimilarity extends DefaultSimilarity {
@Override
public float tf(float freq) {
return 1.0f;
}
/** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */
@Override
public float idf(long docFreq, long numDocs) {
return 1.0f;
}
}
2、在創建索引時IndexWriterConfig中指定相似度計算方式如下:
Analyzer analyzer = new MyAnalyzer(0);
MySimilarity sim = new MySimilarity();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
iwc.setSimilarity(sim);
IndexWriter writer = new IndexWriter(indexDir, iwc);
3、在搜索時指定相似度計算方式:
MySimilarity sim = new MySimilarity();
IndexSearcher searcher = new IndexSearcher(reader);
searcher.setSimilarity(sim);