下載IK
配置schema.xml
<fieldType name="text_ik" class="solr.TextField">
<!-- 最細粒度分詞 -->
<analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
<!-- 智能分詞 -->
<analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
希望建立索引時使用最細粒度分詞,查詢時使用智能分詞,但是配置useSmart參數不起作用,一直都是最細粒度分詞
分析原因
通過查看源代碼,solr應該在創建IKAnalyzer對象時調用了默認構造函數,所以useSmart的一直是false最細粒度分詞
public final class IKAnalyzer extends Analyzer
{
private boolean useSmart;
.....省略代碼......
public IKAnalyzer()
{
this(false);
}
public IKAnalyzer(boolean useSmart)
{
this.useSmart = useSmart;
}
.....省略代碼......
}
解決方法
模仿IKAnalyzer,自己編寫UseSmartIKAnalyzer和NotUseSmartIKAnalyzer類,在默認構造函數中分別給useSmart賦值初始值即可
具體實現
1) IKAnalyzer2012FF_u1 依賴lucene4.x,lucene和solr的版本一致都是4.7.2
2) 編寫UseSmartIKAnalyzer.java
package org.wltea.analyzer.lucene;
import java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.Tokenizer;
public final class UseSmartIKAnalyzer
extends Analyzer
{
private boolean useSmart;
public boolean useSmart()
{
return this.useSmart;
}
public void setUseSmart(boolean useSmart)
{
this.useSmart = useSmart;
}
public UseSmartIKAnalyzer()
{
//默認值true
this.useSmart = true;
}
protected Analyzer.TokenStreamComponents createComponents(String fieldName, Reader in)
{
Tokenizer _IKTokenizer = new IKTokenizer(in, useSmart());
return new Analyzer.TokenStreamComponents(_IKTokenizer);
}
}
3)編寫NotUseSmartIKAnalyzer.java
package org.wltea.analyzer.lucene;
import java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.Tokenizer;
public final class NotUseSmartIKAnalyzer
extends Analyzer
{
private boolean useSmart;
public boolean useSmart()
{
return this.useSmart;
}
public void setUseSmart(boolean useSmart)
{
this.useSmart = useSmart;
}
public NotUseSmartIKAnalyzer()
{
//默認值false
this.useSmart = false;
}
protected Analyzer.TokenStreamComponents createComponents(String fieldName, Reader in)
{
Tokenizer _IKTokenizer = new IKTokenizer(in, useSmart());
return new Analyzer.TokenStreamComponents(_IKTokenizer);
}
}
4)將IKAnalyzer2012FF_u1.jar,lucene4.7.2的依賴jar包,UseSmartIKAnalyzer.java,NotUseSmartIKAnalyzer.java放到同1個目錄中,因爲編譯UseSmartIKAnalyzer.java,NotUseSmartIKAnalyzer.java要使用到
如果lucene4.7.2的jar包找不全,可以下載solr4.7.2從solr.war中取
5) 使用 javac 命令編譯
E:\>javac -encoding UTF-8 -classpath E:\ik\* E:\ik\NotUseSmartIKAnalyzer.java
E:\>javac -encoding UTF-8 -classpath E:\ik\* E:\ik\UseSmartIKAnalyzer.java
6) 編譯生成class文件,class文件在相同目錄中生成
7) 將NotUseSmartIKAnalyzer.class和UseSmartIKAnalyzer.class放到IKAnalyzer2012FF_u1.jar的org.wltea.analyzer.lucene路徑中,通過rar軟件添加即可,下圖使用jd-gui查看
8)配置solr索引的schema.xml
<fieldType name="text_ik" class="solr.TextField">
<!-- 使用智能分詞 -->
<analyzer type="index" class="org.wltea.analyzer.lucene.NotUseSmartIKAnalyzer"/>
<!-- 使用智能分詞 -->
<analyzer type="query" class="org.wltea.analyzer.lucene.UseSmartIKAnalyzer"/>
</fieldType>
9) 查看分詞效果
可以看到建立索引時使用最細粒度分詞,查詢時使用智能分詞