原文地址:http://blog.csdn.net/hiphopmattshi/article/details/7226326
通過學習Lucene3.5.0的doc文檔,對不同release版本 lucene版本的API改動做分析。最後找到了有價值的改動信息。
- StringReader reader = new StringReader(s);
- TokenStream ts =analyzer.tokenStream(s, reader);
- TermAttribute ta = ts.getAttribute(TermAttribute.class);
通過分析Api文檔信息 可知,CharTermAttribute已經成爲替換TermAttribute的接口因此我編寫了一個例子來更好的從TokenStream中提取Token
- package com.segment;
- import java.io.StringReader;
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.analysis.Token;
- import org.apache.lucene.analysis.TokenStream;
- import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
- import org.apache.lucene.analysis.tokenattributes.TermAttribute;
- import org.apache.lucene.util.AttributeImpl;
- import org.wltea.analyzer.lucene.IKAnalyzer;
- public class Segment {
- public static String show(Analyzer a, String s) throws Exception {
- StringReader reader = new StringReader(s);
- TokenStream ts = a.tokenStream(s, reader);
- String s1 = "", s2 = "";
- boolean hasnext= ts.incrementToken();
- //Token t = ts.next();
- while (hasnext) {
- //AttributeImpl ta = new AttributeImpl();
- CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
- //TermAttribute ta = ts.getAttribute(TermAttribute.class);
- s2 = ta.toString() + " ";
- s1 += s2;
- hasnext = ts.incrementToken();
- }
- return s1;
- }
- public String segment(String s) throws Exception {
- Analyzer a = new IKAnalyzer();
- return show(a, s);
- }
- public static void main(String args[])
- {
- String name = "我是俊傑,我愛編程,我的測試用例";
- Segment s = new Segment();
- String test = "";
- try {
- System.out.println(test+s.segment(name));
- } catch (Exception e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
- }