lucence入門

更新:下面的代碼使用Lucene 4.0版本!

Lucene大大簡化了在應用中集成全文搜索的功能。但實際上Lucene十分簡單,我可以在五分鐘之內向你展示如何使用Lucene。

1. 建立索引

爲了簡單起見,我們下面爲一些字符串創建內存索引:

1
2
3
4
5
6
7
8
9
10
11
StandardAnalyzer analyzer = newStandardAnalyzer(Version.LUCENE_40);
Directory index = newRAMDirectory();
 
IndexWriterConfig config = newIndexWriterConfig(Version.LUCENE_40, analyzer);
 
IndexWriter w = newIndexWriter(index, config);
addDoc(w,"Lucene in Action","193398817");
addDoc(w,"Lucene for Dummies","55320055Z");
addDoc(w,"Managing Gigabytes","55063554A");
addDoc(w,"The Art of Computer Science","9900333X");
w.close();

addDoc()方法把文檔(譯者注:這裏的文檔是Lucene中的Document類的實例)添加到索引中。

1
2
3
4
5
6
privatestatic void addDoc(IndexWriter w, String title, String isbn) throwsIOException {
  Document doc = newDocument();
  doc.add(newTextField("title", title, Field.Store.YES));
  doc.add(newStringField("isbn", isbn, Field.Store.YES));
  w.addDocument(doc);
}

注意,對於需要分詞的內容我們使用TextField,對於像id這樣不需要分詞的內容我們使用StringField。

2.搜索請求

我們從標準輸入(stdin)中讀入搜索請求,然後對它進行解析,最後創建一個Lucene中的Query對象。

1
2
String querystr = args.length > 0? args[0] : "lucene";
Query q = newQueryParser(Version.LUCENE_40, "title", analyzer).parse(querystr);

3.搜索

我們創建一個Searcher對象並且使用上面創建的Query對象來進行搜索,匹配到的前10個結果封裝在TopScoreDocCollector對象裏返回。

1
2
3
4
5
6
inthitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = newIndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

4.展示

現在我們得到了搜索結果,我們需要想用戶展示它。

1
2
3
4
5
6
System.out.println("Found " + hits.length + " hits.");
for(inti=0;i<hits.length;++i) {
    intdocId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println((i + 1) + ". " + d.get("isbn") + "\t"+ d.get("title"));
}

這裏是這個小應用的完整代碼。下載HelloLucene.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
importorg.apache.lucene.analysis.standard.StandardAnalyzer;
importorg.apache.lucene.document.Document;
importorg.apache.lucene.document.Field;
importorg.apache.lucene.document.StringField;
importorg.apache.lucene.document.TextField;
importorg.apache.lucene.index.DirectoryReader;
importorg.apache.lucene.index.IndexReader;
importorg.apache.lucene.index.IndexWriter;
importorg.apache.lucene.index.IndexWriterConfig;
importorg.apache.lucene.queryparser.classic.ParseException;
importorg.apache.lucene.queryparser.classic.QueryParser;
importorg.apache.lucene.search.IndexSearcher;
importorg.apache.lucene.search.Query;
importorg.apache.lucene.search.ScoreDoc;
importorg.apache.lucene.search.TopScoreDocCollector;
importorg.apache.lucene.store.Directory;
importorg.apache.lucene.store.RAMDirectory;
importorg.apache.lucene.util.Version;
 
importjava.io.IOException;
 
publicclass HelloLucene {
  publicstatic void main(String[] args) throwsIOException, ParseException {
    // 0. Specify the analyzer for tokenizing text.
    //    The same analyzer should be used for indexing and searching
    StandardAnalyzer analyzer = newStandardAnalyzer(Version.LUCENE_40);
 
    // 1. create the index
    Directory index = newRAMDirectory();
 
    IndexWriterConfig config = newIndexWriterConfig(Version.LUCENE_40, analyzer);
 
    IndexWriter w = newIndexWriter(index, config);
    addDoc(w,"Lucene in Action","193398817");
    addDoc(w,"Lucene for Dummies","55320055Z");
    addDoc(w,"Managing Gigabytes","55063554A");
    addDoc(w,"The Art of Computer Science","9900333X");
    w.close();
 
    // 2. query
    String querystr = args.length > 0? args[0] : "lucene";
 
    // the "title" arg specifies the default field to use
    // when no field is explicitly specified in the query.
    Query q = newQueryParser(Version.LUCENE_40, "title", analyzer).parse(querystr);
 
    // 3. search
    inthitsPerPage = 10;
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = newIndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;
     
    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(inti=0;i<hits.length;++i) {
      intdocId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("isbn") + "\t"+ d.get("title"));
    }
 
    // reader can only be closed when there
    // is no need to access the documents any more.
    reader.close();
  }
 
  privatestatic void addDoc(IndexWriter w, String title, String isbn) throwsIOException {
    Document doc = newDocument();
    doc.add(newTextField("title", title, Field.Store.YES));
 
    // use a string field for isbn because we don't want it tokenized
    doc.add(newStringField("isbn", isbn, Field.Store.YES));
    w.addDocument(doc);
  }
}

可以直接在命令行中使用這個小應用,鍵入java HelloLucene 

發佈了22 篇原創文章 · 獲贊 4 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章