lucence入門

更新：下面的代碼使用Lucene 4.0版本！

Lucene大大簡化了在應用中集成全文搜索的功能。但實際上Lucene十分簡單，我可以在五分鐘之內向你展示如何使用Lucene。

1. 建立索引

爲了簡單起見，我們下面爲一些字符串創建內存索引：

StandardAnalyzer
analyzer = newStandardAnalyzer(Version.LUCENE_40);

Directory
index = newRAMDirectory();

 

IndexWriterConfig
config = newIndexWriterConfig(Version.LUCENE_40,
analyzer);

 

IndexWriter
w = newIndexWriter(index,
config);

addDoc(w,"Lucene
in Action","193398817");

addDoc(w,"Lucene
for Dummies","55320055Z");

addDoc(w,"Managing
Gigabytes","55063554A");

addDoc(w,"The
Art of Computer Science","9900333X");

w.close();

addDoc()方法把文檔（譯者注：這裏的文檔是Lucene中的Document類的實例）添加到索引中。

privatestatic

void 
addDoc(IndexWriter w, String title, String isbn) throwsIOException
{

  Document
doc = newDocument();

  doc.add(newTextField("title",
title, Field.Store.YES));

  doc.add(newStringField("isbn",
isbn, Field.Store.YES));

  w.addDocument(doc);

}

注意，對於需要分詞的內容我們使用TextField，對於像id這樣不需要分詞的內容我們使用StringField。

2.搜索請求

我們從標準輸入（stdin）中讀入搜索請求，然後對它進行解析，最後創建一個Lucene中的Query對象。

String
querystr = args.length > 0?
args[0]
: "lucene";

Query
q = newQueryParser(Version.LUCENE_40,
"title",
analyzer).parse(querystr);

3.搜索

我們創建一個Searcher對象並且使用上面創建的Query對象來進行搜索，匹配到的前10個結果封裝在TopScoreDocCollector對象裏返回。

inthitsPerPage
= 10;

IndexReader
reader = IndexReader.open(index);

IndexSearcher
searcher = newIndexSearcher(reader);

TopScoreDocCollector
collector = TopScoreDocCollector.create(hitsPerPage, true);

searcher.search(q,
collector);

ScoreDoc[]
hits = collector.topDocs().scoreDocs;

4.展示

現在我們得到了搜索結果，我們需要想用戶展示它。

System.out.println("Found
" 
+ hits.length + "
hits.");

for(inti=0;i<hits.length;++i)
{

    intdocId
= hits[i].doc;

    Document
d = searcher.doc(docId);

    System.out.println((i
+ 1)
+ ".
" 
+ d.get("isbn")
+ "\t"+
d.get("title"));

}

這裏是這個小應用的完整代碼。下載HelloLucene.java。

importorg.apache.lucene.analysis.standard.StandardAnalyzer;

importorg.apache.lucene.document.Document;

importorg.apache.lucene.document.Field;

importorg.apache.lucene.document.StringField;

importorg.apache.lucene.document.TextField;

importorg.apache.lucene.index.DirectoryReader;

importorg.apache.lucene.index.IndexReader;

importorg.apache.lucene.index.IndexWriter;

importorg.apache.lucene.index.IndexWriterConfig;

importorg.apache.lucene.queryparser.classic.ParseException;

importorg.apache.lucene.queryparser.classic.QueryParser;

importorg.apache.lucene.search.IndexSearcher;

importorg.apache.lucene.search.Query;

importorg.apache.lucene.search.ScoreDoc;

importorg.apache.lucene.search.TopScoreDocCollector;

importorg.apache.lucene.store.Directory;

importorg.apache.lucene.store.RAMDirectory;

importorg.apache.lucene.util.Version;

importjava.io.IOException;

publicclass

HelloLucene {

  publicstatic

void 
main(String[] args) throwsIOException,
ParseException {

    //
0. Specify the analyzer for tokenizing text.

    //   
The same analyzer should be used for indexing and searching

    StandardAnalyzer
analyzer = newStandardAnalyzer(Version.LUCENE_40);

    //
1. create the index

    Directory
index = newRAMDirectory();

    IndexWriterConfig
config = newIndexWriterConfig(Version.LUCENE_40,
analyzer);

    IndexWriter
w = newIndexWriter(index,
config);

    addDoc(w,"Lucene
in Action","193398817");

    addDoc(w,"Lucene
for Dummies","55320055Z");

    addDoc(w,"Managing
Gigabytes","55063554A");

    addDoc(w,"The
Art of Computer Science","9900333X");

    w.close();

    //
2. query

    String
querystr = args.length > 0?
args[0]
: "lucene";

    //
the "title" arg specifies the default field to use

    //
when no field is explicitly specified in the query.

    Query
q = newQueryParser(Version.LUCENE_40,
"title",
analyzer).parse(querystr);

    //
3. search

    inthitsPerPage
= 10;

    IndexReader
reader = DirectoryReader.open(index);

    IndexSearcher
searcher = newIndexSearcher(reader);

    TopScoreDocCollector
collector = TopScoreDocCollector.create(hitsPerPage, true);

    searcher.search(q,
collector);

    ScoreDoc[]
hits = collector.topDocs().scoreDocs;

    //
4. display results

    System.out.println("Found
" 
+ hits.length + "
hits.");

    for(inti=0;i<hits.length;++i)
{

      intdocId
= hits[i].doc;

      Document
d = searcher.doc(docId);

      System.out.println((i
+ 1)
+ ".
" 
+ d.get("isbn")
+ "\t"+
d.get("title"));

    }

    //
reader can only be closed when there

    //
is no need to access the documents any more.

    reader.close();

  }

  privatestatic

void 
addDoc(IndexWriter w, String title, String isbn) throwsIOException
{

    Document
doc = newDocument();

    doc.add(newTextField("title",
title, Field.Store.YES));

    //
use a string field for isbn because we don't want it tokenized

    doc.add(newStringField("isbn",
isbn, Field.Store.YES));

    w.addDocument(doc);

  }

}

可以直接在命令行中使用這個小應用，鍵入java HelloLucene 。

steve_tao_csdn

發佈了22 篇原創文章 · 獲贊 4 · 訪問量 2萬+

私信關注

1. 建立索引

2.搜索請求

3.搜索

4.展示

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

京東面試：如何進行JVM調優？

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

pycharm anaconda

導入證書到jdk步驟

jvm監測、調優

pig-hive-elasticsearch

mysql登錄不上的問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結