lucene series 1 document 文檔 索引創建

此圖爲lucene 獲取數據建立索引  Understanding the indexing process

lucene 建立索引步驟

1,Extracting text and creating the document   提取數據,創建文檔

2,Analysis         分析獲取的內容,採用分詞 過濾stop word(非關鍵字)

3,add  to the index  添加分析後的結果到 index

在此步 把索引分塊存儲,爲了檢索給力,這就是爲什麼lucene index中爲啥有那麼多小文件的原因吧


代碼提現:

 protected String[] ids = {"1", "2"};
  protected String[] unindexed = {"Netherlands", "Italy"};
  protected String[] unstored = {"Amsterdam has lots of bridges",
                                 "Venice has lots of canals"};
  protected String[] text = {"Amsterdam", "Venice"};
  private Directory directory;
  protected void setUp() throws Exception {
    directory = new RAMDirectory();  //內存字典
    IndexWriter writer = getWriter(); //io操作,需要輸出流,理所當然
    for (int i = 0; i < ids.length; i++) 
{
      Document doc = new Document();
      doc.add(new Field("id", ids[i],      
                        Field.Store.YES, 
                        Field.Index.NOT_ANALYZED)); //id 沒有必要分詞
      doc.add(new Field("country", unindexed[i],
                        Field.Store.YES,
                        Field.Index.NO));
      doc.add(new Field("contents", unstored[i],
                        Field.Store.NO,
                        Field.Index.ANALYZED));//內容分詞
      doc.add(new Field("city", text[i],  /**獲取數據**/
                        Field.Store.YES,
                        Field.Index.ANALYZED)); /**分析數據**/
	writer.addDocument(doc);   /**加入文檔**/
} 
writer.close();  /**生成index**/
}





發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章