此圖爲lucene 獲取數據建立索引 Understanding the indexing process
lucene 建立索引步驟
1,Extracting text and creating the document 提取數據,創建文檔
2,Analysis 分析獲取的內容,採用分詞 過濾stop word(非關鍵字)
3,add to the index 添加分析後的結果到 index
在此步 把索引分塊存儲,爲了檢索給力,這就是爲什麼lucene index中爲啥有那麼多小文件的原因吧
代碼提現:
protected String[] ids = {"1", "2"};
protected String[] unindexed = {"Netherlands", "Italy"};
protected String[] unstored = {"Amsterdam has lots of bridges",
"Venice has lots of canals"};
protected String[] text = {"Amsterdam", "Venice"};
private Directory directory;
protected void setUp() throws Exception {
directory = new RAMDirectory(); //內存字典
IndexWriter writer = getWriter(); //io操作,需要輸出流,理所當然
for (int i = 0; i < ids.length; i++)
{
Document doc = new Document();
doc.add(new Field("id", ids[i],
Field.Store.YES,
Field.Index.NOT_ANALYZED)); //id 沒有必要分詞
doc.add(new Field("country", unindexed[i],
Field.Store.YES,
Field.Index.NO));
doc.add(new Field("contents", unstored[i],
Field.Store.NO,
Field.Index.ANALYZED));//內容分詞
doc.add(new Field("city", text[i], /**獲取數據**/
Field.Store.YES,
Field.Index.ANALYZED)); /**分析數據**/
writer.addDocument(doc); /**加入文檔**/
}
writer.close(); /**生成index**/
}