第二章索引

2.1 索引過程圖解

2.2 索引建立步驟

1.創建Directory

package com.mzsx.write;
 
import java.io.File;
import java.io.IOException;
 
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
 
public class DirectoryConext {
         privatestatic Directory directory=null;
         privateDirectoryConext(){}
         publicstatic Directory getDirectory(String fileName){
                   if(directory==null) {
                            synchronized(DirectoryConext.class){
                                     if(directory==null) {
                                               try{
                                                        directory=FSDirectory.open(new File(fileName));
                                               }catch (IOException e) {
                                                        e.printStackTrace();
                                               }
                                     }
                            }
                   }
                   returndirectory;
         }
}

2. 創建Writer

package com.mzsx.write;
 
import java.io.IOException;
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
importorg.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;
 
public class IndexWriterContext {
         privatestatic IndexWriter indexWrite=null;
         privatestatic Directory directory=null;
         privatestatic Analyzer analyzer=null;
         privateIndexWriterContext(){}
         publicstatic IndexWriter getIndexWrite(String fileName,Analyzer a){
                   try{
                            if(indexWrite==null) {
                                     directory=DirectoryConext.getDirectory(fileName);
                                     synchronized(IndexWriterContext.class){
                                               if(indexWrite==null) {
                                                        indexWrite=newIndexWriter(directory,new IndexWriterConfig(Version.LUCENE_35,a));
                                                        //indexWrite.commit();
                                               }
                                     }
                            }
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (LockObtainFailedException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
                   
                   returnindexWrite;
         }
         publicstatic IndexWriter getIndexWrite(Directory dir,Analyzer a){
                   try{
                            if(indexWrite==null) {
                                     directory=dir;
                                     synchronized(IndexWriterContext.class){
                                               if(indexWrite==null) {
                                                        indexWrite=newIndexWriter(directory,new IndexWriterConfig(Version.LUCENE_35,a));
                                               }
                                     }
                            }
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (LockObtainFailedException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
                   
                   returnindexWrite;
         }
}

3. 創建文檔並且添加索引

         // 創建索引
         publicvoid createdIndex(String fName) {
                   try{
                            indexWriter.deleteAll();
                            Filefile = new File(fName);
                            if(!file.isDirectory()) {
                                     try{
                                               thrownew Exception("您傳入的不是一個目錄路徑。。。");
                                     }catch (Exception e) {
                                               e.printStackTrace();
                                     }
                            }
                            for(File f : file.listFiles()) {
                                     Document doc =getDocument(f);
                                     indexWriter.addDocument(doc);
                            }
                            indexWriter.commit();
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }catch (Exception e) {
                            e.printStackTrace();
                   }
         }

// 遍歷文件生產document
         protectedDocument getDocument(File f) throws Exception {
                   //System.out.println(FileUtils.readFileToString(f));
                   Documentdoc = new Document();
                   doc.add(newField("id", ("" + (id++)), Field.Store.YES,
                                     Field.Index.NOT_ANALYZED));
                   doc.add(newField("contents", FileUtils.readFileToString(f),
                                     Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
                   doc.add(newField("filename", f.getName(), Field.Store.YES,
                                     Field.Index.ANALYZED));
                   doc.add(newField("fullpath", f.getCanonicalPath(), Field.Store.YES,
                                     Field.Index.NOT_ANALYZED));
                   doc.add(newNumericField("size", Field.Store.YES,true).setLongValue(f.length()));
                   doc.add(newNumericField("date", Field.Store.YES,true).setLongValue(f.lastModified()));
                   returndoc;
         }

4. 查詢索引的基本信息

// 查詢文件數量
         publicvoid queryNum() {
                   try{
                            IndexReaderindexReader=IndexReader.open(directory);
                            IndexSearchersearcher = new IndexSearcher(indexReader);
                            System.out.println("searcher.maxDoc="+ searcher.maxDoc());
                            System.out.println("indexReader.maxDoc="+indexReader.maxDoc());
                            System.out.println("indexReader.numDocs="+ indexReader.numDocs());
                            System.out.println("indexReader.numDeletedDocs="
                                               +indexReader.numDeletedDocs());
                            searcher.close();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
         }

5. 刪除和更新索引

索引的刪除主要包含了IndexWriter和IndexReader刪除。但是IndexWriter是2.9版本週出現的其本質還是調用IndexReader進行刪除操作。

         // 更新索引
         publicvoid update(String field, String name) {
                   Documentdocu = new Document();
                   docu.add(newField("id", "2222", Field.Store.YES,
                                     Field.Index.NOT_ANALYZED));
                   docu.add(newField("contents", "修改後的文件內容", Field.Store.NO,
                                     Field.Index.ANALYZED_NO_NORMS));
                   docu.add(newField("filename", "這是修改後的文件名", Field.Store.YES,
                                     Field.Index.NOT_ANALYZED));
                   docu.add(newField("fullpath", "這是修改後的文件後的文件路徑", Field.Store.YES,
                                     Field.Index.NOT_ANALYZED));
                   
                   try{
                            indexWriter.updateDocument(newTerm(field, name), docu,analyzer);
                            indexWriter.commit();
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
         }

//刪除指定ID
         publicvoid deleteByIndexWriter(String field, String value){
                   try{
                            indexWriter.deleteDocuments(newTerm(field,value));
                            indexWriter.commit();
                            //indexWriter.close();
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
         }

//刪除指定ID
         publicvoid deleteByIndexReader(String field, String value){
                   try{
                            indexReader.deleteDocuments(newTerm(field,value));
                            //必須close()
                            indexReader.close();
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
         }

//刪除恢復
         publicvoid unDelete(){
                   try{
                            indexReader.undeleteAll();
                            //必須close()
                            indexReader.close();
                   }catch (CorruptIndexException e) {
                            e.printStackTrace();
                   }catch (IOException e) {
                            e.printStackTrace();
                   }
         }

2.3 域選項

1. 域索引選項

使用Field.Index.*來進行操作

Index.ANALYZED:進行分詞和索引，適用於標題、內容等

Index.NOT_ANALYZED:進行索引，但是不進行分詞，如果***號，姓名，ID等，適用於精確搜索

Index.ANALYZED_NOT_NORMS:進行分詞但是不存儲norms信息，這個norms中包括了創建索引的時間和權值等信息

Index.NOT_ANALYZED_NOT_NORMS:即不進行分詞也不存儲norms信息

Index.NO:不進行索引

注：沒有norms意味着索引階段禁用了文檔boost和域的boost及長度標準化。好處在於節省內存，不用在搜索階段爲索引中的每篇文檔的每個域都佔用一個字節來保存norms信息了。但是對norms信息的禁用是必須全部域都禁用的，一旦有一個域不禁用，則其他禁用的域也會存放默認的norms值。因爲爲了加快norms的搜索速度，Lucene是根據文檔號乘以每篇文檔的norms信息所佔用的大小來計算偏移量的，中間少一篇文檔，偏移量將無法計算。也即norms信息要麼都保存，要麼都不保存。

2. 域存儲選項

Field.Store.*

YES:將會存儲域值，原始字符串的值會保存在索引，以此可以進行相應的恢復操作，對於主鍵，標題可以是這種方式存儲

NO：不會存儲域值，通常與Index.ANAYLIZED合起來使用，索引一些如文章正文等不需要恢復的文檔

3.最佳實踐

NOT_ANALYZED_NOT_NORMS	YES	標識符(主鍵、文件名)，電話號碼，***號，姓名，日期
ANAYLZED	YES	文檔標題和摘要
ANAYLZED	NO	文檔正文
NO	YES	文檔類型，數據庫主鍵（不進行索引）
NOT_ANALYZED	NO	隱藏關鍵字

2.4 其他知識

1. 對數字和日期進行索引

(1)、對數字進行索引可以使用分詞器進行不同的索引

·WhitespaceAnalyzer和StandardAnalyzer會索引數字

·SimpleAnalyzer和StopAnalyzer不會索引數字

(2)、在3.0之後添加了數字域來完成數字和日期的索引

doc.add(new NumericField("size", Field.Store.YES, true).setLongValue(f.length()));
doc.add(new NumericField("date", Field.Store.YES,true).setLongValue(f.lastModified()));

2．常用的Directory

FSDDirectory.open會根據當前的運行環境打開一個最合理的基於File的Directory

new RAMDirectory()會從內存中打開directory,好處是速度快，缺點是無法持久化

3. IndexReader和IndexWriter的生命週期

對於IndexReader而言，反覆使用Index.open打開會有很大的開銷，所以一般在整個程序的生命週期中只會打開一個IndexReader,通過這個IndexReader來創建不同的IndexSearcher,如果使用單例模式，可能出現的問題有：

(1)、當使用Writer修改了索引之後不會更新信息，所以需要使用IndexReader.openIfChange方法操作

如果IndexWriter在創建完成之後，沒有關閉，需要進行commit操作之後才能提交

2.1 索引過程圖解

2.2 索引建立步驟

2.3 域選項

2.4 其他知識

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

Golang modules 初探

Golang面試題解析（五）

此博客不再更新了

MySQL性能調優與架構設計-架構篇

NIO入門系列之第9章：字符集

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

第二章 索引

2.1 索引過程圖解

2.2 索引建立步驟

2.3 域選項

2.4 其他知識

第二章索引