Lucene（一）hello world

原創

ClarkKentYang

2020-07-04 08:00

定義：Lucene是apache旗下的頂級項目，是一個全文檢索工具包。可以通過其構建全文檢索引擎系統，但其不能獨立運行。

應用領域：

1，互聯網全文檢索引擎

2，站內全文檢索引擎

3，優化數據庫查詢

創建索引：

	@Test
	public void testIndexSearchTest() throws Exception{
		//創建文檔列表，保存多個文件信息
		List<Document> docList = new ArrayList<>();
		
		//指定文件所在目錄
		File dir = new File("文件路徑");
		for (File file : dir.listFiles()) {
			//文件名稱
			String fileName = file.getName();
			//文件內容
			String fileContext = FileUtils.readFileToString(file);
			//文件大小
			Long fileSize = FileUtils.sizeOf(file);
			
			//採集文件系統中的文檔數據，放入lucene
			//文檔對象
			Document document = new Document();
			
			/*
			 * 第一個參數：域名
			 * 第二個參數：域值
			 * 第三個參數：是否存儲
			 */
			TextField nameField = new TextField("fileName",fileName, Store.YES);
			TextField contextField = new TextField("fileContext",fileContext, Store.YES);
			TextField sizeField = new TextField("fileSize",fileSize.toString(), Store.YES);
			
			//將域存儲到文檔中
			document.add(nameField);
			document.add(contextField);
			document.add(sizeField);
			
			//將文檔存入文檔集合
			docList.add(document);
		}
		
		//創建分詞器
		Analyzer analyzer = new StandardAnalyzer();
		//指定索引和文檔存儲的目錄
		Directory directory = FSDirectory.open(new File("生成索引路徑"));
		//創建寫對象的初始化對象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//創建索引和文檔寫對象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//將文檔加入到索引和文檔的對象中
		for (Document document : docList) {
			indexWriter.addDocument(document);
		}
		//提交
		indexWriter.commit();
		indexWriter.close();
	}

通過索引搜索：

	@Test
	public void testIndexSearch() throws Exception{
		
		//創建分詞器
		Analyzer analyzer = new StandardAnalyzer();
		//指定索引和文檔的目錄
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//讀取對象
		IndexReader indexReader = IndexReader.open(directory);
		//創建索引搜索對象
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);
		//創建查詢語句對象:第一個參數表示搜索域，第二個參數表示分詞器
		QueryParser queryParser = new QueryParser("fileContext", analyzer);
		//查詢語法：域名：搜索的關鍵字
		Query query = queryParser.parse("fileName:apache");
		//搜索：第一個參數表示查詢語句，第二個參數表示顯示條數
		TopDocs topDocs = indexSearcher.search(query, 10);
		System.out.println("一共搜索到記錄條數爲:"+topDocs.totalHits);
		//遍歷結果集
		for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
			//獲取docId
			int docId = scoreDoc.doc;
			//通過docId從硬盤中讀取數據
			Document document = indexReader.document(docId);
			System.out.println("fileName:"+document.get("fileName")+",fileSize:"+document.get("fileSize"));
		}
	}

索引的刪除：

	@Test
	public void testIndexDelete() throws Exception{
		//創建分詞器
		Analyzer analyzer = new IKAnalyzer();//IKAnalyzer中文分詞器，StandardAnalyzer普通分詞器
		//指定索引和文檔存儲的目錄
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//創建寫對象的初始化對象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//創建索引和文檔寫對象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//刪除所有
		//indexWriter.deleteAll();
		//Term表示詞元，第一個參數表示域名，第二個參數表示要刪除的數據
		indexWriter.deleteDocuments(new Term("fileName","apache"));
		indexWriter.commit();
		indexWriter.close();
	}

索引的修改：

	@Test
	public void testIndexUpdate() throws Exception{
		//創建分詞器
		Analyzer analyzer = new IKAnalyzer();//IKAnalyzer中文分詞器，StandardAnalyzer普通分詞器
		//指定索引和文檔存儲的目錄
		Directory directory = FSDirectory.open(new File("G:\\luceneTest"));
		//創建寫對象的初始化對象
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
		//創建索引和文檔寫對象
		IndexWriter indexWriter = new IndexWriter(directory, config);
		
		//更新,即先查詢，再刪除，再添加
		Term term = new Term("fileName","web");
		Document document = new Document();
		document.add(new TextField("fileName", "xxx",Store.YES));
		document.add(new TextField("fileContext", "think in java xxx",Store.YES));
		document.add(new LongField("fileSize", 100L,Store.YES));

		indexWriter.updateDocument(term, document);
		indexWriter.commit();
		indexWriter.close();
	}

各種搜索類：

TermQuery:根據詞進行搜索(只能從文本中進行搜索)

QueryParser:根據域名進行搜索,可以設置默認搜索域,推薦使用. (只能從文本中進行搜索)

NumericRangeQuery:從數值範圍進行搜索

BooleanQuery:組合查詢,可以設置組合條件,not and or.從多個域中進行查詢

must相當於and關鍵字,是並且的意思

should,相當於or關鍵字或者的意思

must_not相當於not關鍵字, 非的意思

注意:單獨使用must_not 或者獨自使用must_not沒有任何意義

MatchAllDocsQuery:查詢出所有文檔

MultiFieldQueryParser:可以從多個域中進行查詢,只有這些域中有關鍵詞的存在就查詢出來.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Lucene（一）hello world

DAPPER 事務 TRANSACTION

Linux環境下安裝Tomcat

hibernate之一級緩存和二級緩存

Struts2之HelloWorld

設計模式（八）原型模式

設計模式（九）適配器模式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結