全文檢索簡介（什麼是全文檢索）

數據分類

我們生活中的數據總體分爲兩種：結構化數據和非結構化數據。
結構化數據：指具有固定格式或有限長度的數據，如數據庫，元數據等。
非結構化數據：指不定長或無固定格式的數據，如郵件，word文檔等磁盤上的文件

結構化數據搜索

常見的結構化數據也就是數據庫中的數據。在數據庫中搜索很容易實現，通常都是使用sql語句進行查詢，而且能很快的得到查詢結果。
爲什麼數據庫搜索很容易？
因爲數據庫中的數據存儲是有規律的，有行有列而且數據格式、數據長度都是固定的。

非結構化數據查詢方法

（1）順序掃描法(Serial Scanning)
所謂順序掃描，比如要找內容包含某一個字符串的文件，就是一個文檔一個文檔的看，對於每一個文檔，從頭看到尾，如果此文檔包含此字符串，則此文檔爲我們要找的文件，接着看下一個文件，直到掃描完所有的文件。如利用windows的搜索也可以搜索文件內容，只是相當的慢。

（2）全文檢索(Full-text Search)
將非結構化數據中的一部分信息提取出來，重新組織，使其變得有一定結構，然後對此有一定結構的數據進行搜索，從而達到搜索相對較快的目的。這部分從非結構化數據中提取出的然後重新組織的信息，我們稱之索引。
例如：字典。字典的拼音表和部首檢字表就相當於字典的索引，對每一個字的解釋是非結構化的，如果字典沒有音節表和部首檢字表，在茫茫辭海中找一個字只能順序掃描。然而字的某些信息可以提取出來進行結構化處理，比如讀音，就比較結構化，分聲母和韻母，分別只有幾種可以一一列舉，於是將讀音拿出來按一定的順序排列，每一項讀音都指向此字的詳細解釋的頁數。我們搜索時按結構化的拼音搜到讀音，然後按其指向的頁數，便可找到我們的非結構化數據——也即對字的解釋。
這種先建立索引，再對索引進行搜索的過程就叫全文檢索(Full-text Search)。
雖然創建索引的過程也是非常耗時的，但是索引一旦創建就可以多次使用，全文檢索主要處理的是查詢，所以耗時間創建索引是值得的。

全文檢索的應用場景

對於數據量大、數據結構不固定的數據可採用全文檢索方式搜索，比如百度、Google等搜索引擎、論壇站內搜索、電商網站站內搜索等。

Lucene簡介

Lucene是apache下的一個開放源代碼的全文檢索引擎工具包。提供了完整的查詢引擎和索引引擎，部分文本分析引擎。Lucene的目的是爲軟件開發人員提供一個簡單易用的工具包，以方便的在目標系統中實現全文檢索的功能。

現在的互聯網企業中一般都不會去使用Lucene 去應用到開發環境中。外面現在一般使用的都是搜索服務器。但是現在的主流的搜索服務器 Elasticsearch 、Solr 都是基於Lucene開發的，所以學習Lucene後你能明白這兩個搜索服務器的底層原理。那麼Lucene學習就非常有必要了。

下面是Lucene 實現全文搜索的基本流程

創建索引庫

對文檔索引的過程，將用戶要搜索的文檔內容進行索引，索引存儲在索引庫（index）中。

這裏我們要搜索的文檔是磁盤上的文本文件，根據案例描述：凡是文件名或文件內容包括關鍵字的文件都要找出來，這裏要對文件名和文件內容文件路徑創建索引。來進行搜索

理論部分（很重要）

創建原始文檔

原始文檔是指要索引和搜索的內容。原始內容包括互聯網上的網頁、數據庫中的數據、磁盤上的文件等。這裏入門案例就使用文件來做原始文檔

從互聯網上、數據庫、文件系統中等獲取需要搜索的原始信息，這個過程就是信息採集，信息採集的目的是爲了對原始內容進行索引。
在Internet上採集信息的軟件通常稱爲爬蟲或蜘蛛，也稱爲網絡機器人，爬蟲訪問互聯網上的每一個網頁，將獲取到的網頁內容存儲起來。
入門案例要獲取磁盤上文件的內容，可以通過文件流來讀取文本文件的內容，對於pdf、doc、xls等文件可通過第三方提供的解析工具讀取文件內容，比如Apache POI讀取doc和xls的文件內容。

創建文檔對象

獲取原始內容的目的是爲了索引，在索引前需要將原始內容創建成文檔（Document），文檔中包括一個一個的域（Field 相當於數據庫中的列段），域中存儲內容。
這裏我們可以將磁盤上的一個文件當成一個document，Document中包括一些Field（fileName文件名稱、fullPath文件路徑、contents文件內容），如下圖：

注意：每個Document可以有多個Field，不同的Document可以有不同的Field，同一個Document可以有相同的Field（域名和域值都相同）

每個文檔都有一個唯一的編號，就是文檔id。

分析文檔

將原始內容創建爲包含域（Field）的文檔（document），需要再對域中的內容進行分析，分析的過程是經過對原始文檔提取單詞、將字母轉爲小寫、去除標點符號、去除停用詞等過程生成最終的語彙單元，可以將語彙單元理解爲一個一個的單詞。

比如下邊的文檔經過分析如下：
原文檔內容：
Lucene is a Java full-text search engine. Lucene is not a complete
application, but rather a code library and API that can easily be used
to add search capabilities to applications.

分析後得到的語彙單元：
lucene、java、full、search、engine。。。。

每個單詞叫做一個Term，不同的域中拆分出來的相同的單詞是不同的term。term中包含兩部分一部分是文檔的域名，另一部分是單詞的內容。
例如：文件名中包含apache和文件內容中包含的apache是不同的term。

創建索引（倒排索引的概念）

對所有文檔分析得出的語彙單元進行索引，索引的目的是爲了搜索，最終要實現只搜索被索引的語彙單元從而找到Document（文檔）。

注意：創建索引是對語彙單元索引，通過詞語找文檔，這種索引的結構叫倒排索引結構。

傳統方法是根據文件找到該文件的內容，在文件內容中匹配搜索關鍵字，這種方法是順序掃描方法，數據量大、搜索慢。倒排索引結構是根據內容（詞語）找文檔，如下圖：

倒排索引結構也叫反向索引結構，包括索引和文檔兩部分，索引即詞彙表，它的規模較小，而文檔集合較大。

創建索引代碼

這裏先導入Lucene的pom.xml 依賴

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>5.3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>5.3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-common</artifactId>
    <version>5.3.1</version>
</dependency>

代碼

package com.cpc.lucene;

import java.io.File;
import java.io.FileReader;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

/**
 * 配合Demo1.java進行lucene的helloword實現
 * @author Administrator
 *
 */
public class IndexCreate {
	private IndexWriter indexWriter;
	
	/**
	 * 1、構造方法 實例化IndexWriter
	 * @param indexDir
	 * @throws Exception
	 */
	public IndexCreate(String indexDir) throws Exception{
//		獲取索引文件的存放地址對象
		FSDirectory dir = FSDirectory.open(Paths.get(indexDir));
//		標準分詞器（針對英文,中文分析去會有問題 ）
		Analyzer analyzer = new StandardAnalyzer();
//		索引輸出流配置對象
		IndexWriterConfig conf = new IndexWriterConfig(analyzer); 
		indexWriter = new IndexWriter(dir, conf);
	}
	
	/**
	 * 2、關閉索引輸出流
	 * @throws Exception
	 */
	public void closeIndexWriter()  throws Exception{
		indexWriter.close();
	}
	
	/**
	 * 3、索引指定路徑下的所有文件
	 * @param dataDir
	 * @return
	 * @throws Exception
	 */
	public int index(String dataDir) throws Exception{
		File[] files = new File(dataDir).listFiles();
		for (File file : files) {
			indexFile(file);
		}
		return indexWriter.numDocs();
	}
	
	/**
	 * 4、索引指定的文件
	 * @param file
	 * @throws Exception
	 */
	private void indexFile(File file) throws Exception{
		System.out.println("被索引文件的全路徑："+file.getCanonicalPath());
		Document doc = getDocument(file);
		indexWriter.addDocument(doc);
	}
	
	/**
	 * 5、獲取文檔（索引文件中包含的重要信息，key-value的形式）
	 * @param file
	 * @return
	 * @throws Exception
	 */
	private Document getDocument(File file) throws Exception{
		Document doc = new Document();
		doc.add(new TextField("contents", new FileReader(file)));
//		Field.Store.YES是否存儲到硬盤
		doc.add(new TextField("fullPath", file.getCanonicalPath(),Field.Store.YES));
		doc.add(new TextField("fileName", file.getName(),Field.Store.YES));
		return doc;
	}
}

package com.cpc.lucene;

public class Demo1 {
	
	public static void main(String[] args) {
//		索引文件將要存放的位置
		String indexDir = "E:\\temp\\demo1";
//		數據源地址
		String dataDir = "E:\\temp\\demo1\\data";
		IndexCreate ic = null; 
		try {
			ic = new IndexCreate(indexDir);
			long start = System.currentTimeMillis();
			int num = ic.index(dataDir);
			long end = System.currentTimeMillis();
			System.out.println("檢索指定路徑下"+num+"個文件，一共花費了"+(end-start)+"毫秒");
		} catch (Exception e) {
			e.printStackTrace();
		}finally {
			try {
				ic.closeIndexWriter();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}

分析工具查看：
這裏可以查看到構建索引所拆出的一個個詞，通過這些詞能找到對應的文檔。其實原來也就這麼簡單

使用索引

從索引文件中拿數據

1、獲取輸入流（通過dirReader）

2、獲取索引搜索對象（通過輸入流來拿）

3、獲取查詢對象（通過查詢解析器來獲取，解析器是通過分詞器獲取）

4、獲取包含關鍵字排前面的文檔對象集合

5、可以獲取對應文檔的內容

package com.cpc.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;

/**
 * 配合Demo2.java進行lucene的helloword實現
 * @author Administrator
 *
 */
public class IndexUse {
	/**
	 * 通過關鍵字在索引目錄中查詢
	 * @param indexDir	索引文件所在目錄
	 * @param q	關鍵字
	 */
	public static void search(String indexDir, String q) throws Exception{
		FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));
//		注意:索引輸入流不是new出來的，是通過目錄讀取工具類打開的
		IndexReader indexReader = DirectoryReader.open(indexDirectory);
//		獲取索引搜索對象
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);
		Analyzer analyzer = new StandardAnalyzer();
		QueryParser queryParser = new QueryParser("contents", analyzer);
//		獲取符合關鍵字的查詢對象
		Query query = queryParser.parse(q);
		
		long start=System.currentTimeMillis();
//		獲取關鍵字出現的前十次
		TopDocs topDocs = indexSearcher.search(query , 10);
		long end=System.currentTimeMillis();
		System.out.println("匹配 "+q+" ，總共花費"+(end-start)+"毫秒"+"查詢到"+topDocs.totalHits+"個記錄");
		
		for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
			int docID = scoreDoc.doc;
//			索引搜索對象通過文檔下標獲取文檔
			Document doc = indexSearcher.doc(docID);
			System.out.println("通過索引文件："+doc.get("fullPath")+"拿數據");
		}
		
		indexReader.close();
	}
}


package com.javaxl.lucene;

/**
 * 查詢索引測試
 * @author Administrator
 *
 */
public class Demo2 {
	public static void main(String[] args) {
		String indexDir = "D:\\temp\\demo1\\";
		String q = "java";
		try {
			IndexUse.search(indexDir, q);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

測試結果

對索引的增刪改

package com.cpc.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 構建索引
 * 	對索引的增刪改
 * @author Administrator
 *
 */
public class Demo3 {
	private String ids[]={"1","2","3"};
	private String citys[]={"qingdao","nanjing","shanghai"};
	private String descs[]={
			"Qingdao is a beautiful city.",
			"Nanjing is a city of culture.",
			"Shanghai is a bustling city."
	};
	private FSDirectory dir;
	
	/**
	 * 每次都生成索引文件
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir  = FSDirectory.open(Paths.get("D:\\temp\\demo2\\indexDir"));
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new StringField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.NO));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();
	}

	/**
	 * 獲取索引輸出流
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter()  throws Exception{
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf );
	}
	
	/**
	 * 測試寫了幾個索引文件
	 * @throws Exception
	 */
	@Test
	public void getWriteDocNum() throws Exception {
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("索引目錄下生成"+indexWriter.numDocs()+"個索引文件");
	}
	
	/**
	 * 打上標記，該索引實際並未刪除
	 * @throws Exception
	 */
	@Test
	public void deleteDocBeforeMerge() throws Exception {
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("最大文檔數："+indexWriter.maxDoc());
		indexWriter.deleteDocuments(new Term("id", "1"));
		indexWriter.commit();
		
		System.out.println("最大文檔數："+indexWriter.maxDoc());
		System.out.println("實際文檔數："+indexWriter.numDocs());
		indexWriter.close();
	}
	
	/**
	 * 對應索引文件已經刪除,但是該版本的分詞會保留
	 * @throws Exception
	 */
	@Test
	public void deleteDocAfterMerge() throws Exception {
//		https://blog.csdn.net/asdfsadfasdfsa/article/details/78820030
//		org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine:indexWriter是單例的、線程安全的，不允許打開多個。
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("最大文檔數："+indexWriter.maxDoc());
		indexWriter.deleteDocuments(new Term("id", "1"));
		indexWriter.forceMergeDeletes(); //強制刪除
		indexWriter.commit();
		
		System.out.println("最大文檔數："+indexWriter.maxDoc());
		System.out.println("實際文檔數："+indexWriter.numDocs());
		indexWriter.close();
	}
	
	/**
	 * 測試更新索引
	 * @throws Exception
	 */
	@Test
	public void testUpdate()throws Exception{
		IndexWriter writer=getIndexWriter();
		Document doc=new Document();
		doc.add(new StringField("id", "1", Field.Store.YES));
		doc.add(new StringField("city","qingdao",Field.Store.YES));
		doc.add(new TextField("desc", "dsss is a city.", Field.Store.NO));
		writer.updateDocument(new Term("id","1"), doc);
		writer.close();
	}
}

新增索引

刪除索引

合併前

合併後

注意：

數據量大時用合併前的刪除，只是給索引文件打標，定時清理打標的索引文件。
數據量不是特別大的時候，可以及時刪除索引文件。

修改索引

通過可視化工具可發現

注意：5.3的版本修改前的分詞不會消失。

文檔域加權(百度搜索排名）

關鍵字加權有利於排名的提升

package com.cpc.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 文檔域加權
 * @author Administrator
 *
 */
public class Demo4 {
	private String ids[]={"1","2","3","4"};
	private String authors[]={"Jack","Marry","John","Json"};
	private String positions[]={"accounting","technician","salesperson","boss"};
	private String titles[]={"Java is a good language.","Java is a cross platform language","Java powerful","You should learn java"};
	private String contents[]={
			"If possible, use the same JRE major version at both index and search time.",
			"When upgrading to a different JRE major version, consider re-indexing. ",
			"Different JRE major versions may implement different versions of Unicode,",
			"For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,"
	};
	
	private Directory dir;//索引文件目錄

	@Before
	public void setUp()throws Exception {
		dir = FSDirectory.open(Paths.get("D:\\temp\\demo3\\indexDir"));
		IndexWriter writer = getIndexWriter();
		for (int i = 0; i < authors.length; i++) {
			Document doc = new Document();
			doc.add(new StringField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("author", authors[i], Field.Store.YES));
			doc.add(new StringField("position", positions[i], Field.Store.YES));
			
			TextField textField = new TextField("title", titles[i], Field.Store.YES);
			
//			Json投錢做廣告，把排名刷到第一了
			if("boss".equals(positions[i])) {
				textField.setBoost(2f);//設置權重，默認爲1
			}
			
			doc.add(textField);
//			TextField會分詞，StringField不會分詞
			doc.add(new TextField("content", contents[i], Field.Store.NO));
			writer.addDocument(doc);
		}
		writer.close();
		
	}

	private IndexWriter getIndexWriter() throws Exception{
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf);
	}
	
	@Test
	public void index() throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		String fieldName = "title";
		String keyWord = "java";
		Term t = new Term(fieldName, keyWord);
		Query query = new TermQuery(t);
		TopDocs hits = searcher.search(query, 10);
		System.out.println("關鍵字：‘"+keyWord+"’命中了"+hits.totalHits+"次");
		for (ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = searcher.doc(scoreDoc.doc);
			System.out.println(doc.get("author"));
		}
	}
}

文檔域加權前結果

文檔域加權後結果變成

索引搜索功能

特定項搜索

代碼

package com.cpc.lucene;

import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 特定項搜索
 * 查詢表達式（queryParser）
 * @author Administrator
 *
 */
public class Demo5 {
	@Before
	public void setUp() {
		// 索引文件將要存放的位置
		String indexDir = "D:\\temp\\demo4";
		// 數據源地址
		String dataDir = "D:\\temp\\demo4\\data";
		IndexCreate ic = null;
		try {
			ic = new IndexCreate(indexDir);
			long start = System.currentTimeMillis();
			int num = ic.index(dataDir);
			long end = System.currentTimeMillis();
			System.out.println("檢索指定路徑下" + num + "個文件，一共花費了" + (end - start) + "毫秒");
			
			
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			try {
				ic.closeIndexWriter();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
	
	/**
	 * 特定項搜索
	 */
	@Test
	public void testTermQuery() {
		String indexDir = "D:\\temp\\demo4";
		
		String fld = "contents";
		String text = "indexformattoooldexception";
//		特定項片段名和關鍵字
		Term t  = new Term(fld , text);
		TermQuery tq = new TermQuery(t  );
		try {
			FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));
//			注意:索引輸入流不是new出來的，是通過目錄讀取工具類打開的
			IndexReader indexReader = DirectoryReader.open(indexDirectory);
//			獲取索引搜索對象
			IndexSearcher is = new IndexSearcher(indexReader);
			
			
			TopDocs hits = is.search(tq, 100);
//			System.out.println(hits.totalHits);
			for(ScoreDoc scoreDoc: hits.scoreDocs) {
				Document doc = is.doc(scoreDoc.doc);
				System.out.println("文件"+doc.get("fullPath")+"中含有該關鍵字");
				
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	@Test
	public void testQueryParser() {
		String indexDir = "D:\\temp\\demo4";
//		獲取查詢解析器（通過哪種分詞器去解析哪種片段）
		QueryParser queryParser = new QueryParser("contents", new StandardAnalyzer());
		try {
			FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));
//			注意:索引輸入流不是new出來的，是通過目錄讀取工具類打開的
			IndexReader indexReader = DirectoryReader.open(indexDirectory);
//			獲取索引搜索對象
			IndexSearcher is = new IndexSearcher(indexReader);
			
//			由解析器去解析對應的關鍵字
			TopDocs hits = is.search(queryParser.parse("indexformattoooldexception") , 100);
			for(ScoreDoc scoreDoc: hits.scoreDocs) {
				Document doc = is.doc(scoreDoc.doc);
				System.out.println("文件"+doc.get("fullPath")+"中含有該關鍵字");	
			}
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ParseException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

}

testTermQuery查詢結果：
控制檯結果

Lucene查看工具

查詢表達式（queryParser）結果： 與特定項搜索結果一樣的。但是，特定項搜索是沒有指定分詞器的。

分頁功能

方案一

一次全部查出來到session中，分頁的時候從session中那集合截取顯示。優勢是隻要查詢一次，缺陷是佔用內存。併發的可能性很高。得到命中文檔數組，通過下標拿命中文檔，從而獲取內容。

方案二

每次上一頁下一頁都是一次查詢，佔用時間。但是通常少有人點擊下一頁、得到命中文檔數組，通過下標拿命中文檔，從而獲取內容。

指定數字範圍查詢（numbericRangeQuery）

package com.cpc.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

public class Demo6 {
	private int ids[]={1,2,3};
	private String citys[]={"qingdao","nanjing","shanghai"};
	private String descs[]={
			"Qingdao is a beautiful city.",
			"Nanjing is a city of culture.",
			"Shanghai is a bustling city."
	};
	private FSDirectory dir;
	
	/**
	 * 每次都生成索引文件
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir  = FSDirectory.open(Paths.get("D:\\temp\\demo2\\indexDir"));
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new IntField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.NO));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();
	}
	
	/**
	 * 獲取索引輸出流
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter()  throws Exception{
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf );
	}

	@Test
	public void testNumericRangeQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		NumericRangeQuery<Integer> query=NumericRangeQuery.newIntRange("id", 1, 2, true, true);
		TopDocs hits=is.search(query, 10);
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}		
	}
}

指定字符串開頭字母查詢（prefixQuery）

@Test
	public void testPrefixQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		
		PrefixQuery query=new PrefixQuery(new Term("city","n"));
		TopDocs hits=is.search(query, 10);
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}	
	}

組合查詢（booleanQuery）重點

Must、must not、should

@Test
	public void testBooleanQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		
		NumericRangeQuery<Integer> query1=NumericRangeQuery.newIntRange("id", 1, 2, true, true);
		PrefixQuery query2=new PrefixQuery(new Term("city","s"));
		BooleanQuery.Builder booleanQuery=new BooleanQuery.Builder();
		booleanQuery.add(query1,BooleanClause.Occur.MUST);
		booleanQuery.add(query2,BooleanClause.Occur.MUST);
		TopDocs hits=is.search(booleanQuery.build(), 10);
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}	
	}

中文分詞&&高亮顯示

private Integer ids[]={1,2,3};
	private String citys[]={"青島","南京","上海"};
	private String descs[]={
			"青島是個美麗的城市。",
			"南京是個有文化的城市。",
			"上海市個繁華的城市。"
	};

爲了查看高亮顯示效果

南京是一個文化的城市南京，簡稱寧，是江蘇省會，地處中國東部地區，長江下游，瀕江近海。全市下轄11個區，總面積6597平方公里，2013年建成區面積752.83平方公里，常住人口818.78萬，其中城鎮人口659.1萬人。[1-4] “江南佳麗地，金陵帝王州”，南京擁有着6000多年文明史、近2600年建城史和近500年的建都史，是中國四大古都之一，有“六朝古都”、“十朝都會”之稱，是中華文明的重要發祥地，歷史上曾數次庇佑華夏之正朔，長期是中國南方的政治、經濟、文化中心，擁有厚重的文化底蘊和豐富的歷史遺存。[5-7] 南京是國家重要的科教中心，自古以來就是一座崇文重教的城市，有“天下文樞”、“東南第一學”的美譽。截至2013年，南京有高等院校75所，其中211高校8所，僅次於北京上海；國家重點實驗室25所、國家重點學科169個、兩院院士83人，均居中國第三。[8-10] 。",

使用標準分詞器對中文進行分詞的結果如下

把每個字都當作了一個詞，並沒有達到我們想要的效果，也就是說標準分詞器StandardAnalyzer已經不能滿足我們的開發需要了。

中文分詞

依賴

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-smartcn</artifactId>
    <version>5.3.1</version>
</dependency>

將標準分詞器換成中文分詞器

高亮顯示

依賴

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>5.3.1</version>
</dependency>

高亮顯示的步奏：

1、通過查詢對象，獲取查詢得分對象

2、通過得分對象，獲取對應的片段

3、實例化一個html格式化對象

4、通過html格式化實例和查詢得分實例，來實例化Lucene提供的高亮顯示類對象。

5、將前面獲取到的得分片段，設置到高亮顯示的的實例對象中。

6、通過分詞器獲取TokenStream令牌流對象

7、通過令牌和原有的片段，去拿高亮展示後的片段

相關代碼：

package com.cpc.lucene;

import java.io.StringReader;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.SimpleSpanFragmenter;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

public class Demo7 {
	private Integer ids[] = { 1, 2, 3 };
	private String citys[] = { "青島", "南京", "上海" };
	// private String descs[]={
	// "青島是個美麗的城市。",
	// "南京是個有文化的城市。",
	// "上海市個繁華的城市。"
	// };
	private String descs[] = { "青島是個美麗的城市。",
			"南京是一個文化的城市南京，簡稱寧，是江蘇省會，地處中國東部地區，長江下游，瀕江近海。全市下轄11個區，總面積6597平方公里，2013年建成區面積752.83平方公里，常住人口818.78萬，其中城鎮人口659.1萬人。[1-4] “江南佳麗地，金陵帝王州”，南京擁有着6000多年文明史、近2600年建城史和近500年的建都史，是中國四大古都之一，有“六朝古都”、“十朝都會”之稱，是中華文明的重要發祥地，歷史上曾數次庇佑華夏之正朔，長期是中國南方的政治、經濟、文化中心，擁有厚重的文化底蘊和豐富的歷史遺存。[5-7] 南京是國家重要的科教中心，自古以來就是一座崇文重教的城市，有“天下文樞”、“東南第一學”的美譽。截至2013年，南京有高等院校75所，其中211高校8所，僅次於北京上海；國家重點實驗室25所、國家重點學科169個、兩院院士83人，均居中國第三。[8-10]",
			"上海市個繁華的城市。" };

	private FSDirectory dir;

	/**
	 * 每次都生成索引文件
	 * 
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir = FSDirectory.open(Paths.get("D:\\temp\\demo2\\indexDir"));
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new IntField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.YES));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();
	}

	/**
	 * 獲取索引輸出流
	 * 
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter() throws Exception {
//		Analyzer analyzer = new StandardAnalyzer(); //使用標準分析去將無法成功將中文進行分詞
		Analyzer analyzer = new SmartChineseAnalyzer();
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf);
	}

	/**
	 * luke查看索引生成
	 * 
	 * @throws Exception
	 */
	@Test
	public void testIndexCreate() throws Exception {

	}

	/**
	 * 測試高亮
	 * 
	 * @throws Exception
	 */
	@Test
	public void testHeight() throws Exception {
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);

		SmartChineseAnalyzer analyzer = new SmartChineseAnalyzer();
		QueryParser parser = new QueryParser("desc", analyzer);
		// Query query = parser.parse("南京文化");
		Query query = parser.parse("南京文明");
		TopDocs hits = searcher.search(query, 100);

		// 查詢得分項
		QueryScorer queryScorer = new QueryScorer(query);
		// 得分項對應的內容片段
		SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(queryScorer);
		// 高亮顯示的樣式
		SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("<span color='red'><b>", "</b></span>");
		// 高亮顯示對象
		Highlighter highlighter = new Highlighter(htmlFormatter, queryScorer);
		// 設置需要高亮顯示對應的內容片段
		highlighter.setTextFragmenter(fragmenter);

		for (ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = searcher.doc(scoreDoc.doc);
			String desc = doc.get("desc");
			if (desc != null) {
				// tokenstream是從doucment的域（field)中抽取的一個個分詞而組成的一個數據流，用於分詞。
				TokenStream tokenStream = analyzer.tokenStream("desc", new StringReader(desc));
				System.out.println("高亮顯示的片段：" + highlighter.getBestFragment(tokenStream, desc));
			}
			System.out.println("所有內容：" + desc);
		}

	}

}

控制檯結果：

南京文明結果：

高亮顯示的片段：城鎮人口659.1萬人。[1-4] “江南佳麗地，金陵帝王州”，<span color='red'><b>南京</b></span>擁有着6000多年<span color='red'><b>文明</b></span>史、近2600年建城史和近500年的建都史，是中國四大古都之一，有“六朝古都”、“十朝都會”之稱，是中華<span color='red'><b>文明</b></span>的
所有內容：南京是一個文化的城市南京，簡稱寧，是江蘇省會，地處中國東部地區，長江下游，瀕江近海。全市下轄11個區，總面積6597平方公里，2013年建成區面積752.83平方公里，常住人口818.78萬，其中城鎮人口659.1萬人。[1-4] “江南佳麗地，金陵帝王州”，南京擁有着6000多年文明史、近2600年建城史和近500年的建都史，是中國四大古都之一，有“六朝古都”、“十朝都會”之稱，是中華文明的重要發祥地，歷史上曾數次庇佑華夏之正朔，長期是中國南方的政治、經濟、文化中心，擁有厚重的文化底蘊和豐富的歷史遺存。[5-7] 南京是國家重要的科教中心，自古以來就是一座崇文重教的城市，有“天下文樞”、“東南第一學”的美譽。截至2013年，南京有高等院校75所，其中211高校8所，僅次於北京上海；國家重點實驗室25所、國家重點學科169個、兩院院士83人，均居中國第三。[8-10]

Lucene中各個核心類的作用：https://blog.csdn.net/kevinelstri/article/details/52317977

綜合案例

核心代碼

<properties>
		<httpclient.version>4.5.2</httpclient.version>
		<jsoup.version>1.10.1</jsoup.version>
		<!-- <lucene.version>7.1.0</lucene.version> -->
		<lucene.version>5.3.1</lucene.version>
		<ehcache.version>2.10.3</ehcache.version>
		<junit.version>4.12</junit.version>
		<log4j.version>1.2.16</log4j.version>
		<mysql.version>5.1.44</mysql.version>
		<fastjson.version>1.2.47</fastjson.version>
		<struts2.version>2.5.16</struts2.version>
		<servlet.version>4.0.1</servlet.version>
		<jstl.version>1.2</jstl.version>
		<standard.version>1.1.2</standard.version>
		<tomcat-jsp-api.version>8.0.47</tomcat-jsp-api.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>${junit.version}</version>
			<scope>test</scope>
		</dependency>

		<!-- jdbc驅動包 -->
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>${mysql.version}</version>
		</dependency>

		<!-- 添加Httpclient支持 -->
		<dependency>
			<groupId>org.apache.httpcomponents</groupId>
			<artifactId>httpclient</artifactId>
			<version>${httpclient.version}</version>
		</dependency>

		<!-- 添加jsoup支持 -->
		<dependency>
			<groupId>org.jsoup</groupId>
			<artifactId>jsoup</artifactId>
			<version>${jsoup.version}</version>
		</dependency>


		<!-- 添加日誌支持 -->
		<dependency>
			<groupId>log4j</groupId>
			<artifactId>log4j</artifactId>
			<version>${log4j.version}</version>
		</dependency>

		<!-- 添加ehcache支持 -->
		<dependency>
			<groupId>net.sf.ehcache</groupId>
			<artifactId>ehcache</artifactId>
			<version>${ehcache.version}</version>
		</dependency>

		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>fastjson</artifactId>
			<version>${fastjson.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.struts</groupId>
			<artifactId>struts2-core</artifactId>
			<version>${struts2.version}</version>
		</dependency>

		<dependency>
			<groupId>javax.servlet</groupId>
			<artifactId>javax.servlet-api</artifactId>
			<version>${servlet.version}</version>
			<scope>provided</scope>
		</dependency>


		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-core</artifactId>
			<version>${lucene.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-queryparser</artifactId>
			<version>${lucene.version}</version>
		</dependency>
		<!-- <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> 
			<version>${lucene.version}</version> </dependency> -->

		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-analyzers-smartcn</artifactId>
			<version>${lucene.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.lucene</groupId>
			<artifactId>lucene-highlighter</artifactId>
			<version>${lucene.version}</version>
		</dependency>

		<!-- 5.3、jstl、standard -->
		<dependency>
			<groupId>jstl</groupId>
			<artifactId>jstl</artifactId>
			<version>${jstl.version}</version>
		</dependency>
		<dependency>
			<groupId>taglibs</groupId>
			<artifactId>standard</artifactId>
			<version>${standard.version}</version>
		</dependency>

		<!-- 5.4、tomcat-jsp-api -->
		<dependency>
			<groupId>org.apache.tomcat</groupId>
			<artifactId>tomcat-jsp-api</artifactId>
			<version>${tomcat-jsp-api.version}</version>
		</dependency>
	</dependencies>

package com.cpc.blog.web;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.servlet.http.HttpServletRequest;

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.store.Directory;
import org.apache.struts2.ServletActionContext;

import com.javaxl.blog.dao.BlogDao;
import com.javaxl.blog.util.LuceneUtil;
import com.javaxl.blog.util.PropertiesUtil;
import com.javaxl.blog.util.StringUtils;

/**
 * IndexReader
 * IndexSearcher
 * Highlighter
 * @author Administrator
 *
 */
public class BlogAction {
	private String title;
	private BlogDao blogDao = new BlogDao();

	public String getTitle() {
		return title;
	}

	public void setTitle(String title) {
		this.title = title;
	}

	public String list() {
		try {
			HttpServletRequest request = ServletActionContext.getRequest();
			if (StringUtils.isBlank(title)) {
				List<Map<String, Object>> blogList = this.blogDao.list(title, null);
				request.setAttribute("blogList", blogList);
			}else {
				Directory directory = LuceneUtil.getDirectory(PropertiesUtil.getValue("indexPath"));
				DirectoryReader reader = LuceneUtil.getDirectoryReader(directory);
				IndexSearcher searcher = LuceneUtil.getIndexSearcher(reader);
				SmartChineseAnalyzer analyzer = new SmartChineseAnalyzer();
//				拿一句話到索引目中的索引文件中的詞庫進行關鍵詞碰撞
				Query query = new QueryParser("title", analyzer).parse(title);
				Highlighter highlighter = LuceneUtil.getHighlighter(query, "title");
				
				TopDocs topDocs = searcher.search(query , 100);
				//處理得分命中的文檔
				List<Map<String, Object>> blogList = new ArrayList<>();
				Map<String, Object> map = null;
				ScoreDoc[] scoreDocs = topDocs.scoreDocs;
				for (ScoreDoc scoreDoc : scoreDocs) {
					map = new HashMap<>();
					Document doc = searcher.doc(scoreDoc.doc);
					map.put("id", doc.get("id"));
					String titleHighlighter = doc.get("title");
					if(StringUtils.isNotBlank(titleHighlighter)) {
						titleHighlighter = highlighter.getBestFragment(analyzer, "title", titleHighlighter);
					}
					map.put("title", titleHighlighter);
					map.put("url", doc.get("url"));
					blogList.add(map);
				}
				
				request.setAttribute("blogList", blogList);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
		return "blogList";
	}
}


package com.cpc.blog.web;

import java.io.IOException;
import java.nio.file.Paths;
import java.sql.SQLException;
import java.util.List;
import java.util.Map;

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import com.javaxl.blog.dao.BlogDao;
import com.javaxl.blog.util.PropertiesUtil;


/**
 * 構建lucene索引
 * @author Administrator
 * 1。構建索引	IndexWriter
 * 2、讀取索引文件，獲取命中片段
 * 3、使得命中片段高亮顯示
 *
 */
public class IndexStarter {
	private static BlogDao blogDao = new BlogDao();
	public static void main(String[] args) {
		IndexWriterConfig conf = new IndexWriterConfig(new SmartChineseAnalyzer());
		Directory d;
		IndexWriter indexWriter = null;
		try {
			d = FSDirectory.open(Paths.get(PropertiesUtil.getValue("indexPath")));
			indexWriter = new IndexWriter(d , conf );
			
//			爲數據庫中的所有數據構建索引
			List<Map<String, Object>> list = blogDao.list(null, null);
			for (Map<String, Object> map : list) {
				Document doc = new Document();
				doc.add(new StringField("id", (String) map.get("id"), Field.Store.YES));
//				TextField用於對一句話分詞處理	java培訓機構
				doc.add(new TextField("title", (String) map.get("title"), Field.Store.YES));
				doc.add(new StringField("url", (String) map.get("url"), Field.Store.YES));
				indexWriter.addDocument(doc);
			}
			
		} catch (IOException e) {
			e.printStackTrace();
		} catch (InstantiationException e) {
			e.printStackTrace();
		} catch (IllegalAccessException e) {
			e.printStackTrace();
		} catch (SQLException e) {
			e.printStackTrace();
		}finally {
			try {
				if(indexWriter!= null) {
					indexWriter.close();
				}
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
}


package com.cpc.blog.util;

import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.highlight.Formatter;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryTermScorer;
import org.apache.lucene.search.highlight.Scorer;
import org.apache.lucene.search.highlight.SimpleFragmenter;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;

/**
 * lucene工具類
 * @author Administrator
 *
 */
public class LuceneUtil {

	/**
	 * 獲取索引文件存放的文件夾對象
	 * 
	 * @param path
	 * @return
	 */
	public static Directory getDirectory(String path) {
		Directory directory = null;
		try {
			directory = FSDirectory.open(Paths.get(path));
		} catch (IOException e) {
			e.printStackTrace();
		}
		return directory;
	}

	/**
	 * 索引文件存放在內存
	 * 
	 * @return
	 */
	public static Directory getRAMDirectory() {
		Directory directory = new RAMDirectory();
		return directory;
	}

	/**
	 * 文件夾讀取對象
	 * 
	 * @param directory
	 * @return
	 */
	public static DirectoryReader getDirectoryReader(Directory directory) {
		DirectoryReader reader = null;
		try {
			reader = DirectoryReader.open(directory);
		} catch (IOException e) {
			e.printStackTrace();
		}
		return reader;
	}

	/**
	 * 文件索引對象
	 * 
	 * @param reader
	 * @return
	 */
	public static IndexSearcher getIndexSearcher(DirectoryReader reader) {
		IndexSearcher indexSearcher = new IndexSearcher(reader);
		return indexSearcher;
	}

	/**
	 * 寫入索引對象
	 * 
	 * @param directory
	 * @param analyzer
	 * @return
	 */
	public static IndexWriter getIndexWriter(Directory directory, Analyzer analyzer)

	{
		IndexWriter iwriter = null;
		try {
			IndexWriterConfig config = new IndexWriterConfig(analyzer);
			config.setOpenMode(OpenMode.CREATE_OR_APPEND);
			// Sort sort=new Sort(new SortField("content", Type.STRING));
			// config.setIndexSort(sort);//排序
			config.setCommitOnClose(true);
			// 自動提交
			// config.setMergeScheduler(new ConcurrentMergeScheduler());
			// config.setIndexDeletionPolicy(new
			// SnapshotDeletionPolicy(NoDeletionPolicy.INSTANCE));
			iwriter = new IndexWriter(directory, config);
		} catch (IOException e) {
			e.printStackTrace();
		}
		return iwriter;
	}

	/**
	 * 關閉索引文件生成對象以及文件夾對象
	 * 
	 * @param indexWriter
	 * @param directory
	 */
	public static void close(IndexWriter indexWriter, Directory directory) {
		if (indexWriter != null) {
			try {
				indexWriter.close();
			} catch (IOException e) {
				indexWriter = null;
			}
		}
		if (directory != null) {
			try {
				directory.close();
			} catch (IOException e) {
				directory = null;
			}
		}
	}

	/**
	 * 關閉索引文件讀取對象以及文件夾對象
	 * 
	 * @param reader
	 * @param directory
	 */
	public static void close(DirectoryReader reader, Directory directory) {
		if (reader != null) {
			try {
				reader.close();
			} catch (IOException e) {
				reader = null;
			}
		}
		if (directory != null) {
			try {
				directory.close();
			} catch (IOException e) {
				directory = null;
			}
		}

	}

	/**
	 * 高亮標籤
	 * 
	 * @param query
	 * @param fieldName
	 * @return
	 */

	public static Highlighter getHighlighter(Query query, String fieldName)

	{
		Formatter formatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
		Scorer fragmentScorer = new QueryTermScorer(query, fieldName);
		Highlighter highlighter = new Highlighter(formatter, fragmentScorer);
		highlighter.setTextFragmenter(new SimpleFragmenter(200));
		return highlighter;
	}
}

全文檢索技術 Lucene

文章目錄

全文檢索簡介（什麼是全文檢索）

數據分類

結構化數據搜索

非結構化數據查詢方法

全文檢索的應用場景

Lucene簡介

創建索引庫

理論部分（很重要）

創建原始文檔

創建文檔對象

分析文檔

創建索引（倒排索引的概念）

創建索引代碼

使用索引

對索引的增刪改

文檔域加權(百度搜索排名）

索引搜索功能

分頁功能

指定數字範圍查詢（numbericRangeQuery）

指定字符串開頭字母查詢（prefixQuery）

組合查詢（booleanQuery）重點

中文分詞&&高亮顯示

綜合案例

win11關閉自動檢測病毒刪文件

千兆寬帶實際網速能到達多少？

JAP出現java.lang.StackOverflowError異常

關於java面試我的一些總結和經驗

SpringBoot+Quartz實現數據庫存儲和簡單的使用layui對定時任務進行crud

Mysql高性能優化規範建議

mySql索引詳解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

全文檢索技術 Lucene

文章目錄

全文檢索簡介（什麼是全文檢索）

數據分類

結構化數據搜索

非結構化數據查詢方法

全文檢索的應用場景

Lucene簡介

創建索引庫

理論部分（很重要）

創建原始文檔

創建文檔對象

分析文檔

創建索引 （倒排索引的概念）

創建索引代碼

使用索引

對索引的增刪改

文檔域加權(百度搜索排名）

索引搜索功能

分頁功能

指定數字範圍查詢（numbericRangeQuery）

指定字符串開頭字母查詢（prefixQuery）

組合查詢（booleanQuery）重點

中文分詞&&高亮顯示

綜合案例

創建索引（倒排索引的概念）