背景

Lucene是一套用於全文檢索和搜索的開放源碼程序庫，由 Apache 軟件基金會支持和提供。Lucene 提供了一個簡單卻強大的應用程序接口，能夠做全文索引和搜索。Lucene 是現在最受歡迎的免費 Java 信息檢索程序庫。

上面的解釋是來自維基百科，我們只需要知道 Lucene 可以進行全文索引和搜索就行了，這裏的索引是動詞，意思是我們可以將文檔或者文章或者文件等數據進行索引記錄下來，索引過後，我們查詢起來就會很快。

索引這個詞有的時候是動詞，表示我們要索引數據，有的時候是名詞，我們需要根據上下文場景來判斷。新華字典前面的字母表或者書籍前面的目錄本質上都是索引。

接入

引入依賴

首先我們創建一個 SpringBoot 項目，然後在 pom 文件中加入如下內容，我這裏使用的 lucene 版本是 7.2.1，

<properties>
    <lucene.version>7.2.1</lucene.version>
</properties>

<!-- Lucene核心庫 -->
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-core</artifactId>
 <version>${lucene.version}</version>
</dependency>
<!-- Lucene解析庫 -->
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-queryparser</artifactId>
 <version>${lucene.version}</version>
</dependency>
<!-- Lucene附加的分析庫 -->
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-analyzers-common</artifactId>
 <version>${lucene.version}</version>
</dependency>

索引數據

在使用 Lucene 之前我們需要先索引一些文件，然後再通過關鍵詞查詢出來，下面我們來模擬整個過程。爲了方便我們這裏模擬一些數據，正常的數據應該是從數據庫或者文件中加載的，我們的思路是這樣的：

生成多條實體數據；
將實體數據映射成 Lucene 的文檔形式；
索引文檔；
根據關鍵詞查詢文檔；

第一步我們先創建一個實體如下：

import lombok.Data;

@Data
public class ArticleModel {
    private String title;
    private String author;
    private String content;
}

我們再寫一個工具類，用來索引數據，代碼如下：

import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class LuceneIndexUtil {

    private static String INDEX_PATH = "/opt/lucene/demo";
    private static IndexWriter writer;

    public static LuceneIndexUtil getInstance() {
        return SingletonHolder.luceneUtil;
    }

    private static class SingletonHolder {
        public final static LuceneIndexUtil luceneUtil = new LuceneIndexUtil();
    }

    private LuceneIndexUtil() {
        this.initLuceneUtil();
    }

    private void initLuceneUtil() {
        try {
            Directory dir = FSDirectory.open(Paths.get(INDEX_PATH));
            Analyzer analyzer = new StandardAnalyzer();
            IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
            writer = new IndexWriter(dir, iwc);
        } catch (IOException e) {
            log.error("create luceneUtil error");
            if (null != writer) {
                try {
                    writer.close();
                } catch (IOException ioException) {
                    ioException.printStackTrace();
                } finally {
                    writer = null;
                }
            }
        }
    }

    /**
     * 索引單個文檔
     *
     * @param doc 文檔信息
     * @throws IOException IO 異常
     */
    public void addDoc(Document doc) throws IOException {
        if (null != doc) {
            writer.addDocument(doc);
            writer.commit();
            writer.close();
        }
    }

    /**
     * 索引單個實體
     *
     * @param model 單個實體
     * @throws IOException IO 異常
     */
    public void addModelDoc(Object model) throws IOException {
        Document document = new Document();
        List<Field> fields = luceneField(model.getClass());
        fields.forEach(document::add);
        writer.addDocument(document);
        writer.commit();
        writer.close();
    }

    /**
     * 索引實體列表
     *
     * @param objects 實例列表
     * @throws IOException IO 異常
     */
    public void addModelDocs(List<?> objects) throws IOException {
        if (CollectionUtils.isNotEmpty(objects)) {
            List<Document> docs = new ArrayList<>();
            objects.forEach(o -> {
                Document document = new Document();
                List<Field> fields = luceneField(o);
                fields.forEach(document::add);
                docs.add(document);
            });
            writer.addDocuments(docs);
        }
    }

    /**
     * 清除所有文檔
     *
     * @throws IOException IO 異常
     */
    public void delAllDocs() throws IOException {
        writer.deleteAll();
    }

    /**
     * 索引文檔列表
     *
     * @param docs 文檔列表
     * @throws IOException IO 異常
     */
    public void addDocs(List<Document> docs) throws IOException {
        if (CollectionUtils.isNotEmpty(docs)) {
            long startTime = System.currentTimeMillis();
            writer.addDocuments(docs);
            writer.commit();
            log.info("共索引{}個 Document，共耗時{} 毫秒", docs.size(), (System.currentTimeMillis() - startTime));
        } else {
            log.warn("索引列表爲空");
        }
    }

    /**
     * 根據實體 class 對象獲取字段類型，進行 lucene Field 字段映射
     *
     * @param modelObj 實體 modelObj 對象
     * @return 字段映射列表
     */
    public List<Field> luceneField(Object modelObj) {
        Map<String, Object> classFields = ReflectionUtils.getClassFields(modelObj.getClass());
        Map<String, Object> classFieldsValues = ReflectionUtils.getClassFieldsValues(modelObj);

        List<Field> fields = new ArrayList<>();
        for (String key : classFields.keySet()) {
            Field field;
            String dataType = StringUtils.substringAfterLast(classFields.get(key).toString(), ".");
            switch (dataType) {
                case "Integer":
                    field = new IntPoint(key, (Integer) classFieldsValues.get(key));
                    break;
                case "Long":
                    field = new LongPoint(key, (Long) classFieldsValues.get(key));
                    break;
                case "Float":
                    field = new FloatPoint(key, (Float) classFieldsValues.get(key));
                    break;
                case "Double":
                    field = new DoublePoint(key, (Double) classFieldsValues.get(key));
                    break;
                case "String":
                    String string = (String) classFieldsValues.get(key);
                    if (StringUtils.isNotBlank(string)) {
                        if (string.length() <= 1024) {
                            field = new StringField(key, (String) classFieldsValues.get(key), Field.Store.YES);
                        } else {
                            field = new TextField(key, (String) classFieldsValues.get(key), Field.Store.NO);
                        }
                    } else {
                        field = new StringField(key, StringUtils.EMPTY, Field.Store.NO);
                    }
                    break;
                default:
                    field = new TextField(key, JsonUtils.obj2Json(classFieldsValues.get(key)), Field.Store.YES);
                    break;
            }
            fields.add(field);
        }
        return fields;
    }
    public void close() {
        if (null != writer) {
            try {
                writer.close();
            } catch (IOException e) {
                log.error("close writer error");
            }
            writer = null;
        }
    }

    public void commit() throws IOException {
        if (null != writer) {
            writer.commit();
            writer.close();
        }
    }
}

有了工具類，我們再寫一個 demo 來進行數據的索引

import java.util.ArrayList;
import java.util.List;

/**
 * <br>
 * <b>Function：</b><br>
 * <b>Author：</b>@author Silence<br>
 * <b>Date：</b>2020-10-17 21:08<br>
 * <b>Desc：</b>無<br>
 */
public class Demo {
    public static void main(String[] args) {
        LuceneIndexUtil luceneUtil = LuceneIndexUtil.getInstance();
        List<ArticleModel> articles = new ArrayList<>();
        try {
            //索引數據
            ArticleModel article1 = new ArticleModel();
            article1.setTitle("Java天下第一");
            article1.setAuthor("粉絲");
            article1.setContent("這是一篇給大家介紹 Lucene 的技術文章，必定點贊評論轉發！！！");
            ArticleModel article2 = new ArticleModel();
            article2.setTitle("天下第一");
            article2.setAuthor("粉絲");
            article2.setContent("此處省略兩千字...");
            ArticleModel article3 = new ArticleModel();
            article3.setTitle("Java天下第一");
            article3.setAuthor("粉絲");
            article3.setContent("Today is big day!");
            articles.add(article1);
            articles.add(article2);
            articles.add(article3);
            luceneUtil.addModelDocs(articles);
            luceneUtil.commit();
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

上面的 content 內容可以自行進行替換，小編這邊避免湊字數的嫌疑就不貼了。

展示

運行結束過後，我們用過 Lucene 的可視化工具 luke 來查看下索引的數據內容，下載過後解壓我們可以看到有.bat 和 .sh 兩個腳本，根據自己的系統進行運行就好了。小編這邊是 mac 用的是 sh 腳本運行，運行後打開設置的索引目錄即可。

進入過後，我們可以看到下圖顯示的內容，選擇 content 點擊 show top items 可以看到右側的索引數據，這裏根據分詞器的不同，索引的結果是不一樣的，小編這裏採用的分詞器就是標準的分詞器，小夥伴們可以根據自己的要求選擇適合自己的分詞器即可。

搜索數據

數據已經索引成功了，接下來我們就需要根據條件進行數據的搜索了，我們創建一個 LuceneSearchUtil.java 來操作數據。

import org.apache.commons.collections.MapUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.springframework.beans.factory.annotation.Value;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.Map;


public class LuceneSearchUtil {

    private static String INDEX_PATH = "/opt/lucene/demo";
    private static IndexSearcher searcher;

    public static LuceneSearchUtil getInstance() {
        return LuceneSearchUtil.SingletonHolder.searchUtil;
    }

    private static class SingletonHolder {
        public final static LuceneSearchUtil searchUtil = new LuceneSearchUtil();
    }

    private LuceneSearchUtil() {
        this.initSearcher();
    }

    private void initSearcher() {
        Directory directory;
        try {
            directory = FSDirectory.open(Paths.get(INDEX_PATH));
            DirectoryReader reader = DirectoryReader.open(directory);
            searcher = new IndexSearcher(reader);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public TopDocs searchByMap(Map<String, Object> queryMap) throws Exception {
        if (null == searcher) {
            this.initSearcher();
        }
        if (MapUtils.isNotEmpty(queryMap)) {
            BooleanQuery.Builder builder = new BooleanQuery.Builder();
            queryMap.forEach((key, value) -> {
                if (value instanceof String) {
                    Query queryString = new PhraseQuery(key, (String) value);
//                    Query queryString = new TermQuery(new Term(key, (String) value));
                    builder.add(queryString, BooleanClause.Occur.MUST);
                }
            });
            return searcher.search(builder.build(), 10);
        }
        return null;
    }

}

在 demo.java 中增加搜索代碼如下：

//查詢數據
   Map<String, Object> map = new HashMap<>();
   map.put("title", "Java 天下第一");
//   map.put("title", "天下第一");
//   map.put("content", "最");
   LuceneSearchUtil searchUtil = LuceneSearchUtil.getInstance();
   TopDocs topDocs = searchUtil.searchByMap(map);
   System.out.println(topDocs.totalHits);

運行結果如下，表示搜索到了兩條。

通過可視化工具我們可以看到 title 爲"Java 天下第一"確實是有兩條記錄，而且我們也確認只插入了兩條數據。注意這裏如果根據其他字符去查詢可能查詢不出來，因爲小編這裏的分詞器採用的是默認的分詞器，小夥伴可以根據自身的情況採用相應的分詞器。

至此我們可以索引和搜索數據了，不過這還是簡單的入門操作，對於不同類型的字段，我們需要使用不同的查詢方式，而且根據系統的特性我們需要使用特定的分詞器，默認的標準分詞器不一定符合我們的使用場景。而且我們索引數據的時候也需要根據字段類型進行不同 Field 的設定。上面的案例只是 demo 並不能在生產上使用，搜索引擎在互聯網行業是領頭羊，很多先進的互聯網技術都是從搜索引擎開始發展的。

只用了幾百行代碼寫的百度搜索引擎，你看咋樣？背景接入

推薦閱讀：

背景

接入

引入依賴

索引數據

展示

搜索數據

自學編程兩個月，現在我月入 4 萬元

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

奇葩面試題，O(logn)的底數是多少？ O(logn)是有底數的！ O(logn)底數意義不大！

肝完了，總結了SpringBoot與緩存的知識點，快速掌握目錄一、JSR107 二、Spring緩存抽象三、幾個重要概念&緩存註解四、緩存使用五、整合redis實現緩存六、整合一個實例

本地方法棧、JVM棧、本地內存和JVM Heap的區別與關係

獻出我的膝蓋，理論實踐一鍵搞定！阿里新產Java全能筆記也太香了前言一、Redis技術好文精選整理

共享內存 & Actor併發模型哪個更快？先說結論默認Actor模型優化後的Actor模型那爲什麼總體性能慢慢超過共享內存？結束語

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

只用了幾百行代碼寫的百度搜索引擎，你看咋樣？ 背景 接入

推薦閱讀：

背景

接入

引入依賴

索引數據

展示

搜索數據

只用了幾百行代碼寫的百度搜索引擎，你看咋樣？背景接入