學習和使用Elasticsearch有一段時間了，項目中大量使用到了es，但對於我來說都是部分或者局部地去使用，所以得找個時間好好整理並且再完整實踐一下es，於是就有了這篇文章。

首先系統架構是LNMP，很簡單的個人博客網站（逐步前行STEP），
使用laravel框架，實現全文檢索的引擎是elasticsearch，使用的分詞工具是ik-analyzer然後是安裝組件：elasticsearch/elasticsearch，以下列表是本次實踐所用到的軟件/框架/組件的版本：

PHP 7.1.3
Larvel 5.8
Mysql 5.7
elasticsearch 5.3
elasticsearch/elasticsearch 7.2

以下默認上述環境已經準備完畢。

實戰主要分爲4部分：

創建索引
全量數據導入es
增量數據同步es
關鍵詞檢索

一、創建索引

博客的以下屬性需要納入檢索：

字段	備註	屬性
id	ID	int(11)
title	標題	varchar(255)
description	摘要	varchar(255)
content	內容	text
category_id	分類ID	int(11)
keyword_ids	關鍵詞	varchar(255)
read_cnt	閱讀量	int(11)
created_at	發佈時間	TIMESTAMP
updated_at	更新時間	TIMESTAMP

其中，title、description、content既需要分詞來做全文檢索，又需要保留部分原字符串便於直接搜索，所以使用fields將字段映射出不同類型：

"title": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
},

而在分詞器的選擇上，爲了既能對文檔分詞更細，又能對檢索更精確，在對文檔字段分詞和對檢索時的輸入分詞使用不同的分詞器：

"title": {
   "type": "text",
     "fields": {
         "keyword": {
             "type": "keyword",
             "ignore_above": 256
         }
     },
     "analyzer": "ik_max_word",
     "search_analyzer": "ik_smart"
 },

比如，title爲”重走絲綢之路“，ik_max_word分詞如下：

{
    "tokens": [
        {
            "token": "重走",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "絲綢之路",
            "start_offset": 2,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "絲綢",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "之路",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        }
    ]
}

而ik_smart分詞粒度更粗：

{
    "tokens": [
        {
            "token": "重走",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "絲綢之路",
            "start_offset": 2,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

鍵搜索詞爲”重走絲綢之路“，我們當然希望原文儘可能多匹配到這個檢索詞，而不是每個字都可能檢索出一堆文檔，這就是匹配的精確度。

對於keyword_ids、category_id，導入到es中時，就要裝換成具體的內容了，才能要支持用戶使用文本檢索，而不是限制使用ID，這兩個字段分別在es中字段名設置爲keywords、category。
而且，一般來說關鍵詞的檢索，只考慮精確匹配，比如說關鍵詞”全文檢索“，如果要分詞的話就會變成：

{
    "tokens": [
        {
            "token": "全文",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "檢索",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

而實際上，全文可能匹配一部分文檔，檢索頁匹配一部分文檔，這對於關鍵詞這個屬性定義來說，是沒有意義的，所以，我們對keywords、category使用”keyword“類型。

考慮到該實戰只是最小實現，忽略別名（aliases），分片配置使用默認，相應的需建立索引articles如下：

{
        "mappings": {
            "doc": {
                "properties": {
                    "id": {
                        "type": "long"
                    },
                    "keywords": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "categorys": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "read_cnt": {
                        "type": "long"
                    },
                    "title": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        },
                        "analyzer": "ik_max_word",
                        "search_analyzer": "ik_smart"
                    },
                    "description": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        },
                        "analyzer": "ik_max_word",
                        "search_analyzer": "ik_smart"
                    },
                    "created_at": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
                    },
                    "updated_at": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
                    }
                }
            }
        }
    
}

使用 PUT /articlesAPI創建索引成功後會返回：

{
    "acknowledged": true,
    "shards_acknowledged": true
}

二、全量數據導入es

因爲是對已有的博客網站打造全文檢索，所以首先需要進行一次全量導入ES。第一步的操作都是直接使用es api完成的，而這一步涉及到數據查詢與轉換，則需要在我們的項目內完成。

首先我們需要熟悉es組件elasticsearch/elasticsearch的使用，以下介紹本次實戰涉及到的一些功能，更多可以直接看文檔：Elasticsearch-PHP 中文文檔。

我們先在配置文件config/elastic.php定義好es的連接信息：

<?php

return array(
    'default' => [
         'hosts'     => [
            [
                 'host' => ‘xxx.xxx.xxx.xxx’,
                 'port' => '9200',
                 'scheme' => 'http',
             ]
         ],
        'retries'   => 1,

        /*
        |--------------------------------------------------------------------------
        | Default Index Name
        |--------------------------------------------------------------------------
        |
        | This is the index name that elasticquent will use for all
        */
        'default_index' => ‘default_index’,
    ],
);

再使用批量批量索引文檔的方法：bulk，示例：

for($i = 0; $i < 100; $i++) {
    $params['body'][] = [
        'index' => [
            '_index' => 'my_index',
            '_type' => 'my_type',
    	]
    ];

    $params['body'][] = [
        'my_field' => 'my_value'
    ];
}

$responses = ClientBuilder::create()->build()->bulk($params);

這裏不能直接使用查庫後的數據，需要做一些轉換工作，比如keyword_ids 轉換成keywords，我們封裝一個函數：getDoc()：


public function getDoc()
{
    $fields = [
        'id',
        ’title,
        ‘description’,
        ‘read_cnt’,
        'created_at’,
        ‘updated_at’
    ];

    $data = array_only($this->getAttributes(), $fields);

    $data[‘keywords’] = ArticleKeyword::whereIn(‘id’, $this->keyword_ids)->pluck(‘name’)->toArray();

    $data[‘category’] = ArticleCategory::find($this->category_id);

    return $data;
}

直接調用該方法獲取需要同步的文檔數據。
注意使用該方法批量索引時，index + 一組數據是成對的。
按照第一步新建的索引，直接使用組件提供的批量索引功能全量將查詢出的數據同步到es中。

3、增量數據同步es

對於新增的數據，需要在寫入庫中的同時同步到es，這裏使用到的方案是Eloquent 的模型事件。

在 Eloquent 模型類上進行查詢、插入、更新、刪除操作時，會觸發相應的模型事件，不管你有沒有監聽它們。這些事件包括：

retrieved 獲取到模型實例後觸發
creating 插入到數據庫前觸發
created 插入到數據庫後觸發
updating 更新到數據庫前觸發
updated 更新到數據庫後觸發
saving 保存到數據庫前觸發（插入/更新之前，無論插入還是更新都會觸發）
saved 保存到數據庫後觸發（插入/更新之後，無論插入還是更新都會觸發）
deleting 從數據庫刪除記錄前觸發
deleted 從數據庫刪除記錄後觸發
restoring 恢復軟刪除記錄前觸發
restored 恢復軟刪除記錄後觸發

而我們需要使用到的事件是：saved、deleted，監聽這兩個事件，在觸發後同步到es，這樣文章的增、改、刪操作都能實時將數據變化同步到es。

我們使用fireModelEvent設置事件觸發的同步操作，這裏用到了組件中的單文檔索引功能：index，示例：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'id' => 'my_id',
    'body' => [ 'testField' => 'abc']
];

$response = $client->index($params);

使用第2步中的getDoc()方法來獲取待更新的數據。
具體實現如下：

    public function fireModelEvent($event, $halt = true)
    {
        if (in_array($event, ['saved', 'deleted']))
        {
            if($event == 'deleted')
            {
                ClientBuilder::create()->build()->delete(['id' => $this->id]);
            }

            if($event == 'saved')
            {
                $params = [
                    'index' => 'articles',
                    'type' => 'doc',
                    'id' => $this->id,
                    'body' => $this->getDoc()
                ];

                ClientBuilder::create()->build()->index($params);
            }
        }
    }

4、檢索數據

通過2、3步驟，我們的文章已經實時同步到es上了，這一步我們需要將es的全文檢索開放給用戶使用，在我的網站中，我在文章列表增加了一個搜索框給用戶輸入需檢索的文本：

這裏有兩個需求：
1、對title、description、keywords、category 做 query_string 查詢
2、將查詢結果轉化爲Eloquent集合，便於結果展示

封裝的檢索函數：

	public static function search($keyword, $page = 1, $per_page = 20, $conditions = [], $sort = null)
    {
        $page = max(1, intval($page));

        $from = ($page - 1) * $per_page;

        $query = [];
        //搜索文本字段
        $search_fields = ['title', 'keywords', 'category', 'description'];

        if($keyword)
        {
            foreach ($search_fields as $key => $search_field)
            {
                $query['must']['bool']['should'][] = [
                    'query_string' => [
                        'default_field' => $search_field,
                        'query' => strtolower($keyword),
                        'default_operator' => 'AND',
                    ]
                ];
            }
        }

        $params = [
            'index' => 'articles',
            'type' => 'doc',
            'body' => [
                'query' => $query
            ]
        ];

        $response = ClientBuilder::create()->build()->search($params);

        $total_count = array_get($response, 'hits.total', 0);

        $collection = new Collection();

        foreach (array_get($response, 'hits.hits', []) as $key => $item)
        {
            $self = new static;

            $self->setRawAttributes($item['_source'], true);

            $collection->add($self);
        }

        return new LengthAwarePaginator($collection, $total_count, $per_page, intval($from/$per_page) + 1);
    }

Elasticsearch實戰：給博客打造全文檢索

一、創建索引

二、全量數據導入es

3、增量數據同步es

4、檢索數據

C#開源的兩款功能強大的錄屏神器

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

Elasticsearch 定製評分（自定義評分）

Elasticsearch實戰：給博客打造全文檢索

Laravel 發送郵件報錯的解決方案：PHP Warning: stream_socket_enable_crypto(): SSL operation failed with code 1.

搭建共享服務器、共享虛擬主機並提供服務

filebeat 解析日誌併發送到Elasticsearch

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結