ELK結構圖

kibana 數據可視化工具類似Navicat

Elasticsearch ES數據庫

Beats 輕量級數據獲取組件

Logstash 基於Beats的數據抽取工具，ETL

全部都是開源的。

Elasticsearch

elasticsearch簡稱ES，是非關係型數據庫，採用倒序排序，具有強大的搜索能力，基於java開發的，所以安裝時要查詢下對應版本採用的JDK版本有沒有安裝。

需要注意的是，查詢全部資料顯示，如果使用ELK(Elasticsearch+Logstach+Kibana)，一定要注意版本號，最好使用同一版本的，否則會出現很多問題，因爲ELK的版本兼容性不是很好。

重點注意：

在理解Elasticsearch時，有個重要的概念，type(類型)有必要詳細說下。

在ES6.0版本之前，一個索引對應多個類型，但是在ES6.0~7.0的版本，一個索引對應一個類型,在ES7.0以後直接去掉了類型這個概念。默認類型爲_doc，對ES這個非關係型數據，可以這樣理解，一個index(索引)，相當於一張數據表，_id就是主鍵，mapping就是字段設置，settings就是表設置。一個文檔就是一行數據。

(可能這樣理解和本意有些出入，但是對於用慣了關係型數據庫的人來說很容易接受和理解)

接下來介紹下如何在java項目中集成Elasticsearch

1.maven 依賴引入(jar包引入)

千萬別信官網說的引入個elasticsearch-rest-high-level-client就行了，下面的兩個也都要，而且要注意版本統一，elasticsearch-rest-high-level-client內部和下面的包有依賴關係的。（maven中心庫用的阿里的鏡像，不同鏡像pom中的groupid和artifactid可能不同，如果不正確會導致無法獲取）

<!-- 增加ES數據庫相關包 begin-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.3.2</version>
        </dependency>
		<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>elasticsearch-rest-client</artifactId>
			<version>7.3.2</version>
			<classifier>sources</classifier>
			<type>java-source</type>
		</dependency>

		<dependency>
			<groupId>org.elasticsearch</groupId>
			<artifactId>elasticsearch</artifactId>
			<version>7.3.2</version>
		</dependency>
		<!-- 增加ES數據庫相關包 end-->

2.編寫工具類（我這個工具類和業務毫無聯繫，可以拿過去直接測試着玩）很多方法也是看官網api寫的，關於複合查詢什麼的，有機會再詳細介紹吧。（看官網API還是得佛系點，不然分分鐘想打人。。。）

package com.runlin.dealerhelper.util;


import com.runlin.dealerhelper.entity.miniapp.CustomerRecruitmentList;
import org.apache.http.HttpHost;
import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.*;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.jfree.util.Log;

import java.io.IOException;
import java.util.*;

/**
 * Elastic search 操作工具類
 *
 * @author jo.li
 */
public class EsUtils {
    /**
     * ES連接IP(單節點)
     */
    private static final String hostName = "192.168.215.174";

    /**
     * ES連接端口
     */
    private static final Integer esPort = 9200;

    /**
     * ES連接類型
     */
    private static final String esSheme = "http";

    /**
     * 創建ES庫連接
     *
     * @return 返回ES連接對象
     */
    public static RestHighLevelClient createEsClient() {
        HttpHost host = new HttpHost(hostName, esPort, esSheme);
        RestClientBuilder restClientBuilder = RestClient.builder(host);
        return new RestHighLevelClient(restClientBuilder);
    }

    /**
     * 創建索引
     *
     * @param client    ES數據庫連接
     * @param indexName 索引名稱(必須全小寫,可用特殊字符_ . 進行分隔)
     * @param shards    索引分片數
     * @param replicas  索引分片副本數
     * @return 創建結果 true 創建成功| false 創建失敗
     */
    public static boolean createEsIndex(RestHighLevelClient client, String indexName, int shards, int replicas) {
        //創建索引對象
        CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName);
        //設置參數
        createIndexRequest.settings(Settings.builder()
                // 索引主分片數
                .put("index.number_of_shards", shards)
                // 索引主分片的副本數
                .put("index.number_of_replicas", replicas)
        );
        //創建映射源(可以將索引理解爲表，此步驟類似在表中初始化字段，如果不初始，也可以在Put數據時，由ES自動匹配)
        Map<String, Object> mapping = new HashMap<>();
        mapping.put("properties", new HashMap<>());
        createIndexRequest.mappings();
        createIndexRequest.mapping(mapping);

        //操作索引的客戶端
        IndicesClient indices = client.indices();
        //執行創建索引庫
        CreateIndexResponse createIndexResponse;
        boolean ret = false;
        try {
            createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);
            ret = createIndexResponse.isAcknowledged();
        } catch (IOException e) {
            Log.error("ES創建索引失敗！" + e);
            closEsClient(client);
        }
        return ret;
    }

    /**
     * 根據索引名稱在索引中添加文檔數據
     *
     * @param indexName 索引名稱(必須全小寫)
     * @param client    ES數據庫連接
     * @param jsonMap   要插入的數據集合
     * @param docId   文檔ID(唯一不可重複，不可爲空，爲空則會由ES賦予隨機值)
     * @return 添加結果
     */
    public static DocWriteResponse.Result putDataByIndex(RestHighLevelClient client, Map<String, Object> jsonMap, String indexName, String docId) {
        // 根據索引名稱獲取索引對象
        IndexRequest indexRequest = new IndexRequest(indexName, "_doc", docId);
        indexRequest.source(jsonMap);
        // 通過連接進行http請求
        IndexResponse indexResponse = null;
        try {
            indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
            return indexResponse.getResult();
        } catch (IOException e) {
            Log.error("ES索引數據添加失敗！" + e);
            closEsClient(client);
        }
        return null;
    }

    /**
     * 根據索引名稱刪除索引
     *
     * @param client    ES數據庫連接
     * @param indexName 索引名稱
     * @return 運行結果
     */
    public static boolean deleteIndex(RestHighLevelClient client, String indexName) {
        boolean ret = false;
        try {
            DeleteIndexRequest request = new DeleteIndexRequest(indexName);
            client.indices().delete(request, RequestOptions.DEFAULT);
            ret = true;
        } catch (ElasticsearchException | IOException e) {
            Log.error("ES索引刪除失敗！" + e);
            closEsClient(client);
        }
        return ret;
    }

    /**
     * 關閉ES數據庫連接
     *
     * @param client ES數據庫連接
     */
    public static void closEsClient(RestHighLevelClient client) {
        try {
            if (client != null) {
                client.close();
            }
        } catch (IOException ex) {
            Log.error(ex);
        }
    }

    /**
     * 根據索引名稱判斷索引是否存在
     *
     * @param indexName 索引名稱
     * @param client    ES數據庫連接
     * @return true 存在 false 不存在
     */
    public static boolean existsEsIndex(RestHighLevelClient client, String indexName) {
        GetIndexRequest indexRequest = new GetIndexRequest(indexName);
        try {
            return client.indices().exists(indexRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            Log.error("索引是否存在判斷異常！" + e);
            closEsClient(client);
        }
        return false;
    }


    /**
     * 測試方法
     */
    public static void testEs() {
        // 設置索引名稱
        String indexName = "testa_big_data";
        // 設置索引分片數
        int shards = 5;
        // 設置索引分片副本數
        int replicas = 2;
        // 索引是否存在標識 true 存在 false 不存在
        boolean indexEsistsFlag = true;
        // 索引是否創建成功 true 創建成功 false 創建失敗
        boolean createIndexFlag = false;
        // 是否對新建或已存在索引添加數據
        boolean putDataFlag = false;
        // 是否刪除索引
        boolean deleteIndexFlag = false;

        // 創建連接
        RestHighLevelClient client = createEsClient();
        // 判斷該索引是否存在
        indexEsistsFlag = existsEsIndex(client, indexName);
        // 不存在則創建
        if (!indexEsistsFlag) {
            // 創建索引
            createIndexFlag = createEsIndex(client, indexName, shards, replicas);
            if (createIndexFlag) {
                System.out.println("索引創建成功");
            } else {
                System.out.println("索引創建失敗");
            }
        }

        // 在索引中添加數據
        if (putDataFlag) {
            Map<String, Object> jsonMap = new HashMap<>();
            Map<String, Object> testMap = new HashMap<>();
            testMap.put("b1", "B1");
            testMap.put("b2", 2);
            testMap.put("b3", 3.21);
            testMap.put("b4", new Date());
            jsonMap.put("name", "asd");
            jsonMap.put("age", 34);
            jsonMap.put("create_time", new Date());
            jsonMap.put("id", testMap);
            DocWriteResponse.Result result = putDataByIndex(client, jsonMap, indexName, "test2");
            System.out.println(result);
        }

        // 測試索引刪除
        if (deleteIndexFlag) {
            if (deleteIndex(client, indexName)) {
                System.out.println("刪除成功");
            } else {
                System.out.println("刪除失敗");
            }
        }

        searchEsData(client,indexName);
    }

    public static void searchEsData(RestHighLevelClient client,String indexName){
        // 創建查詢請求
        SearchRequest sr = new SearchRequest(indexName);
        // 構造搜索源
        SearchSourceBuilder ssb = new SearchSourceBuilder();
        // 構建搜索條件
        QueryBuilder qb = QueryBuilders.matchAllQuery();
        // 將搜索條件放入搜索源 logstash.bat -f .\config-mysql\mysqlToEs.conf
        ssb.query(qb);
        // 將搜索源放入查詢請求
        sr.source(ssb);
        // 用於存放查詢結果
        List<Map<String,Object>> resultList = new ArrayList<>();
        try {
            // 進行查詢
            SearchResponse srRet = client.search(sr,RequestOptions.DEFAULT);
            // 獲取響應中所有數據集
            SearchHits shArr = srRet.getHits();
            SearchHit[] shtArr = shArr.getHits();
            // 遍歷輸出
            for (SearchHit sh:srRet.getHits()) {
                resultList.add(sh.getSourceAsMap());
                System.out.println(sh.getSourceAsString());
            }
        } catch (IOException e) {
            Log.error("查詢結果異常!"+e);
            closEsClient(client);
        }
    }
}

Logstach

可以理解爲一個強大的ETL數據抽取工具，可以毫秒級同步mysql數據至ES數據庫中。需要注意的是ELK版本要一致，不然容易出現奇怪的錯誤。

Logstach同步mysql數據庫方法

1. 官網下載Logstach 網址：https://www.elastic.co/cn/

因爲本地搭建，運行環境爲Windows，所以下載的是ZIP

2. Logstach 與 mysql數據實時同步

Logstach功能強大，簡單說下數據同步的實現方式。數據庫數據同步採用的是JDBC數據抽取，抽取後通過規則處理，然後輸出至ES。也可以讀取文件，將文件數據源讀取到ES數據庫中，此處只介紹數據庫同步方式。

下載速度慢可以通過迅雷下載，下載完成後解壓到目標文件夾。然後進入bin目錄，創建個文件夾，用來存放驅動包和數據同步的配置文件

抽取的配置文件(數據庫驅動包沒放這裏，不過沒關係，配置文件用的是絕對路徑)

對應配置信息網上搜搜，有的是，就不贅述了。關鍵的同步時間設置。

再簡單介紹下配置文件的主要結構，配置文件主要結構由三部分構成，input filter output 顧名思義

input 配置要同步的數據源，用於獲取數據(可以配置多個不同的數據源)。

filter 用於數據處理，可以進行數據處理規則設置,

output，輸出到哪裏,以及輸出後的一些基礎配置設定。

input {
  jdbc {
    #驅動包絕對路徑
    jdbc_driver_library => "D:/developSW/Elastic/logstash-7.3.2/lib/mysql-connector-java-5.6-bin.jar"
    #驅動類別
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    #數據庫地址
    jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/dealerhelper?useUnicode=true&characterEncoding=UTF8"
    #用戶名
    jdbc_user => "root"
    #密碼
    jdbc_password => "123456"
    # 是否分頁(數據量比較大的時候最好開啓)
	jdbc_paging_enabled => "true"
    # 每頁最大條數
    jdbc_page_size => "50000"
    # 同步刷新時間 各字段含義（從左至右）分、時、天、月、年，全爲*默認含義爲每分鐘都更新 */2每0.5秒刷新一次
    schedule => "*/2 * * * * *"
    # 同步數據的sql語句
    statement => "SELECT * FROM carkeeper_dictionary"
    #
    use_column_value => true
    tracking_column_type => "timestamp"
    tracking_column => "update_time"
    last_run_metadata_path => "syncpoint_table"
  }
}
output {
    elasticsearch {
        # ES的IP地址及端口
        hosts => ["127.0.0.1:9200"]
        # 索引名稱 可自定義
        index => "testa"
        # 需要關聯的數據庫中有有一個id字段，對應類型中的id
        document_id => "%{uuid}"
        # 7.x版本以後不再存在“類型”這一概念，此處使用默認類型_doc就可以了
        document_type => "_doc"
    }
    stdout {
        # JSON格式輸出
        codec => json_lines
    }
}

配置完就可以執行抽取了。bin目錄下url框輸入CMD回車

然後輸入 logstash.bat -f .\config-mysql\配置文件名稱。等待數據同步完成就可以了。

其他說明：

1）目前，只能同步mysql 的input update操作，但delete操作無法同步，目前關於delete同步方法還是添加刪除標識，刪除執行update，修改刪除標識來實現。如果業務存在delete操作也可以在mysql 操作完成後，在java端根據ES索引名稱和文檔ID進行刪除。

2）同步性能單表80萬條數據10分鐘左右，同步後增量數據可以做到毫秒級同步

附錄：input filter output配置說明表

jdbc連接配置說明

Setting Input	type	Required	default	describe
jdbc_connection_string	string	Yes	N/A	jdbc連接 jdbc:mysql://localhost:3306/mydb
jdbc_default_timezone	string	No	默認時區，SQL timestamp會默認轉成UTC time。jdbc_default_timezone => "Asia/Shanghai"
jdbc_driver_class	string	Yes	N/A	如 jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_driver_library	string	No	如 jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"，如果不設置會從找本地java classpath中找
jdbc_user	string	Yes	N/A	數據庫用戶
jdbc_password	password	No	N/A	數據庫密碼
jdbc_password_filepath	a valid filesystem path	No	N/A	數據庫密碼文件存放路徑
jdbc_pool_timeout	number	No	5	連接池超時時間默認5s
jdbc_validate_connection	boolean	No	false	使用前驗證連接，默認不用
jdbc_validation_timeout	number	No	3600	驗證超時時間3600s
connection_retry_attempts	number	No	1	嘗試重連數據庫的最大次數
connection_retry_attempts_wait_time	number	No	0.5	嘗試重連時休眠的秒數

sql 配置

Setting Input	type	Required	default	describe
sequel_opts	hash	No	{}	可選配置選項
statement	string	No	N/A	執行語句，短sql
statement_filepath	a valid filesystem path	No	sql語句路徑長sql寫sql
lowercase_column_names	boolean	No	true	默認列名轉換爲小寫
columns_charset	hash	No	{}	可指定列的字符集。columns_charset => { "column0" => "ISO-8859-1" }
parameters	hash	No	{}	自定義參數數值 "target_id" => "321"
jdbc_paging_enabled	boolean	No	false	是否分頁，默認不分頁。true後採用 limits, offsets 做分頁處理，數據量過大會有性能問題。
jdbc_page_size	number	No	100000	每頁數據量
jdbc_fetch_size	number	No	N/A	jdbc fetch 最大值，不設默認爲相關數據庫驅動設置的默認值。不分頁時過大會造成OOM
schedule	string	No	N/A	sql執行週期，cron表達式，最短間隔1min schedule => "* * * * *"。不設置時 sql只運行一次
sql_log_level	string, one of ["fatal", "error", "warn", "info", "debug"]	No	info	sql日誌級別

sync 配置

Setting Input	type	Required	default	describe
clean_run	boolean	No	false	設爲true，sql_last_value會被忽略，每次start會重新初始化所有數據
record_last_run	boolean	No	true	默認保存最後運行時間
last_run_metadata_path	string	No	/home/ph/.logstash_jdbc_last_run	最後運行時間文件存放地址
use_column_value	boolean	No	false	設爲true tracking_column的值就是 :sql_last_value，false 則是last_run_value的時間戳
tracking_column	string	No	N/A	追蹤的列(:sql_last_value值)，需要use_column_value設爲trun
tracking_column_type	string, one of ["numeric", "timestamp"]	No	numeric	追蹤列的類型，默認數值

Output

ES

es cluster 設置及鏈接

Setting Input	type	Required	default	describe
hosts	uri	No	[//127.0.0.1]	es hosts:"127.0.0.1:9200","127.0.0.2:9200"
path	string	No	N/A	lives es server(需要設置proxy)
proxy	uri	No	N/A	代理地址
user	string	No	N/A	es 用戶
password	password	No	N/A	es password
sniffing	boolean	No	false	是否動態發現es cluster nodes。 1.x and 2.x 所有http.enabled的節點， 5.x and 6.x會排除master node
sniffing_delay	number	No	5	sniffing間隔時間 5s
sniffing_path	string	No	N/A
timeout	number	No	60	超時時間
pool_max	number	No	1000	線程池最大值
pool_max_per_route	number	No	100	每個路由的最大線程池
validate_after_inactivity	number	No	10000	執行請求前等待時間超過多少s，驗證鏈接是否過時
resurrect_delay	number	No	5
retry_initial_interval	number	No	2	批處理重試的初始間隔（秒）。每次重試時加倍，直到retry_max_interval
retry_max_interval	number	No	64	批處理重試之間的最大間隔（秒）
retry_on_conflict	number	No	1	update/upserted衝突重試的次數
http_compression	boolean	No	false	是否開啓gzip壓縮，es 5.0版本以下默認開啓，5.x+版本關閉
custom_headers	hash	No	N/A	自定義http 頭信息
failure_type_logging_whitelist	array	No	[]	es 錯誤白名單.
healthcheck_path	string	No	N/A

sync 配置

Setting Input	type	Required	default	describe
action	string	No	index	批處理動作index:文檔全局替換，delete:按documentId刪除，create:創建，documentId存在則失敗，update:更新，documentId不存在則失敗
doc_as_upsert	boolean	No	false	update模式下，true時documentId不存在會創建新document
upsert	string	No	""	upsert 內容
bulk_path	string	No	N/A	批處理路徑 path+'_bulk'
index	string	No	index名
document_id	string	No	N/A	文檔id
document_type	string	No	N/A	文檔類型
parent	string	No	nil	子文檔，關聯的父文檔的id
pipeline	string	No	nil	執行的pipeline name
routing	string	No	N/A	路由
version	string	No	N/A
version_type	string, one of ["internal", "external", "external_gt", "external_gte", "force"]	No	N/A
manage_template	boolean	No	true	默認logstash-%{+YYYY.MM.dd}，改爲false需要手動管理
template	a valid filesystem path	No	N/A	模板路徑
template_name	string	No	logstash	模板名稱
template_overwrite	boolean	No	false
parameters	hash	No	N/A	查詢參數
script	string	No	N/A	更新腳本script => "ctx._source.message = params.event.get('message')"
script_lang	string	No	painless	腳本語言，6.0+用其他語言，需要設爲""
script_type	string, one of ["inline", "indexed", "file"]	No	inline	腳本類型
script_var_name	string	No	event	腳本變量名稱
scripted_upsert	boolean	No	false	設爲true，會創建不存在的document

ssl配置

Setting Input	type	Required	default	describe
ssl	boolean	No	N/A	es 開啓https，hosts需要以https://開頭
ssl_certificate_verification	boolean	No	true	ssl證書驗證
cacert	a valid filesystem path	No	N/A	.cer or .pem證書路徑
keystore	a valid filesystem path	No	N/A	.jks or .p12證書路徑
keystore_password	password	No	N/A	keystore密碼
truststore	a valid filesystem path	No	N/A	JKS路徑
truststore_password	password	No	N/A	JKS password

Kibana

ES數據庫數據可視化工具，可以理解爲類似Navicat的工具軟件，相關操作方法及KQL規則可查詢官網，此文檔就不做介紹了。(因爲我用的也不是很明白- -!!!)

Elasticsearch-ELK使用入門（版本7.3.2）覺得好的點贊評論下，別白嫖哈~！覺得不好的還請指正。以免誤人子弟

ELK結構圖

Elasticsearch

Logstach

Logstach同步mysql數據庫方法

Output

ES

Kibana

Kafka存儲機制

HTTP URL 詳解

Elasticsearch-ELK使用入門（版本7.3.2）覺得好的點贊評論下，別白嫖哈~！覺得不好的還請指正。以免誤人子弟

MySql-介紹

Mysql-淺析執行計劃

Elasticsearch-ELK使用入門（版本7.3.2）

ES6結構介紹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結