Elasticsearch同步mysql(logstash-input-jdbc)和一些查詢問題

linux環境下：

安裝logstash:
1.下載公共密鑰

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

2.添加yum源

vim  /etc/yum.repos.d/logstash.repo

文件中寫入:

[logstash-5.x]
name=Elastic repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

3.使用yum安裝

yum install logstash

4.驗證是否安裝成功
進入 logstash 安裝目錄

cd /usr/share/logstash

運行

bin/logstash -e 'input { stdin { } } output { stdout {} }'

等待幾秒鐘出現

The stdin plugin is now waiting for input:

然後輸入
hello world

看到出現輸入內容爲成功

安裝logstash-input-jdbc插件:

1.修改ruby倉庫鏡像
如果沒有安裝 gem 的話安裝gem

yum install gem

替換國內的鏡像

gem sources --add https://gems.ruby-china.org/ --remove https://rubygems.org/

驗證是否成功

gem sources -l

出現上面的url爲成功

修改Gemfile的數據源地址：

whereis logstash # 查看logstash安裝的位置， 默認在 /usr/share/logstash目錄

cd /usr/share/logstash

vim Gemfile

修改 source 的值爲： "https://gems.ruby-china.org/"

vim  Gemfile.jruby-1.9.lock # 找到 remote 修改它的值爲：https://gems.ruby-china.org/

然後開始安裝

bin/logstash-plugin  install logstash-input-jdbc

安裝過程沒有進度條，所以不要以爲一直卡着，我之前以爲一直卡着手動停止一次

2.開始同步 mysql 數據

需要的文件有：一個 .conf文件， X個 .sql 文件(X>=0，可以不需要)

去mysql官網下載一個 mysql 的Java 驅動包： mysql-connector-java-5.1.44-bin.jar

下面是導入多張表的.conf配置文件：

input {
    stdin {
    }
    jdbc {
      # 需要連接的數據庫
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxxx"
      # jdbc驅動所在的路徑
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      # 默認
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      # 默認
      jdbc_paging_enabled => "true"
      # 默認
      jdbc_page_size => "50000"
      # 需要執行的sql文件
      statement_filepath => "estest1.sql"
      # statement => "這樣可以直接寫sql語句而不用sql文件，適合短sql"
      schedule => "* * * * *"
      # 這個type有用，但是如果你的表中有type字段，並且你需要這個字段，要麼sql中用as重命名，要麼這裏的type改名字
      type => "a_data"
    }
    jdbc {
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxx"
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      statement_filepath => "esztest2.sql"
      schedule => "* * * * *"
      type => "b_data"
    }
    jdbc {
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxx"
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      statement_filepath => "estest3.sql"
      schedule => "* * * * *"
      type => "c_data"
    }
}

output {
    # 通過上面定義的type來區分
    if[type] == "a_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        # 索引
        index => "estest"
        # 文檔type
        document_type => "a_data"
        # 文檔id，這個是將sql中的id字段當作文檔id，如果sql中沒有id找一個唯一值字段as成id
        document_id => "%{id}"
        }
    }
    if[type] == "b_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        index => "estest"
        document_type => "b_data"
        document_id => "%{id}"
        }
    }
    if[type] == "exit_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        index => "estest"
        document_type => "c_data"
        document_id => "%{id}"
        }
    }
    # 控制檯輸出內容
    stdout {
        codec => json_lines
    }
}

這樣就同步四張表

sql文件就按各自需求寫

SELECT * FROM xxx WHERE update_time> :sql_last_value

可以通過update_time這段進行增量同步(也可以通過唯一id)，如果沒有where就全量同步

在es查詢中出現一個問題，至今沒有找到原因，搜索長的long數據無法搜索到，但是一兩位的long數據卻可以搜素到，這個很無解，我只能在同步時將mysql的數字類型通過CONVERT函數進行類型轉換

SELECT CONVERT(e.`xx_id`,CHAR) as xx_id, FROM xxx e WHERE update_time> :sql_last_value

這樣進入es中的數據都是字符串

有時候將es取代mysql複雜查詢，sql中有類似(a or b) and (c or d or e or f) and g 這樣的判斷語句

es的查詢如下：

{
    "query": {
        "bool": {
        	# must是完全匹配，相當於AND
            "must": [
                {
                    "match": {
                        "g": "1111"
                    }
                },
                {
                    "bool": {
                        # should 相當於OR
                        "should": [
                            {
                                "match": {
                                    "a": "1789104"
                                }
                            },
                            {
                                "match": {
                                    "b": "1789104"
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "c": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "d": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "e": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "f": "有限公司"
                                }
                            }
                        ]
                    }
                }
            ],
            # must_not 不能匹配
            "must_not": [],
            "should": []
        }
    },
    # 起始數據
    "from": 0,
    # 結尾數據
    "size": 20,
    "sort": [],
    "aggs": {}
}

這條查詢語句就是sql的(a or b) and (c or d or e or f) and g

就是通過must(AND)，should(OR)，bool包裹的組合來實現複雜的匹配查詢

es搜索中碰到無法確定關鍵字是中英文還是數字，但是要做到相對精準的匹配，可以使用通配符或者正則(正則我沒有用過不清楚，通配符在字母數字或者兩者組合有效)

下面是一個(a or b) and (c or d）的匹配，其中c用到了“wildcard“這個關鍵字是用於通配符模式，這裏有點要注意的，因爲ES內部的機制，即使head中看到的數據是大寫字母，但是用大寫字母是匹配是匹配不到的，只有用小寫纔可以，所以爲了用戶體驗好點，可以將用戶輸入的字母都轉成小寫再匹配

{
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "a": "18396893"
                                }
                            },
                            {
                                "match": {
                                    "b": "18396893"
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "c": "*3zz*"
                                }
                            },
                            {
                                "match": {
                                    "d": "項目名稱"
                                }
                            }
                        ]
                    }
                }
            ],
            "must_not": [],
            "should": []
        }
    },
    "from": 0,
    "size": 20,
    "sort": [],
    "aggs": {}
}

Elasticsearch同步mysql(logstash-input-jdbc)和一些查詢問題

python交互模式熱加載究極實現方式

《javascrip編程精解》第二版習題練習(未完，根據自己學習進度更新)

aiohttp遇到非法字符的處理(UnicodeDecodeError: 'utf-8' codec can't decode bytes in position......)

python3教程(在線學習地址)

仿scrapy的爬蟲框架 (python3.5以上模塊化,需要支持async/await語法)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結