Logstash7.0.0同步MySQL數據到ES,並存儲geo位置信息,以及支持寫入分詞功能

場景:

1. 數據來源是python爬蟲獲取美團、安居客的數據,使用Java進行ETL清洗後,批量插入MySQL中,由於MySQL在千萬級甚至億級數據量的激增下,查詢緩慢,以及部分功能不能支持業務的需求,比如分詞。。。等

2. 調研組件Hive,發現Hive對於GIS的支持並不友好,首先Hive不支持創建表時指定字段爲geometry類型,這就很尷尬了,對於使用GIS函數的功能,官方並不支持,但是我們可以通過來擴展UDF的方式進行使用,使用請參照:https://blog.csdn.net/qq_32252917/article/details/105378848

3. 調研ElasticSearch,選定版本7.0.0

搭建環境:安裝Logstash,ElasticSearch,Kibana

參考:

安裝logstash和logstash-input-jdbc

ElasticSearch7.0.0安裝IK分詞器

經過調研:ElasticSearch支持geo位置信息的查詢,支持分詞的查詢,對於SQL的解決方案也是有的,並且支持Springboot與ElasticSearch的集成就很nice

清洗後的數據樣例:

1	1070934115905789	張亮麻辣燙	30.147708	120.078346	POINT (120.078346 30.147708)	杭州市轉塘街道金街美地商業中心3-133	4816	4.4000001		8	20	20	快餐小喫	轉塘	10:00-23:59	330100	杭州市	2020-03-27 15:16:37.0	
2	1088436107667681	石小吞	30.139899	120.072293	POINT (120.072293 30.139899)	轉塘街道之江泰景大廈2號樓113室-2	4802	4.5		25	20	24	中式簡餐	首爾印象	10:00-21:00	330100	杭州市	2020-03-27 15:16:37.0	
3	969590067570034	蜀匯香麻辣香鍋	30.141992	120.071456	POINT (120.071456 30.141992)	杭州市轉塘街道霞鳴街159號、161號(之江商務中心1號樓商117、118)	4141	3.9000001		5	32	21	麻辣香鍋	首爾印象	09:40-22:30	330100	杭州市	2020-03-27 15:16:37.0	
4	879266905368275	JIMU佶慕創意生日蛋糕	30.276493	120.095462	POINT (120.095462 30.276493)	五聯西苑51號103室	3986	4.5999999		0	0	115	生日蛋糕		06:00-21:00	330100	杭州市	2020-03-27 15:16:37.0	
5	1007591938247898	暖愛蛙蝦跳	30.150440	120.078756	POINT (120.078756 30.15044)	轉塘鎮美院南街象山國際西面2號樓	3983	4.4000001		8	20	19	中式簡餐	轉塘	10:00-23:00	330100	杭州市	2020-03-27 15:16:37.0	
6	953075918354303	杭粥西糊	30.150265	120.078895	POINT (120.078895 30.150265)	浙江省杭州市西湖區轉塘街道美院南街89號2號樓2樓216室	3071	4.69999981		8	0	15	快餐小喫	轉塘	07:00-21:00	330100	杭州市	2020-03-27 15:16:37.0	
7	1026945060845891	七號の茶	30.147755	120.079269	POINT (120.079269 30.147755)	轉塘街道金街美地商業中心2號樓118商鋪	3050	4.5999999		55	20	12	奶茶果汁	轉塘	09:45-20:45	330100	杭州市	2020-03-27 15:16:37.0	
8	891503267213030	二條輕食	30.143201	120.069550	POINT (120.06955 30.143201)	轉塘街道萬美商務中心5號樓313號	2715	4.69999981		75	18	23	沙拉		10:00-20:00	330100	杭州市	2020-03-27 15:16:37.0	
9	885451658268086	韓味購炸雞啤酒屋	30.147508	120.078562	POINT (120.078562 30.147508)	轉塘金街美的商業中心3號樓206室(一點點樓上)	2375	4.5999999	品牌	55	20	30	炸雞炸串	轉塘	00:00-02:00,09:40-21:00	330100	杭州市	2020-03-27 15:16:37.0	
10	1010946307684654	鍋sir時尚火鍋外賣	30.299309	120.113058	POINT (120.113058 30.299309)	拱墅區塘萍路157號	2137	4.30000019	品牌	68	30	83	小火鍋	城西銀泰	00:00-03:00,09:30-23:59	330100	杭州市	2020-03-27 15:16:37.0	

同步MySQL數據到ElasticSearch

1.首先安裝ElasticSearch和Logstash以及Kibana(ElasticSearch安裝使用head插件)

2.ElasticSearch創建索引:

curl -XPUT "http://127.0.0.1:9200/mt"

或者直接使用head插件創建也可以。

導入地理座標數據需要指定字段gis數據格式爲geo_point,指定的方法有多種,這裏說兩種:

1)利用template模板指定gis字段爲地理座標類型(geo_point)

2)直接在kibana控制檯指定gis座標爲地理座標類型(geo_point)

我這裏使用第二種方法:(使用postman工具)

post http://ip:9200/mt/_mapping 

{
   "properties": {
            "gis": {
                "type": "geo_point"
            }
        }
}

只需指定這一個特殊字段即可,其餘字段會在導入的時候自動和相應的字段類型進行匹配

3.Logstash安裝插件:

logstash-input-jdbc

logstash-output-elasticsearch

切換到logstash-7.0.0的home目錄:

mkdir templete
cd templete
vi logstash.json

#logstash模版,導入ElasticSearch的時候對string類型的字段進行分詞,使用的IK分詞插件
{
    "index_patterns": ["*"],
  "order" : 0,
  "version": 1,
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas":0
  },
    "mappings": {
      "date_detection": true,
      "numeric_detection": true,
            "dynamic_templates": [
                {
                    "string_fields": {
                        "match": "*",
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "text",
                            "norms": false,
                            "analyzer": "ik_max_word",
                            "fields": {
                                "keyword": {
                                    "type": "keyword"
                                }
                            }
                        }
                    }
                }
            ]
    }
}
cd bin
mkdir config
cd config 
#====================================================
vi jdbc.conf


input {
    jdbc {
      jdbc_connection_string => "jdbc:mysql://xxx:3306/xxx?characterEncoding=UTF-8&useSSL=false&autoReconnect=true"
      jdbc_user => "xxx"
      jdbc_password => "xxx"
      jdbc_driver_library => "/data/app/mysql-connector-java-5.1.46.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"

      jdbc_default_timezone => "Asia/Shanghai"

      statement_filepath => "/data/app/logstash-7.0.0/bin/config/jdbc.sql"
      schedule => "* * * * *"
      type => "jdbc"
    }
}



filter {
    #將sql裏的兩個座標字段指定數據格式
    mutate {
      convert => { "longitude" => "float" }
      convert => { "latitude" => "float" }
    }

    #將兩個座標字段合併成一個字段,注意:字段名必須爲lon,lat,否則報錯
    mutate {
      rename => {
          "lon" => "[gis][longitude]"
          "lat" => "[gis][latitude]"
      }
    }
}


# elasticsearch7.x只允許一個index下只能有一種type類型
output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "mt"
        document_type => "_doc" 
        document_id => "%{id}"
    }
}

#====================================================
vi jdbc.sql


select id,poi_id, shop_name,latitude,longitude,concat_ws(',',latitude,longitude) as gis,address, month_sales, score, type_icon,ship_fee,min_price,average_price,third_category,trade_area,ship_time,city_code,city_name,crawl_time,tag from id_mt_shoplist_test

啓動Logstash:

bin/logstash -f config/jdbc.conf &

之後就會開始同步數據,同步完成之後,查詢(postman工具)

#查詢附近1km之內有多少家店
get http://ip:9200/mt/_search  


{
    "query": {
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_distance": {
                    "distance": "1km",
                    "gis": {
                        "lat": 31.299600,
                        "lon": 121.156099
                    }
                }
            }
        }
    }
}

結果:

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 96,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "27924",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.290725,00121.155756",
                    "address": "安亭鎮澤普路600號",
                    "third_category": "小龍蝦",
                    "type": "jdbc",
                    "latitude": 31.290725,
                    "trade_area": "新源路",
                    "average_price": 85.0,
                    "longitude": 121.155756,
                    "city_name": "蘇州市",
                    "poi_id": "1003790892242891",
                    "id": 27924,
                    "@version": "1",
                    "score": 4.300000190734863,
                    "ship_time": "00:00-01:00,09:00-23:59",
                    "month_sales": 155,
                    "min_price": 300.0,
                    "@timestamp": "2020-04-08T01:53:50.402Z",
                    "ship_fee": 48.0,
                    "shop_name": "盱眙兄弟龍蝦",
                    "crawl_time": "2020-04-07T10:47:00.000Z",
                    "city_code": "320500",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "28332",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.290918,00121.156806",
                    "address": "澤浦路599號",
                    "third_category": "小龍蝦",
                    "type": "jdbc",
                    "latitude": 31.290918,
                    "trade_area": "新源路",
                    "average_price": 248.0,
                    "longitude": 121.156806,
                    "city_name": "蘇州市",
                    "poi_id": "921039757353389",
                    "id": 28332,
                    "@version": "1",
                    "score": 0.0,
                    "ship_time": "00:00-01:00,08:50-23:59",
                    "month_sales": 70,
                    "min_price": 300.0,
                    "@timestamp": "2020-04-08T01:53:50.470Z",
                    "ship_fee": 48.0,
                    "shop_name": "辣首龍蝦",
                    "crawl_time": "2020-04-07T10:47:12.000Z",
                    "city_code": "320500",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "78062",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.300276,00121.153649",
                    "address": "安亭鎮新源路796號1層",
                    "third_category": "地方小喫",
                    "type": "jdbc",
                    "latitude": 31.300276,
                    "trade_area": "新源路",
                    "average_price": 23.0,
                    "longitude": 121.153649,
                    "city_name": "上海市",
                    "poi_id": "1003872496642121",
                    "id": 78062,
                    "@version": "1",
                    "score": 4.400000095367432,
                    "ship_time": "07:00-21:35",
                    "month_sales": 238,
                    "min_price": 20.0,
                    "@timestamp": "2020-04-08T01:54:02.687Z",
                    "ship_fee": 2.0,
                    "shop_name": "安亭老街湯糰",
                    "crawl_time": "2020-04-07T11:12:15.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "78066",
                "_score": 1.0,
                "_source": {
                    "type_icon": "品牌",
                    "gis": "31.293330,00121.163573",
                    "address": "上海市嘉定區安亭鎮墨玉路73-75號4幢1層101室",
                    "third_category": "生日蛋糕",
                    "type": "jdbc",
                    "latitude": 31.29333,
                    "trade_area": "安亭",
                    "average_price": 226.0,
                    "longitude": 121.163573,
                    "city_name": "上海市",
                    "poi_id": "1012702949403599",
                    "id": 78066,
                    "@version": "1",
                    "score": 4.300000190734863,
                    "ship_time": "08:00-18:00",
                    "month_sales": 192,
                    "min_price": 100.0,
                    "@timestamp": "2020-04-08T01:54:02.687Z",
                    "ship_fee": 0.0,
                    "shop_name": "GANSO元祖蛋糕",
                    "crawl_time": "2020-04-07T11:12:15.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "78067",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.298283,00121.157455",
                    "address": "安亭鎮阜康路199弄213號1-1號(安亭幼兒園對面)",
                    "third_category": "奶茶果汁",
                    "type": "jdbc",
                    "latitude": 31.298283,
                    "trade_area": "安亭",
                    "average_price": 17.0,
                    "longitude": 121.157455,
                    "city_name": "上海市",
                    "poi_id": "1084059536052905",
                    "id": 78067,
                    "@version": "1",
                    "score": 5.0,
                    "ship_time": "00:00-10:00,10:00-23:59",
                    "month_sales": 176,
                    "min_price": 85.0,
                    "@timestamp": "2020-04-08T01:54:02.687Z",
                    "ship_fee": 20.0,
                    "shop_name": "MaxSee熱麥喜",
                    "crawl_time": "2020-04-07T11:12:15.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "78070",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.291620,00121.158791",
                    "address": "新源路198號104室",
                    "third_category": "湯類",
                    "type": "jdbc",
                    "latitude": 31.29162,
                    "trade_area": "新源路",
                    "average_price": 28.0,
                    "longitude": 121.158791,
                    "city_name": "上海市",
                    "poi_id": "882432296311281",
                    "id": 78070,
                    "@version": "1",
                    "score": 4.099999904632568,
                    "ship_time": "00:00-04:00,04:00-23:58",
                    "month_sales": 161,
                    "min_price": 30.0,
                    "@timestamp": "2020-04-08T01:54:02.687Z",
                    "ship_fee": 5.0,
                    "shop_name": "馬氏古法牛肉湯",
                    "crawl_time": "2020-04-07T11:12:16.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "79067",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.306135,00121.155105",
                    "address": "安亭鎮民豐路950號",
                    "third_category": "麪館",
                    "type": "jdbc",
                    "latitude": 31.306135,
                    "trade_area": "安亭",
                    "average_price": 27.0,
                    "longitude": 121.155105,
                    "city_name": "上海市",
                    "poi_id": "890824662441434",
                    "id": 79067,
                    "@version": "1",
                    "score": 0.0,
                    "ship_time": "00:00-01:45,09:30-23:59",
                    "month_sales": 36,
                    "min_price": 20.0,
                    "@timestamp": "2020-04-08T01:54:02.822Z",
                    "ship_fee": 2.0,
                    "shop_name": "新柴浜麪館",
                    "crawl_time": "2020-04-07T11:12:47.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "79068",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.305605,00121.152790",
                    "address": "安亭鎮民豐路999號蘭塘菜市場內8號",
                    "third_category": "滷味熟食",
                    "type": "jdbc",
                    "latitude": 31.305605,
                    "trade_area": "新源路",
                    "average_price": 60.0,
                    "longitude": 121.15279,
                    "city_name": "上海市",
                    "poi_id": "896953580776452",
                    "id": 79068,
                    "@version": "1",
                    "score": 0.0,
                    "ship_time": "06:30-20:35",
                    "month_sales": 36,
                    "min_price": 20.0,
                    "@timestamp": "2020-04-08T01:54:02.822Z",
                    "ship_fee": 1.0,
                    "shop_name": "南京鹽水鴨夫妻肺片",
                    "crawl_time": "2020-04-07T11:12:47.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "79075",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.293785,00121.156816",
                    "address": "安亭鎮新源路274號1層",
                    "third_category": "",
                    "type": "jdbc",
                    "latitude": 31.293785,
                    "trade_area": "安亭",
                    "average_price": 0.0,
                    "longitude": 121.156816,
                    "city_name": "上海市",
                    "poi_id": "974713963614325",
                    "id": 79075,
                    "@version": "1",
                    "score": 0.0,
                    "ship_time": "00:00-23:59",
                    "month_sales": 26,
                    "min_price": 0.0,
                    "@timestamp": "2020-04-08T01:54:02.822Z",
                    "ship_fee": 0.0,
                    "shop_name": "愛尚花藝慶典",
                    "crawl_time": "2020-04-07T11:12:47.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            },
            {
                "_index": "mt",
                "_type": "_doc",
                "_id": "78856",
                "_score": 1.0,
                "_source": {
                    "type_icon": "",
                    "gis": "31.298157,00121.155791",
                    "address": "安亭鎮阜康西路269號1層",
                    "third_category": "麪包/小蛋糕",
                    "type": "jdbc",
                    "latitude": 31.298157,
                    "trade_area": "新源路",
                    "average_price": 38.0,
                    "longitude": 121.155791,
                    "city_name": "上海市",
                    "poi_id": "931498002711234",
                    "id": 78856,
                    "@version": "1",
                    "score": 5.0,
                    "ship_time": "07:30-21:00",
                    "month_sales": 75,
                    "min_price": 15.0,
                    "@timestamp": "2020-04-08T01:54:02.798Z",
                    "ship_fee": 2.0,
                    "shop_name": "緹小貝麪包坊",
                    "crawl_time": "2020-04-07T11:12:40.000Z",
                    "city_code": "310100",
                    "tag": ""
                }
            }
        ]
    }
}

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章