文章目錄
- 灰灰商城-分佈式高級篇-1
- 全文檢索-ElasticSearch
灰灰商城-分佈式高級篇-1
碼雲地址:https://gitee.com/lin_g_g_hui/grey_mall
全文檢索-ElasticSearch
docker 安裝
1、下載鏡像文件
docker pull elasticsearch:7.4.2
docker pull kibana:7.4.2-》elasticsearch的可視化工具
2、創建實例
1. 創建外部ElasticSearch配置文件
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml
注意: 0.0.0.0之前有一個空格
運行後報錯->
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes"
查看日誌-> docker logs elasticsearch
修改權限-任何用戶,組可讀寫執行
chmod -R 777 /mydata/elasticsearch/
2. 運行 ElasticSearch
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
注意:-e ES_JAVA_OPTS="-Xms64m -Xmx512m"
設置佔用內存
查看:
free -m
3. 運行kibana
docker run --name kibana -e ELASTICSEARCH_URL=http://192.168.80.133:9200 -p 5601:5601 -d kibana:7.4.2
注意主機改爲當前的
http://192.168.80.133:9200/
4. 訪問 主機+端口
http://192.168.80.133:9200/ ->返回json數據則成功
http://192.168.80.133:9200/_cat/nodes -> 查看節點
- 注意:使用可視化界面報錯,鏈接ElasticSearch異常
http://192.168.80.133:5601/
前面第2點運行時ip爲docker中ElasticSearch的IP,查看
docker inspect d66aba8770af |grep IPAddress
結果:查找到d66aba8770af的ip爲->172.17.0.4
重新運行:
docker run --name kibana -e ELASTICSEARCH_URL=http://172.17.0.4:9200 -p 5601:5601 -d kibana:7.4.2
繼續配置:
進入docker中Kibana文件配置
docker exec -it kibana /bin/bash
cd /usr/share/kibana/config/
vi kibana.yml
elasticsearch.hosts 改成你es容器的ip,然後將
xpack.monitoring.ui.container.elasticsearch.enabled 改成 false
3、初步檢索
可使用postman 將GET等改爲主機+端口
1. _cat
GET/_cat/nodes : 查看所有節點
GET/_cat/health : 查看es健康狀況
GET/_cat/master : 查看主節點
GET/_cat/indices : 查看所有索引 ==>show databases;
2. 索引一個文檔(保存)
保存一個數據,保存在哪個索引的哪個類型下,指定用哪個唯一標識
PUT customer /external/1; 在customer索引下的external類型下保存1號數據爲
PUT customer/external/1
1號數據的信息:
{
“name”:“Wei-xhh”
}
PUT和POST都可以
POST新增。如果不指定id, 會自動生成id。指定id就會修改這個數據,並新增版本號
PUT可以新增可以修改。PUT必須指定id;由於PUT需要指定id, 我們一般都用來做修改操作,不指定id會報錯。
返回結果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
3. 查詢文檔
GET customer/external/1
返回結果:
{
"_index": "customer", // 在哪個索引
"_type": "external", // 在哪個類型
"_id": "1", // 記錄id
"_version": 2, // 版本號
"_seq_no": 1, // 併發控制字段,每次更新就會+1,用來做樂觀鎖
"_primary_term": 1, // 同上,主分片重新分配,如重啓,就會變化
"found": true,
"_source": { // 真正的內容
"name": "Wei-xhh"
}
}
更新攜帶 ?if_seq_no=0 & if_primary_term = 1
樂觀鎖->併發
1、小明修改1號數據->
http://192.168.80.133:9200/customer/external/1?if_seq_no=0&if_primary_term=1
2、小紅修改1號數據->
http://192.168.80.133:9200/customer/external/1?if_seq_no=0&if_primary_term=1
情況:
小明修改了,->成功,對應的seq_no也自動修改
小紅並不知道已經被小明修改,想要時修改失敗 -> 錯誤碼409
這時小紅必須重新查詢1號數據得到seq_no等於什麼
查詢後小紅得到了seq_no=5,-> 重新發送請求
http://192.168.80.133:9200/customer/external/1?if_seq_no=5&if_primary_term=1
修改成功。
4. 更新文檔
POST customer/external/1/_update
會對比原來數據,與原來一樣就什麼都不做
{
"doc":{
"name":"wei-xhh6666"
}
}
結果:與原來一樣就什麼都不做 “result”: “noop”
_version,_seq_no不變
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 5,
"result": "noop",
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1
}
或者
POST customer/external/1
不會檢查原來的數據
{
"name":"wei-xhh666"
}
或者
PUT customer/external/1
不會檢查原來的數據
{
"name":"wei-xhh66"
}
更新時也可以同時添加屬性
5. 刪除文檔&索引
DELETE customer/external/1
DELETE customer
- bulk 批量API
POST customer/external/ _bulk
{"index":{"_id":"1"}}
{"name":"wei-xhh"}
{"index":{"_id":"2"}}
{"name":"wei-xhh66"}
語法格式:
{action: {metadata}}\n
{request body} \n
{action: {metadata}}\n
{request body} \n
複雜實例
POST / _bulk
{ "delete":{ "_index":"website", "_type":"blog", "_id":"123"}}
{ "create":{ "_index":"website", "_type":"blog", "_id":"123"}}
{ "title":"my first blog post"}
{ "index":{ "_index":"website", "_type":"blog"}}
{ "title":"my second blog post"}
{ "update":{ "_index":"website", "_type":"blog", "_id":"123"}}
{ "doc":{ "title":"my updated blog post"}}
7. 樣本測試數據
https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
可能訪問不通
我有數據,如果找不到訪問不了可以私我
POST /bank/account/_bulk
4、進階檢索
1. SearchAPI
ES支持兩種基本方式檢索:
-
一個是通過使用 REST request URI 發送搜索參數 (uri+檢索參數)
-
另外一個是通過使用 REST request body 來發送它們 (url+請求體)
設置關機自動重啓
docker update 容器id --restart=always
第一種方式:
GET bank/_search?q=*&sort=account_number:asc
第二種方式:
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": "asc"
},
{
"balance": "desc"
}
]
}
2. Query DSL -> 查詢領域對象語言
1、一個查詢語句的典型結構
{
QUERY_NAME:{
ARGUMENT:VALUE,
ARGUMENT:VALUE,...
}
}
- 如果針對某個字段,那麼他的結構如下:
{
QUERY_NAME:{
FIELD_NAME:{
ARGUMENT:VALUE,
ARGUMENT:VALUE,...
}
}
}
例子
GET bank/_search
{
"query": {"match_all": {}},
"sort": [
{
"balance": {
"order": "asc"
}
}
],
"from": 5,
"size": 3,
"_source": ["balance","age"] // 指定值
}
結果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "749",
"_score" : null,
"_source" : {
"balance" : 1249,
"age" : 36
},
"sort" : [
1249
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "402",
"_score" : null,
"_source" : {
"balance" : 1282,
"age" : 32
},
"sort" : [
1282
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "315",
"_score" : null,
"_source" : {
"balance" : 1314,
"age" : 33
},
"sort" : [
1314
]
}
]
}
}
2、 match用法
精確查詢
GET bank/_search
{
"query": {
"match": {
"account_number": "20"
}
}
}
模糊查詢 -> 分詞匹配
GET bank/_search
{
"query": {
"match": {
"address": "Kings"
}
}
}
3、 match_phrase -> 短語匹配 (上述match的增強,模糊搜索以短語,不分詞)
//不分詞匹配
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill lane"
}
}
}
結果:
{
"took" : 1058,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 9.507477,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 9.507477,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "[email protected]",
"city" : "Urie",
"state" : "IL"
}
}
]
}
}
4、 multi_match 多字段匹配
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill movice",
"fields": ["address","city"]
}
}
}
結果:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "[email protected]",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 5.4032025,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "[email protected]",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 5.4032025,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "[email protected]",
"city" : "Blackgum",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "[email protected]",
"city" : "Movico",
"state" : "MT"
}
}
]
}
}
5、 bool 複合查詢
複合查詢可以合併任何其他查詢語句,包括複合語句,這就意味則,複合語句之間可以互相嵌套,可以表達非常複雜的邏輯
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "F"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Wallace"
}
}
]
}
}
}
結果:
{
"took" : 109,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.1104345,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 6.1104345,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "[email protected]",
"city" : "Movico",
"state" : "MT"
}
}
]
}
}
6、 filter 結果過濾
並不是所有的查詢都需要產生分數,特別是那些僅用於filtering (過濾)的文檔,爲了不計算分數Elasticsearch會自動檢查場景並且優化查詢的執行。
GET bank/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gte": 19,
"lte": 30
}
}
}
}
}
}
7、 term, 與match類似
模糊檢索推薦使用match -> 文本字段使用
精確檢索推薦使用term -> 非文本字段使用
GET bank/_search
{
"query": {
"term": {
"age": "28"
}
}
}
match的精確查找
GET bank/_search
{
"query": {
"match": {
"address.keyword": "789 Madison Street"
}
}
}
8、 aggregations (執行聚合)
聚合提供了從數據中分組和提取數據的能力,最簡單的聚合方法大致等於 SQL GROUP BY 和 SQL聚合函數。在Elasticsearch中,您有執行搜索返回hits(命中結果),並且同時返回聚合結果,把一個響應的所有hits(命中結構)分隔開的能力。可以執行查詢和多個聚合,並且在一次使用中得到各自的(任何一個的)返回結果,使用一次簡潔和簡化的API來避免網絡往返。
- 搜索address中包含mill的所有人的年齡分佈以及平均年齡,但不顯示這些人的詳情。
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
}
}
}
結果
{
"took" : 4643,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "[email protected]",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 5.4032025,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "[email protected]",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 5.4032025,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "[email protected]",
"city" : "Blackgum",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "[email protected]",
"city" : "Movico",
"state" : "MT"
}
}
]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"ageAvg" : {
"value" : 34.0
},
"balanceAvg" : {
"value" : 25208.0
}
}
}
可嵌套聚合(道理類似)
3. Mapping
1、字段類型
7.0版本可有可無
未來8.0將取消。
直接將文檔存在某個索引下。
去掉type就是爲了提高ES處理數據的效率。
2、映射(Mapping)
Mapping是用來定義一個文檔,以及它所包含的屬性是如何存儲和索引的。
比如使用mapping來定義。
-
哪些字符串屬性應該被看做全文本屬性。
-
哪些屬性包含數字,日期或者地理位置。
-
文檔中的所有屬性是否都能被索引
-
日期的格式
-
自定義映射規則來執行動態添加屬性。
-
查看mapping信息:
- GET bank/_mapping
-
修改mapping信息
3、新版本下
- 創建映射
規定 my_index 這個索引下的屬性類型
PUT /my_index
{
"mappings": {
"properties": {
"age": {"type": "integer"},
"email":{"type": "keyword"},
"name":{"type": "text"}
}
}
}
- 添加新的字段映射
PUT /my_index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": false
}
}
}
- 更新映射
對於已經存在的映射字段,我們不能更新,更新必須創建新的索引進行數據遷移。
- 數據遷移
先創建出 new_twitter 的正確映射。然後使用如下方式進行數據遷移
修改bank下的mapping
- 創建新的索引,指定mapping規則
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text"
},
"gender": {
"type": "text"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
結果
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "newbank"
}
- 數據遷移
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}
4、分詞
一個tokenizer(分詞器)接收一個字符流,將之分割爲獨立的tokens(詞元,通常是獨立的單詞),然後輸出tokens流。
例如,whitespace tokenizer 遇到空白字符是分割文本,它會將文本 “Quick brown fox!”分割爲[Quick, brown, fox!]
該tokenizer(分詞器)還負責記錄各個term(詞條)的順序或position位置(用於phrase短語和word proximtiy詞近鄰查詢),以及term所代表的原始word的start和end的character offsets(字符偏移量)(用於高亮顯示搜索的內容)
- 標準分詞器
POST _analyze
{
"tokenizer": "standard",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
- 安裝自己的分詞器(ik)
http://github.com/medcl/elasticsearch-analysis-ik/
下載對應版本,可以複製下載地址到迅雷下,非常快
進入容器內部
docker exec -it 容器id /bin/bash
- ik zip 解壓
unzip elasticsearch-analysis-ik-7.4.2.zip
修改權限
chmod -R 777 ik/
重啓elasticsearch。
- 使用ik
ik_smart
POST _analyze
{
"tokenizer": "ik_smart",
"text": "歡迎您的到來"
}
結果
{
"tokens" : [
{
"token" : "歡迎您",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "的",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "到來",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
}
]
}
ik_max_word
POST _analyze
{
"tokenizer": "ik_max_word",
"text": "歡迎您的到來"
}
額外
安裝wget,unzip
yum install wget
yum install unzip
- 自定義詞庫
修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
直接修改外部掛載的文件
重啓容器 -》 如果測試失敗 -》看下面第5步最後一點
5、安裝nginx(爲自定義詞庫創建)
-
隨便啓動一個nginx實例,只是爲了複製出配置
- docker run -p 80:80 --name nginx -d nginx:1.10
-
將容器內的配置文件拷貝到當前目錄;docker container cp nginx:/etc/nginx . (後面還有個點,且點前面有空格)
-
修改文件名稱:mv nginx conf 把這個conf移動到/mydata/nginx下
-
終止原容器:docker stop nginx
-
刪除容器 docker rm 容器id
-
創建新的nginx
docker run -p 80:80 --name nginx
-v /mydata/nginx/html:/usr/share/nginx/html
-v /mydata/nginx/logs:/var/log/nginx
-v /mydata/nginx/conf:/etc/nginx
-d nginx:1.10
- 訪問 主機地址
在/mydata/nginx/html中創建index.html
- 創建分詞文本
mkdir es
vi fenci.txt
訪問:
http://192.168.80.133/es/fenci.txt
- 注意問題:我使用的docker需要像之前安裝kibana找到ealsticsearch一樣,需要得到docker幫它創建的ip,不能直接使用主機ip修改IKAnalyzer.cfg.xml
如圖
這樣就可以配置成功。
6、Elasticsearch-Rest-Client
1、9300:TCP
- spring-data-elasticsearch:transport-api.jar
- spirngboot版本不同
- 7.x已經不建議使用;8後就要廢棄
2、9200:HTTP
- JestClient:非官方;更新慢
- RestTemplate:模擬發送HTTP請求,ES很多操作需要自己分裝,麻煩
- HttpClient:同上
- Elasticsearch-Rest-Client:官方RestClient,封裝了ES操作,API層次分明,上手簡單