最終,我還是回到了成都。三年上海,曾經厭倦了的繁華都市這些天又常在我眼前浮現,勾起了我一次又一次的思念。
在經過幾家公司的面試後,懷揣着對於技術的熱愛,最終在幾份offer中選擇了一家薪資待遇最少規模也最小的創業公司。在工作了一兩週後公司給我的感覺並沒有像面試官描述的那樣,體驗後發現對技術充滿熱情的同事很少,以至於在工作時常常自我懷疑自己的選擇。那天我又厚着臉皮詢問另一家曾經給了offer的公司的hr能否再給一次反悔的機會,但得到了婉拒,我想這一次真是我錯了,但這就是成年人的世界。加油!!!!!!
最近的項目中遇到一個類似這樣的需求:要求按照用戶當前的位置獲得附近的停車場,按照距離遠近排序由近到遠排序,其存到es的index爲:
PUT parking_index
{
"mappings" : {
"doc" : {
"properties" : {
"state" : {
"type" : "short"
},
"location" : {
"type": "geo_point"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"crt_time" : {
"type" : "date"
}
}
}
}
}
其數據內容如下:
PUT parking_index/_doc/1
{
"state":1,
"name": "天府三街一號停車場",
"location": [ -71.34, 66.12 ],
"crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/2
{
"state":3,
"name": "科學城二號停車場",
"location": [ -72.32, 69.20 ],
"crt_time":"2016-01-01T12:10:30Z"
}
PUT parking_index/_doc/3
{
"state":3,
"name": "天府五街三號停車場",
"location": [ -77.39, 63.12 ],
"crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/4
{
"state":1,
"name": "世紀城四號停車場",
"location": [ -69.31, 68.123 ],
"crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/5
{
"state":1,
"name": "天府五街五號停車場",
"location": [ -90.101, 80.67 ],
"crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/6
{
"state":2,
"name": "孵化園六號停車場",
"location": [ -79.36, 60.12 ],
"crt_time":"2015-01-01T12:10:30Z"
}
geo distance query
官方geo距離查詢文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-geo-distance-query.html
官方的排序文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-sort.html
那麼按照用戶當前的位置獲得附近的停車場,按照距離遠近排序由近到遠排序,並分頁的查詢語句則爲:
GET parking_index/doc/_search
{
"size": 10,
"sort": [{
"_geo_distance": {
"location": {
"lat": 60.10,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}]
}
結果爲:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "6",
"_score" : null,
"_source" : {
"state" : 2,
"name" : "孵化園六號停車場",
"location" : [
-79.36,
60.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2.223897568915248
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
351.54896748966985
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
780.1677819779763
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "4",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "世紀城四號停車場",
"location" : [
-69.31,
68.123
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1013.9586549777497
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "2",
"_score" : null,
"_source" : {
"state" : 3,
"name" : "科學城二號停車場",
"location" : [
-72.32,
69.2
],
"crt_time" : "2016-01-01T12:10:30Z"
},
"sort" : [
1064.2891333474859
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "天府五街五號停車場",
"location" : [
-90.101,
80.67
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2312.8208940710206
]
}
]
}
}
如果需要分頁,則可以爲:
官方的from size文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-from-size.html
官方的scroll文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html
官方的search after文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-search-after.html
GET parking_index/doc/_search
{
"size": 3,
"search_after":[0],
"sort": [{
"_geo_distance": {
"location": {
"lat": 60.10,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}]
}
這裏選擇search_after的原因是from size性能差,scroll是基於快照的不能靈活查看上一頁且數據不實時,綜合考慮search_after的性能最優最合適。上面的參數中的search_after指的是從第0條記錄開始,返回3(size爲3)條記錄
其結果爲:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "6",
"_score" : null,
"_source" : {
"state" : 2,
"name" : "孵化園六號停車場",
"location" : [
-79.36,
60.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2.223897568915248
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
351.54896748966985
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
780.1677819779763
]
}
]
}
}
官方的prefix文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-prefix-query.html
如果需求爲搜索出停車場名稱以天府開頭的停車場,並以距離排序且進行分頁則爲:
GET parking_index/doc/_search
{
"size": 10,
"search_after":[0],
"query": {
"prefix": {
"name.keyword": {
"value": "天府"
}
}
},
"sort": [{
"_geo_distance": {
"location": {
"lat": 60.10,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}]
}
其結果爲:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : null,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
351.54896748966985
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
780.1677819779763
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "5",
"_score" : null,
"_source" : {
"state" : 1,
"name" : "天府五街五號停車場",
"location" : [
-90.101,
80.67
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2312.8208940710206
]
}
]
}
}
function_score
官方的function_score文檔爲:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-function-score-query.html
如果需求爲搜索出停車場名稱以天府開頭的停車場,並以距離排序且進行分頁,但讓狀態爲1的排在前面則可以爲:
GET parking_index/doc/_search
{
"from": 0,
"size": 20,
"query": {
"function_score": {
"query": {
"prefix": {
"name.keyword": {
"value": "天府"
}
}
},
"functions": [
{
"filter": {
"match": {
"state": 1
}
},
"weight": 100
}
]
}
},
"sort": [
{
"_score": "desc"
},
{
"_geo_distance": {
"location": {
"lat": 60.1,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}
]
}
其結果爲:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : 100.0,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
100.0,
780.1677819779763
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "5",
"_score" : 100.0,
"_source" : {
"state" : 1,
"name" : "天府五街五號停車場",
"location" : [
-90.101,
80.67
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
100.0,
2312.8208940710206
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.0,
351.54896748966985
]
}
]
}
}
上面的需求採用了function score來實現對於特殊的字段值的自定義排序,下面是我從es官網的function score文檔中的簡短學習理解筆記
什麼是function_score?
The
function_score
allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.To use
function_score
, the user has to define a query and one or more functions, that compute a new score for each document returned by the query.
function_score可以讓我們在查詢數據時修改文檔的分數,並且將分數也返回給我們,這樣我們就可以通過score來進行排序了。特別是當我們遇到了一些比較複雜的排序操作時,比如我們想讓查詢中的某些字段的特定值排在某個順序時,那麼function score就非常有用了.(比如上面的例子中想讓state爲1的排在最前面)
function_score的語法要求我們必須定義一個query語句,同時還必須至少寫一個function函數來描述打分的具體細節,計算後的最終分值會在查詢的document中返回
function_score支持的函數
script_score(自定義腳本打分,通過寫腳本的方法自定義打分)
weight(權重,符合某條件時會打多少分)
random_score(隨機)
field_value_factor(字段因子)
- decay functions:
gauss
,linear
,exp(越近越好)
weight
The
weight
score allows you to multiply the score by the providedweight
. This can sometimes be desired since boost value set on specific queries gets normalized, while for this score function it does not. The number value is of type float.
weight可以讓文檔在計算分數時乘就weight所指定的數值,weight的值爲float類型的,可以是小數
上面的例子中也就是用的weight來將state爲1的數據設置了權重而實現的排前的
同時個人感覺weight也是function_score支持的函數類型中最簡單的同時也是實用性最強的一個
script_score
The
script_score
function allows you to wrap another query and customize the scoring of it optionally with a computation derived from other numeric field values in the doc using a script expression
script_score可以讓我們在查詢時通過自定義的腳本對文檔進行打分,對於某些weight處理不了或不好處理的自定義打分情況可用它來實現
上面例子中的查詢語句等價於:
GET parking_index/doc/_search
{
"from": 0,
"size": 20,
"query": {
"function_score": {
"query": {
"prefix": {
"name.keyword": {
"value": "天府"
}
}
},
"script_score": {
"script": "doc['state'].value==1?100:1"
}
}
},
"sort": [
{
"_score": "desc"
},
{
"_geo_distance": {
"location": {
"lat": 60.1,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}
]
}
也就是:
"script_score": {
"script": "doc['state'].value==1?100:1"
}
等價於weight中的:
"functions": [
{
"filter": {
"match": {
"state": 1
}
},
"weight": 100
}
]
Random
The
random_score
generates scores that are uniformly distributed from 0 up to but not including 1. By default, it uses the internal Lucene doc ids as a source of randomness, which is very efficient but unfortunately not reproducible since documents might be renumbered by merges.
random_score可以生成0到1之間的隨機數來給文檔打分
通過它可以讓排序的結果隨機一點,如給用戶推薦某個停車場時當用戶每刷新一次就推薦給他不同的停車場,就可以用如下的查詢語句完成:
GET parking_index/doc/_search
{
"from": 0,
"size": 20,
"query": {
"function_score": {
"query": {
"wildcard": {
"name.keyword": {
"value": "*停車場"
}
}
},
"functions": [
{
"random_score": {
"seed": 666
}
}
]
}
},
"sort": [
{
"_score": "desc"
},
{
"_geo_distance": {
"location": {
"lat": 60.1,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}
]
}
每次傳入的seed不同,其返回的結果順序也就不同
Field Value factor
The
field_value_factor
function allows you to use a field from a document to influence the score. It’s similar to using thescript_score
function, however, it avoids the overhead of scripting. If used on a multi-valued field, only the first value of the field is used in calculations.
field_value_factor函數可以讓我們決定由一個field的值來給文檔打分,和script_score有相似之處,但是比script_score簡單。field _value_factor能完成的功能script_score都能完成。
例如推薦停車場時將空餘停車位多的先推薦給用戶,上面的demo數據中假設state是空餘停車位的數據,那麼對應的查詢語句則爲:
GET parking_index/doc/_search
{
"from": 0,
"size": 20,
"query": {
"function_score": {
"query": {
"wildcard": {
"name.keyword": {
"value": "*停車場"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "state",
"factor": 2,
"modifier": "sqrt",
"missing": 1
}
}
]
}
},
"sort": [
{
"_score": "desc"
},
{
"_geo_distance": {
"location": {
"lat": 60.1,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}
]
}
上面中的missing代表不存在state值時打多少分,modifier代表採用的打分函數,支持的函數有:
- none
- log
- log1p
- log2p
- ln
- ln1p
- ln2p
- square
- sqrt
- reciprocal
上面使用的field_value_factor語句轉換爲script_score則爲:
sqrt(2*doc['state'].value)
結果爲:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : 2.4494898,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2.4494898,
351.54896748966985
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "2",
"_score" : 2.4494898,
"_source" : {
"state" : 3,
"name" : "科學城二號停車場",
"location" : [
-72.32,
69.2
],
"crt_time" : "2016-01-01T12:10:30Z"
},
"sort" : [
2.4494898,
1064.2891333474859
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "6",
"_score" : 2.0,
"_source" : {
"state" : 2,
"name" : "孵化園六號停車場",
"location" : [
-79.36,
60.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
2.0,
2.223897568915248
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : 1.4142135,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.4142135,
780.1677819779763
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "4",
"_score" : 1.4142135,
"_source" : {
"state" : 1,
"name" : "世紀城四號停車場",
"location" : [
-69.31,
68.123
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.4142135,
1013.9586549777497
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "5",
"_score" : 1.4142135,
"_source" : {
"state" : 1,
"name" : "天府五街五號停車場",
"location" : [
-90.101,
80.67
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.4142135,
2312.8208940710206
]
}
]
}
}
Decay functions
Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.
衰減函數。類似於通過給定的值來計算出一個圈,然後再給定一個點,求出這個圈子內的數據,距離給定的點越遠則打分越低,越近則越高。
其控制參數如下:
- origin(原點,期望值。類似於:最好)
- offset(偏移值。類似於:也可以)
- scale(衰減範圍,類似於:實在不行也可以)
- decay(衰減值,默認爲0.5,與偏移距離有關)
其支持的函數有:支持gauss(高斯函數)、lin(線性函數)、exp(指數函數),具體可以看下圖
如商城裏的價格搜索就可以用此方式來實現。
對於上面例子中,如果想找到距離用戶360KM之內的停車場,實在找不到的話800KM內也可以的。那麼查詢語句則爲:
GET parking_index/doc/_search
{
"from": 0,
"size": 20,
"query": {
"function_score": {
"query": {
"wildcard": {
"name.keyword": {
"value": "*停車場"
}
}
},
"linear":
{
"location": {
"origin":{ "lat":60.1,"lon": -79.36},
"offset": "360km",
"scale": "800km",
"decay": 0.5
}
}
}
},
"sort": [
{
"_score": "desc"
},
{
"_geo_distance": {
"location": {
"lat": 60.1,
"lon": -79.36
},
"unit": "km",
"order": "asc"
}
}
]
}
結果爲:
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"state" : 2,
"name" : "孵化園六號停車場",
"location" : [
-79.36,
60.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.0,
2.223897568915248
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"state" : 3,
"name" : "天府五街三號停車場",
"location" : [
-77.39,
63.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
1.0,
351.54896748966985
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "1",
"_score" : 0.7373951,
"_source" : {
"state" : 1,
"name" : "天府三街一號停車場",
"location" : [
-71.34,
66.12
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
0.7373951,
780.1677819779763
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "4",
"_score" : 0.5912758,
"_source" : {
"state" : 1,
"name" : "世紀城四號停車場",
"location" : [
-69.31,
68.123
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
0.5912758,
1013.9586549777497
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "2",
"_score" : 0.5598193,
"_source" : {
"state" : 3,
"name" : "科學城二號停車場",
"location" : [
-72.32,
69.2
],
"crt_time" : "2016-01-01T12:10:30Z"
},
"sort" : [
0.5598193,
1064.2891333474859
]
},
{
"_index" : "parking_index",
"_type" : "doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"state" : 1,
"name" : "天府五街五號停車場",
"location" : [
-90.101,
80.67
],
"crt_time" : "2015-01-01T12:10:30Z"
},
"sort" : [
0.0,
2312.8208940710206
]
}
]
}
}
從結果可以看出,距離給定的座標越遠,則分越低,處於offet內的距離爲滿分,處於scale內的開始分值按照距離進行了衰減
如有問題歡迎提問交流,共同學習共同進步!