elasticsearch之function score query(自定义排序/打分)从入门到会用

最终,我还是回到了成都。三年上海,曾经厌倦了的繁华都市这些天又常在我眼前浮现,勾起了我一次又一次的思念。

在经过几家公司的面试后,怀揣着对于技术的热爱,最终在几份offer中选择了一家薪资待遇最少规模也最小的创业公司。在工作了一两周后公司给我的感觉并没有像面试官描述的那样,体验后发现对技术充满热情的同事很少,以至于在工作时常常自我怀疑自己的选择。那天我又厚着脸皮询问另一家曾经给了offer的公司的hr能否再给一次反悔的机会,但得到了婉拒,我想这一次真是我错了,但这就是成年人的世界。加油!!!!!!

 







最近的项目中遇到一个类似这样的需求:要求按照用户当前的位置获得附近的停车场,按照距离远近排序由近到远排序,其存到es的index为:

PUT parking_index
{

    "mappings" : {
      "doc" : {
        "properties" : {
          "state" : {
            "type" : "short"
          },
          "location" : {
            "type": "geo_point"
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "crt_time" : {
            "type" : "date"
          }
        }
      }
    }
}

其数据内容如下:


PUT parking_index/_doc/1
{
  "state":1,
  "name": "天府三街一号停车场",
  "location": [ -71.34, 66.12 ],
  "crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/2
{
  "state":3,
  "name": "科学城二号停车场",
  "location": [ -72.32, 69.20 ],
  "crt_time":"2016-01-01T12:10:30Z"
}
PUT parking_index/_doc/3
{
  "state":3,
  "name": "天府五街三号停车场",
  "location": [ -77.39, 63.12 ],
  "crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/4
{
  "state":1,
  "name": "世纪城四号停车场",
  "location": [ -69.31, 68.123 ],
  "crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/5
{
  "state":1,
  "name": "天府五街五号停车场",
  "location": [ -90.101, 80.67 ],
  "crt_time":"2015-01-01T12:10:30Z"
}
PUT parking_index/_doc/6
{
  "state":2,
  "name": "孵化园六号停车场",
  "location": [ -79.36, 60.12 ],
  "crt_time":"2015-01-01T12:10:30Z"
}

geo distance query 

官方geo距离查询文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-geo-distance-query.html

官方的排序文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-sort.html

那么按照用户当前的位置获得附近的停车场,按照距离远近排序由近到远排序,并分页的查询语句则为:

GET parking_index/doc/_search
{
    "size": 10, 
	"sort": [{
		"_geo_distance": {
			"location": {
				"lat": 60.10,
				"lon": -79.36
			},
			"unit": "km",
			"order": "asc"
		}
	}]
}

结果为:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "state" : 2,
          "name" : "孵化园六号停车场",
          "location" : [
            -79.36,
            60.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2.223897568915248
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          351.54896748966985
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          780.1677819779763
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "世纪城四号停车场",
          "location" : [
            -69.31,
            68.123
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1013.9586549777497
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "state" : 3,
          "name" : "科学城二号停车场",
          "location" : [
            -72.32,
            69.2
          ],
          "crt_time" : "2016-01-01T12:10:30Z"
        },
        "sort" : [
          1064.2891333474859
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "天府五街五号停车场",
          "location" : [
            -90.101,
            80.67
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2312.8208940710206
        ]
      }
    ]
  }
}

如果需要分页,则可以为:

官方的from size文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-from-size.html

官方的scroll文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html

官方的search after文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-search-after.html

GET parking_index/doc/_search
{
  "size": 3, 
  "search_after":[0],
	"sort": [{
		"_geo_distance": {
			"location": {
				"lat": 60.10,
				"lon": -79.36
			},
			"unit": "km",
			"order": "asc"
		}
	}]
}

这里选择search_after的原因是from size性能差,scroll是基于快照的不能灵活查看上一页且数据不实时,综合考虑search_after的性能最优最合适。上面的参数中的search_after指的是从第0条记录开始,返回3(size为3)条记录

其结果为:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "state" : 2,
          "name" : "孵化园六号停车场",
          "location" : [
            -79.36,
            60.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2.223897568915248
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          351.54896748966985
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          780.1677819779763
        ]
      }
    ]
  }
}

官方的prefix文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-prefix-query.html

如果需求为搜索出停车场名称以天府开头的停车场,并以距离排序且进行分页则为:

GET parking_index/doc/_search
{
  "size": 10, 
  "search_after":[0],
  "query": {
    "prefix": {
      "name.keyword": {
        "value": "天府"
      }
    }
  }, 
	"sort": [{
		"_geo_distance": {
			"location": {
				"lat": 60.10,
				"lon": -79.36
			},
			"unit": "km",
			"order": "asc"
		}
	}]
}

其结果为:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          351.54896748966985
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          780.1677819779763
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "state" : 1,
          "name" : "天府五街五号停车场",
          "location" : [
            -90.101,
            80.67
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2312.8208940710206
        ]
      }
    ]
  }
}

function_score

官方的function_score文档为:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-function-score-query.html

如果需求为搜索出停车场名称以天府开头的停车场,并以距离排序且进行分页,但让状态为1的排在前面则可以为:

GET parking_index/doc/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "prefix": {
          "name.keyword": {
            "value": "天府"
          }
        }
      },
      "functions": [
        {
          "filter": {
            "match": {
              "state": 1
            }
          },
          "weight": 100
        }
      ]
    }
  },
  "sort": [
    {
      "_score": "desc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 60.1,
          "lon": -79.36
        },
        "unit": "km",
        "order": "asc"
      }
    }
  ]
}

其结果为:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 100.0,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          100.0,
          780.1677819779763
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 100.0,
        "_source" : {
          "state" : 1,
          "name" : "天府五街五号停车场",
          "location" : [
            -90.101,
            80.67
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          100.0,
          2312.8208940710206
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.0,
          351.54896748966985
        ]
      }
    ]
  }
}

上面的需求采用了function score来实现对于特殊的字段值的自定义排序,下面是我从es官网的function score文档中的简短学习理解笔记

什么是function_score?

The function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

To use function_score, the user has to define a query and one or more functions, that compute a new score for each document returned by the query.

function_score可以让我们在查询数据时修改文档的分数,并且将分数也返回给我们,这样我们就可以通过score来进行排序了。特别是当我们遇到了一些比较复杂的排序操作时,比如我们想让查询中的某些字段的特定值排在某个顺序时,那么function score就非常有用了.(比如上面的例子中想让state为1的排在最前面)

function_score的语法要求我们必须定义一个query语句,同时还必须至少写一个function函数来描述打分的具体细节,计算后的最终分值会在查询的document中返回

function_score支持的函数

weight 

The weight score allows you to multiply the score by the provided weight. This can sometimes be desired since boost value set on specific queries gets normalized, while for this score function it does not. The number value is of type float.

weight可以让文档在计算分数时乘就weight所指定的数值,weight的值为float类型的,可以是小数

上面的例子中也就是用的weight来将state为1的数据设置了权重而实现的排前的

同时个人感觉weight也是function_score支持的函数类型中最简单的同时也是实用性最强的一个

script_score 

The script_score function allows you to wrap another query and customize the scoring of it optionally with a computation derived from other numeric field values in the doc using a script expression

script_score可以让我们在查询时通过自定义的脚本对文档进行打分,对于某些weight处理不了或不好处理的自定义打分情况可用它来实现

上面例子中的查询语句等价于:

GET parking_index/doc/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "prefix": {
          "name.keyword": {
            "value": "天府"
          }
        }
      },
      "script_score": {
        "script": "doc['state'].value==1?100:1"
      }
    }
  },
  "sort": [
    {
      "_score": "desc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 60.1,
          "lon": -79.36
        },
        "unit": "km",
        "order": "asc"
      }
    }
  ]
}

也就是:

      "script_score": {
        "script": "doc['state'].value==1?100:1"
      }

等价于weight中的:

      "functions": [
        {
          "filter": {
            "match": {
              "state": 1
            }
          },
          "weight": 100
        }
      ]

Random

The random_score generates scores that are uniformly distributed from 0 up to but not including 1. By default, it uses the internal Lucene doc ids as a source of randomness, which is very efficient but unfortunately not reproducible since documents might be renumbered by merges.

random_score可以生成0到1之间的随机数来给文档打分

通过它可以让排序的结果随机一点,如给用户推荐某个停车场时当用户每刷新一次就推荐给他不同的停车场,就可以用如下的查询语句完成:

GET parking_index/doc/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "wildcard": {
          "name.keyword": {
            "value": "*停车场"
          }
        }
      },
      "functions": [
        {
          "random_score": {
            "seed": 666
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_score": "desc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 60.1,
          "lon": -79.36
        },
        "unit": "km",
        "order": "asc"
      }
    }
  ]
}

每次传入的seed不同,其返回的结果顺序也就不同

Field Value factor

The field_value_factor function allows you to use a field from a document to influence the score. It’s similar to using the script_score function, however, it avoids the overhead of scripting. If used on a multi-valued field, only the first value of the field is used in calculations.

field_value_factor函数可以让我们决定由一个field的值来给文档打分,和script_score有相似之处,但是比script_score简单。field _value_factor能完成的功能script_score都能完成。

例如推荐停车场时将空余停车位多的先推荐给用户,上面的demo数据中假设state是空余停车位的数据,那么对应的查询语句则为:

GET parking_index/doc/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "wildcard": {
          "name.keyword": {
            "value": "*停车场"
          }
        }
      },
      "functions": [
        {
           "field_value_factor": {
                "field": "state",
                "factor": 2,
                "modifier": "sqrt",
                "missing": 1
            }
        }
      ]
    }
  },
  "sort": [
    {
      "_score": "desc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 60.1,
          "lon": -79.36
        },
        "unit": "km",
        "order": "asc"
      }
    }
  ]
}

上面中的missing代表不存在state值时打多少分,modifier代表采用的打分函数,支持的函数有:

  • none
  • log
  • log1p
  • log2p
  • ln
  • ln1p
  • ln2p
  • square
  • sqrt
  • reciprocal

上面使用的field_value_factor语句转换为script_score则为:

sqrt(2*doc['state'].value)

结果为:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 2.4494898,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2.4494898,
          351.54896748966985
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 2.4494898,
        "_source" : {
          "state" : 3,
          "name" : "科学城二号停车场",
          "location" : [
            -72.32,
            69.2
          ],
          "crt_time" : "2016-01-01T12:10:30Z"
        },
        "sort" : [
          2.4494898,
          1064.2891333474859
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "6",
        "_score" : 2.0,
        "_source" : {
          "state" : 2,
          "name" : "孵化园六号停车场",
          "location" : [
            -79.36,
            60.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          2.0,
          2.223897568915248
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.4142135,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.4142135,
          780.1677819779763
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 1.4142135,
        "_source" : {
          "state" : 1,
          "name" : "世纪城四号停车场",
          "location" : [
            -69.31,
            68.123
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.4142135,
          1013.9586549777497
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.4142135,
        "_source" : {
          "state" : 1,
          "name" : "天府五街五号停车场",
          "location" : [
            -90.101,
            80.67
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.4142135,
          2312.8208940710206
        ]
      }
    ]
  }
}

Decay functions

Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.

衰减函数。类似于通过给定的值来计算出一个圈,然后再给定一个点,求出这个圈子内的数据,距离给定的点越远则打分越低,越近则越高。

其控制参数如下:

  • origin(原点,期望值。类似于:最好)
  • offset(偏移值。类似于:也可以)
  • scale(衰减范围,类似于:实在不行也可以)
  • decay(衰减值,默认为0.5,与偏移距离有关)

其支持的函数有:支持gauss(高斯函数)、lin(线性函数)、exp(指数函数),具体可以看下图

如商城里的价格搜索就可以用此方式来实现。

对于上面例子中,如果想找到距离用户360KM之内的停车场,实在找不到的话800KM内也可以的。那么查询语句则为:

GET parking_index/doc/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "wildcard": {
          "name.keyword": {
            "value": "*停车场"
          }
        }
      },
      "linear": 
        {
           "location": {
                "origin":{ "lat":60.1,"lon": -79.36},
                "offset": "360km",
                "scale": "800km",
                "decay": 0.5
            }
        }
    }
  },
  "sort": [
    {
      "_score": "desc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 60.1,
          "lon": -79.36
        },
        "unit": "km",
        "order": "asc"
      }
    }
  ]
}

结果为:

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "state" : 2,
          "name" : "孵化园六号停车场",
          "location" : [
            -79.36,
            60.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.0,
          2.223897568915248
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "state" : 3,
          "name" : "天府五街三号停车场",
          "location" : [
            -77.39,
            63.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          1.0,
          351.54896748966985
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.7373951,
        "_source" : {
          "state" : 1,
          "name" : "天府三街一号停车场",
          "location" : [
            -71.34,
            66.12
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          0.7373951,
          780.1677819779763
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.5912758,
        "_source" : {
          "state" : 1,
          "name" : "世纪城四号停车场",
          "location" : [
            -69.31,
            68.123
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          0.5912758,
          1013.9586549777497
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.5598193,
        "_source" : {
          "state" : 3,
          "name" : "科学城二号停车场",
          "location" : [
            -72.32,
            69.2
          ],
          "crt_time" : "2016-01-01T12:10:30Z"
        },
        "sort" : [
          0.5598193,
          1064.2891333474859
        ]
      },
      {
        "_index" : "parking_index",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 0.0,
        "_source" : {
          "state" : 1,
          "name" : "天府五街五号停车场",
          "location" : [
            -90.101,
            80.67
          ],
          "crt_time" : "2015-01-01T12:10:30Z"
        },
        "sort" : [
          0.0,
          2312.8208940710206
        ]
      }
    ]
  }
}

从结果可以看出,距离给定的座标越远,则分越低,处于offet内的距离为满分,处于scale内的开始分值按照距离进行了衰减

 

如有问题欢迎提问交流,共同学习共同进步!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章