預排序概述
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules-index-sorting.html
在Elasticsearch中創建新索引時,可以配置如何對每個碎片中的段進行排序。默認情況下,Lucene不應用任何排序。index.sort.*設置定義應使用哪些字段對每個段中的文檔進行排序。當查找TopN文檔時,默認需要遍歷所有文檔纔可以找到所有的相關文檔,但當配置了索引排序後,如果檢索排序與索引排序一致,則每個分片上只需要檢索前N個文檔即可,這樣便可以提前結束查詢,減少計算和提升性能。
以下示例演示如何在單個字段上定義排序:
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date",
"sort.order" : "desc"
}
},
"mappings": {
"_doc": {
"properties": {
"date": {
"type": "date"
}
}
}
}
}
也可以按多個字段對索引進行排序:
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : ["username", "date"],
"sort.order" : ["asc", "desc"]
}
},
"mappings": {
"_doc": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"date": {
"type": "date"
}
}
}
}
}
您可以使用以下命令搜索最近10個文檔:
GET /events/_search
{
"size": 10,
"sort": [
{ "timestamp": "desc" }
]
}
具體例子
新建索引 essort ,根據 clickcount 預排序
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"sort.field": "clickcount",
"sort.order": "desc"
},
"index.write.wait_for_active_shards": 1
},
"mappings": {
"doc": {
"properties": {
"logtime": {
"type": "date"
},
"clickcount": {
"type": "integer"
},
"title": {
"type": "text"
},
"docid": {
"type": "keyword"
},
"desc": {
"type": "text"
}
}
}
}
}
插入記錄 n 條
put essort/doc/1003
{
"title": "redis",
"clickcount": 150,
"docid": "1003",
"desc": "電腦",
"logtime": "2018-12-24T08:12:12Z"
}
.....
put essort/doc/2003
{
"title": "redis",
"clickcount": 950,
"docid": "2003",
"desc": "電腦",
"logtime": "2019-12-24T08:12:12Z"
}
強制合併索引
force merge
讀取lucene文件
拷貝ES索引原始lucene 文件
D:\soft\elasticsearch-6.2.4\elasticsearch-6.2.4\data\nodes\0\indices\LG5iHCmeTkqrhJ-bzp3UOA\0\index
到 F:\index 目錄
利用lucene api 讀取lucene文件內容
public class SearchEsIndex {
private String dir;
public SearchEsIndex(String path) {
this.dir = path;
}
/**
* 獲取寫入
*
* @return
* @throws IOException
*/
public IndexWriter getWriter() throws IOException {
//寫入索引文檔的路徑
Directory directory = FSDirectory.open(Paths.get(dir));
//中文分詞器
Analyzer analyzer = new SmartChineseAnalyzer();
//保存用於創建IndexWriter的所有配置。
IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer);
return new IndexWriter(directory, iwConfig);
}
/**
* 獲取讀取
*
* @return
* @throws Exception
*/
public IndexReader getReader() throws Exception {
//寫入索引文檔的路徑
Directory directory = FSDirectory.open(Paths.get(dir));
return DirectoryReader.open(directory);
}
/**
* 根據字段和值查詢
*
* @param field
* @param q
* @throws Exception
*/
public void searchForField(String field, String q) throws Exception {
IndexReader reader = getReader();
// 建立索引查詢器
IndexSearcher is = new IndexSearcher(reader);
//中文分詞器(查詢分詞器要和存儲分詞器一致)
Analyzer analyzer = new SmartChineseAnalyzer();
// 建立查詢解析器
QueryParser parser = new QueryParser(field, analyzer);
// 根據傳進來的p查找
Query query = parser.parse(q);
// 計算索引開始時間
long start = System.currentTimeMillis();
// 開始查詢
TopDocs hits = is.search(query, 10);
// 計算索引結束時間
long end = System.currentTimeMillis();
System.out.println("匹配 " + q + " ,總共花費" + (end - start) + "毫秒" + "查詢到" + hits.totalHits + "個記錄");
// 遍歷hits.scoreDocs,得到scoreDoc
for (ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = is.doc(scoreDoc.doc);
System.out.println("docId:" + scoreDoc.doc + "," + doc);
BytesRef source = doc.getBinaryValue("_source");
System.out.println("source:"+source.utf8ToString());
}
// 關閉reader
reader.close();
}
public static void main(String[] args) {
SearchEsIndex searchEsIndex = new SearchEsIndex("F:\\index");
try {
searchEsIndex.searchForField("*", "*");
} catch (Exception e) {
e.printStackTrace();
}
}
}
打印lucene文件數據
可見根據clickcount預排序後,lucene中的文檔是根據clickcount降低排列的,docid 爲 0 的 clickcount 最大。