ElasticSearch update api 和 update_by_query哪家強

很久沒有怎麼隨記筆記了，今天這裏是爲了糾正一個一直以來我們使用es的一個誤區，這個誤區很大的可能你會就範。很多童靴會把update_by_query拿mysql的語法特性來用，那你就大錯特錯了，這裏有必要溫習下我之前的一篇update_by_query，理論上講es的準實時的僅限於search，而get id則是實時的。

實踐往往是檢驗真理的唯一標準，看下面演示吧

<!--1.關閉refresh-->
http://xxx:9200/mytest_user/_settings
{
    "index" : {
        "refresh_interval" : -1
    }
}
<!--2.新增-->
http://xxx:9200/mytest_user/_doc/2
{
	"product_type": "test",
  	"product_code": "324049",
	"shop_code": "9N72"
}
<!--3.修改，可以併發但並沒有發生更新丟失-->
http://xxx:9200/mytest_user/_doc/2/_update
{ "doc" : {
        "name" : "new_name"
    }
}
http://xxx:9200/mytest_user/_doc/2/_update
{ "doc" : {
        "name" : "alex",
        "age" : 20
    }
}
...
<!--4.update_by_query，不會有任何效果-->
http://xxx:9200/mytest_user/_update_by_query?conflicts=proceed
{
  
  "query" : {
    "term" : { "product_code": "324049" }
  },
   "script": {
    "source": "ctx._source.en_product_name='cn';ctx._source.plu_code='00';"
  }
}
<!--5.有最新的數據-->
http://xxx:9200/mytest_user/_doc/2

沒錯_update_by_query使用了search，顧沒有任何反應。而update api藉助get API的實時性做到了（即先根據文檔ID做一次GET，然後拿最新文檔修改後寫回去），而get API爲此有個參數可以控制的是爲非實時（http://xxx:9200/mytest_user/_doc/4?realtime=false）。

realtime

官方介紹，默認情況下，get API是實時的，並且不受索引刷新率的影響（當數據在搜索中變爲可見時）。如果文檔已更新但尚未刷新，則get API將發出刷新調用以使文檔可見。這還會使上次刷新後的其他文檔可見。爲了禁用realtime GET，可以將realtime參數設置爲false。

update API的文檔和源碼都沒有提供一個“禁用”實時性的參數。update對GET的調用，傳入的realtime是寫死爲true的。

爲何get API會要求實時？

update允許對文檔做部分字段更新。如果有2個請求分別更新了不同的字段，可能先更新的數據只在writter buffer裏，searcher裏看不到，那後面的更新還是在老版本文檔上做的，造成部分更新丟失。

上面的結論我們藉助cat的監控可以看到：http://xxx:9200/_cat/segments/mytest_user?v

如果realtime設置爲false，就從searcher裏面拿，而searcher只能訪問refresh過的數據。剛寫入的數據存在於index writter buffer裏，暫時無法搜索到，所以這種方式拿到的數據是準實時的。

在5.x版本以上，實時則能夠訪問到index writter buffer裏的數據，並且還執行了強制刷新（並非refresh_interval），生成了新的segment file。如果短時間反覆大量更新相同doc id的操作，會因爲過於頻繁的refresh短時間生成很多小segment，繼而不斷做短合產生性能損耗。官方認爲，在提升大多數應用場景性能情況下，對於這種較少見的場景下的性能損失是值得的，應該在應用層面解決。

注：update方法更新文檔，如果關閉了Upsert，意味着如果更新的文檔id如果不存在，會拋出doc missing異常，大量拋出和捕獲doc missing異常開銷很高。

在2.4版本中，沒有采用refresh的方式讓數據實時，而是直接訪問的translog來保證GET的實時性。官方在這個變更裏 https://github.com/elastic/elasticsearch/pull/20102 將其更新方式改爲了refresh。理由是之前ES裏有很多地方用translog維護數據的位置，使得很多操作變得很慢，去掉對translog的依賴可以提高性能。

代碼驗證環節

代碼實現中確實有realtime參數和 refresh("realtime_get"); 的函數調用

//源自core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
public GetResult get(Get get, Function<String, Searcher> searcherFactory, LongConsumer onRefresh) throws EngineException {
        assert Objects.equals(get.uid().field(), uidField) : get.uid().field();
        try (ReleasableLock lock = readLock.acquire()) {
            ensureOpen();
            if (get.realtime()) {
                VersionValue versionValue = versionMap.getUnderLock(get.uid());
                if (versionValue != null) {
                    if (versionValue.isDelete()) {
                        return GetResult.NOT_EXISTS;
                    }
                    if (get.versionType().isVersionConflictForReads(versionValue.getVersion(), get.version())) {
                        throw new VersionConflictEngineException(shardId, get.type(), get.id(),
                            get.versionType().explainConflictForReads(versionValue.getVersion(), get.version()));
                    }
                    long time = System.nanoTime();
                    refresh("realtime_get");
                    onRefresh.accept(System.nanoTime() - time);
                }
            }

            // no version, get the version from the index, we know that we refresh on flush
            return getFromSearcher(get, searcherFactory);
        }

現在足以可見如果對es的更新需求特別多，首先需要考慮藉助get API（依賴 _id），否則使用update_by_query還是你手寫的類似語義（先search，再update）都不得不接受更新丟失的問題。

ElasticSearch update api 和 update_by_query哪家強

TDengine docker安裝方法

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

Navicat安裝與激活教程

ElasticSearch update api 和 update_by_query哪家強

再談elasticsearch 高cpu問題

Redis工具列表

es6.x下的filter query

再談elasticsearch下的深度分頁

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結