使用painless將ElasticSearch字符串拆分爲數組

一、實現場景:

ES字符串類型字段imgs,有些歷史數據是用逗號分隔的字符串,需要將歷史數據拆分爲數組形式。

示例:

1.構造測試數據:

創建索引並推送幾條典型的歷史數據,涵蓋以下幾種情況:

  • 逗號分隔字符串;
  • 數組類型;
  • 長度爲0的字符串;
  • 空數組。
PUT test_cj/test/id_1
{
  "imgs": "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
}


PUT test_cj/test/id_2
{
  "imgs": [
    "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
    "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
    "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
    "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
    "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
  ]
}

PUT test_cj/test/id_3
{
  "imgs": ""
}

PUT test_cj/test/id_4
{
  "imgs": []
}

2.確認一下數據。

GET test_cj/_search
[
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_1",
        "_score" : 1.0,
        "_source" : {
          "imgs" : "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_2",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [
            "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
          ]
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_3",
        "_score" : 1.0,
        "_source" : {
          "imgs" : ""
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_4",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [ ]
        }
      }
    ]

3.執行painless腳本

使用painless腳本更新歷史數據。有幾點需要注意:

  • 只更新符合某些條件的數據,可以使用_update_by_query操作,這個例子比較簡單沒有設置query語句。
  • 執行過程中衝突處理方式,這裏使用的是conflicts=proceed,表示繼續執行;
  • painless檢測對象類型使用關鍵字instanceof;
  • painless腳本拆分字符串,想避免使用正則表達式,而是選用了StringTokenizer實現。
POST test_cj/_update_by_query?conflicts=proceed
{
  "script": {
    "source": """
    if(ctx._source['imgs'] instanceof String){
      String s=ctx._source['imgs'];
      ArrayList array=new ArrayList();
      if(!s.isEmpty()){
         String splitter = ",";
         StringTokenizer tokenValue = new StringTokenizer(s, splitter);
         while (tokenValue.hasMoreTokens()) {
            array.add(tokenValue.nextToken());
         }
      }
     ctx._source.imgs=array;
    }
"""
  }
}

4.如果更新數據量較大,需要執行一段時間,期間查看執行進度:

GET _tasks?detailed=true&actions=*byquery

5.查看執行結果。

GET test_cj/_search
[
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_1",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [
            "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg",
            "https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg",
            "https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
          ]
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_2",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [
            "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
            "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
          ]
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_3",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [ ]
        }
      },
      {
        "_index" : "test_cj",
        "_type" : "test",
        "_id" : "id_4",
        "_score" : 1.0,
        "_source" : {
          "imgs" : [ ]
        }
      }
    ]
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章