ES刪除數據

es 刪除數據的三種方法

1  因爲高版本的es並不支持批量刪除,所以第一個方法思路,首先查詢es 獲取主鍵id,然後根據id逐個刪除

def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= {
  var searchResponse = client.prepareSearch(index).setTypes("docs")
    .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic)))
    .setSearchType(SearchType.DEFAULT)
    .addStoredField("id")
    .setSize(1000).setScroll(TimeValue.timeValueMinutes(5))
    .execute().actionGet()

  var num = searchResponse.getHits.getHits.length
  // 循環直到遍歷所有數據
  //loop.breakable{
  while (num != 0) {
    println(" num " + num)
    val res = searchResponse.getHits.getHits
    res.foreach { x =>
     val response = client.prepareDelete(index, "docs", x.getId).execute().actionGet()
     // println(response)
    }
    println(s"========================= $index 刪除成功 ${res.length}==${topic}  ==========")
    searchResponse = client.prepareSearchScroll(searchResponse.getScrollId)
      .setScroll(TimeValue.timeValueMinutes(5))
      .execute().actionGet()
    num = searchResponse.getHits.getHits.length
  } // }
}
2 發現逐個刪除的效率有點低,在此基礎上改進,查詢之後採用多線程來刪除
def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= {
  var searchResponse = client.prepareSearch(index).setTypes("docs")
    .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic)))
    .setSearchType(SearchType.DEFAULT)
    .addStoredField("id")
    .setSize(1000).setScroll(TimeValue.timeValueMinutes(5))
    .execute().actionGet()

  var num = searchResponse.getHits.getHits.length
  // 循環直到遍歷所有數據
  //loop.breakable{
  while (num != 0) {
    println(" num " + num)
    val res = searchResponse.getHits.getHits.map(_.getId)
     deleteDocuments(4,res)
    
    println(s"========================= $index 刪除成功 ${res.length}==${topic}  ==========")
    searchResponse = client.prepareSearchScroll(searchResponse.getScrollId)
      .setScroll(TimeValue.timeValueMinutes(5))
      .execute().actionGet()
    num = searchResponse.getHits.getHits.length
  } // }
}
def deleteDocuments(num:Int,arr:Array[String])= {
  val step = if (arr.length % num == 0) arr.length / num else arr.length + 1
  for (i <- 0 until num) {
    new Thread(new Runnable {
      override def run(): Unit = {
        val client = Es_test.getDeleteClient()
        val beginNum = i * step
        val endNum = (i + 1) * step - 1
        for (j <- 0 until arr.length if (j >= beginNum && j <= endNum)) {
          client.prepareDelete("ods_wj_scenes_detail", "docs", arr(i)).execute().actionGet()
        }
        Es_test.close(client)
      }
    })
  }


3 網上發現一個插件elete-by-query,可以實現批量刪除

"org.elasticsearch.plugin" % "delete-by-query" % "2.4.1" % Test
val queryBuilder =  QueryBuilders.boolQuery()
queryBuilder.must(QueryBuilders.matchAllQuery())
val start = new Date().getTime
val response = DeleteByQueryAction.INSTANCE.newRequestBuilder(client).filter(queryBuilder).source("ods_wj_scenes_detail").get()
val deleted = response.getDeleted
val end = new Date().getTime
println(s"=================$deleted=====${end-start}==============")
結果 方法一的效率比較低 方法二效率有提升 採用4個線程刪除數據,76萬數據耗時182883毫秒,方法三 74萬數據 耗時51410毫秒
可以說效率是逐漸提升的,方法三效率更高,其中也發現了一個問題,逐漸提高方法二的線程數,耗時並不會變少,瓶頸主要在查詢上,經測量發現load 1萬數據耗時2s左右,如果能提高查詢效率,方法二的效率也會得到提升

還有 這個插件目前是測試版,和其他部分的兼容性並不是特別好,應用中遇到一個jar包問題,其依賴的jar包缺少一個關鍵的類ReflectUtil

log4j-slf4j-impl.2.8 ,解決的方法就是採用這個jar包的最新版本,我下的是2.9,完美解決這個問題






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章