es 刪除數據的三種方法
1 因爲高版本的es並不支持批量刪除,所以第一個方法思路,首先查詢es 獲取主鍵id,然後根據id逐個刪除
def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= { var searchResponse = client.prepareSearch(index).setTypes("docs") .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic))) .setSearchType(SearchType.DEFAULT) .addStoredField("id") .setSize(1000).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() var num = searchResponse.getHits.getHits.length // 循環直到遍歷所有數據 //loop.breakable{ while (num != 0) { println(" num " + num) val res = searchResponse.getHits.getHits res.foreach { x => val response = client.prepareDelete(index, "docs", x.getId).execute().actionGet() // println(response) } println(s"========================= $index 刪除成功 ${res.length}==${topic} ==========") searchResponse = client.prepareSearchScroll(searchResponse.getScrollId) .setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() num = searchResponse.getHits.getHits.length } // } }
2 發現逐個刪除的效率有點低,在此基礎上改進,查詢之後採用多線程來刪除
def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= { var searchResponse = client.prepareSearch(index).setTypes("docs") .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic))) .setSearchType(SearchType.DEFAULT) .addStoredField("id") .setSize(1000).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() var num = searchResponse.getHits.getHits.length // 循環直到遍歷所有數據 //loop.breakable{ while (num != 0) { println(" num " + num) val res = searchResponse.getHits.getHits.map(_.getId)
deleteDocuments(4,res)
println(s"========================= $index 刪除成功 ${res.length}==${topic} ==========") searchResponse = client.prepareSearchScroll(searchResponse.getScrollId) .setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() num = searchResponse.getHits.getHits.length } // } }
def deleteDocuments(num:Int,arr:Array[String])= { val step = if (arr.length % num == 0) arr.length / num else arr.length + 1 for (i <- 0 until num) { new Thread(new Runnable { override def run(): Unit = { val client = Es_test.getDeleteClient() val beginNum = i * step val endNum = (i + 1) * step - 1 for (j <- 0 until arr.length if (j >= beginNum && j <= endNum)) { client.prepareDelete("ods_wj_scenes_detail", "docs", arr(i)).execute().actionGet() } Es_test.close(client) } }) }
3 網上發現一個插件elete-by-query,可以實現批量刪除
"org.elasticsearch.plugin" % "delete-by-query" % "2.4.1" % Test
val queryBuilder = QueryBuilders.boolQuery() queryBuilder.must(QueryBuilders.matchAllQuery()) val start = new Date().getTime val response = DeleteByQueryAction.INSTANCE.newRequestBuilder(client).filter(queryBuilder).source("ods_wj_scenes_detail").get() val deleted = response.getDeleted val end = new Date().getTime println(s"=================$deleted=====${end-start}==============")
結果 方法一的效率比較低 方法二效率有提升 採用4個線程刪除數據,76萬數據耗時182883毫秒,方法三 74萬數據 耗時51410毫秒
可以說效率是逐漸提升的,方法三效率更高,其中也發現了一個問題,逐漸提高方法二的線程數,耗時並不會變少,瓶頸主要在查詢上,經測量發現load 1萬數據耗時2s左右,如果能提高查詢效率,方法二的效率也會得到提升
還有 這個插件目前是測試版,和其他部分的兼容性並不是特別好,應用中遇到一個jar包問題,其依賴的jar包缺少一個關鍵的類ReflectUtil
log4j-slf4j-impl.2.8 ,解決的方法就是採用這個jar包的最新版本,我下的是2.9,完美解決這個問題