DataSketches HLL Sketch module

原創

2020-06-30 00:59

上圖是官網的介紹，翻譯後的意思是此模塊提供Apache Druid聚合器爲不同的計數基於HLL sketch來自datasketches數據庫。攝入的時候這個聚合器創建HLL sketch對象存儲在Druid的segments中。在查詢的時候sketches被讀取並且被合併到一起。最後默認情況下，你可以收到提交給sketch的不同值的估計值。此外，還可以使用post聚合器在同一行中生成sketch列的聯合。可以對任何標識符的列使用HLL sketch聚合器。它將返回列的估計基數。

要是想要使用此聚合器，在配置文件中必須要包含：

druid.extensions.loadList=["druid-datasketches"]

聚合器示例：

{
  "type" : "HLLSketchBuild",
  "name" : <output name>,
  "fieldName" : <metric name>,
  "lgK" : <size and accuracy parameter>,
  "tgtHllType" : <target HLL type>,
  "round": <false | true>
 }

{
  "type" : "HLLSketchMerge",
  "name" : <output name>,
  "fieldName" : <metric name>,
  "lgK" : <size and accuracy parameter>,
  "tgtHllType" : <target HLL type>,
  "round": <false | true>
 }

參數的類型及對應的含義：

post的聚合器：

估計：返回不重複計數估計值的兩倍

{
  "type"  : "HLLSketchEstimate",
  "name": <output name>,
  "field"  : <post aggregator that returns an HLL Sketch>,
  "round" : <if true, round the estimate. Default is false>
}

有界的估計：返回HLL sketch中不同的估算數和誤差範圍結果將是一個包含三個雙精度值的數組：估計值、下界和上界。邊界以給定的標準差數提供（可選默認是1）。該值必須是1、2或3的整數值，對應約68.3%、95.4%和99.7%的置信區間。

{
  "type"  : "HLLSketchEstimateWithBounds",
  "name": <output name>,
  "field"  : <post aggregator that returns an HLL Sketch>,
  "numStdDev" : <number of standard deviations: 1 (default), 2 or 3>
}

聯合：

{
  "type"  : "HLLSketchUnion",
  "name": <output name>,
  "fields"  : <array of post aggregators that return HLL sketches>,
  "lgK": <log2 of K for the target sketch>,
  "tgtHllType" : <target HLL type>
}

sketch to string:

{
  "type"  : "HLLSketchToString",
  "name": <output name>,
  "field"  : <post aggregator that returns an HLL Sketch>
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

DataSketches HLL Sketch module

DataSketches HLL Sketch module

深入理解JIT

MySQL實現高可用——MHA

劍指Offer(GC)——老年代垃圾收集器

劍指Offer(網絡)——TCP三次握手

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結