druid.io架構的個人理解學習 part1 part2 翻譯

原文鏈接：https://medium.com/@leventov/the-problems-with-druid-at-large-scale-and-high-load-part-1-714d475e84c9

part 1

historical 內存使用

從 deep storage 加載segment 到內存中
處理查詢，本地結果緩存

Broker 內存使用

維持 historical 分佈 segment 的全局視圖（ZK 事件訂閱）
處理查詢，本地結果緩存

用戶查詢隨機到一臺broker
broker 能確定要查詢的segment 和 historical，分發子查詢到 historical
broker 聚合 historical 的數據，返回給用戶

No fault tolerance on the query execution path

broker 必須等所有 historical 返回，有一個 historical 很慢或錯誤，整個查詢就很慢或錯誤（但是 historical 是有冗餘備份的，即一個 segment 是存儲到好幾個historical 的，broker 有全局視圖也知曉segment的分佈），所以：
爲什麼一個historical失敗時，broker重試子查詢？

zookeeper, broker, historical 的路由問題：

可能發生的現象

broker 分發子查詢，可能分發到了一個已經與 zookeeper 失聯的 historical
只要 zookeeper 不給通知 historical 有問題，查詢可能一直路由到已經down掉的 historical
historical 有問題，一個查詢導致卡住，影響到其它所有路由到此historical的所有查詢（致命）即：一個查詢就能打垮集羣(可用性？)
OOM 風險

Huge variance in performance of historical nodes (*)

segments 在 historical 的j均衡，確保一個查詢路由到多個 historical 時，每個 historical 查詢儘可能一樣的快（通過前面查詢分析已經知道：查詢是受限制於最慢的那個 historical 的）

historical 不將 deep storage 存儲的 segments 加載到自己的內存和磁盤，而是每次子查詢的時候再從 deep storage 加載 segments, 其最大的缺點：耗時

即：decoupling of storage and compute （存儲和計算去藕）

part2

Issues with ultra-large queries

In ad analytics, time series data sources are generally very “thick”. Reporting queries in our cluster over many months of historical data cover up to millions of segments. The amount of computation required for such queries is enough to saturate the processing capacity of the entire historical layer for up to tens of seconds.(一個大查詢可能包含上百萬的segment, 佔用歷史節點多達幾十秒，幾十分鐘)

對於實時查詢，即使有tier(hot/cold)也沒有完全隔離historical
tier的方式限制的是整個的計算資源，沒有再在進程或線程層面做限制

方案1: 類似 Spark 的查詢方式？
方案2: 隔離做成進程或線程層面的，historical間通信報告查詢情況（實現複雜）

Brokers need to keep the view of the whole cluster in memory

維持全局segments分佈視圖

broker 服務特定datasource,維持需要維持的segments分佈視圖即可 ?

Design of a Cost Efficient Time Series Store for Big Data

文章鏈接

Stream processing system

數據分區，將數據按照 interval 進行轉換，壓縮等，也負責查詢

Storage

Computation tree

download data of specific partitions and intervals from Storage and compute partial results for them(從存儲層加載數據，計算部分結果)
merge 第一步處理結果，接收實時數據
處理第2部的結果，平衡計算資源

原則：

Separation of Computation tree and Storage (計算和存儲分開)
Separation of data ingestion (in Stream processing system) and Storage.（數據消費和存儲分開）

druid.io架構的個人理解學習 part1 part2 翻譯

part 1

No fault tolerance on the query execution path

Huge variance in performance of historical nodes (*)

part2

Issues with ultra-large queries

Brokers need to keep the view of the whole cluster in memory

Design of a Cost Efficient Time Series Store for Big Data

Stream processing system

Storage

Computation tree

物理機開關機

HttpClient的使用和各種報錯

Spark 學習筆記（一）概念，demo入門

Hadoop 兩張表，三種 join 方式的實踐

Hadoop MR(In English)

Druid.io 實時和離線任務使用的MiddleManager分離

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結