作爲國內規模最大的ClickHouse用戶,字節跳動踩過哪些坑? 

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"作者 | 蔡芳芳"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"採訪嘉賓 | 郭東東"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ClickHouse由於其性能方面的突出優勢,正在分析型數據庫領域掀起一波新的技術浪潮。作爲國內規模最大的 ClickHouse 用戶,目前字節跳動內部的 ClickHouse 節點總數超過15000個,管理總數據量超過 600PB,最大的集羣規模在 2400 餘個節點。實際上,字節跳動廣泛的業務增長分析很多都建立在 ClickHouse 爲基礎的查詢引擎上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"那麼,ClickHouse具體應用於字節跳動哪些業務場景?爲什麼選擇採用ClickHouse而不是其他數據分析技術?在使用ClickHouse的過程中,字節跳動內部團隊又踩過哪些坑?近日,InfoQ帶着上述問題採訪了字節跳動數據平臺數據應用研發負責人郭東東。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"字節跳動數據應用產品"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您在奇虎360工作的時候也曾負責大數據平臺建設,能否基於您自己的感受,談談360和字節兩家企業建設大數據平臺的側重點有哪些不同?(比如場景、需求、技術棧等等)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"兩家公司的發展階段,包括本身數據的體量都有一些差異,所以這兩個公司可能在建設上有一些比較相通的地方,也有一些差異化。在360那時候主要是Hadoop生態剛剛興起,當時更多的工作是把Hadoop、HBase等一系列大數據技術引入到360,去解決之前傳統數據庫構建、數據分析平臺建設這塊的一些瓶頸,當時更多隻是把這些平臺作爲底座更好地支撐業務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"來字節跳動之後,這些開源的生態已經比較成熟了。我們更多是怎樣體系化地建設數據平臺,在技術平臺的基礎之上,更多地構建數據分析的其他能力。當然,字節跳動的數據量後期增速很大,本身底層分析引擎等方面的挑戰也比較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您團隊負責的數據應用產品,與前段時間字節對外開放的火山引擎數據中臺產品,二者之間的關係應該怎麼理解?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我主要負責數據應用相關產品,跟火山引擎的數據中臺其實是上下游的依賴關係。中臺更多是把數據整理好加工好,形成相對規範的數據體系。數據應用的話更多考慮的是在數據體系上怎樣把更多的數據能力賦能給業務線,比如各種分析能力、AB實驗能力、行爲分析能力和可視化能力等等。二者是一個比較密切的協同關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:數據應用產品迭代的節奏和流程是怎樣的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們基本上採用敏捷開發,一個迭代週期可能是兩到三週,每個產品會不太一樣,整體來說是小步快跑的節奏,快速把客戶的需求轉化成產品能力,然後提供給用戶去使用。這裏麪包括測試環節、活動環節都需要把控,整個有一套相對完善的需求管理和研發管控的系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:能否以一個數據應用產品爲例,爲我們拆解一下背後的整體技術棧和架構是什麼樣的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我以AB實驗平臺爲例,簡單介紹一下我們整體的技術棧和架構。AB實驗平臺整個產品的技術架構包括指標建設模塊、數據分流模塊等,以及底層的查詢引擎能力。指標建設模塊負責數據的接入和清洗,包括整個AB實驗平臺數據體系的建設。數據分流模塊模塊主要是根據不同用戶實時決定用戶屬於的實驗組。最底層的查詢引擎是我們的核心,主要負責保證整個交互式查詢的能力,這裏面還有一些增強分析的子模塊等等。整個是以容器化部署的,編程語言的話包括Python、Go這些都有用到。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"ClickHouse應用實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:ClickHouse其實在16年就已經開源了,但似乎直到去年熱度和關注度才一下子變得特別高,這是爲什麼呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其實一個開源技術從開源到逐步成熟、被業內廣泛採用,本來就需要一個過程。另外,如果有一些大公司逐步在使用這個技術的話,也有助於更好地推動這項技術在業內被普遍採用。應該說字節跳動內部的ClickHouse應用實踐,對於ClickHouse在業內更大範圍的使用也起到比較大的推動作用。很多公司都跟我們交流過ClickHouse的使用情況,包括技術改進、技術引進路線等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另外,從本質上來說ClickHouse確實解決了一些特定場景和業務上存在的比較大的痛點。數據分析之前大家更多是困在數據量,很少能得到相對明細數據的分析,而ClickHouse強大的分析能力剛好解決了這一痛點。這其實也反映了大家對數據更細粒度的分析需求的持續拓展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:據瞭解,ClickHouse在字節應用還比較多。能否基於您負責的團隊和產品,介紹一下ClickHouse主要應用於哪些業務場景?第一個採用ClickHouse的業務場景是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ClickHouse在字節的應用場景比較多,比如我負責的數據應用平臺,基本上很多底層技術都非常多地依賴ClickHouse提供的能力,比如BI分析能力、AB實驗的分析能力、行爲分析能力等等,包括商業化層面的廣告效果分析,也都是依賴ClickHouse的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:在選用ClickHouse之前你們做了哪些技術選型工作?爲什麼上述業務場景選擇採用ClickHouse而不是其他數據分析技術?主要看重ClickHouse的哪些特性?相對應可以解決業務場景中的什麼問題?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其實在選ClickHouse之前,我們也做了比較多的技術選型工作。當時我們有一個相對比較有挑戰的技術場景,是要基於很多明細數據做行爲分析,這一塊我們研究了挺長時間,當時也試用了Presto、Kylin等等各種各樣的分析技術,最後選擇了ClickHouse。主要是ClickHouse在相對固定的一個Panel場景下,查詢能力確實有比較明顯的優勢,而且本身它是不會損失靈活性的,像Kylin的話其實靈活性會比較差,只要做一點修改就需要重刷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另外我們其實也調研過Druid等,但使用起來跟ClickHouse還是有比較大差異的。我們本身選ClickHouse,還有一個比較大的原因是ClickHouse本身Engine是相對簡單的,因爲它Engine的執行引擎寫得比較高效,它帶來的向量化執行等等這些特性對我們場景化分析的價值還是比較大的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:從最初採用到現在,技術方案迭代過嗎?團隊對基於ClickHouse開源版本做了哪些改進和優化?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ClickHouse是本身開源版本,我們也會持續進行迭代和優化,還是做了不少工作的。比如說ClickHouse的單機用戶規模原始是受限的,我們做到了大概幾千臺的單機用戶規模,這裏面就做了大量的優化。對於它本身查詢能力層面、性能層面,我們也做了比較多的優化,包括特殊的像那些比較複雜的路徑轉換等等一系列分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另外我們也做了ClickHouse的雲原生改造,本身它只支持Local部署的模式,我們做到了存儲計算分離,就能比較容易地基於容器去調動算力,這些方面也做了很多事情。另外ClickHouse不支持事務、實時寫入能力,包括對Update的支持,這塊我們都做了比較多的改進."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們整體來說還是按照雲原生和相對完整的一個數據庫去推進這個演進,包括對相對複雜SQL能力的支持、優化器能力的補足,這塊都有投入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:在使用ClickHouse的過程中,你們都遇到過哪些問題?是否有一些解決的經驗可以借鑑?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們使用ClickHouse算比較早的,中間遇到的問題比較多,踩了不少坑,但是現在來看的話,其實ClickHouse本身開源也在逐步成熟,很多問題也在逐步完善。至於有哪些經驗可借鑑,我覺得可能有幾個點拿出來跟大家分享一下。首先ClickHouse本身運維管控是比較弱的,所以我們內部自己搭建了一套相對完善的運維管控系統,以保證ClickHouse的穩定性,包括故障節點的停換等等一系列事情。另外ClickHouse在對外數據攝入這一方面其實也不算特別完善,這塊我們也做了比較多事情,還有包括實時能力等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"對大數據分析技術的觀察"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:能否談談過去1-3年,您對於大數據分析技術的觀察?有哪些比較重要的變化和趨勢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"過去三年大數據分析技術發展還是挺快的,尤其業內也有比較多的開源技術出現,像ClickHouse這樣的技術。另外業內雲原生數據分析公司(如Snowflake)的成功,也在大力推動技術的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"回到技術本身,大家其實可以看到越來越多的雲原生能力,包括AI支持和數據分析、數據庫和數據倉的結合、湖倉一體、批流一體等等,技術一直在持續推進。未來我認爲數據分析能力會持續加強,包括數據分析技術的多樣性、整個架構Layer Out、存儲計算分離等等,都是比較大的發展趨勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:基於實時數據流的Kappa架構現在越來越多企業開始嘗試。字節的大數據架構中,目前是Lambda架構和Kappa架構共存嗎?如果是,兩者分別用在哪些場景?如果還只有Lambda架構,那爲什麼還沒有引入Kappa架構?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"郭東東:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"目前在我們公司內部這兩種架構都是存在的,每一種架構都有不同的使用場景。Lambda架構本身離線和實時是分開的,在我們內部更多用於一些數據量比較大且整體有一些比較複雜的策略的場景,比如反作弊等策略,實時很難做得很準確,就需要把離線和實時分開,離線先提供一份數據,然後實時進一步修正這個數據,保證數據是可用的且準確性更高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但有些場景其實我們也直接採用Kappa架構,尤其數據湖這些技術在內部的廣泛使用,保證了實時的分析能力跟離線也差不了太多,類似這種場景我們就會把實時和離線整合起來,就只用一套,保證實時產出的數據就是我們最終需要的數據。我們只有在出現比較大的數據口徑調整,或者其他事故的時候,纔會跑離線任務去修正,默認的話就是一套。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"採訪嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"郭東東,字節跳動數據平臺數據應用研發負責人,負責數據應用相關產品的研發,具體包括 AB 實驗平臺、行爲分析系統、智能 BI 洞察系統相關產品等,支撐內部的抖音、今日頭條等核心業務線。曾經任職於奇虎360,負責大數據平臺相關建設,有 10 年的大數據平臺以及應用架構經驗,對 OLAP、大數據實時&離線處理技術有比較深入的瞭解,熟悉 ClickHouse、Spark、Presto 等主流的大數據處理技術。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章