數據庫內核雜談(二十一): 流處理系統簡介

原創

2021-11-24 10:38

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然咱們是數據庫內核博客，但流式處理系統已經成爲數據系統的主流之一，並且提供了類似於SQL的接口，現在也有流批一體的趨勢（我個人覺得，還得觀察一下，因爲畢竟數據源方式不同，服務的應用也不同，使用一套系統，感覺很難魚和熊掌兼得）。這一期，咱們聊一聊流處理系統。內容源於 Facebook 2016 年 "},{"type":"link","attrs":{"href":"https:\/\/sigmod.org\/","title":"xxx","type":null},"content":[{"type":"text","text":"SIGMOD"}]},{"type":"text","text":" 上發表的一篇文章，標題就叫做"},{"type":"link","attrs":{"href":"https:\/\/blog.acolyer.org\/2016\/07\/11\/realtime-data-processing-at-facebook\/","title":"xxx","type":null},"content":[{"type":"text","text":"《Realtime Data Processing at Facebook (Meta)》"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先，爲什麼需要流處理系統，因爲有低延時的應用需求：如實時數據分析，如性能指標，error指標；推薦系統，爲了取得最好的推薦結果，希望可以採集到某些實時的特徵等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這篇文章首先討論了一下流處理（或者叫實時數據處理）系統的5個重要屬性。分別是："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）易用性：程序員如何聲明流處理的邏輯，SQL語言（或者類SQL語言）支持是否已經足夠；還是要支持general purpose的處理邏輯，比如可以讓程序員用C++或者Java語言來實現處理邏輯然後交給系統執行（類似於map-reduce）。從聲明，測試到發佈，整個生命週期需要多長時間？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）性能：性能一般指延遲和吞吐量（throughput）需求。延遲是毫秒級別，秒級別，或者是分鐘級別？吞吐量需要多高？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3）健壯性（fault-tolerance）：系統能夠支持什麼級別的崩潰恢復？對於數據處理，能提供什麼樣的service level agreement ,是至少一次，至多一次，還是保證一次？如果某個task崩潰了，如何恢復in-memory的狀態，等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4）擴展性（scalability）：數據處理是否能被shard或者reshard來提高吞吐量？系統是否能動態地伸縮（elasticity）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5）正確性：是否提供類似數據庫的ACID保證？是否會有數據丟失（這點和上面的健壯性有重疊）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Facebook在設計流系統時的決策是基於這個前提：秒級別的延遲和幾百GB\/s吞吐量需求（a few seconds of latency with hundreds of GB\/s throughput）。在這個前提下，不同的批處理過程可以通過一個persistent的message bus系統（Scribe，類似於Kafka）相連來傳輸數據。異構數據傳輸和數據處理，能夠使得整個系統更好地處理上述提到的這些屬性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Facebook(Meta)流處理系統簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個Facebook流處理生態提供了3個不同的系統。結合下面這張數據流圖，依次來介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c0\/e0\/c0b1312cf38e952yy4c93260a61151e0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據從mobile端或者服務器（web）端產生，首先以log形式記錄到Scribe（上文提到的persistent的message bus系統）。流系統Puma，Stylus和Swift可以從Scribe中讀取數據，執行數據處理，再寫回Scribe。以這種方式，三個系統結合Scribe可以組成複雜的數據處理DAG。最終，處理完的數據通過Scribe寫入Laser，Scuba和Hive三類Data stores。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Scribe"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Scribe是一個非常scalable，基於persistent store（文件系統）的"},{"type":"link","attrs":{"href":"https:\/\/www.cnblogs.com\/svenzhang9527\/p\/7354684.html","title":"xxx","type":null},"content":[{"type":"text","text":"message bus"}]},{"type":"text","text":"系統，類似開源的Kafka系統。數據以一個個category（Kafka中的術語叫topic）的形式存在，每個category可以shard成多個bucket來提高吞吐量。bucket是流處理系統的基本單元。Scribe將數據存儲在HDFS上，通常retention可以到幾天。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Puma"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Puma提供了類SQL的語法並支持用Java語言寫可擴展的UDF（user defined functions）。Puma的優勢在於開發流程非常快（因爲提供了類SQL語法），整個聲明週期可以在小時級別完成。Puma可以非常高效地完成簡單的類SQL的聚合操作。文中給出了一個簡單示例，在5分鐘的sliding window中計算topK events。Puma的簡易code如下，即使從來沒接觸過Puma語法，相信理解下面的內容也不困難。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/22\/1b\/22efc757b0bef09a61c7ef8d05f09b1b.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Puma的另一個優勢是對於簡單的filtering邏輯，比如只選取某些相關的數據，可以提供秒級別的延遲（這些處理後的數據可以馬上被寫入到另一個scribe category）。和傳統數據庫不同，Puma選擇更好地支持那些被長期運行的app而不是ad-hoc analytics，因此它可以通過code generation來生成優化的處理代碼。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Swift"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（插一句題外話，在讀這篇paper前，我都不知道有這個系統，其實讀完簡介，我依然是雲裏霧裏）。Swift只提供了非常簡單的API：從某個scribe中讀取N個string或者bytes，然後週而復始。如果在處理某個checkpoint的時候app crash了，可以接着從當前checkpoint重來。Swfit通常用於非常低吞吐量，且無狀態的數據處理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Stylus"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Stylus是一個通用的流處理系統，語言是C++。它提供的API和開源的流處理系統如Storm，Samza，Millwheel類似，它分別支出無狀態和有狀態的流處理。因爲實現語言是C++，因此Stylus不僅支持各種操作（包括讀取外部系統獲取信息），性能也非常高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"咱們也快速介紹一下data store，這些系統可以從Scribe導入數據，但不再支持導出到Scribe，而是通過自身的API對外提供數據服務。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Laser"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Laser是一個高吞吐量，低延遲的key-value存儲，它可以通過Scribe導入數據，之後這些數據就可以被其他應用訪問，包括Stylus，Swift和Puma。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Scuba"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Scuba可以看成一個高性能，但支持單個table的in-memory數據庫。它可以支持非常低延時的數據導入，然後通過類SQL（但是隻能查詢單個table）或者UI操作來查詢數據，查詢也在毫秒級別完成。因此Scuba廣泛應用在各種性能，監控， debug指標中。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Hive data warehouse"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hive data warehouse就省略了，大家都懂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹完了所有系統，再通過一個簡單的例子來梳理一下。文中給出了下面這個示例：從event流裏找出最熱的event topic（通過將event count進行高到低排序），輸入event流有event的基本信息如event timestamp，event type，dimension_id（用來獲取相關dimension信息）event text等，輸出就是每個topic的TopK events。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bf\/32\/bfd61a5f3f9b40f88f9ef59737110f32.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）Filterer：可以過濾掉不符合規定的信息，並且將event流重新以event的dimension_id作爲sharding的形式分發到下游的scribe中（這樣，下游處理可以根據dimension_id來進行並行處理）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）Joiner：Joiner需要根據dimension_id抓取相應的dimension信息，並且調用classification系統來得到event topic。因爲上游的scribe是以dimension_id作爲sharding，因此joiner可以cache相應的dimension信息來減少network bandwidth（有狀態的處理）。處理過的信息以的pair形式發送到下游的Scribe。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3）Scorer：Scorer通過收集一個sliding window裏topic的event count來計算score。由於計算score需要考慮到long-term trend和current count，因此scorer需要存儲long-term trend作爲狀態。最終輸出（shard by topic）到下游。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4）Ranker：最終， Ranker針對每個topic計算出當前sliding window的topK events。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文中有提到，所有的logic都可以用Stylus來實現。不過，Filterer和Ranker可以更快地用Puma實現。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"設計決策"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來纔是本文的重點，文中介紹了5個維度的設計決策。並且討論了這些決策是如何影響文章最開始介紹的流處理系統的5個屬性（易用性，性能，健壯性，擴展性，正確性）。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"編程語言支持(language paradigm)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編程語言支持會影響到易用性和性能。文中介紹了三大類：declarative（聲明式）類似於SQL應該是最易於理解和上手的，缺點在於表達的侷限性；Functional（函數式）將整個application封裝成不同function(operator)的組合，不如SQL那麼容易上手，但提供了更多的控制。最後就是procedural：直接提供C++或者Java等語言接口。Procedural提供了最大的控制同時也在很大程度上能保證性能，缺點就是開發週期更長。這三類各有優缺點，在Facebook內部，Puma實現了declarative，而Stylus實現了procedural。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據傳輸(data transport)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複雜的流處理邏輯通常用DAG表示。如何實現數據從一個節點傳輸到另一個節點，影響到整個流數據的健壯性，性能以及可擴展性，以及一定程度的易用性（尤其是在debugging時）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文中也介紹了三大類："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）direct message transfer:類似於用RPC或者in-mem message queue來直接傳輸數據，這類的好處在於延遲非常低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）broker based：通過引入中間broker來decouple上游和下游。Broker雖然增加了性能負擔，但提高了擴展性，方便scale out。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3）persistent storage based broker：類似Scribe或者Kafka。毋庸置疑，這個方法雖然最heavy，但是帶來了message bus系統所有的好處，解耦，擴容，訂閱分發，持久保存等等。Facebook內部使用第三類，用來提升健壯性，可擴展性，以及易用性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據處理語義(processing semantics)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據處理語義決定了正確性和健壯性。文中也介紹了三大類："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）更新內部狀態：讀取一個event，進行相應處理（如查詢外部系統）然後對in-memory狀態進行更新；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）生成output event：處理完event後，生成一個output event到下游；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3）保存狀態至外部系統，如數據庫：這裏面可以涉及到offset和checkpoint的保存來進行災備恢復。如果是無狀態的節點，只能選擇生成output event，有狀態的節點三者都可能涉及。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於event處理的正確性，如果選擇at least once（至少一次），節點應該選擇先保存in-memory state，再更新offset；如果選擇 at most once（至多一次）：節點應該選擇先保存offset，再更新in-memory state；如果選擇exactly once（強一致）：必須保證原子更新，如利用transaction機制。在介紹的系統中，Puma選擇了at least once，而Scuba選擇了at most once。因爲Scuba本身就自帶sampling，而且查詢，爲了追求效率是best effort，因此，少量的數據丟失是可以接受的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"狀態保存方式(state-saving mechanism)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於有狀態的處理節點，如何保存狀態。文中介紹了下面這幾類："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）複製到其他節點；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）本地數據庫或文件存儲；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3）遠程數據庫或文件存儲；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4）依賴上游節點存儲；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5）全局snapshot存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在介紹的系統中，Stylus提供了本地數據庫和遠程數據庫的狀態存儲。本地存儲的優勢是減少帶寬，程序崩潰恢復也快。而遠程存儲則可以應對硬件級別的機器故障（需要重新provision一個新node，再將狀態導入）。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"重複處理機制(reprocessing\/backfill mechanism)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於某些特定應用場景，我們會需要重新處理一些舊數據。如引入了一個新的流處理邏輯，需要用一段過去的數據來測試；引入新指標，需要重新運行數據來獲取這個指標。要處理舊數據，需要以下這些機制：1）stream的數據保留的retention足夠長，比如在Scribe中設置更長的retention；2）使得流處理系統可以處理data warehouse的數據（batch處理）。Facebook系統中Scribe的retention通常不能很久，通常幾天。因此，需要使得流系統對接data warehouse來處理，通過引入tailer。Backfill機制會影響系統的易用性，可擴展性和正確性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"總結一下，這期，我們通過介紹Facebook內部的流處理系統生態，討論了流處理系統中5個維度的設計決策，以及它們對流處理系統5個關鍵屬性的影響（下圖展示了不同維度的設計決策分別會影響哪些屬性，以供參考）。感覺閱讀！"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/94\/68\/9467bd6e8e19c1a37dab16a542ab3868.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

文檔圖像大模型

隨着信息技術的快速發展，文檔處理已經成爲日常生活和工作中不可或缺的一部分。傳統的文檔處理方法往往需要人工參與，效率低下且易出錯。近年來，隨着深度學習技術的突破，文檔圖像大模型在智能文檔處理領域嶄露頭角，爲提升文檔處理性能提供了新的解決方案。

2024-04-18 11:29:52

GaussDB(DWS)基於Flink的實時數倉構建

本文分享自華爲雲社區《GaussDB(DWS)基於Flink的實時數倉構建》，作者：胡辣湯。大數據時代，廠商對實時數據分析的訴求越來越強烈，數據分析時效從T+1時效趨向於T+0時效，爲了給客戶提供極速分析查詢能力，華爲雲數倉GaussDB

2024-04-18 10:32:57

五一假期暢遊指南：Python技術構建的熱門景點分析系統解讀

導言五一假期即將到來，作爲一名熱愛旅遊的技術達人，我總是希望能夠通過技術手段更好地規劃我的旅行路線。在這篇文章中，我將向大家介紹一款基於Python技術的熱門景點分析系統，幫助您在五一假期中游玩得更加盡興！ 1. 系統概述熱門景點

2024-04-16 23:25:46

裁員了！別錯過2024年大數據工程師必備的10項技能

在當今快速發展的世界中，數據被視爲新的石油。隨着對數據驅動洞察的日益依賴，大數據工程師的角色比以往任何時候都更爲關鍵。這些專業人員在管理和優化組織內的數據操作中扮演着至關重要的角色。在本文中，我們將探索2024年大數據工程師必須具備的十

2024-04-16 11:00:53

還在擔心報表不好做？不用怕，試試這個方法（四）

系列文章：《還在擔心報表不好做？不用怕，試試這個方法》（一）《還在擔心報表不好做？不用怕，試試這個方法》（二）《還在擔心報表不好做？不用怕，試試這個方法》（三）概要在上一篇文章《還在擔心報表不好做？不用怕，試試這個方法》（三）中，

2024-04-16 10:23:03

MaxCompute 近實時增全量處理一體化新架構和使用場景介紹

隨着當前數據處理業務場景日趨複雜，對於大數據處理平臺基礎架構的能力要求也越來越高，既要求數據湖的大存儲能力，也要求具備海量數據高效批處理能力，同時還可能對延時敏感的近實時鏈路有強需求，本文主要介紹基於 MaxCompute 的離線近實時一體

2024-04-15 23:41:52

普元信息顧偉：用更簡單的方式來建設數據中臺

近日，普元信息與鏡舟科技聯合舉辦“數據中臺新範式”雲端峯會，深入解析湖倉一體、批流一體、治理與運營一體的數據中臺新範式特徵，闡述以一站式聯合方案賦能企業提質增效的實踐經驗。普元信息數智研究院院長顧偉發表主旨演講《基於湖倉一體，構建開發

2024-04-12 11:43:03

Sql優化之回表

前言： MySQL的性能是大家在使用時十分關心的問題，比如在高併發訪問時，並且有慢sql存在的情況下，MySQL的性能會明顯下降，這會導致數據庫響應時間變慢，甚至導致數據庫宕機。那麼爲了避免Mysql性能問題，比較常用的方式創建適當的索引

2024-04-08 23:16:30

Ascend C 自定義PRelu算子

本文分享自華爲雲社區《Ascend C 自定義PRelu算子》，作者： jackwangcumt。 1 PRelu算子概述 PReLU是 Parametric Rectified Linear Unit的縮寫，首次由何凱明團隊提出，和Le

2024-04-08 10:33:15

Notion 開源替代品 AFFINE 部署和使用教程

AFFiNE 是一款完全開源的 Notion + Miro 替代品，與 Notion 相比，AFFiNE 更注重隱私安全，優先將筆記內容保存到本地。 GitHub 地址：https://github.com/toeverything/AFF

2024-04-07 21:14:35

夯實智慧新能源數據底座，TiDB Serverless 在 Sandisolar+ 的應用實踐

本文介紹了 SandiSolar+通過 TiDB Serverless 構建智慧新能源數據底座的思路與實踐。作爲一家致力於爲全球提供清潔電力解決方案的新能源企業，SandiSolar+面臨着處理大量實時數據的挑戰。爲了應對這一問題，Sand

2024-04-06 22:23:36

ClickHouse 數據一致性保障的常用解決方案

在ClickHouse中，數據一致性是通過Mergetree引擎實現的。Mergetree引擎採用最終一致性的解決方案，即系統保證數據在最終狀態上是一致的，但在數據寫入過程中可能會存在短暫的不一致狀態。爲了保障數據一致性，ClickHous

2024-04-03 23:23:44

技術引領，策略升級：騰訊雲與你共探數字金融新篇章

引言 2024 年 3 月 27 日下午，在北京騰訊總部，一場關於大模型與數據要素時代數字金融發展的深入討論火熱進行中。【TVP 走進騰訊：大模型與數據要素時代的數字金融發展論壇】是在騰訊二十年發展歷程和數字化實踐的基礎上，進一步探索

2024-04-03 23:09:31

七天入門RAG應用開發！鵝廠大牛手把手帶你實踐

2024年，大模型發展的腳步持續加快，相信你一定對 RAG（檢索增強生成）有所耳聞，隨着大模型的快速發展，RAG 作爲一種新興的開發範式，能有效解決大模型的幻覺和知識停滯的問題，並已成爲企業構建智能問答應用的最佳實踐。 RAG 技術易

2024-04-03 11:10:22

Redis開源協議調整，我們怎麼辦？

本文分享自華爲雲社區《Redis開源協議調整，我們怎麼辦？》，作者：華爲雲PaaS服務小智。 2024年3月20日, Redis官方宣佈，從 Redis 7.4版本開始，Redis將獲得源可用許可證 ( RSALv2 ) 和服務器端公共許可

2024-04-02 10:32:23

24小時熱門文章

最新文章

最新評論文章