從Lambda到無Lambda,領英吸取到的教訓

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lambda架構已經成爲一種流行的架構風格,它通過使用批處理和流式處理的混合方法來保證數據處理的速度和準確性。但它也有一些缺點,比如額外的複雜性和開發\/運維開銷。LinkedIn高級會員有一個功能,就是可以查看誰瀏覽過你的個人資料(Who Viewed Your Profile,WVYP),這個功能曾在一段時間內採用了Lambda架構。支持這一功能的後端系統在過去的幾年中經歷了幾次架構迭代:從Kafka客戶端處理單個Kafka主題開始,最終演變爲具有更復雜處理邏輯的Lambda架構。然而,爲了追求更快的產品迭代和更低的運維開銷,我們最近把它變成無Lambda的。在這篇文章中,我們將分享一些在採用Lambda架構時的經驗教訓、過渡到無Lambda時所做的決定,以及經歷這個過渡所必需的轉換工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"這個系統是如何運作的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WVYP系統依靠一些不同的輸入源向會員提供最近瀏覽過其個人資料的記錄:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"捕獲瀏覽信息並進行除重;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算瀏覽源(例如,通過搜索、資料頁面瀏覽等);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瀏覽相關性(例如,一位高級人員查看了你的資料);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據會員的隱私設置查看模糊信息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖顯示了使用Lambda架構的系統簡化圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/46\/c7\/46202854de89e305b7ba2582byyab9c7.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,我們有一個Kafka客戶端,可以近實時地處理並提供會員資料視圖活動。當一個會員查看另一個會員的個人資料時,會生成一個叫作ProfileVieweEvent的事件,併發送到Kafka主題。處理作業將消費這個ProfileVieweEvent並調用大約10個其他在線服務來獲取額外的信息,如會員概要數據、工作申請信息、會員網絡距離(一度、二度連接)等。然後,該作業將處理後的消息寫入另一個Kafka主題,這個主題的消息將被Pinot(一個分佈式OLAP數據存儲,"},{"type":"link","attrs":{"href":"https:\/\/pinot.apache.org)消費","title":"","type":null},"content":[{"type":"text","text":"https:\/\/pinot.apache.org)消費"}]},{"type":"text","text":"。Pinot將處理後的消息追加到實時中。Pinot可以處理離線和實時數據,所以非常適合被用在這個地方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,還有一組離線的Hadoop MapReduce作業在不同的技術棧中執行上述操作,使用的是ETL過的ProfileViewEvent和上述服務處理過的相應數據集。這些作業每天加載這些數據集,並執行數據轉換操作,如過濾、分組和連接。此外,如上圖所示,離線作業還將處理實時作業不處理的NavigationEvent,這個事件可以告訴我們瀏覽者是如何找到被瀏覽資料的。處理後的數據集被插入到Pinot的離線表中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pinot數據庫負責處理來自實時表和離線表的數據。中間層服務通過查詢Pinot獲取處理過的會員資料信息,並根據前端API的查詢參數(如時間範圍、職業等)對數據進行切片和切塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就實現了Lambda架構:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時作業側重速度,進行不完整信息的快速計算;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop離線作業側重批處理,旨在提高準確性和吞吐量;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pinot數據存儲是服務層,將批處理和實時處理的視圖合併起來。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lambda架構爲我們帶來了很多優勢,這要得益於實時處理的快速和批處理的準確性及可再處理性。然而,這也伴隨着大量的運維開銷。隨着我們不斷迭代產品並增加更多的複雜性,是時候做出改變了。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"我們的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,Lambda架構帶來了飽受詬病的運維開銷,違反了“不要重複你自己”(DRY)原則。更具體地說,WVYP系統面臨以下幾個挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"開發人員必須構建、部署和維護兩個管道,這兩個管道產生數據大部分是相同的;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"這兩個處理管道需要在業務邏輯方面保持同步。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述的兩個挑戰佔用了開發人員大量的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"導致系統的演化有很多不同的原因,包括特性增強、bug修復、合規性或安全性的變更、數據遷移等。WYVP的所有這些變更都需要付出雙倍的成本,部分原因是因爲Lambda架構。更糟糕的是,Lambda架構還帶來了額外的問題,因爲我們是基於兩個不同的技術棧實現大部分的特性,所以新的bug可能會在批處理或實時處理中出現。此外,隨着LinkedIn工具和技術棧的不斷演化,我們需要不斷地跟進,以便能夠保持在最新狀態。例如,在最近的一次GEO位置數據遷移過程中,我們發現了一些不必要的複雜性。Lambda架構的分層帶來了運維上的負擔。例如,實時作業在處理消息是會出現延遲,離線作業有時會失敗——這兩種情況我們都太熟悉了。最終我們發現,這種開銷是不值得的,因爲它顯著降低了開發速度。因此,我們開始努力重新改造WVYP的Lambda架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"無Lambda架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們開始簡化架構,移除全部離線批處理作業,並使用Samza開發新的實時消息處理器。我們之所以選擇移除離線作業並保留實時處理,主要原因是產品需要近實時的會員資料瀏覽通知。批處理更適合用在其他一些場景中,例如在A\/B測試中計算業務指標影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新架構如下圖所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1b\/ca\/1b7e52f844e220216a3bd57841fffcca.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新架構的兩個主要變化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建了一個新的Samza作業,用來消費ProfileVieweEvent和NavigationEvent,舊的消費者客戶端只消費ProfileVieweEvent。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有的離線作業都被移除,並創建了一個單獨的作業,我們稍後將討論這個作業。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Samza作業"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Samza最初由LinkedIn開發,是LinkedIn的分佈式流式處理服務,現在是Apache的一個項目。我們選擇將現有的實時處理器作業遷移到Samza有很多原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,Samza支持各種編程模型,包括Beam編程模型。Samza實現了Beam API("},{"type":"link","attrs":{"href":"https:\/\/beam.apache.org","title":"","type":null},"content":[{"type":"text","text":"https:\/\/beam.apache.org"}]},{"type":"text","text":"):我們可以用它輕鬆地創建數據處理單元管道,包括過濾、轉換、連接等。例如,在我們的例子中,我們可以很容易地加入PageVieweEvent和NavigationEvent,近乎實時地計算出視圖的來源——這在舊處理器中是不容易做到的。其次,在LinkedIn部署和維護Samza作業非常簡單,因爲它們運行在由Samza團隊維護的YARN集羣上。開發團隊仍然需要處理伸縮、性能等問題,但在定期維護方面確實有很大幫助(例如,不需要擔心機器發生故障)。最後,Samza與LinkedIn的其他工具和環境進行了很好的集成。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"新的離線作業"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些人可能會問,爲什麼我們仍然在無Lambda架構使用離線作業。事實上,從架構轉換的角度來看,這並不是必要的。但是,如上圖所示,離線作業會讀取HDFS裏經過ETL的數據,這些數據是由Samza作業通過Kafka主題間接產生的。離線作業的唯一目的是將所有寫入Pinot實時表的數據複製到離線表。這樣做有兩個原因:1)由於數據的組織方式,離線表有更好的性能(離線表的數據段比實時表要少得多,查詢速度更快)。2)處理過的視圖數據將保留90天,而實時表只保留幾天的數據,並通過自動數據清除功能進行清除。新離線作業與舊離線作業的一個關鍵區別是,新作業在處理邏輯上與實時作業沒有重疊,它沒有實現Samza作業中已經實現的邏輯。當Pinot能夠自動支持從實時表到離線表的文件整合時,我們就可以移除這個作業。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"消息再處理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"天底下沒有無bug的軟件,一切事物仍然會以不同的方式出錯。對於WVYP,使用錯誤的邏輯處理過的事件會一直保留在數據庫中,直到被重新處理和修復。此外,一些意想不到的問題會在系統可控範圍之外發生(例如,數據源被破壞)。批處理的一個重要作用是進行再處理。如果作業失敗,它可以重新運行,並生成相同的數據。如果源數據被損壞,它可以重新處理數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在進行流式處理時,這個會更具挑戰性,特別是當處理過程依賴其他有狀態的在線服務提供額外的數據時。消息處理變成非冪等的。WVYP在狀態方面依賴在線服務,在消息被處理時需要向會員發送通知(但我們不想發送重複的通知)。如果所選擇的數據存儲不支持隨機更新,比如Pinot,那麼我們就需要一個重複數據刪除機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們意識到,要解決這個問題,並沒有什麼靈丹妙藥。我們決定以不同的方式對待每個問題,並使用不同的策略來緩解問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們要對處理過的消息做一些微小的改動,最好的方法是寫一個一次性離線作業,讀取HDFS中已處理的消息(就像新架構中的離線作業那樣),進行必要的處理,再推送到Pinot,覆蓋掉之前的數據文件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果出現重大的處理錯誤,或者Samza作業處理大量事件失敗,我們可以將當前的處理偏移量倒回到前一個位置。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果作業只在某段時間內降級,例如視圖相關性的計算失敗,我們將跳過某些視圖。對於這種情況,系統將在這段時間內降低容量。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"去重"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重複處理髮生在各種場景中。一個是上面提到的,我們顯式地想要重新處理數據。另一個是Samza固有的,爲了確保消息的至少一次處理。當Samza容器重新啓動時,它可能會再次處理一些消息,因爲它讀取的檢查點可能不是它處理的最後一條消息。我們可以在兩個地方解決去重問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務層:當中間層服務從Pinot表中讀取數據時,它會進行去重,並選擇具有最新處理時間的視圖。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通知層:通知基礎設施確保我們不會在指定的時間段內向會員發送重複的通知。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"價值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lambda架構已經存在了很多年,並得到了相當多的讚揚和批評。在遷移WVYP的過程中,我們獲得了以下這些好處:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過將大部分的開發時間減半,開發速度得到了顯著的提升,維護開銷也減少了一半以上(實時流程的維護開銷比批處理流程少)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提升了會員的用戶體驗。現在在開發過程中引入錯誤的可能性降低了。我們也有了更好的實時計算(例如,視圖源的快速計算,這在以前是不可用的),可以更快地爲會員提供WVYP信息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過釋放開發人員的時間,我們現在能夠進行更快的迭代,並將精力放到其他地方。在這篇文章中,我們分享了WVYP系統的開發、運行和重新改造過程,希望我們的一些收穫能夠幫助那些在使用Lambda架構時面臨類似問題的人做出更好的決策。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/engineering.linkedin.com\/blog\/2020\/lambda-to-lambda-less-architecture","title":"","type":null},"content":[{"type":"text","text":"https:\/\/engineering.linkedin.com\/blog\/2020\/lambda-to-lambda-less-architecture"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章