我是如何在AWS Lambda中用幾分鐘處理50萬個事務的?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"本文最初發表於 "},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Dev Genius "},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"博客,經原作者 Mohammed Lutfalla 授權,InfoQ 中文站翻譯並分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據處理是一項密集型任務,尤其是對於計算單元,因爲讀寫操作需要大量的資源。如果你有合適的工具,你可以很容易地實現這項任務。比如,我通過 AWS Lambda,在幾分鐘內就處理了 50 萬個事務。通過本文,我將向你們分享我是如何做到這一點的以及我的經驗。這個過程非常簡單,同時也非常複雜。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"幾年前,我的經理告訴我要考慮一個處理架構,可以處理大量的記錄,但不是那麼繁重的操作。比如 80 萬行的數據,有 16 列,每行需要做的工作量並不複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我處理過許多問題,例如如何處理 Lambda 中的有限資源,如何處理由於超時和操作系統錯誤而丟失的問題。一位在 AWS 中東(巴林)區域擔任高級顧問的朋友,幫助我獲得了實現這一有希望的想法的工具。在處理 AWS 資源時,這是我獲得的最好的體驗之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"閒話少敘,讓我們來準備一張圖示吧。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解決方案圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c7\/47\/c78fc1029a0a2254cyy5d564a540f847.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"圖示看起來很可怕?其實並不可怕,相信我。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"請允許我將其分爲幾個步驟來解釋:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.啓動流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於我採用的是無服務器架構,這意味着事件驅動事件,如果有事件發生,事件將根據它採取操作,並在流程結束之前觸發另一個操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在我們的例子中,就是 S3 Put 請求。文件上載到 S3 後,它就會把文件放到一個桶中,當文件完全上傳到桶中時,S3 有效負載將觸發 Lambda。我們的第一步剛剛完成。接下來要做什麼?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.數據清洗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於我們有一個 csv 文件,一些列和行,可能包含空格,以及一些可能損壞代碼的特殊字符。所以,需要把它清洗一下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對這些記錄進行清洗,以便插入。因爲我們有許多記錄,並且該函數可能會失敗,那麼如何跟蹤添加的內容和剩餘的內容?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.將清洗後的數據添加到隊列中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們將清洗後的記錄添加到一個隊列中。這是因爲你要跟蹤哪些已經被添加,哪些沒有被添加。本質上,SQS 將充當一個組織者。這會把小批量記錄發送給 Lambda,Lambda 將其添加到 DynamoDB 中,然後它會將成功消息返回到 SQS,並從隊列中刪除。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如果記錄失敗, SQS 會嘗試重試 3 次,根據我的配置,嘗試插入操作。若三次嘗試均失敗,則將其移至死信隊列(Dead Letter Queue,DLQ),這是另外一個 SQS 隊列,其中包含失敗記錄。這樣,你就可以調試這些記錄爲何從未進入到 DynamoDB,並且可以再次處理或者拒絕它。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.DynamoDB"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由於我們要處理大量的數據,所以需要某種能處理極端負載或記錄的數據庫。DynamoDB 解決了這個問題。由於寫入能夠處理每單位 1 kb 的數據,所以在如何處理記錄的數量和如何處理有限的讀寫吞吐量方面進行了大量的實驗。因此,DynamoDB 的按需服務解決了這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"根據 AWS 文檔,在無法預測工作負載時,可以選擇按需使用 DynamoDB 吞吐量。因爲它將準備最大的吞吐量,以備不時之需。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"將記錄從 csv 轉移到 DynamoDB,然後呢?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.Stream 記錄到 SQS"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於 Lambda 來說,DynamoDB 是非常好的事件執行器。啓用 Stream 時,你可以指定一個 Lambda 函數,以響應從該函數傳遞的有效負載。優點在於,你需要根據記錄的類型來採取操作。現在我們要處理的是新添加的記錄。因此,在驗證標記時,我們將記錄添加到另一個 SQS 中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這個隊列的原因是記錄傳遞的一致性。你只需要添加一次記錄,捕獲它,並將其添加到隊列中,這樣你就可以處理它。若不能,則必須掃描該表,獲取未處理的記錄,並進行處理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.處理數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通過這個架構,我們達到了記錄生命週期的最新階段。當它到達 Process 隊列時,它批量地傳遞記錄,對其進行處理,然後將其傳遞到另一個隊列。就像我前面所闡述的,爲了保持一致性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7.更新已處理的記錄"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最後,從 Finished 隊列獲得記錄,並將其傳遞給 Lambda 函數,它將用已處理的信息更新記錄。若無法傳遞記錄, DLQ 將收集這些記錄,以便進一步調試和操作。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上述這些操作看起來簡單明瞭,但事實並非如此。以下是遇到的一些問題,以及相應的解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.Lambda Lambda Lambda"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Lambda 是這裏的關鍵角色,我們需要在有限的時間內執行代碼邏輯。怎樣確保記錄是紅色的、清楚的,並添加到隊列中?雖然很難,但是你需要的是速度。做法就是在 Python 上編寫程序,並使用 Multiprocessing 庫進行加速。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在 Lambda 中,我使用了 Multiprocessing Process 函數,以使用每個可能的處理單元。這一操作使我的進程在 1 分鐘 30 秒內清洗了 55.8 萬個事務(在某些測試中)。這個速度真快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Lambda 在分配最大內存時能處理大約 500 個進程,其他任何進程都會觸發 “OS Error 38: Too many files open”。爲何要面對這個問題?因爲我加入了所有運行中的進程,但它並沒有關閉已完成的進程。因此,我運行一般的批處理,然後循環使用正在運行的進程,要是完成了,我就會強制關閉。事情解決了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.關注 CloudWatch"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我犯了一個大錯誤,甚至當我運行大量處理時,我還是將事件變量傳遞到 CloudWatch 中。由於做了大量測試,所以這導致寫到 6.6TB 的數據。CloudWatch put 日誌操作的成本很高,因此請謹慎使用它。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.DynamoDB 按需服務是關鍵詞"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在準備 DynamoDB 時,我首先對預測的工作負載進行配置,將讀寫吞吐量設置爲 5。問題是,在 55.8 萬條記錄中,只有 1 千條插入了我的表格。雖然我將其提高到 100 的吞吐量,但是仍然有至少 60% 的文件丟失而沒有添加!之後,我重新閱讀了文檔,注意到 DynamoDB 按需服務是針對不可預測的負載的解決方案。於是,我很快在 5 分鐘內添加了 55.8 萬條記錄。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.SQS 可能很棘手"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"SQS 是一項很棒的服務,有很多選擇和機會。不過,你需要知道你每次傳遞的批處理的大小是多少,以及你對批處理完成的預測時間。這是因爲在你告訴 SQS 等待 x 秒之後,該批處理可能會被多次處理,然後再一次進行處理。理解你的代碼和數據,測試,測試,再測試,然後配置大量的工作負載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這幾個觀點是我最關心的問題,S3 很有趣,但並沒有我想像中的那麼複雜。但是主要的問題是,這值得嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"生活中的一切都取決於條件,如果你不想管理實例,或者你想用最小的努力來管理實例,那麼這種情況對你來說是有效的。請記住,調試這些用例可能會有些麻煩,因爲它們是相關的,而且一個步驟中的錯誤會影響接下來的步驟。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"注意安全!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Mohammed Lutfalla,着眼於未來,認爲技術是他靈魂的核心。足球愛好者。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"https:\/\/blog.devgenius.io\/how-did-i-processed-half-a-million-transactions-in-aws-lambda-within-minutes-120c69d37ce5"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章