通過優化S3讀取來提高效率和減少運行時間

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將介紹一種提升S3讀取吞吐量的新方法,我們使用這種方法提高了生產作業的效率。結果非常令人鼓舞。單獨的基準測試顯示,S3讀取吞吐量提高了12倍(從21MB\/s提高到269MB\/s)。吞吐量提高可以縮短生產作業的運行時間。這樣一來,我們的vcore-hours減少了22%,memory-hours減少了23%,典型生產作業的運行時間也有類似的下降。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然我們對結果很滿意,但我們將來還會繼續探索其他的改進方式。文末會有一個簡短的說明。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"動機"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們每天要處理保存在Amazon S3上的數以PB計的數據。如果我們看下MapReduce\/Cascading\/Scalding作業的相關指標就很容易發現:mapper速度遠低於預期。在大多數情況下,我們觀測到的mapper速度大約是5-7MB\/s。這樣的速度要比aws s3 cp這類命令的吞吐量慢幾個數量級,後者的速度達到200+MB\/s都很常見(在EC2 c5.4xlarge實例上的觀測結果)。如果我們可以提高作業讀取數據的速度,那麼作業就可以更快的完成,爲我們節省相當多的處理時間和金錢。鑑於處理成本很高,節省的時間和金錢可以迅速增加到一個可觀的數量。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"S3讀取優化"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"問題:S3A吞吐量瓶頸"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們看下S3AInputStream的實現,很容易就可以看出,以下幾個方面可以做些改進:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}},{"type":"strong"}],"text":"單線程讀"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":":數據是在單線程中同步讀取的,導致作業把大量時間花在通過網絡讀取數據上。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}},{"type":"strong"}],"text":"多次非必要重新打開"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":":S3輸入流是不可尋址的。每次執行尋址或是遇到讀取錯誤時,總是要重複打開“分割(split)”。分割越大,出現這種情況的可能性越高。每次重新打開都會進一步降低總體的吞吐量。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解決方案:提高讀取吞吐量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3d\/2d\/3d2165f34e6d524af34134cba3d0532d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"圖1:S3讀取器的預取+緩存組件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決上述問題,我們採取了以下措施:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"我們將分割視爲是由固定大小的塊組成的。默認大小是8MB,但可配置。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"每個塊在異步讀取到內存後,調用者才能訪問。預取緩存的大小(塊的數量)是可配置的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"調用者只能讀取已經預取到內存中的塊。這樣客戶端可以免受網絡異常的影響,而我們也可以有一個額外的重試層來增加整體彈性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"每當遇到在當前塊之外尋址的情況時,我們會在本地文件系統中緩存預取的塊。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們進一步增強了這個實現,讓生產者-消費者交互幾乎不會出現鎖。根據一項單獨的基準測試(詳情見圖2),這項增強將讀吞吐量從20MB\/s提高到了269MB\/s。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"順序讀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何按照順序處理數據的消費者(如mapper)都可以從這個方法中獲得很大的好處。雖然mapper處理的是當前檢索出來的數據,但序列中接下來的數據已經異步預取。在大多數情況下,在mapper準備好處理下一個數據塊時,數據就已經預取完成。這樣一來,mapper就把更多的時間花在了有用的工作上,等待的時間減少了,CPU利用率因此增加了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Parquet文件讀取更高效"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Parquet文件需要非順序讀取,這是由它們的磁盤格式決定的。我們最初實現的時候沒有使用本地緩存。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#292929","name":"user"}}],"text":"每當遇到在當前塊之外尋址的情況時,我們就得拋棄預取的數據。在讀取"},{"type":"text","text":"Parquet文件時,這比通常的讀取器性能還要差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在引入預取數據的本地緩存後,我們發現Parquet文件讀取吞吐量有明顯的提升。目前,與通常的讀取器相比,我們的實現將Parquet文件讀取吞吐量提升了5倍。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"改進生產作業"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取吞吐量的增加給生產作業帶來了多方面的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"降低了作業運行時間"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作業的總體運行時間減少了,因爲mapper等待數據的時間減少了,可以更快地完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"減少mapper數量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果mapper耗時大大減少,那麼我們就可以通過增加分割大小來減少mapper數量。Mapper數量的減少可以減少由固定mapper開銷所導致的CPU浪費。更重要的是,這樣做並不會增加作業的運行時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"提高CPU利用率"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於mapper完成同樣的工作所花費的時間減少,所以CPU整體的利用率會提高。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,我們的實現(S3E)使用了一個單獨的存儲庫,提高了我們的迭代改進速度。最終,我們會將其合併到S3A,把它回饋給社區。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"單獨的基準測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5d\/5d\/5d0478a6fa285586a9e1a55d7f2fa05d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"圖2:S3A和S3E的吞吐量對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每種情況下,我們都是順序讀取一個3.5GB的S3文件,並將其寫入本地的一個臨時文件。後半部分是爲了模擬mapper操作期間發生的IO重疊。基準測試是在EC2 c5.9xlarge實例上進行的。我們測量了讀取文件的總時間,並計算每種方法的有效吞吐量。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"生產運行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在許多大型生產作業中測試了S3E實現。這些作業每次運行時通常都要使用數以萬計的vcore。圖3是對比了啓用S3E和不啓用S3E時獲得的指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"度量資源節省情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用以下方法度量這項優化所帶來的資源節省情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/39\/8a\/39b7a25bc76bb774ec5e7d3c069daf8a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"觀測到的結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5c\/10\/5cf63283e7yya8cdbf91e7d1351c1310.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"圖3:MapReduce作業資源消耗對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然不同的生產作業工作負載有不同的特徵,但我們看到,在30個成本高昂的作業中,大部分的vcore都減少了6%到45%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的方法有一個吸引人的地方,就是在一個作業中啓用時不需要對作業的代碼做任何修改。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"未來展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,我們把這個增強實現放在了一個單獨的Git存儲庫中。將來,我們可能會升級已有的S3A實現,並把它回饋給社區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們正在把這項優化推廣到我們的多個集羣中,結果將發表在以後的博文上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鑑於S3E輸入流的核心實現不依賴於任何Hadoop代碼,我們可以在其他任何需要大量訪問S3數據的系統中使用它。目前,我們把這項優化用在MapReduce、Cascading和Scalding作業中。不過,經過初步評估,將其應用於Spark和Spark SQL的結果也非常令人鼓舞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前的實現可以通過進一步優化來提高效率。同樣值得探索的是,是否可以使用過去的執行數據來優化每個作業的塊大小和預取緩存大小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查看英文原文:"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/pinterest-engineering\/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0","title":null,"type":null},"content":[{"type":"text","text":"Improving efficiency and reducing runtime using S3 read optimization"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章