我們爲什麼構建自己的serverless計算平臺,而非使用AWS Lambda?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文最初發表於"},{"type":"link","attrs":{"href":"https:\/\/www.cortex.dev\/post\/serverless-machine-learning-aws-lambda","title":"","type":null},"content":[{"type":"text","text":"Cortex網站"}]},{"type":"text","text":",經原作者Caleb Kaiser許可由InfoQ中文站翻譯分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於模型部署來講,AWS Lambda是一個很有吸引力的方案。從表面上來看,其收益是很明顯的。Lambda可以:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讓數據科學家和機器學習工程師在部署時無需管理基礎設施"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最大化可用性的同時,能將成本降到最低"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲定義預測API提供了一個簡單的接口"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,問題在於,儘管這都是serverless架構的收益,但是像Lambda這樣的通用serverless平臺通常會有一些限制,這些限制使得它並非機器學習的最理想方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們親身體會到這一點。在着手實現Cortex之前,我們曾經嘗試通過Lambda運行部署。事實上,正是由於Lambda的不足,在一定程度上促使我們建立一個專門用於機器學習的serverless計算平臺。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Lambda不能部署大型的模型(比如Transformer模型)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,你可能已經讀過很多關於機器學習模型增長的文章了。可以說,在很多適用於機器學習的領域,尤其是自然語言處理方面,模型正在迅速地變得越來越大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,在過去幾年中,Hugging Face的Transformers庫成爲了最受歡迎的NLP庫。從傳聞中看到,用戶經常在生產API中使用它。這個庫爲如下的模型提供了便利的接口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPT-2:完全訓練後大約是6GB"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BlenderBot:完全訓練後大約是5GB"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RoBERTa:完全訓練後大於1GB"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而這僅僅是看上去比較合理的模型。有些模型,比如T5,可能會超過40GB,不過我承認,自己沒有遇到過太多團隊大規模地部署這種規模的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"適用於現代機器學習需求的serverless平臺需要能部署大型的模型,但是Lambda做不到這一點。Lambda限制部署包的大小爲未壓縮的250MB,並將函數限制到了30008 MB的內存。如果你想運行任何一種最先進的語言模型,Lambda都不是合適的可選方案。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲進行模型處理,需要GPU\/ASIC的支持"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着模型變得越來越大,它們的資源需求也會隨之增加。對我們前文所討論的一些大模型來說,使用GPU推理是唯一能以接近實時延遲的速度處理它們的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類似的,像Inferentia和TPU這樣的ASIC在某些情況下正在改變模型處理的經濟效益,並且隨着它們的不斷成熟,有潛力在更大的範圍實現這一點。即使是相對比較年輕的方案,但是我們已經對某些模型的性能進行了基準測試,使用Inferentia的效率能提高一個數量級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在過去,GPU\/ASIC推理被認爲是相對小衆的場景,但是它正在越來越多地成爲機器學習工程的標準。令人遺憾的是,Lambda並不支持它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對大量的Cortex用戶來說,僅憑這一點就讓Lambda失去了將模型部署到生產環境的機會。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Lambda處理模型的效率太低"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lambda實例能夠服務於連續的請求,但不能處理併發的請求。在處理模型的時候,這是一個大問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推理是一項計算成本高昂的任務,通常伴隨大量的延遲(因此經常需要GPU\/ASIC)。爲了防止推理成本的飆升,很重要的一點就是在分配計算資源的時候,要儘可能保持高效,同時不能對延遲產生負面影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Cortex中,我們實現這一點的方式是提供預測前和預測後的鉤子,它們可以異步執行代碼。通常來講,當一些IO請求(比如從數據庫中調用用戶信息、寫入日誌等)與推理函數相連接的時候,就會用到它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些異步鉤子提供的優勢在於,它允許我們在預測生成後立即釋放推理所需的資源,而不必等到響應發送之後。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,在Lambda中,這是不可能實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,如果使用Lambda處理模型的話,很可能會因爲每個實例上閒置的資源浪費而導致過度擴展。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"機器學習需要一個專門的serverless平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Serverless架構天然適合模型部署。但問題在於,我們在適用於MLOps的任何場景中都會遇到的問題是,機器學習的需求非常具體,使得流行的DevOps工具(如Lambda)並不適用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們構建Cortex的部分使命就是構建一個平臺,提供我們在Lambda中喜愛的易用性,同時解決ML基礎設施的具體挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.cortex.dev\/post\/serverless-machine-learning-aws-lambda","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.cortex.dev\/post\/serverless-machine-learning-aws-lambda"}],"marks":[{"type":"underline"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章