亞馬遜正在重塑MLOps

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者 | Vishnu Prathish"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"譯者 | 王強"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"策劃 | 冬梅"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"本文最初發佈於 Medium 網站,經原作者授權由 InfoQ 中文站翻譯並分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,在三大雲提供商中 AWS 擁有最豐富的機器學習能力組合。隨着 Sagemaker Studio 於 2020 年初公開發布,他們創建了一個全集成的 ML 開發環境——這是業界首創。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在所有 ML 產品的中心錨定一個 IDE 是一個明智的舉動——只要你的相關服務正確地填補了關鍵運維層面的空白。如果一切順利,亞馬遜將有機會一勞永逸地重塑行業中機器學習的面貌。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"甚至在 Sagemaker Studio 之前,AWS 就有了一些針對 MLOps 的服務。但是,Re:invent 2020 更進一步。他們發佈了一系列產品 \/ 服務,填補了大多數已知的空白。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在他們做得怎麼樣?他們是否爲正確的受衆構建了正確的工具?這個問題還需要幾年時間才能得出答案。但是 AWS 肯定在這場競賽中處於領先位置。現在,我們來研究一些關鍵的新服務,從中瞭解 AWS 在這場遊戲中的優勢所在。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"AWS 的現有 MLOps 套件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"亞馬遜的現有產品完全基於 Sagemaker Studio。它爲 ML 開發提供了業內首創的集成開發環境。下面介紹一些基於它實現的功能,這些功能讓這個平臺頗具吸引力:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker Studio notebooks 提供無服務器的 Jupyter 筆記本代替你的本地筆記本。它還支持本地模式。但我強烈建議你圍繞中心化筆記本設置構建開發環境。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker Autopilot 將 AutoML 引入了 AWS,從而消除了 ML 流程中的所有繁重工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker Experiments 允許你保存和跟蹤你的訓練實驗。它還允許將一個模型與另一個模型對比,從而允許用戶從實驗結果表中手動選擇最佳模型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker Model tuning 允許你利用雲來自動執行超參數優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Multimodel endpoints 能大大降低推理成本。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Model monitor 能幫助你跟蹤生產中的指標,從而輕鬆跟蹤模型漂移。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2021 年有什麼新變化?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管 AWS 是 ML 服務的運維提供商,但它仍然不能聲稱自己擁有用於所有機器學習目的的,打通的開發環境。MLOps 在幾個領域存在重大差距。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒有連貫的 CI\/CD 管道可以將它們連在一起。沒有這樣的管道,感覺用戶在使用一系列不同的服務。機器學習過程各個階段(數據準備、訓練、驗證、推理、監控)的相關產品也還不完整。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但這種情況正在改變。隨着 re:invent 2020 和之前發佈的一些新服務的出現,AWS 在今年已經填補了大部分空白,而其他多數提供商則遠遠落後。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是一些例子。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Data Wrangler:零代碼數據準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AWS Sagemaker Data Wrangler 提供了一種乾淨的 Jupyter 風格的 IDE,用於機器學習數據準備。它直接建立在 Sagemaker Studio 上,因此利用了 Studio 的所有強大功能(比如它的數據可視化)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使從技術上講這是一種無代碼工具,但 Data Wrangler 還是可以使用代碼自定義的。你可以將 300 多種內置的自動轉換應用於你的訓練數據。你只需單擊即可將工作流程導出到 Sagemaker 筆記本並構建就地模型。它還直接支持多個數據存儲,包括 Snowflake、MongoDB 和 Databricks。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Data Wrangler 解決了亞馬遜在 ML 數據準備方面的巨大空白。他們聲稱,以這種方式簡化數據準備工作可以大大減少用戶花費在數據準備上的時間。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Sagemaker DataBrew:同樣的工具,但做法不同"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/22be8e85b0eea383a38aa93fbe572e8f.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Data Brew 也是無代碼數據準備工具。但這兩種工具面向的是兩類不同的受衆。Data Wrangler 專門針對 ML,而 Data Brew 專注在通用探索性數據分析(EDA)上。另外,Data Brew 是一個以 UI 爲中心的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EDA 通常是 ML 的先決條件,因此它們完全可以同時使用。Data Brew 的一鍵分析和精心設計的界面(適合不會編寫代碼的用戶)讓作業變得更加簡單明瞭。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩種工具都可以用來完成特徵工程。但是隻有 Data Wrangler 支持將特徵空間導出到 AWS Feature Store,所以更合適一些。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個空白,填補完畢。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"AWS Feature Store:大規模特徵工程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是一個重要的發佈,解決了關鍵的 特徵工程缺失 的問題。許多機器學習實踐在脫機(批處理)和在線(實時)特徵工程之間存在差異。複雜的特徵工程轉換和在批處理期間構建的新特徵很難很好地轉換爲推理 \/ 預測管道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Feature Store 在這兩點之間放置了一個專有的針對特徵空間的存儲庫來解決這一問題。訓練期間你在 Sagemaker Studio 中對原始數據所做的所有操作都可以導出到 Feature Store 中,並且可以保證在推理過程中可以正確地複製這些數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了解決這個在線 - 離線問題外,它還支持特徵可發現、共享和特徵重用。它的設計還考慮了延遲——這是大規模場景中必須做的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Sagemaker Pipelines:機器學習流程的 CI\/CD"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對我來說,這項服務是本年度最重要的運維發佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管可擴展 ML 的重要先決條件是可靠的 CI\/CD 流程 \/ 框架,但之前並沒有好用的產品選項。大家要麼用的是沒那麼理想的 MLOps 流程,要麼建立了自己的 CI\/CD 版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ML 的自制 CI\/CD 框架存在的問題是它們無法推廣,因此無法輕鬆開源。框架不可避免地要在代碼中寫入許多領域知識——既是爲了縮短開發時間,也是爲了與現有服務更好地集成。AWS 打算通過用於 ML 的通用 CI\/CD 框架解決這一問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker Pipelines 允許你創建、可視化和管理 ML 工作流。它使你能夠創建單獨的開發和生產環境並進行跟蹤。環境允許你進行工件升級。它還帶有一個模型註冊表,可讓你跟蹤和選擇正確的部署模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一管道的一個不太明顯的效果是,它還將其他所有用於 ML 的 Sagemaker 服務編織在一起。這爲 AWS 帶來了明顯的優勢,因爲它可以實現真正的端到端 ML。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"re:invent 的其他相關內容"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Sagemaker Clarify:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"跨 e2e Sagemaker 工作流的偏見檢測。對於 B2C 公司而言這是一大優勢。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"SageMaker 調試器的改進"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訓練期間對資源利用情況進行監視和深度 profiling。特別是在深層神經網絡上。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"邊緣機器學習"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sagemaker edge manager 基於 AWS Neo 之上,引入了邊緣設備的模型管理。如果你在物聯網行業,它會非常有用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據庫 ML 功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然不太算是 MLOps,但亞馬遜新的數據庫 ML 服務確實屬於一個共同的主題——建立一個平穩的生產級 ML 流程,從而完全消除了對運維的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Amazon Redshift ML:將 Sagemaker Autopilot 集成到 Amazon Redshift 中"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Amazon Neptune ML:集成 Graph ml"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Amazon Aurora ML:使用 SQL 查詢將 ML 直接集成到 Postgres 中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Amazon Athena ML:在 Athena 上提供經過預訓練的模型。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"競爭對手的情況?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Azure Machine Learning 和 Google Cloud AI platform 是排名靠前的雲提供商中的兩家頭部 MlOps 提供商。兩者都具有強大的管道和 CI\/CD 功能。但是,Google AI 管道仍處於測試階段,而其 AWS 競品已經具備通用性。Azure Machine Learning Studio 感覺與 Sagemaker 非常相似,但並沒有提供那麼多服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其他提供商所用的模式並沒有在中心包含集成的 IDE。Azure ML Studio 似乎在這方面做了嘗試。但它在功能集方面侷限很大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與其他頭部提供商相比,亞馬遜確實投入了更多資源來提供更好的數據科學運維解決方案。這樣是否可以讓他們牢牢地把持最集成的 MLOps 套件的領先地位?我想是這樣。亞馬遜在開發雲解決方案方面具有 3 到 5 年的領先優勢(或更多?這裏我找不到參考數據)。但是,現在預測誰將贏得 MLOps 競賽還爲時過早。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/medium.com\/analytics-vidhya\/amazon-is-reinventing-mlops-57f36c0b2d0a"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章