DataOps、MLOps 和 AIOps,你要的是哪個Ops?

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何在 DataOps、MLOps 和 AIOps 之間進行選擇?大數據團隊應該採取哪種 Ops?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩年前,由於我領導的運維團隊效率低下,我“贏得”了恥辱的勳章。我具有數據科學和機器學習的背景,因此,我們想當然的從工程團隊的同事那裏學來了 DevOps。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至少我們是這麼認爲的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這在當時對我們來說是不可思議的,因爲我們的數據科學家就坐在數據工程師的旁邊。我們遵循了所有好的敏捷實踐——每天開站會,討論阻礙我們的因素,並沒有那種“隔牆扔磚”的狀態。我們密切合作,我們的科學家和工程師彼此相愛。但進展很緩慢,團隊成員很沮喪。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"過了兩年,我終於理解了 DevOps 的真正含義。它在數據團隊中是如此的相同,又是如此的不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在討論以數據爲中心的 Ops 之前,讓我們先從軟件開始。它們有太多的相似之處和對比了,請耐心聽我說……"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自從 2000 年末 DevOps 普及以來,軟件行業就一直癡迷於各種 Ops 術語。十年前,從軟件開發到部署的方式早已推陳出新。軟件工程師開發應用程序,然後將其交給運維工程師。應用程序在部署過程中經常會出現故障,並在團隊之間造成重大摩擦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DevOps 實踐的目的是使部署過程更加順暢。其理念是將自動化視爲構建和部署軟件應用程序的一等公民。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式徹底改變了整個行業。許多組織開始通過組建跨職能的團隊來管理整個 SDLC。團隊會搭建基礎設施(基礎設施工程師),開發應用程序 (軟件工程師),構建 CI\/CD 管道(DevOps 工程師),部署應用程序(每個工程師),然後持續監控和觀察應用程序 (站點可靠性工程師)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一個大型的團隊中,不同的工程師都有各自的主要職能。但在較小的團隊中,一名工程師往往要扮演多種角色。理想的情況是讓許多團隊成員能夠履行多項職能,這樣就可以消除瓶頸和對關鍵人員的依賴。所以實際上……"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DevOps 與其說是一項工作職能,不如說是一種實踐或文化。在構建任何軟件時都應該採用它。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着 DevOps 的興起,各種 Ops 也應運而生了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/43\/434266da14ca64e170f4bb7ff9f6d381.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件開發世界中存在的各種 Ops,作者製作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SecOps 以安全爲核心,GitOps 致力於持續交付,NetOps 確保網絡能夠支持數據流,ITOps 專注於軟件交付之外的運維任務。但總的來說,這些 Ops 的基石都是源於 DevOps 承諾的願景:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“以最低的錯誤率,儘可能快的速度交付軟件”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"五年前,所有人都鼓吹“數據是新石油”(Data is the new Oil)。世界各地的領導者開始投入大量資源,組建大數據團隊來挖掘這些寶貴的資產。對這些團隊來說,交付的壓力是巨大的——畢竟,我們怎麼能辜負“新石油”的承諾呢?隨着業務的快速擴張,分析團隊也經歷了同樣的悲痛。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後我們實現了所有的一切。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據科學家成爲了 21 世紀最性感的職業。我們正處於數據和分析的黃金時代。每個高管都有一個儀表盤。這個儀表盤包含了來自整個組織的數據和嵌入的預測模型。每個客戶都有一個基於其行爲的個性化推薦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,現在添加一個新功能需要花費數週甚至數月的時間。數據模式很混亂,沒有人知道我們使用的活躍用戶定義是來自信貸團隊的還是來自營銷團隊的。我們在將模型投入生產時變得謹慎,因爲我們不確定它會帶來什麼破壞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,以數據爲中心的社區站在了一起,承諾不會因爲管理不善而導致效率低下。從那時起,各種以數據爲中心的 Ops 也應運而生了……"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/bb\/bbb960252bdcb01f14628fdde99b2a13.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在以數據爲中心的團隊中誕生的各種 Ops: DataOps、MLOps、AIOps,由作者製作"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DataOps vs MLOps vs DevOps(以及 AIOps?)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:在本文中,分析團隊是指使用 SQL\/PowerBI 來生成業務洞察力的傳統 BI 團隊。AI 團隊是指使用大數據技術構建高級分析和機器學習模型的團隊。有時他們是同一個團隊,但我們會將他們分開,這樣能更容易地解釋相應概念。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了理解所有這些不同的 Ops,讓我們來看一下數據是如何在組織中流動的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過客戶與軟件程序的交互產生數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件將數據存儲在應用程序的數據庫中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分析團隊根據這些來自不同團隊的應用程序數據庫構建 ETL。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分析團隊爲業務用戶構建報表和儀表盤,以協助他們做出以數據爲驅動的決策。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,數據工程師將原始數據、合併了的數據集(來自分析團隊)和其他非結構化的數據集整合到某種形式的數據湖中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,數據科學家利用這些海量的數據集建立模型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,這些模型利用用戶生成的新數據進行預測。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後,軟件工程師將預測結果呈現給用戶,"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣的循環不斷進行下去……"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們知道,DevOps 的誕生是由於開發團隊和運維團隊之間產生了摩擦。因此,可以想象一下,在運維、軟件、分析和 AI4 個團隊之間進行交互會有多痛苦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了說明這些不同的 Ops 是如何解決上述過程的,這裏有一個圖表,它繪製了每個工作職能在整個時間軸上所執行的一些任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/7b\/7b6a32febb63e57a097dc4e6d5373132.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個工作職能在時間軸上所執行的任務圖,由作者生成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理想情況下,應在項目開始時採用 X-Ops 文化,並在整個過程中實施。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體而言之,以下就是每種 Ops 的具體含義:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DevOps 更快地交付軟件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一系列旨在消除開發和運維團隊之間障礙的實踐,以便更快地構建和部署軟件。它通常會被工程團隊所採用,包括 DevOps 工程師、基礎設施工程師、軟件工程師、站點可靠性工程師和數據工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DataOps 更快地交付數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一系列旨在提高數據分析質量並縮短分析週期的實踐。DataOps 的主要任務包括數據標記、數據測試、數據管道編排、數據版本控制和數據監控。分析和大數據團隊是 DataOps 的主要操作者,但是任何生成和使用數據的人都應該採用良好的 DataOps 實踐。這包括數據分析師、BI 分析師、數據科學家、數據工程師,有時還包括軟件工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"MLOps 更快地交付機器學習模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一系列設計、構建和管理可重現、可測試和可持續的基於 ML 的軟件實踐。對於大數據 \/ 機器學習團隊,MLOps 包含了大多數 DataOps 的任務以及其他特定於 ML 的任務,例如模型版本控制、測試、驗證和監控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"附加:AIOps 利用 AI 的功能增強了 DevOps 工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時人們錯誤地將 MLOps 稱爲 AIOps,但它們是完全不同的。以下說明來自 Gartner(高德納,美國諮詢公司):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AIOps 平臺利用大數據、現代機器學習以及其他先進的分析技術,直接或間接地增強 IT 運維(監控、自動化和服務檯),具有前瞻性、個性化以及動態的洞察力。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,AIOps 通常是利用 AI 技術來增強服務產品的 DevOps 工具。AWS Cloud Watch 提供的報警和異常檢測是 AIOps 的一個很好的例子。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"是原則不是工作角色"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存在的一種誤解是:爲了達到這些 Ops 所承諾的效率,需要從選擇正確的技術開始。事實上,技術並不是最重要的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DataOps、MLOps 和 DevOps 的實踐必須是與語言、框架、平臺和基礎設施無關的。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個人都有不同的工作流程,該工作流程應該由相應的原則來決定,而不是你想要嘗試的技術,或者最流行的技術。技術先行的陷阱是,如果你想用錘子,那麼一切對你來說都像是釘子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有的 Ops 都具有相同的 7 個首要原則,但是每個原則又都有其細微的差別:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 合規"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DevOps 通常會擔心網絡和應用程序的安全性。在 MLOps 領域,金融和醫療保健等行業通常需要模型的可解釋性。DataOps 需要確保數據產品符合 GDPR\/HIPPA 等法規。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工具:"},{"type":"text","text":"PySyft 能夠解耦模型訓練過程中的私有數據,AirClope 能夠匿名化數據。Awesome AI Guidelines 能夠基於 AI 的原則、標準和規範進行管理。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 迭代開發"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該原則源於敏捷方法論,它專注於以可持續的方式不斷地產生業務價值。產品經過迭代設計、構建、測試和部署,以最大程度地實現快速失敗並不斷學習。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 可重現性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件系統通常是確定性的:代碼每次都應該以完全相同的方式運行。因此,爲了確保可重現性,DevOps 只需跟蹤代碼即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,機器學習模型經常會因爲數據漂移而被重新訓練。爲了重現結果,MLOps 需要對模型進行版本控制,DataOps 需要對數據進行版本控制。當被審計師問到“產生這個特定的結果,需要使用哪個模型,需要使用哪些數據來訓練該模型”時,數據科學家需要能夠回答這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工具:"},{"type":"text","text":"實驗跟蹤工具,比如 KubeFlow、MLFlow、SageMaker,它們都具有將元數據鏈接到實驗運行中的功能。Pachyderm 和 DVC 可用於數據版本控制。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件測試包括單元測試、集成測試和迴歸測試。DataOps 需要進行嚴格的數據測試,包括模式變更、數據漂移、特徵工程後的數據驗證等。從 ML 的角度來看,模型的準確性、安全性、偏差 \/ 公平性、可解釋性都需要測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工具:"},{"type":"text","text":"像 Shap、Lime 這樣的庫可用於可解釋性,fiddler 可用於解釋性監控,great expectation 可用於數據測試。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 持續部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器學習模型的持續部署由三個組件構成:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個組件是觸發事件,即觸發器是數據科學家的手動觸發器、日曆計劃事件和閾值觸發器嗎?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個組件是新模式的實際再培訓。生成模型的腳本、數據和超參是什麼?它們的版本以及它們之間的聯繫。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後一個組件是模型的實際部署,它必須由具有預警功能的部署管道進行編排。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工具:"},{"type":"text","text":" 大多數的工作流管理工具都具有此功能,比如 AWS SageMaker、AzureML、DataRobot 等。開源工具有 Seldon、Kubeflow 等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 自動化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化是 DevOps 的核心價值,實際上有很多專門針對自動化各個方面的工具。以下是一些機器學習項目相關的資源:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Awesome Machine Learning"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/josephmisiti\/awesome-machine-learning","title":"","type":null},"content":[{"type":"text","text":"https:\/\/github.com\/josephmisiti\/awesome-machine-learning"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Awesome Production Machine Learning"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/ethicalml\/awesome-production-machine-learning","title":"","type":null},"content":[{"type":"text","text":"https:\/\/github.com\/ethicalml\/awesome-production-machine-learning"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7. 監控"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件應用程序需要監控,機器學習模型和數據管道也需要監控。對於 DataOps 來說,重要的是監控新數據的分佈,以發現是否有任何數據和 \/ 或概念的漂移。在 MLOps 領域,除了模型降級之外,如果你的模型具有公共 API,那麼監控對抗性攻擊也是至關重要的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工具:"},{"type":"text","text":" 大多數工作流管理框架都具有某種形式的監控。其他流行的工具包括用於監控度量指標的 Prometheus,用於數據和模型監控的 Orbit by Dessa。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用正確的 X-Ops 文化來加快數據和機器學習驅動的軟件產品的交付。請記住,原則勝過技術:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"培養跨學科技能:"},{"type":"text","text":" 培養 T 型個人和團隊(彌合差距,協調問責制)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"儘早實現自動化(足夠):"},{"type":"text","text":" 集中在一個技術棧上並實現自動化(減小工程過程的開銷)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"以終爲始:"},{"type":"text","text":" 在解決方案設計上提前投資,以減少從 PoC 到生產的摩擦"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/what-the-ops-are-you-talking-about-518b1b1a2694","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/what-the-ops-are-you-talking-about-518b1b1a2694"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章