持續集成和交付流水線的反模式

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CI\/CD & Pipeline"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着DevOps的理念在衆多公司的採納,CI\/CD也漸漸落地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CI("},{"type":"text","text":"Continuous Integration)持續集成,是把代碼變更自動集成到主幹的一種實踐。CI的出現解決了集成地獄的問題,讓產品可以快速迭代,同時還能保持高質量。它的核心措施是,代碼集成到主幹之前,必須通過一系列自動化測試,比如編譯、單元測試、lint、代碼風格檢查。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CD"},{"type":"text","text":"包括持續交付和持續部署。持續交付(Continuous Delivery)指的是團隊自動地、頻繁地、可預測地交付高質量軟件版本的過程,可以看做持續集成的下一個階段,強調的是無論代碼怎麼更新,軟件都是隨時可以交付的;持續部署(continuous deployment)更強調的是使用自動化測試來保證變更的正確性和穩定性,以便在測試通過後立即部署,是持續交付的更進一步。二者的區別是,持續交付需要人爲介入,需要確保可以部署到生產環境時,纔去進行部署。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/18\/183db14f4b5dfcd459b5fc58ab6aae1d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 持續集成 & 持續交付 & 持續部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CI\/CD Pipeline"},{"type":"text","text":"是軟件開發過程中避免浪費的一種實踐,展現了從代碼提交、構建、部署、測試到發佈的整個過程,爲團隊提供可視化和及時反饋。Pipeline推薦的實施方式是,把軟件部署的過程分爲不同的階段(Stage),其中任務(Step)在每個階段中運行。在同一階段,可以並行執行任務,幫助快速反饋,只有一個階段中所有任務都通過時,下一階段的任務纔可以啓動。比如圖中,從git push到deploy to production的整個流程,就是一條CD Pipeline。可以利用Pipeline工具,如Jenkins、Buildkite、Bamboo,來幫助我們更方便的實施C\/ICD。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/79\/796e7ea7de89024df0b10d53cbe786ae.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 CI\/CD Pipeline"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CI\/CD Pipeline的反模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然有Pipeline廣泛的應用,但我們卻會聽見開發人員抱怨糟糕的Pipeline對他們的傷害,如阻塞開發流程,影響變更的部署效率,降低交付質量。我們收集了項目上經常出現的Pipeline的八大反模式,按照出現頻率排序,分別闡述這些壞味道,分析可能產生的原因、影響及解決方式,希望能夠減少抱怨,讓Pipeline更大程度上提升工作效率。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 沒有代碼化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:Pipeline的定義沒有完全代碼化,進行版本控制,存儲在代碼倉庫,而是在Pipeline 工具上直接輸入shell腳本定義Pipeline的運行過程。原因:由於早期的CI工具不支持代碼化,一直能夠保留到現在,沒有做重構和升級。影響:Pipeline的創建和管理都是通過CI工具的界面交互來的,難以維護,因此需要專門的管理員來維護,而有人工操作的部分就會出錯,因此會降低Pipeline的可靠性。如果Pipeline因爲一些原因丟失就沒有辦法很快恢復,就會影響交付速率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方案:Pipeline as code這個理念已經提了很多年了,在ThoughtWorks 2016年的技術雷達裏就已經採納了,需要強調的是,用於構建、測試和部署我們應用程序或基礎設施的交付Pipeline的配置,都應以代碼形式展現。隨着組織逐漸演變爲構建微服務或微前端的去中心化自治團隊,人們越來越需要以代碼形式管理Pipeline這種工程實踐,來保證組織內部構建和部署軟件的一致性。通常,針對某個項目的Pipeline配置,應和項目代碼放在項目的源碼管理倉庫中。同業務代碼一樣要做code review。這種需求使得業界出現了很多支持Pipeline工具,它們可以以標準的方式構建、部署服務和應用,如Jenkins、Buildkite、Bamboo。這些工具用大多有一個Pipeline的藍圖,來執行一個交付生命週期中不同階段的任務,如構建、測試和部署,而不用關心實現細節。以代碼形式來完成構建、測試和部署流水線的能力,應該成爲選擇CI\/CD工具的評估標準之一。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 運行速度慢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:一條Pipeline的執行時間超過半小時,就屬於運行速度慢的Pipeline。(這裏的運行速度與交付的產品有關,在不同的項目中,運行時長的限定也有所不同)很多原因都會導致運行一次Pipeline時間很長,比如:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該並行的任務沒有並行執行,等待的任務拉長了執行時間;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行Pipeline的agent節點太少,或者性能不足,導致排隊時間太長,效率太低;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行的任務太重,相同測試場景被不同的測試覆蓋了很多次。比如同樣的邏輯在不同測試中都測了一遍;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒有合理利用緩存,比如每個任務裏都要下載全部依賴,在構建Dockerfile時沒有合理利用layer,每次都會構建一個全新的image。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影響:這是開發人員抱怨最多的一個反模式。敏捷開發模式需要Pipeline快速反饋結果,受這一反模式制約,在特性開發過程中,經常出現開發人員改一行代碼,等半天CI的效果。如果出現一個線上事故需要修改一行代碼來修復,最終需要很長的週期才能讓這一更改應用在生產環境。解決:不同的原因導致的Pipeline速度慢,有不同的解決方法。比如針對上面的問題,我們可以去:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"檢查Pipeline的設計是否合理,儘可能讓任務並行;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對代碼的各種測試深入瞭解,讓測試儘量正交,避免過多的重複;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"檢查代碼中的依賴,合理利用好緩存。包括Docker Image、Gradle、Yarn、Rubygem的緩存,以及Dockerfile是否合理的設計,最大化的將不可變的layer集中的開始階段;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"檢查執行構建的節點資源是否充足,能否在任務量大時做彈性伸縮,減少等待和執行時間。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 執行結果不穩定"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/92\/920440909d88a7490d9f0b433076cc4a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖3 執行多次結果不穩定反模式:構建相同代碼的Pipeline運行多次,得到結果不同。比如,基於同一代碼基線,一條Pipeline構建了5次,只有最後一次通過了。原因:出現執行結果不穩定的原因也多種多樣,比如測試用例的實現不合理,導致測試結果時過時不過;代碼中使用了不可靠的依賴源,比如來自國外的依賴源,下載依賴經常超時;由或是在Pipeline運行過程中沒有合理設計各個階段,導致有些任務同時運行衝突了。影響:Pipeline作爲代碼發佈的最後一道防火牆,最基本的特性是冪等性,即在一個相同的代碼基線,執行Pipeline的任意任務,不管是10次、100次,得到的結果都相同。Pipeline不穩定會直接導致代碼的部署速率降低。更重要的是,影響開發人員對Pipeline的信任。如果不穩定Pipeline不及時解決,慢慢這條Pipeline會失去維護,開發最後會轉向手工部署。解決:要構建冪等的、可靠的Pipeline,就要分析這些不穩定因素出現的原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提升測試的穩定性,比如用mock替代不穩定的源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用Pipeline的重試功能,或者採用穩定的鏡像源,或者提前構建好基礎鏡像。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引入Pipeline的插件保證任務不會並行執行。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 濫用job處理生產環境數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:使用Pipeline的定時任務的特性,運行生產環境的負載。比如經常會定期做數據備份、數據遷移,數據抓取。原因:由於對Pipeline的認識不夠清晰,將重要的任務交由Pipeline做。Pipeline一旦有了某個生產環境的訪問權限,做這些數據處理相關的任務就很方便,減少了很多人爲的操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影響:Pipeline是用來做構建、部署的工具,不能用於業務邏輯的執行。由於Pipeline是一個內部服務,他的SLO\/SLI必定和生產環境不同,如果強依賴勢必影響生產環境的SLO。假如某天Pipeline掛掉了,生產環境就無法得到想要的數據。另外,任務和Pipeline緊密耦合,是我們後面會討論的另一個反模式。解決方法:用生產環境自身的工具解決這種數據問題,比如 採用AWS的lambda,定時觸發數據處理任務。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 複雜難懂"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ac\/acada74bb5b54b72597a80d800dd0c83.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4 Pipeline的定義邏輯複雜"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:Pipeline的定義包含了太多的邏輯,複雜難懂。只有在一條Pipeline運行起來才能知道這裏會運行哪些步驟,會將這個版本部署到哪些環境。原因:Pipeline的代碼不夠整潔。有人認爲Pipeline只是給CI工具提供的,就隨意編寫,認爲能完成指定的工作就夠了。影響:Pipeline的複雜性,會直接提升學習成本。如果想重複執行上一次構建,會花費較長時間。解決:Pipeline的代碼要簡潔,把複雜性放在部署腳本或代碼側。通過每個階段的的標題可以直接瞭解所要執行的任務。如果存在很多相同的邏輯,可以通過開發Pipeline的Plugin來簡化配置。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 耦合太高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/42\/42bf2add183a10eea30a4cdf3e4f425c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5 (左)耦合太高的Pipeline定義 (右)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"期待的Pipeline定義反模式:Pipeline跟運行它的CI工具緊密耦合,以至於無法在本地重複相同的步驟。表現可能多種多樣:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pipeline的定義跟構建工具緊密耦合,包含了Pipeline工具特有的參數以及CLI命令。比如在配置中使用BUILDKITE_BUILD_NUMBER,BUILDKITE_QUEUE等等。結果就是本地運行的方式或結果和Pipeline上運行的方式以及結果不一致。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Pipeline的任務中寫了一大段腳本,或者直接使用命令加上一堆參數,以至於在本地想跑測試需要在Pipeline的配置中找命令並且在本地粘貼。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不做環境隔離, 測試,編譯,部署等都依賴於運行時環境。可能出現Pipeline 因依賴的軟件\/庫等版本不一致而導致的不一致的情況,通常還很難排查。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影響:因爲本地不方便調試,所變更的失敗概率會大大增加。如果變更用來修復一個Bug,由於不做環境隔離,會導致故障修復週期拉長。解決:Pipeline的每個step都用腳本封裝起來,腳本里不使用Pipeline工具特有的參數,並且保證本地運行時和Pipeline上保持一致。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7. 殭屍Pipeline"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:一條Pipeline年久失修,很久沒有執行過,而且最後一次的構建是失敗的。原因:這種反模式通常出現於不再活躍開發的項目上,因此很久沒有執行過Pipeline。影響:Pipeline的結果反應的是一個項目的狀態。由於軟件產品迭代速度快,這個軟件的依賴可能已經發生了巨大的變化,一旦運行,大概率會出錯。假如這個項目目前出現了一個事故,需要提交代碼,就得先修復項目的Pipeline,才能確保提交修復代碼。解決:針對常年沒有提交的Pipeline,我們建議讓Pipeline週期的執行,出現問題立即修復。如Github的Dependabot,能保證項目的依賴始終是是最新的,而且能讓Pipeline執行,提早發現問題。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"8. 需要人工介入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反模式:通常項目上會有一個專職Ops,在項目可以發佈的時候手動觸發部署流程,或者需要傳遞很多參數,讓Pipeline運行起來。原因:包括項目的流程繁瑣,需要反覆確認;DevOps成熟度不夠,沒有實現持續部署;或者CI的測試覆蓋不夠,CI通過後還要進行更多的測試才能部署。影響:這些Pipeline需要專人盯着,去點某些按鈕。會直接影響產品的交付速率和代碼部署頻率。解決:讓項目的運行更加敏捷,減少Pipeline定義中的阻塞按鈕,將手工測試自動化後集成到Pipeline中。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"最後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"希望通過本篇文章,意識到項目中CI\/CD Pipeline的問題,使其發揮更大的價值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:ThoughtWorks洞見(ID:TW-Insights)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/cbXH3pF_dnY2clI0-6KVSg","title":"xxx","type":null},"content":[{"type":"text","text":"持續集成和交付流水線的反模式"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章