處理Java中的不穩定單元測試

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"不穩定測試簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單元測試是持續集成(CI)系統的基石。在軟件工程師新實現的代碼合併到已有代碼之前,它會對其中的錯誤和已有代碼中的迴歸給出警告。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它提升了軟件的可靠性,還提高了開發人員的整體生產力,因爲他們在軟件開發生命週期的早期就能發現錯誤。因此,構建穩定可靠的測試系統通常是軟件開發組織的關鍵要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不幸的是,根據定義,不穩定(flaky)單元測試是與這一要求相悖的。如果單元測試在任意兩次執行中返回不同的結果(通過或失敗),而沒有對源代碼進行任何底層更改,則該單元測試被認爲是不穩定的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"測試代碼或正在測試的代碼中的程序級不確定性(例如線程順序和其他併發問題)可能會導致不穩定測試。或者,它的成因可能是測試環境(例如執行它的機器、同時執行的測試集等)的可變性。前者需要修復代碼,後者則要找出導致不確定性的原因,並解決它們以消除不穩定因素。代碼模式和基礎設施的測試必須儘量減少出現不穩定測試的可能性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不穩定測試會在多個維度上影響開發人員的生產力。首先,當測試由於外來原因失敗時,測試人員必須調查問題成因;如果失敗的可重複性是不確定的,這就可能非常耗時。在許多情況下,在本地重現故障可能是不切實際的,因爲故障需要特定的測試配置和執行環境才能復現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,如果無法確定不穩定性的根本原因,就必須在CI期間大量重複測試,以觀察到測試的成功運行併合並對應的代碼更改。這個過程的兩個方面都浪費了關鍵的開發時間,因此需要構建基礎設施支持來處理不穩定單元測試的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用一個簡單的示例進一步解釋這個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"private static int REDIS_PORT = 6380;\n…\n@Before\npublic void setUp() throws IOException, TException {\n MockitAnnotations.initMocks(this);\n …\n server = RedisServer.newRedisServer(REDIS_PORT);\n …\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在運行單元測試之前執行的setUp方法中,在REDIS_PORT定義的端口6380上建立了到RedisServer的連接。當對應的單元測試在開發機器上本地運行時不會有錯誤,測試將成功完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,當這段代碼被推送到CI並且對應的測試在CI環境中運行時,只有當setUp方法運行的時候環境中的端口6380可用,測試纔會成功。如果在CI環境中還有其他併發執行的單元測試已經偵聽了同一個端口,那麼示例中的setUp方法將失敗,並顯示“端口已在使用中”綁定異常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,重現導致不穩定情況的原因需要開發人員瞭解不穩定情況的位置(例如,在上面的示例中是硬編碼的端口號)。這是一個循環問題,因爲不穩定性可能有多種表現形式,並且類似的直接“原因”(例如Java異常或測試失敗類型)可能對應非常不同的根本原因,這些原因出現在測試執行的早期,如下圖所示1。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,要重現異常堆棧跟蹤,還要適當設置環境(例如,連接到同一端口的測試也應併發執行)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/53\/34\/5384cb2b56556962f7c5c7b092679d34.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1:測試失敗時出現不穩定和可見症狀的根本原因"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在優步,我們爲了利用在單體上開發帶來的許多中心化優勢,而將各個存儲庫合併到一個單一的"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/?s=monorepo","title":"","type":null},"content":[{"type":"text","text":"單體"}]},{"type":"text","text":"存儲庫中時,不穩定測試帶來的痛點進一步加劇了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種中心化還帶來了由中心化團隊來管理依賴項、測試基礎設施、構建系統、靜態分析等工具的好處。單體存儲庫節約了爲各個獨立存儲庫管理這些系統的總體成本,並保證了整個組織的一致性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,向單體存儲庫的遷移讓影響開發人員生產力的不穩定測試問題更爲突出。由於更復雜的執行環境和同時運行的測試數量更多,在獨立存儲庫中不一定不穩定的測試,到了新的單體存儲庫就變得不穩定了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於很多測試最初並非設計爲運行在單體存儲庫規模上,因此在將它們遷移到單體存儲庫時會產生或暴露出明顯的不穩定性,也就不足爲奇了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我們將解釋我們減輕不穩定測試影響的方法。我們將討論用於管理單元測試狀態和消滅不穩定測試的測試分析器服務(Test Analyzer Service)的設計。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨後,我們將解釋我們對各種不穩定性來源進行分類,和構建程序分析工具(自動復現器和靜態檢查器)的努力,這些努力是爲了幫助重現不穩定性故障,並避免在單體存儲庫中添加新的不穩定測試。最後,我們將分享我們從這個過程中學到的東西。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用測試分析器管理不穩定測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在解決不穩定測試問題時的直接目標,是區分單體存儲庫中的穩定和不穩定測試。在高層次上,我們可以定期執行單體存儲庫主分支中的所有單元測試,並記錄與每個測試關聯的最後k次運行的歷史記錄來實現這一目的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於這些測試已經是主分支的一部分,因此它們應該會無條件地成功。如果測試在最後k次運行中失敗哪怕一次,就將其歸類爲不穩定並單獨處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我們構建了一個通用的測試分析器工具,幫助我們大規模分析和可視化優步測試所需的單元測試報告。該工具的核心稱爲測試分析器服務(TAS),它消費並處理與執行測試相關的數據,以生成可由開發人員可視化和分析的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種分析能捕獲大量測試元數據,包括執行測試的時間、測試執行的頻率、上次成功的時間等。該服務運行在優步的各個語言特定的單體存儲庫上,因此存儲了各個庫中數十萬單元測試的處理信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個單體存儲庫都有多個CI管道,它們定期執行測試並將測試報告提供給TAS。最近的數據存儲在一個本地數據庫中,而長期結果存儲在一個數據倉庫中用於歷史分析。我們利用TAS建立了一個自定義管道,其目標是在單體存儲庫的主分支中運行所有單元測試,以幫助識別和分離不穩定測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的架構圖顯示了服務的工作流程,一開始運行測試的CI作業,然後通過一個測試處理程序(Test Handler)CLI將結果提供給TAS,其結果存儲在本地數據庫和一個數據倉庫中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TAS通過API公開這些數據,以便在測試分析器UI中進行可視化和用於進一步分析。代碼審查工具已集成到測試分析器工具中,用於可視化結果並更好地理解測試失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/48\/27\/482f2e9b1a0918357600129b71154627.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2:測試分析器服務和相關係統的架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了檢測不穩定測試,我們使用了測試分析器捕獲的以下數據:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"測試用例元數據:"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"測試(Test)名稱"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"測試套件(TestSuite)名稱"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"標識項目中構建規則的目標("},{"type":"link","attrs":{"href":"https:\/\/buck.build\/concept\/build_target.html","title":"","type":null},"content":[{"type":"text","text":"Target"}]},{"type":"text","text":")名稱。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"測試結果"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"運行測試的時間"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"連續成功運行次數"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"每次失敗測試運行的堆棧跟蹤(如果有)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"測試用例的當前狀態(穩定或不穩定)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用這些信息對主分支上的所有測試進行分類,連續100次成功運行的測試是穩定的,其餘爲不穩定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於此,一個不穩定測試禁用作業會定期禁止不穩定測試,防止它們影響CI相關的結果。換句話說,在爲新的代碼更改運行測試時,會忽略與不穩定測試相關的失敗。下面的圖3說明了這種情況:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/3c\/1d\/3c9e7f334ae086ab8d48b14ceae49e1d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3:通過TAS進行的不穩定測試分類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於在代碼更改合併時會忽略不穩定測試的結果,當開發人員將更改合併到單體存儲庫時就能儘量避免不穩定測試的影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,這會影響可靠性,因爲不穩定測試所測試的功能在測試被歸類爲不穩定的期間是未經測試的。這是我們爲保持開發引擎正常運行而做出的慎重權衡。當開發人員修復不穩定的測試並在自定義CI管道上連續成功運行100次後,這些測試纔會被重新歸類爲穩定,那時這一問題會得到一定程度的改善。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然區分不穩定測試和穩定測試是處理這個問題的必要步驟,但這並沒有完全解決問題,因爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"由於追蹤由此產生的錯誤,測試會被忽略,這會影響軟件可靠性並最終影響開發人員的生產力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"開發人員缺少對不穩定測試進行分類和修復的基礎設施支持,這導致很大一部分歸類爲不穩定的測試未能獲得修復,因爲開發人員沒有很好的方法來重現(並調試)測試失敗。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"減少不穩定測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們以分層的方式解決了減少不穩定測試的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初,我們手動對不穩定性背後的關鍵原因進行分類和優先度排序,並修復背後的基礎設施問題。這有助於減少不穩定測試的總數。但這種方法不可擴展,因爲這一過程無法輕鬆處理不穩定測試症狀和根本原因的長尾問題。此外,中心化的開發體驗團隊沒有資源來分類所有有問題的測試用例,也常常不瞭解每個測試打算驗證的團隊特定上下文(於是也無法獲得解決這些不穩定問題的正確方法)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,爲了讓任何開發人員都能對不穩定失敗進行分類,我們構建了動態復現工具,可在本地再現失敗。此外,爲了抑制單體存儲庫中的不穩定測試增長趨勢,我們構建了靜態檢查器,以避免將具有已知不穩定來源的新測試引入單體存儲庫。後文我們將詳細討論這些策略。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"各種不穩定類型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不穩定的測試可以獨立地表現出不穩定的行爲,也可能由於外部因素(例如運行時環境\/基礎設施或依賴的庫\/框架)而變得不穩定。爲了理解這一點,我們分析了堆棧跟蹤來分類失敗原因。從最初的數據中我們發現,大部分不穩定測試是由於外部因素造成的,例如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"高度並行的運行環境:在遷移到單體存儲庫之前,每個子倉庫都會按順序運行它們的測試。單體存儲庫的測試是並行運行的,這可能會導致CPU\/內存爭用,從而導致不穩定失敗。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"嵌入式數據庫\/服務器:許多測試使用嵌入式數據庫(例如cassandra、mariadb、redis),它們有自己的邏輯來啓動\/停止和清理它們的狀態。這些自定義實現通常有細微的錯誤;如果嵌入式服務器無法啓動,這些錯誤就會引發壞狀態。隨後,使用這個嵌入式數據庫的其餘測試將在並行運行環境中失敗。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"端口衝突:"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"嵌入式數據庫\/服務器通常具有硬編碼端口,讓在CI上並行運行的測試變得不可靠。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Spark默認啓動一個UI服務器,在測試期間通常不會禁用它。UI服務器使用固定端口,這會導致端口綁定失敗,從而導致涉及Spark的兩個測試同時運行時出現不穩定現象。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於大多數不穩定測試是源於外部因素的,我們開始以中心化的方式處理它們:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們遷移了使用嵌入式數據庫訪問容器化數據庫的測試,改爲利用"},{"type":"link","attrs":{"href":"https:\/\/www.testcontainers.org\/","title":"","type":null},"content":[{"type":"text","text":"testcontainers"}]},{"type":"text","text":"庫。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"這有助於解耦實現,同時穩定測試數據庫的啓動和停止過程"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫現在在它們自己的容器上運行,從而解決了不可用端口問題,因爲每個容器都被分配了一個隨機可用端口"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"testcontainers庫用於MariaDB、Cassandra、Redis、Elasticsearch和Kafka"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"對於Spark測試,SparkUI在測試期間被"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/spark\/pull\/2363\/files","title":"","type":null},"content":[{"type":"text","text":"禁用"}]},{"type":"text","text":"了,因爲我們的測試都不需要顯示SparkUI,這消除了不穩定現象。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在修復這些導致不穩定現象的基礎設施因素的同時,我們還在着手構建復現工具來處理仍然一定會發生的不穩定現象。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"重現不穩定測試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發人員在處理不穩定測試時面臨的一個障礙,是他們無法調查不穩定測試的根本成因。這主要是由於他們無法可靠地重現這些失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,基於不穩定測試的分類和我們自己對其他不穩定測試的修復分析結果,我們構建了動態復現器工具來重現觀察到的不穩定測試失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們構建了一個系統,開發人員可以在其中輸入測試的詳細信息並觸發與之相關的自動分析。我們的分析將在各種場景下執行測試,以幫助重現潛在問題。具體來說,它將:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"只運行輸入測試"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"運行輸入測試類中的所有測試"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"運行測試目標中的所有測試"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"在端口衝突檢測模式下運行測試。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"重複步驟1–3,同時增加系統上的資源負載"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行測試的前三類是要處理測試方法、類或目標中的任何本地問題。例如,在少數情況下,由於特定測試之間的依賴性(就是說單元測試並不是真正獨立的,而是期望由同一測試類中的其他測試進行狀態設置),單獨運行某個測試方法可以幫助重現失敗。應用這個簡單的啓發式方法可以發現大量的不穩定測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據我們的分析,我們還注意到有許多不穩定測試歸因爲端口衝突。我們觀察到,想要檢測出訪問相同端口的測試組合,就要同時調用適當的測試組合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將這一策略應用在數十萬個測試的集合上實際上是不可行的。相反,我們設計了一個獨立執行每個測試的分析,該分析可以識別出與其他可能的測試出現端口衝突的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我們使用"},{"type":"link","attrs":{"href":"https:\/\/docs.oracle.com\/javase\/tutorial\/essential\/environment\/security.html","title":"","type":null},"content":[{"type":"text","text":"Java安全管理器"}]},{"type":"text","text":"來識別測試訪問的端口集。生成一個名爲Port Claimer的單獨進程來綁定和偵聽已識別的端口(同時在IPv4和IPv6上)。當Port Claimer偵聽時,測試會重新執行並識別任何新訪問的端口集,然後由Port Claimer獲取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重複幾次這一過程就能收集測試使用的潛在端口集。如果測試使用的是恆定端口,則某次測試重新執行將失敗,因爲Port Claimer偵聽了先前標識的端口。否則,測試可以訪問新端口。多次重複此過程後,我們可以找出測試使用的一組恆定端口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果某個測試執行失敗,那麼我們可以輸出一個簡單的復現器命令,該命令將讓Port Claimer連接一組已識別的端口,然後執行正在考察的不穩定測試。隨後,開發人員可以使用它在本地對問題進行分類並修復根本原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/57\/e2\/57831yy6524d08c86565a6583c0557e2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4:通過端口衝突檢測工具鏈確定對可用端口的測試靈敏度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的圖4描述了這個過程。獨立運行時,某個不穩定測試可能會成功。安全管理器用於偵聽測試訪問的端口,並將該信息作爲輸入提供給Port Claimer程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當測試與保留已識別端口的Port Claimer一起執行時,如果測試失敗,則會生成復現器命令。該命令可以聲明已識別的端口並在這些條件下運行測試,從而幫助開發人員在本地確定地重現問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們還在節點增加額外負載的條件下運行測試。我們會生成多個進程(類似於"},{"type":"link","attrs":{"href":"https:\/\/linux.die.net\/man\/1\/stress","title":"","type":null},"content":[{"type":"text","text":"stress"}]},{"type":"text","text":"命令)來實現這一點,並確保測試在這些高CPU負載條件下成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果測試具有內部編碼的時序依賴性(另一個常見的不穩定來源),則可以立即重現這種不穩定性。我們使用對應的數據輸出一個重現命令,開發人員可以使用該命令在所需的壓力負載下運行測試,來在本地分類問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"把不穩定測試的修復衆包出去"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述對不穩定測試的分類有助於解決與基礎設施相關的故障,和可以集中處理的其他類型的故障。爲了擴展修復不穩定測試的過程,我們向全優步的工程師發起了衆包修復,並在多個級別上執行這一行動:在針對所有向單體存儲庫提交代碼的開發人員的“修復周”活動中推動修復衆包,並對接不穩定測試比例最高的團隊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們針對部署的努力,以及重現不穩定失敗的基礎設施和工具支持,在短時間內顯著降低了不穩定測試的總體百分比。復現器基礎設施還讓開發人員可以定期輕鬆地對較新的不穩定測試進行分類和修復。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"靜態檢查"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在合併到單體存儲庫後消滅已有的不穩定測試只是任務的一部分。爲了提供長期穩定的CI,我們還希望能先降低引入新的不穩定測試的速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們希望這樣做不至於要在每次代碼更改的全套測試上運行多個動態復現器。全面的動態分析方法需要我們在每次代碼更改時運行許多測試,以尋找潛在的衝突測試用例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它還需要在不同的動態不穩定復現器下多次運行測試用例。由於這種開銷會在代碼審查時對開發人員的工作流程產生不可接受的影響,因此自然的解決方案是使用某種形式的輕量級靜態分析(也稱爲linting)來查找與新添加或修改的測試中的不穩定性相關的已知模式."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在優步,我們主要使用谷歌的"},{"type":"link","attrs":{"href":"https:\/\/errorprone.info\/","title":"","type":null},"content":[{"type":"text","text":"Error Prone"}]},{"type":"text","text":"框架對Java代碼進行構建時靜態分析(另見:"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/nullaway\/","title":"","type":null},"content":[{"type":"text","text":"NullAway"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/piranha\/","title":"","type":null},"content":[{"type":"text","text":"Piranha"}]},{"type":"text","text":")。我們減少測試不穩定性的努力是全方位的,作爲這種努力中的一部分,我們已經開始實現簡單的Error Prone檢查器,來檢測已知會在我們的CI測試環境中引入不穩定性的代碼模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當某個測試匹配任一模式時,將在編譯期間觸發一個錯誤——這發生在本地和CI上——提示開發人員修復(或抑制)問題。我們通過分析跟蹤來監控這些檢查的觸發率,並跟蹤單個檢查被抑制的速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後文中,我們將主要關注一個特定的靜態檢查示例:我們的ForbidTimedWaitInTests檢查器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,考慮以下使用Java的"},{"type":"link","attrs":{"href":"https:\/\/docs.oracle.com\/javase\/8\/docs\/api\/java\/util\/concurrent\/CountDownLatch.html","title":"","type":null},"content":[{"type":"text","text":"CountDownLatch"}]},{"type":"text","text":"的代碼:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":" final CountDownLatch latch = new CountDownLatch(1);\n Thread t = new Thread(new CountDownRunnable(latch));\n t.start();\n assertTrue(latch.await(100, TimeUnit.MILLISECONDS));\n …\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏,開發人員創建了一個倒計時爲1的latch對象。然後將該對象傳遞給某個後臺線程t,該線程可能會運行某個任務(此處抽象爲CountDownRunnable對象),任務調用latch.countDown()來宣告完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"啓動此線程後,測試代碼調用latch.await,超時爲100毫秒。如果任務在100毫秒內完成,則此方法將返回true並且JUnit斷言調用將成功,繼續測試用例的其餘部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,如果任務未能在100毫秒內就緒,則測試將因斷言失敗而失敗。當測試單獨運行時,100毫秒的超時很可能總是足夠完成操作,但在高CPU壓力下超時就太短了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正因如此,我們採取了稱得上固執(opinionated)的步驟,不鼓勵在測試代碼中使用latch.await(...)API調用的有界版本,並用無界await()調用替換它們。當然,無界await也有自己的問題,會導致潛在的進程掛起。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,由於我們僅在測試代碼2上強制執行此約定,因此我們可以依靠精心選擇的全局單元測試超時限制來檢測任何可能無限期運行的單元測試。我們認爲,這比嘗試以某種方式靜態估計單元測試中特定操作的“合適”超時值更可取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了Java的CountDownLatch,我們的檢查還處理其他由於依賴掛鐘時間而引入不穩定性的API。附帶說明一下,如果我們的檢查器判斷操作總是會超時,我們明確允許使用有界await的測試代碼,因爲這不是壓力下的不穩定來源。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"這些變化對開發人員有什麼影響?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發人員提交代碼更改,這些更改會通過CI,後者識別任何編譯或測試失敗。如果在CI上構建成功,開發人員使用稱爲"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/research\/keeping-master-green-at-scale\/","title":"","type":null},"content":[{"type":"text","text":"SubmitQueue"}]},{"type":"text","text":"(SQ)的自制內部工具合併他們的更改。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不穩定測試會導致CI​​和SQ作業失敗,而這些作業以前無法由開發人員操作,結果對開發速度以及他們部署和發佈新特性的能力產生負面影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述這些步驟和工具減少了開發人員運行CI\/SQ作業時遇到的失敗,還通過避免多次重新運行和減少CI運行時間來減少了CI資源的使用量。由於不穩定測試的數量顯著減少(大約85%),我們接下來能重新運行CI期間失敗的測試用例,以確定它們是否可能是不穩定的;如果是這樣,無論如何都要通過構建(而不必等待TAS移除不穩定測試)。這種方法消除了不穩定測試對CI和SQ的所有影響,從而大大提高了軟件可靠性和開發人員生產力。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來發展方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上述工作外,我們正在探索更多減少不穩定測試的機會,包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"構建更通用的系統來檢測不穩定性的根本原因,包括併發錯誤和測試用例之間的通行交互。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"構建工具,將可重現的不穩定測試失敗分配給具有相關所有權和域區域上下文的工程師。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"擴展我們的動態復現器和靜態檢查器,以處理其他不穩定來源(例如,我們正在改進靜態檢查,以防止硬編碼端口號出現在測試中,包括那些由庫默認值和配置文件帶來的端口號)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"提高動態復現器的運行效率,可能讓它們在代碼審查時運行。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"擴展我們的工具鏈以處理使用其他主要語言(如Go)的優步單體存儲庫中的不穩定單元測試。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"致謝"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們要感謝這個項目的其他貢獻者(按字母順序):我們要感謝來自阿姆斯特丹和美國的開發平臺團隊的幾位貢獻者,他們爲這個項目做出了貢獻,包括Maciej Baksza、Raj Barik、Zsombor Erdody-Nagy、Edgar Fernandes、Han Liu、Yibo Liu、Thales Machado、Naveen Narayanan、Tho Nguyen、Donald Pinckney、Simon Soriano、Viral Sangani、Anda Xu。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/handling-flaky-tests-java\/","title":"","type":null},"content":[{"type":"text","text":"https:\/\/eng.uber.com\/handling-flaky-tests-java\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章