快手聯合南方科技大學、UIUC提出全新深度及時缺陷預測模型｜ ISSTA 2021論文解讀

原創

2021-07-26 10:44

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"項目背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代碼審查是軟件質量保證（software quality assurance）中至關重要的流程。通過代碼審查，可以及早發現項目中存在的問題，規避項目風險，並保障項目的順利進行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fa\/fad93a8f2e0ea03ebc0dcc04fbf462d1.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個具有普遍性的代碼審查流程通常有下圖所示的幾個步驟：上傳代碼、預警系統檢查、評審代碼檢查、系統集成測試等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/c6\/c6ae409ef6c6cf0cfa147a54364da27d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當開發者提交一次 commit 時，這個 commit 在被加入到主分支之前會經過多道自動或人工審查工序，若其中任何一道工序發現了問題，這次 commit 都會被退回開發者進行修改。上圖中第三步，通常需要引入有經驗的開發者對提交的代碼進行人工審查，不僅成本高，速度也受限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者們因此提出了"},{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"xxx","type":null},"content":[{"type":"text","text":"及時缺陷預測（JIT-DP）"}]},{"type":"text","text":"，旨在進行人工審查之前自動預測提交代碼中存在缺陷的可能性，以合理分配和節省人力資源。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"現存的 JIT-DP 方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/d5\/d5a28e53251c11bbbef64758429c9fce.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖展示了進行及時缺陷預測的標準方法。其中，及時（Just-In-Time）表示進行缺陷預測的單位爲一次 commit。在該方法中，每個歷史 commit 會被標記爲有缺陷（Defect）或無缺陷（Clean）。然後我們就可以在大量歷史 commit 數據上訓練一個機器學習模型用以預測未來的 commit 是否有缺陷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/57\/57435ee99b270cae979ca972e7f409d5.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此前，研究人員已經提出了各式各樣的模型應用於 JIT-DP，其中最經典的是 Kamei et al 提出的 LR-JIT，他們從 commit 中提取了 14 種基礎特徵，例如 NF（修改文件數）、LA（增加代碼行數）和 EXP（開發者經驗）。在爲每個 commit 提取出這些特徵後，便可用 Logistic 迴歸分類器來對 commit 進行分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 LR-JIT 的基礎上，Yang et al 在特徵和分類器之間添加了一層深度置信網絡（DBN），用以提取 commit 特徵更高維度的向量表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/4b\/4b132d5ef820fa7e1b5dcf29b1f06bda.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同於前面兩種基於人工特徵提取的傳統方法，近年來提出的深度學習方法多使用神經網絡自動地從 commit 中提取信息，例如上圖中的 CC2Vec 和 DeepJIT。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 以 commit 的 Message 和 Code Changes 作爲輸入，然後用兩個卷積神經網絡從向量化後的 Commit Message 和 Code Changes 中提取特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 則保留了 Code Changes 的結構信息。首先 CC2Vec 把 Code Change 分爲 Added Code 和 Removed Code，再用一個層次注意力網絡（HAN）分別提取 Added Code 和 Removed Code 的向量特徵。然後用一個 Comparison Layers 對比 Added Code 和 Removed Code 的向量特徵並生成總的 Code Changes 向量特徵。最後，CC2Vec 和 DeepJIT 的特徵可以結合起來以獲得一個更強大的 JIT-DP 模型。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"研究問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 和 DeepJIT 在先前的研究中展示了超越 LR-JIT 和 DBN-JIT 的優異性能。但是由於先前的研究只採用了很小的數據集，並不能很好地展示 CC2Vec 和 DeepJIT 的泛用能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能對當前 JIT-DP 的進展進行詳盡的評估，本文重點研究以下 4 個問題："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼 DeepJIT 和 CC2Vec 有用？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 和 CC2Vec 在更大的數據集下的表現如何？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的缺陷預測特性對 JIT-DP 的性能如何？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"一個簡單的方法能比 DeepJIT\/CC2Vec 更好嗎？"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實驗和分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ1: 爲什麼 DeepJIT 和 CC2Vec 有用？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 RQ1，我們對 DeepJIT 和 CC2Vec 的三種輸入的有效性進行了研究，即 CC2VecCode DeepJITMsg 和 DeepJITCode 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2724970812df48eecdfb7b9618bfd214.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了研究 DeepJITMsg 的有效性，我們應用了 Grad-CAM 來可視化每個 Commit Message 單詞對預測結果的貢獻，如上圖例子中單詞的顏色越深則對結果的貢獻越大。接着，我們對數據集中出現過多次的單詞的貢獻度進行加權平均並排序，可以發現“task-number”，“fix”，“bug”等詞在所有 10000 個單詞中排名相對較高。基於以上結果，我們可以推測 DeepJIT 和 CC2Vec 可以從 Commit Message 中提取 commit 的意圖來協助進行缺陷預測。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/71\/71fdf3a17cf5e6af40e69c53c9345b2e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在研究 DeepJITCode 的有效性時，我們發現 DeepJIT 的實現並沒有如原論文所描述的採用完整的 Code Changes，而是採用了一種抽象的 Code Changes 形式。如上圖 Figure 3 中的例子，更改的源代碼被 7 行”added _ code removed _ code”代替。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了驗證這兩種 Code Changes 的有效性，我們把完整 Code Changes 訓練的模型加了 Paper 後綴，把抽象 Code Changes 訓練的模型加了 Github 後綴，並用這兩種模型進行了復現實驗。從表中的結果可以發現 DeepJITGithub 的 AUC 分數比 DeepJITPaper 高，對於 CC2Vec 也是如此。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fc\/fc53a8332400d4f7a6c2fe459cc7a8ba.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了驗證 CC2VecCode 的有效性。我們對 CC2Vec 的三種輸入特徵進行了消融實驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 4 的前三行表示只採用單種 feature 的結果，後三行表示只移除單個 feature 的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據表 4 的結果我們發現 DeepJITMsg 和 DeepJITCode 對 JIT-DP 結果的貢獻比 CC2VecCode 大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ2: DeepJIT 和 CC2Vec 在更大的數據集下的表現如何？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/83\/8345ee1acf4c3acd6278c3c093a7d2fe.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 RQ2，我們首先採用了 Kim et al. 和 McIntosh et al. 的數據處理流程（與 CC2Vec 和 DeepJIT 一致）收集了一個總共包含 310,370 個真實 commit 和 81,300 個缺陷的大型數據集。接着在該數據集上驗證了 4 種 JIT-DP 方法在 within-project（WP）和 cross-project（CP）場景下的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/2e\/2e04de928d798375c921052955b2a718.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於表 5 的結果，我們可以觀察到，CC2Vec 無法在大多數項目中保持與 DeepJIT 相比的性能優勢，如 JDT，Platform 和 Gerrit。因此，我們認爲 CC2Vec 在擴展的數據集中不能顯著地優於 DeepJIT。另外，我們也發現即使 DeepJIT 和 CC2Vec 的平均結果要優於 LR-JIT 和 DBN-JIT，但在 OpenStack 中 LR-JIT 和 DBN-JIT 的 AUC 結果確要優於 DeepJIT 和 CC2Vec，因此 DeepJIT 和 CC2Vec 不能保證全面優於傳統方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a8\/a8e413390d42c2cadaee2bce21466791.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了探究軟件迭代過程中缺陷模式變動對 JIT-DP 的影響，我們進一步研究了 JIT-DP 模型在不同大小的訓練集下的性能差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖表示了 DeepJIT 和 LR-JIT 在不同大小的訓練集下 AUC 變化趨勢。總的來說，隨着訓練集的增大，兩種方法的預測準確度呈現出波動狀態。該實驗結果表明簡單的引入更多歷史數據來增大訓練集大小並不能保證 JIT-DP 模型性能的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ3: 傳統的缺陷預測特性對 JIT-DP 的性能如何？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RQ1 和 RQ2 的實驗發現展示了一下幾個事實"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJITCode 對預測結果的貢獻最大。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在 CC2Vec 中，更多的輸入特徵有時甚至會導致性能下降。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"基於深度學習的方法並不能保證在所有的實驗項目中都優於傳統方法。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些事實表明，JIT-DP 傳統方法中的不同特徵的有效性還沒有被充分研究。因此，我們在上圖展示了每個具有代表性的傳統特徵在 JIT-DP 中的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2717241e3e4b5aa1c0f5a4c4e93124b6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖中我們可以發現只用“LA”特徵的模型在 QT，OpenStack 和 Go 上優於採用所有特徵“ALL”的模型。並且在 cross-project 場景下，“LA”特徵模型的性能下降要小於採用其他特徵的模型。這意味着只採用“LA”特徵就可以構建一個準確高且穩定的 JIT-DP 模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ4: 一個簡單的方法能比 DeepJIT\/CC2Vec 更好嗎？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/af\/affb550bdca02d632d22789f0fb9f5c4.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於先前的發現，我們提出了一個簡單的 JIT-DP 方法 LApredict，該方法簡單的採用了“LA”特徵和 Logistic 迴歸模型來進行 JIT-DP。並在擴展數據集上進行了驗證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 8 中的結果表明 LApredict 在基本上優於其它 4 總方法。表 9 中的結果表明由於極簡結構，LApredict 的訓練時間和測試時間幾乎可以忽略不計。因此，LApredict 可以在性能和效率上都由於深度學習的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影響和討論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本研究爲未來 JIT-DP 研究提供了以下重要且實用的指導方針："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"深度學習並不總是有幫助"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究顯示，儘管深度 JIT-DP 方法在某些情況下可以取得進展，但它們相當依賴於數據，在不同的數據集 \/ 場景下效果有限。此外，深度學習技術的速度可能比傳統分類器慢幾個數量級。我們強烈建議研究人員 \/ 開發人員對未來的深度 JIT-DP 方法進行全面評估。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"簡單的特徵也能有很好效果"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明，簡單的“LA”特徵加上簡單的分類器就能夠取得很優異的結果，這種簡單的方法應該作爲一個基準被所有未來的 JIT-DP 研究所考慮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Commit Message 對 JIT-DP 是有幫助的"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究結果表明，Commit Message 中的某些關鍵詞對於深度 JIT-DP 有相當大的幫助，因爲它們可以傳達特定代碼的意圖。這表明，對 JIT-DP 感興趣的開發者\/團隊應該保持嚴格的規則來起草提交消息，以助於 JIT-DP 的研究。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":4,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"訓練數據的選擇十分重要"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明，簡單地增加訓練數據並不能提高傳統或深度 JIT-DP 方法的預測精度。另一方面，在不同的基準 \/ 場景下，想要人工選擇信息量最大的訓練集以優化預測精度是相當有挑戰性的。因此，我們建議研究人員 \/ 開發人員考慮完全自動化的訓練數據的選擇方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":5,"normalizeStart":5},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"未來的研究中需要考慮 Cross-Project 驗證"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實驗結果顯示，現有的傳統 \/ 深度 JIT-DP 方法在轉爲 Cross-Project 驗證時，性能有所下降，這樣的結果激勵着未來的研究人員提出更多穩健的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文鏈接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"","type":null},"content":[{"type":"text","text":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

centos7下Docker 安裝

Docker 是一個開源的商業產品，有兩個版本：社區版（Community Edition，縮寫爲 CE）和企業版（Enterprise Edition，縮寫爲 EE）。企業版包含了一些收費服務，個人開發者一般用不到。下面的介紹都針對社區

2024-04-26 13:11:00

技術實踐｜大模型內容安全藍軍的道與術

1、引子大語言模型（LLM）在2023年大放異彩，在許多領域展現出強大的能力，包括角色扮演，文本創作，邏輯推理等。然而，隨着其應用範圍的擴大，生成內容的安全問題也日益凸顯。這包括但不限於生成虛假信息、有害內容、偏見或歧視性言論等。這些問題

2024-04-26 09:33:23

MySQL 核心模塊揭祕 | 15 期 | 事務模塊小結

✍ 專欄小結 1 月 3 日，我在社區發佈事務模塊的第一篇文章；4 月 17 日，發佈了最後一篇文章。歷時 3 個半月，用 14 篇文章對事務模塊做了比較全面的介紹。本文我們對事務模塊已經發布的 14 篇文章做個簡單回顧。 01 期《事

2024-04-24 23:20:56

一則 TCP 緩存超負荷導致的 MySQL 連接中斷的案例分析

除了 MySQL 本身之外，如何分析定位其他因素的可能性？作者：龔唐傑，愛可生 DBA 團隊成員，主要負責 MySQL 技術支持，擅長 MySQL、PG、國產數據庫。愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註

2024-04-24 23:20:53

離開工位老是忘記鎖屏？試着讓電腦自動完成這事吧！

1.場景說明公司要求離開工位要立刻鎖定電腦屏幕防止信息泄露，但無論是使用鎖屏快捷鍵還是設置觸發角，總感覺不得勁。想想汽車現在基本都是自動鎖車了，電腦它就不能自己鎖屏嗎？於是抽空蒐羅了一些自動化的解決方案，並按照Win和Mac進行分類。

2024-04-24 23:17:17

高可用 - 隔離原則

前言當討論高可用時，那麼必然有與之對應的低可用甚至不可用，但無論是哪種可用描述，其中都暗含了一個大衆共識，即不存在永久穩定運行的系統程序。事實上，幾十年前圖靈也論證過類似的問題，稱爲“停機問題”，具體的描述是：能否爲A計算機編程，使得

2024-04-24 23:17:13

對接HiveMetaStore，擁抱開源大數據

本文分享自華爲雲社區《對接HiveMetaStore，擁抱開源大數據》，作者：睡覺是大事。 1. 前言適用版本：9.1.0及以上在大數據融合分析時代，面對海量的數據以及各種複雜的查詢，性能是我們使用一款數據處理引擎最重要的考量

2024-04-24 22:33:08

DataGear 企業版 1.1.0 發佈，數據可視化分析平臺

DataGear 企業版 1.1.0 正式發佈，歡迎大家瞭解試用！ http://datagear.tech/pro/ 企業版 1.1.0 新增了MQTT、WebSocket實時數據集功能，新增了Redis、MongoDB數據集功能，具體更

2024-04-24 21:42:05

用DolphinScheduler輕鬆實現Flume數據採集任務自動化！

轉載自天地風雷水火山澤目的因爲我們的數倉數據源是Kafka，離線數倉需要用Flume採集Kafka中的數據到HDFS中。在實際項目中，我們不可能一直在Xshell中啓動Flume任務，一是因爲項目的Flume任務很多，二是一旦Xsh

2024-04-24 21:18:09

自學編程兩個月，現在我月入 4 萬元

這個外國小哥叫 Nico，他一開始是個編程小白，後來把自己關在房間裏花了兩個月時間學會了編程，如今正在開發一款名爲 Talknotes 的應用，可以將語音備忘錄轉化爲結構化的內容，月收入 5000 美元。 Nico 從高中畢業就開始創業，

2024-04-24 21:14:29

沙特2030年願景和對中國IT企業的市場機會分析

沙特2030年願景和對中國IT企業的市場機會分析前言：最近“開源老DJ，帶你去沙特”欄目第一期已經播出，收到了不錯的反響。見COPU官網的回顧。（https://mp.weixin.qq.com/s/3B0jNVhybxTF1xPiy

2024-04-23 22:24:54

2024 開源數據工程生態系統全景圖

點擊藍字關注我們作者 | ALIREZA SADEGHI翻譯 | Debra Chen 01 簡介

2024-04-23 21:30:36

京東廣告研發——效率爲王：廣告統一檢索平臺實踐

1、系統概述實踐證明，將互聯網流量變現的在線廣告是互聯網最成功的商業模式，而電商場景是在線廣告的核心場景。京東服務中國數億的用戶和大量的商家，商品池海量。平臺在兼顧用戶體驗、平臺、廣告主收益的前提推送商品具有挑戰性。京東廣告檢索平臺

2024-04-25 23:17:47

RocketMQ 之 IoT 消息解析：物聯網需要什麼樣的消息技術?

前言：從初代開源消息隊列崛起，到 PC 互聯網、移動互聯網爆發式發展，再到如今 IoT、雲計算、雲原生引領了新的技術趨勢，消息中間件的發展已經走過了 30 多個年頭。目前，消息中間件在國內許多行業的關鍵應用中扮演着至關重要的角色。隨着數

2024-04-24 23:40:04

“企業創新新引擎”數據庫專項賦能會，讓雲原生技術普惠千行百業！

本文分享自華爲雲社區《“企業創新新引擎”數據庫專項賦能會，讓雲原生技術普惠千行百業！》，作者： GaussDB 數據庫。 4月19日，由福州軟件園科技創新發展公司和華爲技術有限公司聯合主辦的HCDG城市行福州站——“企業創新新引擎”數據庫專

2024-04-24 10:32:53

24小時熱門文章

最新文章

最新評論文章