快手聯合南方科技大學、UIUC提出全新深度及時缺陷預測模型 | ISSTA 2021論文解讀

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"項目背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代碼審查是軟件質量保證(software quality assurance)中至關重要的流程。通過代碼審查,可以及早發現項目中存在的問題,規避項目風險,並保障項目的順利進行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fa\/fad93a8f2e0ea03ebc0dcc04fbf462d1.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個具有普遍性的代碼審查流程通常有下圖所示的幾個步驟:上傳代碼、預警系統檢查、評審代碼檢查、系統集成測試等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/c6\/c6ae409ef6c6cf0cfa147a54364da27d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當開發者提交一次 commit 時,這個 commit 在被加入到主分支之前會經過多道自動或人工審查工序,若其中任何一道工序發現了問題,這次 commit 都會被退回開發者進行修改。上圖中第三步,通常需要引入有經驗的開發者對提交的代碼進行人工審查,不僅成本高,速度也受限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者們因此提出了"},{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"xxx","type":null},"content":[{"type":"text","text":"及時缺陷預測(JIT-DP)"}]},{"type":"text","text":",旨在進行人工審查之前自動預測提交代碼中存在缺陷的可能性,以合理分配和節省人力資源。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"現存的 JIT-DP 方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/d5\/d5a28e53251c11bbbef64758429c9fce.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖展示了進行及時缺陷預測的標準方法。其中,及時(Just-In-Time)表示進行缺陷預測的單位爲一次 commit。在該方法中,每個歷史 commit 會被標記爲有缺陷(Defect)或無缺陷(Clean)。然後我們就可以在大量歷史 commit 數據上訓練一個機器學習模型用以預測未來的 commit 是否有缺陷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/57\/57435ee99b270cae979ca972e7f409d5.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此前,研究人員已經提出了各式各樣的模型應用於 JIT-DP,其中最經典的是 Kamei et al 提出的 LR-JIT,他們從 commit 中提取了 14 種基礎特徵,例如 NF(修改文件數)、LA(增加代碼行數)和 EXP(開發者經驗)。在爲每個 commit 提取出這些特徵後,便可用 Logistic 迴歸分類器來對 commit 進行分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 LR-JIT 的基礎上,Yang et al 在特徵和分類器之間添加了一層深度置信網絡(DBN),用以提取 commit 特徵更高維度的向量表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/4b\/4b132d5ef820fa7e1b5dcf29b1f06bda.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同於前面兩種基於人工特徵提取的傳統方法,近年來提出的深度學習方法多使用神經網絡自動地從 commit 中提取信息,例如上圖中的 CC2Vec 和 DeepJIT。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 以 commit 的 Message 和 Code Changes 作爲輸入,然後用兩個卷積神經網絡從向量化後的 Commit Message 和 Code Changes 中提取特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 則保留了 Code Changes 的結構信息。首先 CC2Vec 把 Code Change 分爲 Added Code 和 Removed Code,再用一個層次注意力網絡(HAN)分別提取 Added Code 和 Removed Code 的向量特徵。然後用一個 Comparison Layers 對比 Added Code 和 Removed Code 的向量特徵並生成總的 Code Changes 向量特徵。最後,CC2Vec 和 DeepJIT 的特徵可以結合起來以獲得一個更強大的 JIT-DP 模型。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"研究問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 和 DeepJIT 在先前的研究中展示了超越 LR-JIT 和 DBN-JIT 的優異性能。但是由於先前的研究只採用了很小的數據集,並不能很好地展示 CC2Vec 和 DeepJIT 的泛用能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能對當前 JIT-DP 的進展進行詳盡的評估,本文重點研究以下 4 個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼 DeepJIT 和 CC2Vec 有用?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 和 CC2Vec 在更大的數據集下的表現如何?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的缺陷預測特性對 JIT-DP 的性能如何?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"一個簡單的方法能比 DeepJIT\/CC2Vec 更好嗎?"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實驗和分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ1: 爲什麼 DeepJIT 和 CC2Vec 有用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 RQ1,我們對 DeepJIT 和 CC2Vec 的三種輸入的有效性進行了研究,即 CC2VecCode DeepJITMsg 和 DeepJITCode 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2724970812df48eecdfb7b9618bfd214.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了研究 DeepJITMsg 的有效性,我們應用了 Grad-CAM 來可視化每個 Commit Message 單詞對預測結果的貢獻,如上圖例子中單詞的顏色越深則對結果的貢獻越大。接着,我們對數據集中出現過多次的單詞的貢獻度進行加權平均並排序,可以發現“task-number”,“fix”,“bug”等詞在所有 10000 個單詞中排名相對較高。基於以上結果,我們可以推測 DeepJIT 和 CC2Vec 可以從 Commit Message 中提取 commit 的意圖來協助進行缺陷預測。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/71\/71fdf3a17cf5e6af40e69c53c9345b2e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在研究 DeepJITCode 的有效性時,我們發現 DeepJIT 的實現並沒有如原論文所描述的採用完整的 Code Changes,而是採用了一種抽象的 Code Changes 形式。如上圖 Figure 3 中的例子,更改的源代碼被 7 行”added _ code removed _ code”代替。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了驗證這兩種 Code Changes 的有效性,我們把完整 Code Changes 訓練的模型加了 Paper 後綴,把抽象 Code Changes 訓練的模型加了 Github 後綴,並用這兩種模型進行了復現實驗。從表中的結果可以發現 DeepJITGithub 的 AUC 分數比 DeepJITPaper 高,對於 CC2Vec 也是如此。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fc\/fc53a8332400d4f7a6c2fe459cc7a8ba.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了驗證 CC2VecCode 的有效性。我們對 CC2Vec 的三種輸入特徵進行了消融實驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 4 的前三行表示只採用單種 feature 的結果,後三行表示只移除單個 feature 的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據表 4 的結果我們發現 DeepJITMsg 和 DeepJITCode 對 JIT-DP 結果的貢獻比 CC2VecCode 大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ2: DeepJIT 和 CC2Vec 在更大的數據集下的表現如何?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/83\/8345ee1acf4c3acd6278c3c093a7d2fe.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 RQ2,我們首先採用了 Kim et al. 和 McIntosh et al. 的數據處理流程(與 CC2Vec 和 DeepJIT 一致)收集了一個總共包含 310,370 個真實 commit 和 81,300 個缺陷的大型數據集。接着在該數據集上驗證了 4 種 JIT-DP 方法在 within-project(WP)和 cross-project(CP) 場景下的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/2e\/2e04de928d798375c921052955b2a718.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於表 5 的結果,我們可以觀察到,CC2Vec 無法在大多數項目中保持與 DeepJIT 相比的性能優勢,如 JDT,Platform 和 Gerrit。因此,我們認爲 CC2Vec 在擴展的數據集中不能顯著地優於 DeepJIT。另外,我們也發現即使 DeepJIT 和 CC2Vec 的平均結果要優於 LR-JIT 和 DBN-JIT,但在 OpenStack 中 LR-JIT 和 DBN-JIT 的 AUC 結果確要優於 DeepJIT 和 CC2Vec,因此 DeepJIT 和 CC2Vec 不能保證全面優於傳統方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a8\/a8e413390d42c2cadaee2bce21466791.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了探究軟件迭代過程中缺陷模式變動對 JIT-DP 的影響,我們進一步研究了 JIT-DP 模型在不同大小的訓練集下的性能差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖表示了 DeepJIT 和 LR-JIT 在不同大小的訓練集下 AUC 變化趨勢。總的來說,隨着訓練集的增大,兩種方法的預測準確度呈現出波動狀態。該實驗結果表明簡單的引入更多歷史數據來增大訓練集大小並不能保證 JIT-DP 模型性能的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ3: 傳統的缺陷預測特性對 JIT-DP 的性能如何?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RQ1 和 RQ2 的實驗發現展示了一下幾個事實"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJITCode 對預測結果的貢獻最大。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在 CC2Vec 中,更多的輸入特徵有時甚至會導致性能下降。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"基於深度學習的方法並不能保證在所有的實驗項目中都優於傳統方法。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些事實表明,JIT-DP 傳統方法中的不同特徵的有效性還沒有被充分研究。因此,我們在上圖展示了每個具有代表性的傳統特徵在 JIT-DP 中的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2717241e3e4b5aa1c0f5a4c4e93124b6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖中我們可以發現只用“LA”特徵的模型在 QT,OpenStack 和 Go 上優於採用所有特徵“ALL”的模型。並且在 cross-project 場景下,“LA”特徵模型的性能下降要小於採用其他特徵的模型。這意味着只採用“LA”特徵就可以構建一個準確高且穩定的 JIT-DP 模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ4: 一個簡單的方法能比 DeepJIT\/CC2Vec 更好嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/af\/affb550bdca02d632d22789f0fb9f5c4.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於先前的發現,我們提出了一個簡單的 JIT-DP 方法 LApredict,該方法簡單的採用了“LA”特徵和 Logistic 迴歸模型來進行 JIT-DP。並在擴展數據集上進行了驗證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 8 中的結果表明 LApredict 在基本上優於其它 4 總方法。表 9 中的結果表明由於極簡結構,LApredict 的訓練時間和測試時間幾乎可以忽略不計。因此,LApredict 可以在性能和效率上都由於深度學習的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影響和討論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本研究爲未來 JIT-DP 研究提供了以下重要且實用的指導方針:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"深度學習並不總是有幫助"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究顯示,儘管深度 JIT-DP 方法在某些情況下可以取得進展,但它們相當依賴於數據,在不同的數據集 \/ 場景下效果有限。此外,深度學習技術的速度可能比傳統分類器慢幾個數量級。我們強烈建議研究人員 \/ 開發人員對未來的深度 JIT-DP 方法進行全面評估。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"簡單的特徵也能有很好效果"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明,簡單的“LA”特徵加上簡單的分類器就能夠取得很優異的結果,這種簡單的方法應該作爲一個基準被所有未來的 JIT-DP 研究所考慮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Commit Message 對 JIT-DP 是有幫助的"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究結果表明,Commit Message 中的某些關鍵詞對於深度 JIT-DP 有相當大的幫助,因爲它們可以傳達特定代碼的意圖。這表明,對 JIT-DP 感興趣的開發者\/團隊應該保持嚴格的規則來起草提交消息,以助於 JIT-DP 的研究。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":4,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"訓練數據的選擇十分重要"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明,簡單地增加訓練數據並不能提高傳統或深度 JIT-DP 方法的預測精度。另一方面,在不同的基準 \/ 場景下,想要人工選擇信息量最大的訓練集以優化預測精度是相當有挑戰性的。因此,我們建議研究人員 \/ 開發人員考慮完全自動化的訓練數據的選擇方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":5,"normalizeStart":5},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"未來的研究中需要考慮 Cross-Project 驗證"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實驗結果顯示,現有的傳統 \/ 深度 JIT-DP 方法在轉爲 Cross-Project 驗證時,性能有所下降,這樣的結果激勵着未來的研究人員提出更多穩健的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"","type":null},"content":[{"type":"text","text":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章