快手联合南方科技大学、UIUC提出全新深度及时缺陷预测模型｜ ISSTA 2021论文解读

原創

2021-07-26 10:44

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"项目背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代码审查是软件质量保证（software quality assurance）中至关重要的流程。通过代码审查，可以及早发现项目中存在的问题，规避项目风险，并保障项目的顺利进行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fa\/fad93a8f2e0ea03ebc0dcc04fbf462d1.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一个具有普遍性的代码审查流程通常有下图所示的几个步骤：上传代码、预警系统检查、评审代码检查、系统集成测试等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/c6\/c6ae409ef6c6cf0cfa147a54364da27d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当开发者提交一次 commit 时，这个 commit 在被加入到主分支之前会经过多道自动或人工审查工序，若其中任何一道工序发现了问题，这次 commit 都会被退回开发者进行修改。上图中第三步，通常需要引入有经验的开发者对提交的代码进行人工审查，不仅成本高，速度也受限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者们因此提出了"},{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"xxx","type":null},"content":[{"type":"text","text":"及时缺陷预测（JIT-DP）"}]},{"type":"text","text":"，旨在进行人工审查之前自动预测提交代码中存在缺陷的可能性，以合理分配和节省人力资源。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"现存的 JIT-DP 方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/d5\/d5a28e53251c11bbbef64758429c9fce.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图展示了进行及时缺陷预测的标准方法。其中，及时（Just-In-Time）表示进行缺陷预测的单位为一次 commit。在该方法中，每个历史 commit 会被标记为有缺陷（Defect）或无缺陷（Clean）。然后我们就可以在大量历史 commit 数据上训练一个机器学习模型用以预测未来的 commit 是否有缺陷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/57\/57435ee99b270cae979ca972e7f409d5.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此前，研究人员已经提出了各式各样的模型应用于 JIT-DP，其中最经典的是 Kamei et al 提出的 LR-JIT，他们从 commit 中提取了 14 种基础特征，例如 NF（修改文件数）、LA（增加代码行数）和 EXP（开发者经验）。在为每个 commit 提取出这些特征后，便可用 Logistic 回归分类器来对 commit 进行分类。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 LR-JIT 的基础上，Yang et al 在特征和分类器之间添加了一层深度置信网络（DBN），用以提取 commit 特征更高维度的向量表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/4b\/4b132d5ef820fa7e1b5dcf29b1f06bda.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同于前面两种基于人工特征提取的传统方法，近年来提出的深度学习方法多使用神经网络自动地从 commit 中提取信息，例如上图中的 CC2Vec 和 DeepJIT。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 以 commit 的 Message 和 Code Changes 作为输入，然后用两个卷积神经网络从向量化后的 Commit Message 和 Code Changes 中提取特征。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 则保留了 Code Changes 的结构信息。首先 CC2Vec 把 Code Change 分为 Added Code 和 Removed Code，再用一个层次注意力网络（HAN）分别提取 Added Code 和 Removed Code 的向量特征。然后用一个 Comparison Layers 对比 Added Code 和 Removed Code 的向量特征并生成总的 Code Changes 向量特征。最后，CC2Vec 和 DeepJIT 的特征可以结合起来以获得一个更强大的 JIT-DP 模型。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"研究问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CC2Vec 和 DeepJIT 在先前的研究中展示了超越 LR-JIT 和 DBN-JIT 的优异性能。但是由于先前的研究只采用了很小的数据集，并不能很好地展示 CC2Vec 和 DeepJIT 的泛用能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了能对当前 JIT-DP 的进展进行详尽的评估，本文重点研究以下 4 个问题："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"为什么 DeepJIT 和 CC2Vec 有用？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJIT 和 CC2Vec 在更大的数据集下的表现如何？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"传统的缺陷预测特性对 JIT-DP 的性能如何？"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"一个简单的方法能比 DeepJIT\/CC2Vec 更好吗？"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"实验和分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ1: 为什么 DeepJIT 和 CC2Vec 有用？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于 RQ1，我们对 DeepJIT 和 CC2Vec 的三种输入的有效性进行了研究，即 CC2VecCode DeepJITMsg 和 DeepJITCode 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2724970812df48eecdfb7b9618bfd214.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了研究 DeepJITMsg 的有效性，我们应用了 Grad-CAM 来可视化每个 Commit Message 单词对预测结果的贡献，如上图例子中单词的颜色越深则对结果的贡献越大。接着，我们对数据集中出现过多次的单词的贡献度进行加权平均并排序，可以发现“task-number”，“fix”，“bug”等词在所有 10000 个单词中排名相对较高。基于以上结果，我们可以推测 DeepJIT 和 CC2Vec 可以从 Commit Message 中提取 commit 的意图来协助进行缺陷预测。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/71\/71fdf3a17cf5e6af40e69c53c9345b2e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在研究 DeepJITCode 的有效性时，我们发现 DeepJIT 的实现并没有如原论文所描述的采用完整的 Code Changes，而是采用了一种抽象的 Code Changes 形式。如上图 Figure 3 中的例子，更改的源代码被 7 行”added _ code removed _ code”代替。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了验证这两种 Code Changes 的有效性，我们把完整 Code Changes 训练的模型加了 Paper 后缀，把抽象 Code Changes 训练的模型加了 Github 后缀，并用这两种模型进行了复现实验。从表中的结果可以发现 DeepJITGithub 的 AUC 分数比 DeepJITPaper 高，对于 CC2Vec 也是如此。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fc\/fc53a8332400d4f7a6c2fe459cc7a8ba.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了验证 CC2VecCode 的有效性。我们对 CC2Vec 的三种输入特征进行了消融实验。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 4 的前三行表示只采用单种 feature 的结果，后三行表示只移除单个 feature 的结果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据表 4 的结果我们发现 DeepJITMsg 和 DeepJITCode 对 JIT-DP 结果的贡献比 CC2VecCode 大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ2: DeepJIT 和 CC2Vec 在更大的数据集下的表现如何？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/83\/8345ee1acf4c3acd6278c3c093a7d2fe.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于 RQ2，我们首先采用了 Kim et al. 和 McIntosh et al. 的数据处理流程（与 CC2Vec 和 DeepJIT 一致）收集了一个总共包含 310,370 个真实 commit 和 81,300 个缺陷的大型数据集。接着在该数据集上验证了 4 种 JIT-DP 方法在 within-project（WP）和 cross-project（CP）场景下的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/2e\/2e04de928d798375c921052955b2a718.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于表 5 的结果，我们可以观察到，CC2Vec 无法在大多数项目中保持与 DeepJIT 相比的性能优势，如 JDT，Platform 和 Gerrit。因此，我们认为 CC2Vec 在扩展的数据集中不能显著地优于 DeepJIT。另外，我们也发现即使 DeepJIT 和 CC2Vec 的平均结果要优于 LR-JIT 和 DBN-JIT，但在 OpenStack 中 LR-JIT 和 DBN-JIT 的 AUC 结果确要优于 DeepJIT 和 CC2Vec，因此 DeepJIT 和 CC2Vec 不能保证全面优于传统方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a8\/a8e413390d42c2cadaee2bce21466791.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了探究软件迭代过程中缺陷模式变动对 JIT-DP 的影响，我们进一步研究了 JIT-DP 模型在不同大小的训练集下的性能差异。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图表示了 DeepJIT 和 LR-JIT 在不同大小的训练集下 AUC 变化趋势。总的来说，随着训练集的增大，两种方法的预测准确度呈现出波动状态。该实验结果表明简单的引入更多历史数据来增大训练集大小并不能保证 JIT-DP 模型性能的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ3: 传统的缺陷预测特性对 JIT-DP 的性能如何？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RQ1 和 RQ2 的实验发现展示了一下几个事实"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"DeepJITCode 对预测结果的贡献最大。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"在 CC2Vec 中，更多的输入特征有时甚至会导致性能下降。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"基于深度学习的方法并不能保证在所有的实验项目中都优于传统方法。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这些事实表明，JIT-DP 传统方法中的不同特征的有效性还没有被充分研究。因此，我们在上图展示了每个具有代表性的传统特征在 JIT-DP 中的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/2717241e3e4b5aa1c0f5a4c4e93124b6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从图中我们可以发现只用“LA”特征的模型在 QT，OpenStack 和 Go 上优于采用所有特征“ALL”的模型。并且在 cross-project 场景下，“LA”特征模型的性能下降要小于采用其他特征的模型。这意味着只采用“LA”特征就可以构建一个准确高且稳定的 JIT-DP 模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RQ4: 一个简单的方法能比 DeepJIT\/CC2Vec 更好吗？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/af\/affb550bdca02d632d22789f0fb9f5c4.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于先前的发现，我们提出了一个简单的 JIT-DP 方法 LApredict，该方法简单的采用了“LA”特征和 Logistic 回归模型来进行 JIT-DP。并在扩展数据集上进行了验证。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表 8 中的结果表明 LApredict 在基本上优于其它 4 总方法。表 9 中的结果表明由于极简结构，LApredict 的训练时间和测试时间几乎可以忽略不计。因此，LApredict 可以在性能和效率上都由于深度学习的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"影响和讨论"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本研究为未来 JIT-DP 研究提供了以下重要且实用的指导方针："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"深度学习并不总是有帮助"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究显示，尽管深度 JIT-DP 方法在某些情况下可以取得进展，但它们相当依赖于数据，在不同的数据集 \/ 场景下效果有限。此外，深度学习技术的速度可能比传统分类器慢几个数量级。我们强烈建议研究人员 \/ 开发人员对未来的深度 JIT-DP 方法进行全面评估。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"简单的特征也能有很好效果"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明，简单的“LA”特征加上简单的分类器就能够取得很优异的结果，这种简单的方法应该作为一个基准被所有未来的 JIT-DP 研究所考虑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Commit Message 对 JIT-DP 是有帮助的"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究结果表明，Commit Message 中的某些关键词对于深度 JIT-DP 有相当大的帮助，因为它们可以传达特定代码的意图。这表明，对 JIT-DP 感兴趣的开发者\/团队应该保持严格的规则来起草提交消息，以助于 JIT-DP 的研究。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":4,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"训练数据的选择十分重要"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究表明，简单地增加训练数据并不能提高传统或深度 JIT-DP 方法的预测精度。另一方面，在不同的基准 \/ 场景下，想要人工选择信息量最大的训练集以优化预测精度是相当有挑战性的。因此，我们建议研究人员 \/ 开发人员考虑完全自动化的训练数据的选择方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":5,"normalizeStart":5},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"未来的研究中需要考虑 Cross-Project 验证"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实验结果显示，现有的传统 \/ 深度 JIT-DP 方法在转为 Cross-Project 验证时，性能有所下降，这样的结果激励着未来的研究人员提出更多稳健的 JIT-DP 方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"论文链接："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf","title":"","type":null},"content":[{"type":"text","text":"http:\/\/lingming.cs.illinois.edu\/publications\/issta2021a.pdf"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

Apache DolphinScheduler支持Flink吗？

隨着大數據技術的快速發展，很多企業開始將Flink引入到生產環境中，以滿足日益複雜的數據處理需求。而作爲一款企業級的數據調度平臺，Apache DolphinScheduler也跟上了時代步伐，推出了對Flink任務類型的支持。 Flink

2024-04-30 11:49:27

跨平台美学！使用DevExpress Reports & Office File API时如何管理字体？

DevExpress Office File API是一個專爲C#, VB.NET 和 ASP.NET等開發人員提供的非可視化.NET庫。有了這個庫，不用安裝Microsoft Office，就可以完全自動處理Excel、Word等文檔。開

2024-05-06 23:35:34

MySQL 社区经理：MySQL 8.4 InnoDB 参数默认值为什么要这么改？

MySQL 8.4 LTS 版本，我們一共修改了 20 個 InnoDB 變量的默認值。作者：Frederic Descamps，EMEA 和亞太地區的 MySQL 社區經理。於 2016 年 5 月加入 MySQL 社區團隊。擔任開源

2024-05-06 23:20:21

Redis开源社区持续壮大，华为云为Valkey项目注入新的活力

摘要：作爲Valkey社區的Technical Steering Committee member，華爲雲將持續參與社區建設。一、背景今年3月21日，Redis Labs宣佈從Redis 7.4版本開始，將原先比較寬鬆的BSD

2024-05-06 22:32:57

通义灵码实战系列：一个新项目如何快速启动，如何维护遗留系统代码库？

作者：別象進入 2024 年，AI 熱度持續上升，翻閱科技區的文章，AI 可謂是軍書十二卷，卷卷有爺名。而麥肯錫最近的研究報告顯示，軟件工程是 AI 影響最大的領域之一，AI 已經成爲了軟件工程的必選項，也有研究稱開發者每天的事務性工作可

2024-04-30 21:12:20

30 秒出服装设计稿，森马用函数计算+AIGC 整“新活”!

創新項目如何去賦能我們的業務，這件事情在森馬很重要。阿里雲函數計算幫我們屏蔽掉了想把AI落地到實際業務場景中 GPU 算力資源儲備、採購成本、技術門檻等很多難題，從而迅速做出決策，快人一步站在正確的起點，體驗新技術對整個服裝爆款設計、營銷

2024-04-30 21:12:14

当「软件研发」遇上 AI 大模型

作者：陳鑫（神秀）大家好，我是通義靈碼的產品技術負責人陳鑫。過去有八年時間，我都是在阿里集團做研發效能，即研發工具相關的工作。我們從 2015 年開始做一站式 DevOps 平臺，然後打造了雲效，也就是將 DevOps 平臺實現雲化。到

2024-04-30 21:12:13

2024年DataOps趋势预测：AI不会取代数据工程师

APM digest收集了多位行業專家對DataOps在2024的發展形勢及對IT和業務的影響的預測，這些技術最高管理者，包括Confluent技術戰略負責人Andrew Sellers的深刻洞見可能與你的感覺一致嗎？快來探討一下。數據可

2024-04-30 11:49:29

云原生周刊：K8s 中的服务和网络｜ 2024.4.29

開源項目推薦 k8s-image-swapper k8s-image-swapper 是 Kubernetes 的一個變更 Webhook，它將鏡像下載到自己的鏡像倉庫，並將鏡像指向該新位置。它是 docker pull-through p

2024-04-30 10:48:10

全面提升 RAG 质量！Zilliz 携手智源集成 Sparse Embedding、Reranke

Zilliz 持續爲 AI 應用開發者賦能！近期，Zilliz 與智源研究院達成合作，將多種 BGE（BAAI General Embedding）開源模型與開源向量數據庫 Milvus 集成。得益於 Milvus 2.4 最新推出的

2024-04-29 21:20:24

如何通过前端表格控件在10分钟内完成一张分组报表？

前言：當今時代，報表作爲信息化系統的重要組成部分，在日常的使用中發揮着關鍵作用。藉助報表工具使得數據錄入、分析和傳遞的過程被數字化和智能化，大大提高了數據的準確性及利用的高效性。而在此過程中，信息化系統能夠實現對數據的實時監控和更新，爲管

2024-05-06 10:22:56

巧用 TiCDC Syncpiont 构建银行实时交易和准实时计算一体化架构

本文闡述了某商業銀行如何利用 TiCDC Syncpoint 功能，在 TiDB 平臺上構建一個既能處理實時交易又能進行準實時計算的一體化架構，用以優化其零售資格業務系統的實踐。通過遷移到 TiDB 並巧妙應用 Syncpoint，該銀行成

2024-04-30 22:24:58

从原始边列表到邻接矩阵Python实现图数据处理的完整指南

本文分享自華爲雲社區《從原始邊列表到鄰接矩陣Python實現圖數據處理的完整指南》，作者：檸檬味擁抱。在圖論和網絡分析中，圖是一種非常重要的數據結構，它由節點（或頂點）和連接這些節點的邊組成。在Python中，我們可以使用鄰接矩陣來表示

2024-04-30 10:34:05

如何通过前后端交互的方式制作Excel报表

前言 Excel擁有在辦公領域最廣泛的受衆羣體，以其強大的數據處理和可視化功能，成了無可替代的工具。它不僅可以呈現數據清晰明瞭，還能進行數據分析、圖表製作和數據透視等操作，爲用戶提供了全面的數據展示和分析能力。今天小編就爲大家介紹一下，如

2024-04-30 10:24:12

Python爬虫技术与数据可视化：Numpy、pandas、Matplotlib的黄金组合

前言在當今信息爆炸的時代，數據已成爲企業決策和發展的關鍵。而互聯網作爲信息的主要來源，網頁中蘊含着大量的數據等待被挖掘。Python爬蟲技術和數據可視化工具的結合，爲我們提供了一個強大的工具箱，可以幫助我們從網絡中抓取數據，並將其可視

2024-04-29 23:26:28

24小時熱門文章

最新文章

最新評論文章