big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

原創

大黄老鼠

2020-05-22 10:16

原文：Deep Learning On Code with an Unbounded Vocabulary

作者：Milan Cvitkovic

單位：加州理工學院（Caltech, California Institute of Technology）、Amazon AI

會議：EasyChair 2018

模型

講源代碼轉成AST
在AST的基礎上加各種邊，如數據流，控制流
(本文重點)變量的結點和subtoken之間加邊
用GGNN訓練

效果

FILL-IN-THE-BLANK

$\begin{array}{cl|ccc} \hline & & \text{Fixed Vocab } & \text{CharCNN Only } & \text{Graph Vocab (ours) }\\ \text{Unseen files from seen repos } & \text{AST } & 0.58 & 0.60 & 0.89\\ \hline & \text{Augmented AST } & 0.80 & 0.90 & {\boldsymbol{\mathbf{0 . 9 7}}}\\ \text{Entirely unseen repos } & \text{AST } & 0.36 & 0.48 & 0.80\\ & \text{Augmented AST } & 0.59 & 0.84 & {\boldsymbol{\mathbf{0 . 9 2}}} \end{array}$

Variable Naming

$\begin{array}{cl|ccc} \hline & & \text{Fixed Vocab } & \text{CharCNN Only } & \text{Graph Vocab (ours) }\\ \text{Unseen files from seen repos } & \text{AST } & 0.23 (7.22) & 0.22 (8.67) & 0.49 (3.87)\\ \hline & \text{Augmented AST } & 0.19 (7.64) & 0.20 (7.46) & {\boldsymbol{\mathbf{0 . 5 3} (\mathbf{3 . 6 8})}}\\ \text{Entirely unseen repos } & \text{AST } & 0.05 (8.66) & 0.06 (8.82) & 0.38 (4.81)\\ & \text{Augmented AST } & 0.04 (8.34) & 0.06 (8.16) & {\boldsymbol{\mathbf{0 . 4 1 (4 . 2 8)}}} \end{array}$

小結

又一個加邊狂魔。

論文裏掛的GitHub，點開是沒有的，所以看不了代碼。

在subtoken和ast node之間加個邊也是很不錯的

這個論文要是數據集用的是微軟的那個java-small就好了，可以好好比一下，可惜是自己爬的。

參考

數據集

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

模型

效果

FILL-IN-THE-BLANK

Variable Naming

小結

參考

《日本蠟燭圖》讀書筆記 & 技術分析回測

一分鐘部署 Llama3 中文大模型，沒別的，就是快

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

ffmpeg 百度雲盤

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

換電腦後，Zotero的一些配置

cuda 10 環境下安裝 pytorch_geometric

圖神經網絡學習筆記：Graph Attention Network 淺析

TeXmacs開發：用tm2md將TeXmacs文檔轉換爲markdown文檔

圖神經網絡學習筆記：2018年-2020年 GNN論文簡讀（其他部分）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結