big code: Neural Code Completion [ICLR 2017]

原文:Neural Code Completion

作者:Chang Liu, Xin Wang

單位:加州大學伯克利分校(University of California, Berkeley)

會議:ICLR 2017 (???爲啥openreview網站上寫reject)

一個演示

在這裏插入圖片描述

模型公式

嵌入

Ei=ANi+BTi \begin{array}{rcl} E_i = AN_i + BT_i \end{array}

LSTM

(qfog)=(σσσtanh)PJ,2J(xihi1)ci=fci1+qghi=otanh(ci) \begin{array}{rcl} \begin{array}{ll} \left( \begin{array}{l} q\\ f\\ o\\ g \end{array} \right) & = \left( \begin{array}{c} \sigma\\ \sigma\\ \sigma\\ \tanh \end{array} \right) \mathbf{P}_{J, 2 J} \left( \begin{array}{c} \mathbf{x}_i\\ \mathbf{h}_{i - 1} \end{array} \right)\\ \mathbf{c}_i & = f \odot \mathbf{c}_{i - 1} + q \odot g\\ \mathbf{h}_i & = o \odot \tanh \left( \mathbf{c}_i \right) \end{array} \end{array}

softmax

N^k+1=softmax(WN×hk+bN) \begin{array}{rcl} \hat{N}_{k + 1} = \mathrm{softmax} (W_N \times h_k + b_N) \end{array}

模型示意圖

NT2N

using the sequence of Non-terminal and Terminal pairs TO predict the next
Non-terminal.

使用N和T預測N

在這裏插入圖片描述

N和T嵌入後進LSTM,再FC後softmax得分類

NTN2T

Non-terminal and Terminal pair sequence and the next Non-terminal TO predict
the next Terminal.

使用N和T,給要預測的N,再預測T

在這裏插入圖片描述

NT2N相比,FC後要加一個next N的線性變換,再softmax得分類

其他

其他的x2y的,算了。

訓練

找到的代碼:GitHub

這個代碼我沒看到怎麼做數據預處理的,然後是GitHub上的代碼是一個系統,所以模型不純粹,有些看不清到底幹了啥。

這個模型用pytorch很好搭,但是數據不能很好的做出來,都是白搭。

這裏的數據處理方式也沒看懂。

在這裏插入圖片描述
我現在的印象是,用EOF填充,然後分割爲LSTM的輸入大小。

貼一下原文:

We divide each program into segments consisting of s consecutive tokens. The
last segment of a program, which may not be full, is padded with hEOFi tokens.
We coalesce multiple epochs together. We organize all training data into b
buckets. In each epoch, we randomly shuffle all programs in the training data
to construct a queue. Whenever a bucket is empty, a program is popped from the
queue and all segments of the program are inserted into the empty bucket
sequentially. When the queue becomes empty, i.e., the current epoch finishes,
all programs are re-shuffled randomly to reconstruct the queue. Each mini-batch
is formed by b segments, i.e., one segment popped from each bucket. When the
training data has been shuffled for e = 8 times, i.e., e epochs are inserted
into the bucket, we stop adding whole programs, and start adding only the first
segment of each program: when a bucket is empty, a program is chosen randomly,
and its first segment is added to the bucket. We terminate the training process
when all buckets are empty at the same time. That is, all programs from the
first 8 epochs have been trained.

論文評審

提到的文章

  • Toward Deep Learning Software Repositories
  • Code Completion with Statistical Language Models
  • Structured generative models of natural source code
  • Probabilistic Model for Code with Decision Trees
  • Grammar as a foreign language.

提到的問題

  • 爲什麼把AST序列化爲(N, T)?
    別人也是這麼做的,方便比較
  • 如何保證補全的代碼符合語法?
    不能保證,但準確率高(96%),夠用

參考

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章