模型
統計語言模型(n-gram和RNN)
- Code Completion with Statistical Language Models [ACM SIGPLAN Notices 2014]
RNN
- Toward Deep Learning Software Repositories [MSR 2015]
決策樹
- Probabilistic Model for Code with Decision Trees [OOPSLA 2016]
LSTM
- Neural Code Completion [ICLR 2017]
LSTM+attention+pointer
- Code Completion with Neural Attention and Pointer Networks [IJCAI 2018]
數據集效果
js150
論文 | Type | Value |
---|---|---|
PHOG: Probabilistic Model for Code | 81.5% | 74.1% |
Probabilistic Model for Code with Decision Trees | 83.9% | 82.9% |
NEURAL CODE COMPLETION | 84.8% | 76.6% |
Code Completion with Neural Attention and Pointer Networks | 88.6% | 81.0% |
注意:
NEURAL CODE COMPLETION 有很多組數據,這裏按Code Completion with Neural Attention and Pointer Networks作比較的數據算
其他預測
- Learning Programs from Noisy Data
py150
論文 | Type | Value |
---|---|---|
Probabilistic Model for Code with Decision Trees | 76.3% | 69.2% |
Code Completion with Neural Attention and Pointer Networks | 80.6% | 70.1% |
自己爬的GitHub(未公開數據集)
- LEARNING PYTHON CODE SUGGESTION WITH A SPARSE POINTER NETWORK
預測標識符(Python)
準確率取TopK | 性能 |
---|---|
TOP1 | 63.21% |
TOP5 | 82.62 |
- Toward Deep Learning Software Repositories
預測token (Java) (沒有用AST)
- Code Completion with Statistical Language Models
小結
目前這個時間只到2018年,Code Completion with Neural Attention and Pointer Networks 是SOTA。
從傳統機器學習方法到神經網絡模型,在py150和js150上的準確率慢慢得到提升。
傳統機器學習方法暫時看不到突破的希望。
RNN這一塊把能用的都用了。
不知道NMT中的Transform會怎樣?
不知道GNN用上去會怎樣?