big code: Learning python code suggestion with a sparse pointer network [ICLR 2017]

Learning python code suggestion with a sparse pointer network [ICLR 2017]

原文:Learning python code suggestion with a sparse pointer network

作者:Avishkar Bhoopchand

單位:倫敦大學學院(University College London)

會議:ICLR 2017

模型

在這裏插入圖片描述

神經語言模型

對序列S=a1,,aNS = a_1, \ldots, a_NSS的聯合概率爲

Pθ(S)=Pθ(a1)t=2NPθ(atat1,,a1) \begin{array}{rcl} P_{\theta} (S) = P_{\theta} (a_1) \cdot \prod_{t = 2}^N P_{\theta} (a_t |a_{t - 1}, \ldots, a_1) \end{array}

給一個Python code token序列,預測接下來M個token

argmaxat+1,,at+MPθ(a1,,at,at+1,,at+M) \begin{array}{rcl} \underset{a_{t + 1}, \ldots, a_{t + M}}{\arg \max} P_{\theta} (a_1, \ldots, a_t, a_{t + 1}, \ldots, a_{t + M}) \end{array}

用LSTM估計概率

Pθ(at=τat1,,a1)=exp(vτTht+bτ)τexp(vτTht+bτ) \begin{array}{rcl} P_{\theta} (a_t = \tau |a_{t - 1}, \ldots, a_1) = \frac{\exp ({\boldsymbol{v}}_{\tau}^T {\boldsymbol{h}}_t + b_{\tau})}{\sum_{\tau'} \exp ({\boldsymbol{v}}_{\tau'}^T {\boldsymbol{h}}_t + b_{\tau'})} \end{array}

其中,vτ{\boldsymbol{v}}_{\tau}是關於token τ\tau的參數向量

注意力

Mt=[m1mK]Rk×KGt=tanh(WMMt+1KT(Whht))Rk×Kαt=softmax(wTGt)R1×Kct=MtαtTRk \begin{array}{rcl} \begin{array}{llll} {\boldsymbol{M}}_t & = [{\boldsymbol{m}}_1 \ldots {\boldsymbol{m}}_K] & & \in \mathbb{R}^{k \times K}\\ {\boldsymbol{G}}_t & = \tanh ({\boldsymbol{W}}^M {\boldsymbol{M}}_t + \mathbf{1}_K^T ({\boldsymbol{W}}^h {\boldsymbol{h}}_t)) & & \in \mathbb{R}^{k \times K}\\ {\boldsymbol{\alpha}}_t & = \mathrm{softmax} ({\boldsymbol{w}}^T {\boldsymbol{G}}_t) & & \in \mathbb{R}^{1 \times K}\\ {\boldsymbol{c}}_t & ={\boldsymbol{M}}_t {\boldsymbol{\alpha}}_t^T & & \in \mathbb{R}^k \end{array} \end{array}

其中,Mt{\boldsymbol{M}}_t是長度爲KK的記憶。

nt=tanh(WA[htct])Rkyt=softmax(WVnt+bV)RV \begin{array}{rcl} \begin{array}{ll} {\boldsymbol{n}}_t = \tanh \left( {\boldsymbol{W}}^A \left[ \begin{array}{l} {\boldsymbol{h}}_t\\ {\boldsymbol{c}}_t \end{array} \right] \right) & \in \mathbb{R}^k\\ {\boldsymbol{y}}_t = \mathrm{softmax} ({\boldsymbol{W}}^V {\boldsymbol{n}}_t +{\boldsymbol{b}}^V) & \in \mathbb{R}^{|V|} \end{array} \end{array}

其中,yt{\boldsymbol{y}}_t是算出來的預測token的概率分佈。

Sparse Pointer Network

st[i]={αt[j]if mt[j]=iCotherwise it=softmax(st)RV \begin{array}{rcl} \begin{array}{ll} {\boldsymbol{s}}_t [i] & = \left\{ \begin{array}{ll} {\boldsymbol{\alpha}}_t [j] & \text{if } {\boldsymbol{m}}_t [j] = i\\ - C & \text{otherwise } \end{array} \right.\\ {\boldsymbol{i}}_t & = \mathrm{softmax} ({\boldsymbol{s}}_t) \in \mathbb{R}^{|V|} \end{array} \end{array}

獲得全局詞表的pseudo-sparse distribution

其中,C- C是一個很小的常數。 mt=[id1,,idK˙]NK{\boldsymbol{m}}_t = \left[ \mathrm{id}_1, \ldots, \mathrm{id}_K˙\right] \in \mathbb{N}^K是標識符的符號ID。

yt=softmax(WVht+bV)RV \begin{array}{rcl} {\boldsymbol{y}}_t = \mathrm{softmax} ({\boldsymbol{W}}^V {\boldsymbol{h}}_t +{\boldsymbol{b}}^V) \quad \in \mathbb{R}^{|V|} \end{array}

htλ=[htxtct]R3kλt=softmax(Wλhtλ+bλ)R2yt=[ytit]λtRV \begin{array}{rcl} \begin{array}{lll} {\boldsymbol{h}}_t^{\lambda} & = \left[ \begin{array}{l} {\boldsymbol{h}}_t\\ {\boldsymbol{x}}_t\\ {\boldsymbol{c}}_t \end{array} \right] & \in \mathbb{R}^{3 k}\\ {\boldsymbol{\lambda}}_t & = \mathrm{softmax} \left( {\boldsymbol{W}}^{\lambda} \mathbf{h}_t^{\lambda} +{\boldsymbol{b}}^{\lambda} \right) & \in \mathbb{R}^2\\ {\boldsymbol{y}}_t^{\ast} & = [{\boldsymbol{y}}_t {\boldsymbol{i}}_t] {\boldsymbol{\lambda}}_t & \in \mathbb{R}^{| V |} \end{array} \end{array}

其中,xt{\boldsymbol{x}}_t是輸入的token,ct{\boldsymbol{c}}_t是注意力算出來的,
WλR2×3k{\boldsymbol{W}}^{\lambda} \in \mathbb{R}^{2 \times 3 k}

訓練

數據處理

在這裏插入圖片描述
把標識符變成標識符類別+數字,數字用$NUM$,還有就是$OOV$

效果

Model Train PP Dev PP Test PP Acc [%]Acc @5[%]All IDs Other All IDs Other 3-gram 12.9024.1926.9013.1950.814-gram 7.6021.0723.8513.6851.265-gram 4.5219.3321.2213.9051.496-gram 3.3718.7320.1714.5151.76LSTM 9.2913.0814.0157.912.162.876.304.582.6LSTM w/Attn 20 7.3011.0711.7461.3021.464.879.3229.983.7LSTM w/Attn 50 7.099.8310.0563.2130.265.381.6941.384.1本文 6.419.409.1862.9727.364.982.6243.684.5 \begin{array}{lccccrcrrr} \hline \text{Model } & \text{Train PP } & \text{Dev PP } & \text{Test PP } & \text{Acc } [\%] & & & \text{Acc } @5 [\%] & & \\ & & & & \text{All } & \text{IDs } & \text{Other } & \text{All } & \text{IDs } & \text{Other }\\ \hline 3 \text{-gram } & 12.90 & 24.19 & 26.90 & 13.19 & - & - & 50.81 & - & -\\ \text{4-gram } & 7.60 & 21.07 & 23.85 & 13.68 & - & - & 51.26 & - & -\\ \text{5-gram } & 4.52 & 19.33 & 21.22 & 13.90 & - & - & 51.49 & - & -\\ \text{6-gram } & 3.37 & 18.73 & 20.17 & 14.51 & - & - & 51.76 & - & -\\ \hline \text{LSTM } & 9.29 & 13.08 & 14.01 & 57.91 & 2.1 & 62.8 & 76.30 & 4.5 & 82.6\\ \text{LSTM w/Attn 20 } & 7.30 & 11.07 & 11.74 & 61.30 & 21.4 & 64.8 & 79.32 & 29.9 & 83.7\\ \text{LSTM w/Attn 50 } & 7.09 & 9.83 & 10.05 & \mathbf{6 3 . 2 1} & \mathbf{3 0 . 2} & \mathbf{6 5 . 3} & 81.69 & 41.3 & 84.1\\ \hline \text{本文 } & 6.41 & {\boldsymbol{\mathbf{9 . 4 0}}} & {\boldsymbol{\mathbf{9 . 1 8}}} & 62.97 & 27.3 & 64.9 & \mathbf{8 2 . 6 2} & \mathbf{4 3 . 6} & \mathbf{8 4 . 5}\\ \hline \end{array}

遇到的問題

在用他給的代碼爬數據時,出現

Traceback (most recent call last):
  File "github-scraper/scraper.py", line 143, in <module>
    main(sys.argv[1:])
  File "github-scraper/scraper.py", line 130, in main
    repos = create_repos(dbFile)
  File "github-scraper/scraper.py", line 59, in create_repos
    repos = pickle.load(infile)
AttributeError: Can't get attribute 'UTC' on <module 'github3.utils' from 
'/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/github3/utils.py'>

在跑GitHub上代碼時,有些包的位置不對,不知道怎麼回事

小結

這篇文章的模型很好理解lstm+attn+pointer,困難同樣在數據的處理上。沒有提供原始數據或經過處理後的pkl文件,比較麻煩。看代碼的文件,裏面有beam search。這樣一比,感覺做big code,數據處理會是一個比較困難的並且麻煩的事情。

st[i]={αt[j]if mt[j]=iCotherwise it=softmax(st)RV \begin{array}{rcl} \begin{array}{ll} {\boldsymbol{s}}_t [i] & = \left\{ \begin{array}{ll} {\boldsymbol{\alpha}}_t [j] & \text{if } {\boldsymbol{m}}_t [j] = i\\ - C & \text{otherwise } \end{array} \right.\\ {\boldsymbol{i}}_t & = \mathrm{softmax} ({\boldsymbol{s}}_t) \in \mathbb{R}^{|V|} \end{array} \end{array}
這裏mt[j]=i\boldsymbol{m}_t [j] = i 沒有太懂怎麼回事

參考

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章