AI驅動的編碼工具“CodeT5”來了:一種可以實時理解和生成代碼的機器學習模型

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/957a3531cf8c636c227a65760a4aba7b.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖片來源:https:\/\/blog.einstein.ai\/codet5\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人工智能驅動的編碼工具,使用機器學習算法來根據輸入數據生成代碼,吸引了越來越多的關注。理論上,這些系統可以減少編寫代碼所花費的時間以及計算和運維成本,並且輸出最少的錯誤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,目前的編碼預訓練系統存在許多挑戰。這些方法要麼嚴重依賴一種類似 BERT 的編碼器模型,要麼嚴重依賴一種類似 GPT 的解碼器模型。無論哪種方式,它都不是生成和理解任務的最優選項。例如,當用於代碼摘要之類的任務時,CodeBERT 需要一個額外的解碼器。除上述問題外,大多數現有方法都採用傳統的 NLP 源代碼預訓練技術,將其視爲自然語言(NL)中的標記序列。這在很大程度上忽視了編程語言中存在的豐富的結構信息,這對於充分理解其語義至關重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Salesforce 團隊已經創建並開源了一種新的標識符感知統一預訓練編碼器 - 解碼器模型,稱爲 CodeT5。到目前爲止,他們已經在多個與代碼相關的下游任務中展示了最先進的成果,例如跨多個方向的理解和生成,包括 PL 到 NL、NL 到 PL,以及從一種編程語言到另一種編程語言。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CodeT5 構建在與谷歌的 T5(文本到文本傳輸轉換器,Text-to-Text Transfer Transformer)框架類似的架構上,但具有更好的代碼理解能力。它提出了自然語言處理任務的統一模型。它將文本重新格式化爲文本,其中輸入和輸出都是文本字符串。這允許任何任務應用到這個模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2f\/2f32758f8b02f121b1a9ab43f8410ab9.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖片來源:https:\/\/blog.einstein.ai\/codet5\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CodeT5 的研究團隊有 835 萬多個例子可以用來訓練人工智能,包括來自開源 GitHub 代碼庫的用戶寫的評論。在訓練期間,擁有 2.2 億個參數的 CodeT5 的最大、功能最強的版本,在 16 個 Nvidia A100 GPUs 和 40GB 內存的集羣上花費了 12 天時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CodeT5 在代碼智能基準 CodeXGLUE [3] 中的 14 個子任務上實現了最先進的(SOTA)性能,如下表所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/23\/23d11b5bc0121aec2d623043b1ac4e04.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖片來源:https:\/\/blog.einstein.ai\/codet5\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 CodeT5 的應用方面,Salesforce 團隊計劃使用它爲 Apex 開發者構建一個人工智能驅動的編碼助手。下面,你可以看到一個具有三種編碼智能功能並由 CodeT5 支持的編碼助手示例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"文本到代碼生成"},{"type":"text","text":":它可以根據自然語言描述生成代碼"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"代碼自動補全"},{"type":"text","text":":給定目標函數名,可以補全函數的完整代碼"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"代碼摘要"},{"type":"text","text":":它可以用自然語言描述生成一個函數的摘要"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管 Salesforce 的 CodeT5 具有所有的優點和功能,但研究人員承認,它的一個主要缺點是,它可以從用於訓練的數據集中的文本註釋對種族或性別等刻板印象進行編碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文鏈接:https:\/\/arxiv.org\/pdf\/2109.00859.pdf"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代碼鏈接:https:\/\/github.com\/salesforce\/CodeT5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源鏈接:https:\/\/blog.einstein.ai\/codet5\/"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Asif Razzaq 是 Marketpost,LLC 的編輯和聯合創始人。他是一名人工智能科技記者和數字健康業務戰略家,擁有豐富的醫療設備和生物技術行業經驗,在健康應用程序、人工智能和數據科學的開發方面擁有令人羨慕的投資組合。作爲一名精明的企業家,Asif 成功地將初創企業從初創階段發展爲盈利企業,從而使自己成爲一名傑出的初創企業專業管理人士。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.marktechpost.com\/2021\/09\/08\/salesforce-open-sources-codet5-a-machine-learning-model-that-understands-and-generates-code-in-real-time"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章