達摩院AliceMind上新!首箇中文表格預訓練模型發佈,已向業界開源

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"12月2日,InfoQ獲悉,達摩院深度語言模型體系AliceMind發佈中文社區首個表格預訓練模型SDCUP,該模型在全球權威表格數據集WikiSQL、SQuALL上取得了業界最優效果,且模型和訓練代碼均已對外開源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開源地址:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/alibaba\/AliceMind","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/alibaba\/AliceMind"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a7\/a7bb43f7b5828cd4732012bb03d7d834.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表格是應用普遍的結構化數據,也是智能對話系統和搜索引擎的重要答案來源。但傳統表格查詢需技術人員撰寫專業查詢語句,阻礙了表格查詢的大規模應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新興的表格問答技術,可將自然語言轉換爲查詢語句,使用戶能通過簡單問句直接與表格數據庫交互,具有廣泛應用前景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過,由於表格內容複雜多樣、涉及各行業專業知識,表格問答任務一直是自然語言處理領域的難題。此前,谷歌、微軟、亞馬遜等海外公司開展了相關探索,但在中文場景,該方向處於空白。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次,達摩院對話智能團隊提出了首箇中文表格預訓練模型SDCUP,其基於“模式依存”方法,通過模型直接預測自然語言與表格結構內容的關鍵詞映射,提升了表格問答的準確率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體而言,即參考語義依存分析方法對Schema Dependency任務建模,使用全連接網絡獲取每個節點作爲父親節點和作爲孩子節點的語義表示,然後使用雙仿射網絡預測每個邊存在的概率和該邊關係類型的概率。同時,團隊使用了模仿人類的“課程學習”方法減少數據噪聲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ff\/ffe48765f7a27362ae5b62ef33c2c8e2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "},{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"SDCUP生成SQL示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在耶魯大學發佈的業界最大規模的英文文本-表格數據集WikiSQL,以及微軟構建的英文文本-表格高難度預測任務SQuALL數據集上,SDCUP模型均取得業界最優效果。在達摩院構建的表格問答中文數據集TaBLUE上,SDCUP比同參數規模BERT模型效果提升約3個百分點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bc\/bc55b0aacd8b9f79fa60471f9436106e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":" SDCUP在WikiSQL數據集上取得業界最優效果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5d\/5daf37f542f72b7c18437ff5f7b59c31.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":" SDCUP在SQuALL數據集上取得業界最優效果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"達摩院資深算法專家李永彬介紹,SDCUP模型是達摩院表格對話技術系列研發的一部分,後續將持續對外開源。其相關技術先後在四大國際公開數據集WikiSQL、Spider、SParC、CoSQL上取得第一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據瞭解,該技術完成了產品化,已通過阿里雲智能客服爲政務、金融、零售等行業客戶提供表格問答和數據庫自然交互服務。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章