面向認知,智源研究院聯合多家單位發佈超大規模新型預訓練模型“悟道·文匯”

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年1月11日,北京智源人工智能研究院(以下簡稱“智源研究院”)發佈面向認知的超大規模新型預訓練模型“文匯”,旨在探索解決當前大規模自監督預訓練模型不具有認知能力的問題。這一項目由智源研究院發起的“悟道”攻關團隊完成,團隊由智源研究院、阿里巴巴、清華大學、中國人民大學、中國科學院、搜狗、智譜.AI、循環智能等單位的科研骨幹組成。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“文匯”模型不僅使用數據驅動的方法來建構預訓練模型,還將用戶行爲、常識知識以及認知聯繫起來,主動“學習”與創造。本次發佈的“文匯”模型與1月初OpenAI剛剛發佈的DALL·E和CLIP這兩個連接文本與圖像的大規模預訓練模型類似,“文匯”模型能夠學習不同模態(文本和視覺領域爲主)之間的概念,可以實現“用圖生文”等任務,具有一定的認知能力。“文匯”模型參數規模達113億,僅次於DALL·E模型的120億參數量,是目前我國規模最大的預訓練模型,並已實現與國際領先預訓練技術的並跑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自從2020年5月,OpenAI發佈迄今爲止全球規模最大的預訓練模型GPT-3以來,超大規模預訓練模型就成爲人工智能領域研究的熱點。OpenAI、谷歌、Facebook等國際IT公司都在持續推動大規模預訓練模型的進一步發展。可以預測到的是,未來的GPT-4參數又會增大至少10倍,而且處理的數據將會更加多模態(文字、圖像、視覺、聲音)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然GPT-3在多項任務中表現出色,但它最大的問題是沒有常識,不具有認知能力。例如,向GPT-3提問第一個問題“長頸鹿有幾個眼睛?”GPT-3回答是兩個眼睛,再提問第二個問題“我的腳有幾個眼睛?”GPT-3回答的結果也是兩個眼睛,這就不符合人類常識。智源研究院學術副院長、清華大學計算機系唐傑教授認爲,GPT-3等超大型預訓練模型在處理複雜的認知推理任務上,例如開放對話、基於知識的問答、可控文本生成等,結果仍然與人類智能有較大差距。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲推動研發我國自主的大規模預訓練模型,解決目前國際主流模型存在的問題,2020年10月,智源研究院啓動了新型超大規模預訓練模型研發項目“悟道”。此次發佈的是“文匯”(面向認知的超大規模新型預訓練模型)的一期研發成果,用於自動生成圖片、文字以及視頻,可具有初級認知能力。智源研究院院長、北京大學信息技術學院黃鐵軍教授指出,“文匯”模型針對性地設計了多任務預訓練的方法,可以同時學習文→文、圖→文以及圖文→文等多項任務,實現對多個不同模態的概念理解。經過預訓練的“文匯”模型不需要進行微調就可以完成“用圖生文”等任務,對模型進行微調則可以靈活地接入如視覺問答、視覺推理等任務。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"面向認知的大規模預訓練模型“文匯”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“文匯”是面向認知的大規模預訓練模型,項目研究組提出了針對多模態認知生成的大規模預訓練的架構M6:MultiModality-to-MultiModality Multi-task Mega-Transformer。模型整體架構基於Transformer,其中圖像進行切塊並對塊採用ResNet-50提取特徵。這些特徵以及對應的position embedding讓圖像和文本能組合在一起送入模型。團隊針對性地設計了多任務預訓練的方法,通過靈活的mask技巧實現多任務學習。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d0\/d00c0a391faf1be0062dc20b56110319.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“文匯”模型能夠完成多種圖文生成任務,比如輸入下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b4\/b415c28d6e9023297fb1940167ff5cc3.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在阿里商品場景下微調的模型將給出描述:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"旗袍是一種古典的風格,它不僅能夠彰顯出女性優雅氣質,而且還具有很好的修飾身形。這款旗袍採用了經典的圓領設計,穿着舒適自在,同時又能夠展現出女性柔美的頸部線條,讓你更加的迷人。精緻的繡花工藝,使得整件衣服看起來更加的精緻,也更加的凸顯出女性的魅力。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型也可以同時接受文本的提示(Prompt)和圖像,例如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d5\/d5111dafa1ee47d7870203cc07c58a9c.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prompt: 走進平定縣宋家莊村,映入眼簾的是"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文匯(M6架構): 一座座古色古香的明清建築,這裏有着濃厚的歷史文化底蘊和獨特的民俗風情。走進村子,就像走進了一個童話故事裏的世外桃源。村子裏的房屋依山而建,錯落有致地分佈着各種各樣的古建築羣,古樸典雅的建築風格讓人耳目一新。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與GPT不同,研究人員使用Transformer-XL來替換了GPT中原來的Transformer模型,從而能夠生成超過Transformer的窗口長度(一般爲512)的文本。如下圖所示,GPT-XL架構能夠生成基於人設的文本,較好的保持了內容一致性。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2f\/2fa2db4d1dc69a93366d04df3256c168.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/80\/802c3f2fdc3385bc047af42a41a6700f.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統工程上,採用阿里統一多種並行策略的高性能、靈活、易用的分佈式框架Whale,使用模型並行+流水並行+數據並行訓練進行分佈式訓練,256卡訓練速度是8GPU的29.4倍,加速比接近線性。基於中文百科、知乎、問答三類數據則由搜狗提供。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"“文匯”模型應用即將上線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,“文匯”已支持基於認知推理的多種自然語言及跨模態應用任務,部分應用即將與搜狗、阿里巴巴、學堂在線、智譜.AI、循環智能等機構合作上線。目前已有四個樣例應用可用於展示模型效果。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"(一)基於上傳圖片的開放域問答"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本應用基於圖片文本的多模態認知預訓練百億模型,可以支持用戶上傳圖片後,針對圖片內容進行提問或生成圖片的一句話描述。如上傳圖片後詢問“圖片中的電腦在水杯的什麼位置?”或“生成對應商品圖片的一句話描述”。將於未來大規模應用於阿里的電商場景。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/88\/88fe2de39d8c5b4cdcf40fc091dd842a.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"(二)Talk to Data,用語言操作數據可視化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本應用基於數據可視化技術,通過將自然語言轉化爲可視化查詢語句,從而達到“上傳圖表,輸入指令,輸出可視化圖像”的功能目標。只需要一句自然語言的話,就可以實現數據的可視化自動統計與查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a8\/a8bc03e25e59cba5b1e982efa77fa4c2.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"(三)基於預訓練語言模型的詩詞創作應用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本應用可以基於輸入的詩歌題目、朝代、作者,生成仿古詩詞。與傳統基於規則或監督學習的詩歌生成不同,這個應用創作的詩歌來自於自然語言的生成,且無標註數據進行fine-tune,並且可以模仿任意詩人創作任意新穎題目的詩歌。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是基於模型自動作詩的結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"贈抗疫英雄"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"君王歌裏贈英雄,百萬雄師劈毒煙。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"況是人民勤努力,更加衆志與國捐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"團結團結再團結,萬衆一心戰瘟神。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"沔水橋邊香霧起,暗送瘟神過漢陽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"詠智利日全食"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"六出奇光動地來,西方海國見旌旗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"水心惟有終無底,火內曾無徹上灰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"地軸迂迴周八極,天球低陷不磨回。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"明朝滄海成桑田,吾欲吹噓弄大灰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"詠科舉"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#40A9FF","name":"blue"}}],"text":"輸入的創作背景:此詩爲作者高中狀元,春風得意所做"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"青雲有路狀元郎,一步登天入翰林。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"黃金榜上詞臣貴,笑看人間進士郎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"掖庭宰相垂簾喜,尚書省郎入試忙。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"榮登榜首魁星閃,不枉當年萬戶郎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"詠科舉"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#40A9FF","name":"blue"}}],"text":"輸入的創作背景:此詩爲作者屢試不中,名落孫山所做"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"金榜無名誓不歸,只求大第是何年。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"若教九品並孔目,學取天人第一流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"經天緯地才難用,萬卷詩書濟世遊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"未到翰林先一日,十年共話九仙樓。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(四)可控人設的開放域問答"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本應用支持用戶上傳問題,並生成具有人物角色風格的定製化文本回答。與傳統的開放式問答不同,“文匯”模型生成的答案具有人設的語言特色,問答內容趣味橫生。目前該應用將計劃在搜狗的問答場景中使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/50\/501ff210a3ddaed85fb838ce23be3b51.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9e\/9e078eaaf12899f7705cfc070cb914ef.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"“悟道”項目下一步研發計劃"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,“悟道”項目研究團隊正在視覺等更廣泛的範圍內,對大規模自監督預訓練方法開展探索研究,已經啓動了四類大規模預訓練模型研製,包括“文源”(以中文爲核心的超大規模預訓練語言模型)、“文匯”(面向認知的超大規模新型預訓練模型)、“文瀾”(超大規模多模態預訓練模型)和“文溯”(超大規模蛋白質序列預訓練模型)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020年11月14日,智源研究院已發佈了“文源”(以中文爲核心的超大規模預訓練語言模型)第一階段26億參數規模的中文語言模型。下一步,智源研究院將聯合優勢單位加快四類大規模預訓練模型的研發進度。特別是“文匯”模型,未來將着力在多語言、多模態條件下,提升完成開放對話、基於知識的問答、可控文本生成等複雜認知推理任務的能力,使其更加接近人類水平。計劃在今年6月實現“中文自然語言應用系統”“基於圖文增強和知識融入的圖文應用系統”“基於認知的複雜認知系統”等一批各具特色的超大規模預訓練模型,以期達到對國際領先AI技術的趕超,儘快實現我國在國際AI前沿技術研究的領跑。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章