阿里達摩院發佈萬億參數AI大模型M6,“神經元”達人類10倍,初具認知與創造能力

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6月25日,"},{"type":"link","attrs":{"href":"https:\/\/damo.alibaba.com","title":"xxx","type":null},"content":[{"type":"text","text":"阿里巴巴達摩院"}]},{"type":"text","text":"發佈“低碳版”巨模型M6,在全球範圍內首次大幅降低了萬億參數超大模型訓練能耗,更加符合業界對低碳、高效訓練AI大模型的迫切需求。通過一系列突破性的技術創新,達摩院團隊僅使用480卡GPU,即訓練出了規模達人類神經元10倍的萬億參數多模態大模型M6,與英偉達、谷歌等海外公司實現萬億參數規模相比,能耗降低超八成、效率提升近11倍。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、M6大模型是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"M6是"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/profile\/2621502EF5BED0\/publish","title":"xxx","type":null},"content":[{"type":"text","text":"阿里達摩院"}]},{"type":"text","text":"研發的超大規模"},{"type":"link","attrs":{"href":"https:\/\/s.geekbang.org\/search\/c=0\/k=多模態預訓練模型\/t=","title":"xxx","type":null},"content":[{"type":"text","text":"多模態預訓練模型"}]},{"type":"text","text":",英文全稱是MultiModality-to-MultiModality Multitask Mega-transformer,6個M,簡稱M6。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"顧名思義,M6大模型主打多模態、多任務能力,"},{"type":"text","marks":[{"type":"strong"}],"text":"其目標是打造全球領先的具有通用性的人工智能大模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今年3月,達摩院發佈了國內首個千億參數多模態大模型M6,引發海外關注。OpenAI前政策主管Jack Clark公開點評道:“"},{"type":"text","marks":[{"type":"strong"}],"text":"這個模型的規模和設計都非常驚人。這看起來像是衆多中國的AI研究組織逐漸發展壯大的一種表現"},{"type":"text","text":"。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今日,阿里M6宣佈升級至萬億參數,並在全球範圍內首次大幅降低了萬億參數超大模型訓練能耗,更加符合業界對低碳、高效訓練AI大模型的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過一系列突破性的技術創新,達摩院團隊僅使用480卡V100 32G GPU,即訓練出了規模達人類神經元10倍的萬億參數多模態大模型M6,與英偉達、谷歌等海外公司實現萬億參數規模相比,能耗降低超八成、效率提升約11倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"這一技術突破將極大降低萬億模型訓練門檻,讓大模型研究和工業化落地進入更加普惠的時代。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下爲M6發展歷程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年1月 —— M6百億參數模型達成,國內首個百億規模多模態大模型 "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年2月 —— M6千億參數模型達成,國內首個千億規模多模態大模型"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年5月 —— M6萬億參數模型達成,全球範圍內首次大幅降低了萬億參數超大模型訓練能耗,且成爲國內首個實現商業化落地的多模態大模型"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、M6萬億大模型有哪些亮點?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AI大模型更低碳高效"},{"type":"text","text":":提升了超大規模預訓練模型的資源利用率與訓練效率,沉澱大模型高效訓練的能力。對比Nvidia(3072 A100 GPU\/萬億)、Google(2048 TPU\/1.6萬億),"},{"type":"text","marks":[{"type":"strong"}],"text":"阿里此次僅使用480卡V100 32G GPU就實現了高效的萬億M6模型的訓練,能耗降低超過8成,且效率提升近11倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AI創造力再次升級:"},{"type":"text","text":"M6擁有超越傳統AI的認知和創造能力,擅長繪畫、寫作、問答,在電商、製造業、文學藝術等諸多領域擁有廣泛應用前景。"},{"type":"text","marks":[{"type":"strong"}],"text":"OpenAI DALL·E生成圖片清晰度達256×256,M6將圖片生成清晰度提升至1024×1024。"},{"type":"text","text":"更大的模型帶來了更強的創造力和可直接工業化應用的前景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AI大模型首次商用:M6成爲國內首個實現商業化落地的多模態大模型。"},{"type":"text","text":"經過一段時間的試用,M6將作爲AI助理設計師正式上崗阿里新制造平臺犀牛智造,通過結合潮流趨勢進行快速設計、試穿效果模擬,有望大幅縮短快時尚新款服飾設計週期。M6還已應用於支付寶、淘寶等平臺,參與跨模態搜索、文案撰寫、圖片設計等工作。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、M6萬億模型有哪些關鍵技術突破?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從開始研發大模型起,阿里M6團隊便格外重視GreenAI,即提升超大規模預訓練模型的資源利用率與訓練效率,沉澱大模型高效訓練的能力。這樣更多人可用較少的成本訓練或者應用大模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對大模型訓練資源消耗過高的難題,達摩院聯合阿里雲機器學習PAI平臺、EFLOPS計算集羣等團隊改進了MOE(Mixture-of-Experts)框架,"},{"type":"text","marks":[{"type":"strong"}],"text":"創造性地通過專家並行策略,大大擴增了單個模型的承載容量"},{"type":"text","text":"。同時,通過"},{"type":"text","marks":[{"type":"strong"}],"text":"加速線性代數"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"混合精度訓練"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"半精度通信"},{"type":"text","text":"等優化技術,達摩院團隊大幅提升了萬億模型訓練速度,且在效果接近無損的前提下有效降低了所需計算資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他們首先更細緻地探索了MoE在預訓練模型中的各類超參對模型收斂速度和精度的影響,包括top-k的k值、capacity對load balance的影響、load balance本身對效果的影響。基於這一系列的觀察,他們提出了一種Expert Prototyping的方法,使用分組MoE的形式,讓不同組的MoE通過組合能在參數規模不變的情況下,增大模型的表達空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他們觀察到在不同規模的模型上,分組MoE都能取得比baseline更好的效果。相比於單組switch routing的串行實現方式,分組MoE可以達到更好的加速效果,並且我們發現他在更大規模的模型上優勢會變得更大,如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f0\/0c\/f053c4af7850885bbffdf40ca8f3830c.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在機器方面,M6團隊最終採用的是在Hippo混布集羣搭建模型的方案,利用的是480個單機單卡的NVIDIA V100-32GB的機器,通信爲帶寬爲100Gb RoCEv2的RDMA網絡,在XDL上提交任務。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、M6已有哪些商業化應用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AI設計師與智能新制造"},{"type":"text","text":":經過試用期,M6將作爲AI助理設計師正式上崗阿里新制造平臺犀牛智造,通過結合潮流趨勢進行快速設計、試穿效果模擬,有望大幅縮短快時尚新款服飾設計週期。隨着實踐經驗的增長,M6設計的能力還將不斷進化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合阿里的電商背景,M6團隊希望通過M6大模型優異的文到圖生成能力,和電商領域產業鏈深度融合,挖掘潛在的應用價值。具體來說,他們已深入到從服飾設計&生成、線上展示&測款的完整鏈路,期望利用M6的高清圖像生成能力,縮短服飾企業的存貨週轉率,幫助商家對潮流趨勢有更好的掌控力和更快速的反應力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此前OpenAI DALL·E生成圖片清晰度達256×256,M6則將圖片生成清晰度提升至1024×1024。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d5\/86\/d59c3c1a8454aeb7e270bc794407c286.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 以下是M6生成高清服裝設計圖的示例,設計和圖案均爲AI創作:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/be\/f8\/beac78a73613yy5a118d49d85e107ef8.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fd\/07\/fdbb06aec4b04993f7bcd0314f56cd07.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/75\/18\/757ab59167cyyaf9c436fc023f4b7818.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"M6還可生成現實中不存在的衣服類型,風格可鹽可甜,可搞怪。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/24\/0d\/24b673yyd580bb14d79e4bdb256c8e0d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下爲M6參與新款服裝設計的流程圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/67\/94\/67b04957a27d7eca1071a40f9cb51794.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工業級文案生成:"},{"type":"text","text":"除文生圖外,M6也已具備可在工業界直接落地的圖生文能力,能夠快速爲商品等圖片提供描述文案。該能力目前已在淘寶、支付寶部分業務上試應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在參數規模不斷升級的過程中,達摩院團隊發現,M6的認知和表達能力也在不斷提升:它能夠觀察到圖片中更豐富的細節,並使用更精準的語言進行表達。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如,在對下述風衣圖片的描述中,更大參數規模的M6相比基礎版,注意到了“經典翻領設計”“腰間繫帶裝飾”“兩側大口袋點綴”等細節,生成文案信息量更大、措詞更精準。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/38\/0e\/38e4f2d5ae992e2008262fdee657530e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"跨模態搜索:"},{"type":"text","text":"M6對圖片、文本的精準理解及匹配能力,已在支付寶、手機淘寶中初步試應用,有望幫助提升用戶跨模態搜索的效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"M6團隊觀察到,淘寶上有很多長尾詞,主要因爲很多95後、00後用戶有非常特別的商品需求,這些需求帶來了很多長尾的搜索詞。比如,有用戶可能想要一個表面凹凸的咖啡杯,也就是日式風格凹凸咖啡杯,因爲商家一般不會把這樣的細節寫在商品名和描述中,單純基於文本的搜索很難搜出對應商品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多模態大模型爲精準的跨模態搜索帶來可能。目前M6已建立從文本到圖片的匹配能力,未來,或將建立從文字到視頻內容的認知能力,爲搜索形態帶來變革。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/63\/40\/632c843d104222e5309cc00dc27yya40.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":" 五、M6團隊接下來的規劃?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"達摩院資深算法專家楊紅霞表示,“接下來,M6團隊將繼續把低碳AI做到極致,推進應用進一步落地,並探索對通用大模型的理論研究。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"M6團隊主要關注方向如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"繼續將GreenAI做到極致,讓更多學者和企業能參與對下一代AI的研究、應用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"繼續推進大模型應用落地,讓下一代AI進入包括社會公益在內的更多領域。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化通用模型下游任務訓練,讓大模型在更多任務上擁有更好的表現。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"探索對通用大模型的理論研究,期望揭開“How it works”。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"探索大模型訓練的軟硬件結合,啓發下一代人工智能硬件設計。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今年以來,阿里在超大規模預訓練模型領域屢出成果。除發佈多模態巨模型M6外,阿里巴巴達摩院近期還發布了中文社區領先的語言大模型PLUG,實現了在AI大模型底層技術及應用上的深入佈局。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章