“霸榜CLUE” ,剛剛發佈的業界最大中文NLP預訓練模型有多強?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛剛,華爲雲在華爲開發者大會(Cloud)上發佈了全球最大的中文語言(NLP)及視覺(CV)預訓練模型——盤古系列大模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據悉,華爲雲盤古系列 AI 大模型計劃包括四大模型:NLP 大模型、CV 大模型、多模態大模型、科學計算大模型。整個大模型設計遵循三大原則:一是超大的神經網絡;二是網絡架構強壯,相比於定製化小模型,大模型綜合性能提升了 10% 以上;三是健壯(Robust)的網絡性能,全場景覆蓋率提升 10 倍以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/0e\/0e170a5e851538d72f9f663ab745f6a8.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼我們需要大模型?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年 5 月,OpenAI 發表了一篇關於 GPT-3 的論文,GPT-3 模型迭代之後,擁有 1750 億個參數。2019 年,GPT-2 就憑藉 30 億條參數獲得了“最強 NLP 模型”的稱號,1750 億條參數的 GPT-3 發佈之後,自然也就在工業界和學術界引發了廣泛的談論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們真的需要大模型嗎?大模型會給我們帶來哪些改變?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"過去十年,AI 算法對算力的需求增長了 40 萬倍,神經網絡從小模型到大模型已經成爲了必然的發展趨勢。同時,我們也看到人工智能與科學計算深度融合,已經在衆多領域都有所應用,大模型就是解決 AI 模型定製化和應用開發碎片化的一種方式,它可以吸收海量的知識,提高模型的泛化能力,減少對領域數據標註的依賴。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大模型出現之後,高度定製化的小模型可能會被“兼併”。在技術方面,大模型對於 AI 框架的深度優化和並行能力都有很高的要求,同時它也會牽引 AI 產業快速收斂,成爲 AI 產業底座,從而改變 AI 發展的規則和格局。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在業界普遍的 AI 開發方式還是作坊式的,針對不同場景的 AI 應用需要進行定製化開發,不僅要投入大量的專家和時間,而且 AI 模型的性能也很難做到極致。一旦場景變化,整個模型可能都需要重新開發。如果把工業化模式引入到 AI 開發過程,讓一個模型可以應用到多個場景中,那麼 AI 開發就會獲得突飛猛進的發展。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業界首個千億參數的中文大模型——盤古 NLP 大模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了加速 AI 工業化開發進程,華爲發佈了全棧全場景 AI 解決方案。2019 年 8 月,發佈了昇騰 910 芯片力和計算框架 MindSpore;2020 年 3 月,在 HDC.Cloud 發佈了視覺研究計劃,正式開源 MindSpore;2020 年 9 月,升級發佈了 AI 一站式開發平臺 ModelArts3.0。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就在剛剛,華爲雲又發佈了業界首個千億參數的中文大模型——盤古 NLP 大模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據瞭解,盤古 NLP 大模型,由華爲雲、循環智能和鵬城實驗室聯合開發,是全球最大的中文語言預訓練模型,在預訓練階段就學習了 40 TB 的中文文本數據,其中包括細分行業的小樣本數據,可以優化提升模型在具體場景中的應用性能。與其他大模型不同的是,盤古 NLP 大模型瞄準的是細分行業,主要解決商業環境中低成本大規模定製的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最新的中文語言理解評測基準(CLUE)中,盤古 NLP 大模型獲得了總排行榜、分類任務、閱讀理解三項榜單第一,其中,總排行榜得分 83.046。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/75\/75e5dd96a09d92ccd0d332a98a5f77ff.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"盤古 NLP 大模型獲得 CLUE 總排行榜第一"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/23\/230e653bcb5acc76678220fc6e217650.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"盤古 NLP 大模型在 CLUE 分類任務排名第一"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/8f\/8f5b63039e7dfa53b0d331f100610554.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"盤古 NLP 大模型在 CLUE 閱讀理解任務排名第一"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼盤古 NLP 大模型能夠在 CLUE 刷新三項歷史記錄?相比於業界其他大模型,它又有哪些不同呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,盤古 NLP 大模型在預訓練階段沉澱了大量的通用知識,既能做生成又能做理解的特性讓它有能力支持行業知識庫和數據庫的嵌入,進而對接行業經驗。大模型可以充當系統中的任意模塊,快速適配和擴展不同的場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,盤古 NLP 大模型在 encoder-decoder 的架構基礎上植入了華爲雲的訓練技巧和方法,所以性能優異,在 CLUE 三項榜單中都獲得了第一名。同時,盤古 NLP 大模型還進行了 nlpcc2018 文本摘要任務的評測,獲得了 Rouge Score 平均分 0.53 的業界最佳成績,超越第二名百分之六十。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三,之前業界發佈的大模型基本都不調優,或者是使用 non-gradient(非梯度下降)調優,爲了追求泛化能力而犧牲一些場景的性能。而盤古 NLP 大模型爲了改變這一缺陷,採用了大模型小樣本的調優方式,基於提示(prompt-based)的調優、動態冰化等一系列正則化技術,實現了小樣本學習任務上超越 GPT 系列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"盤古 NLP 在各種榜單中都獲得了不錯的成績,那麼在具體場景中它的表現如何呢?在華爲開發者大會(Cloud)現場,華爲雲人工智能首席科學家、IEEE Fellow 田奇就在現場對盤古 NLP 大模型進行了連續追問。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fd\/fdf89e1047c3e9f85a0b8ada98e70ea9.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/00\/0062d55646725551ba10951574c0d01f.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/ac\/acb26fcc7a82930b2dc89ef788587190.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/bf\/bfdf829a2c2a18a5e7d0835a843da1a3.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"通過這幾個來回的問答,我們發現盤古 NLP 大模型可以如同人類一般自如交流,體現出驚人的理解能力和生成能力。通過 40TB 中文文本的訓練,它能夠通過少樣本學習對意圖進行識別,準確回答我們的問題,而且即使你在一句話中提出了多個問題,它也能夠逐一識別並回答,具備了多重意圖識別能力。在其中一個問題中,完全沒有提到“碳中和”這個關鍵詞,盤古也可以基於上下文推斷出當前的討論對象,並且針對“碳中和”話題發表自己的觀點與看法。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三十億參數、十億級圖像知識的 CV 模型——盤古 CV 大模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了 NLP 模型,華爲雲還同時發佈了盤古 CV 大模型。據瞭解,該 CV 模型包含 30 億 + 參數,是目前業界最大的 CV 模型,並且在 ImageNet 1%、10% 等數據集上的小樣本分類精度上均達到目前業界最高水平(SOTA)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與其他 CV 大模型不同的是,盤古 CV 大模型首次兼顧了圖像判別與生成能力,能夠同時滿足底層圖像恢復與高層語義理解的需求,同時融合了各行業知識,能夠快速適配各種下游任務。目前,盤古 CV 大模型已經在醫學影像、金融等 100+ 項任務中應用實踐,不僅可以大幅提升業務測試精度,還能平均節約 90% 以上的研發成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有的 AI 工程通常都需要針對不同場景做定製化開發,費時費力。盤古 CV 大模型的出現,解決了 AI 工程難以泛化和複製的問題,讓 AI 開發進入工業化模式,一套流水線可以複製到不同的場景中,節約人力和算力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在功能方面,盤古 CV 大模型提供了大模型預訓練、大模型部署和大模型迭代三個功能,三者既是個有機整體,也形成了 AI 開發的完整閉環。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大模型預訓練:這個階段解決的核心問題是如何將超大規模數據,特別是各種行業數據中蘊含的知識,存儲在大模型中。預訓練的關鍵是整合無標籤和有標籤圖像,捕捉其中隱含的結構化特徵,特別是樣本和樣本之間的關係信息。盤古 CV 大模型中包含了數據處理、架構設計和模型優化三個步驟,支持層次化空間特徵聚合、監督式對比語義調整等算法,可以將圖像的表徵效率提升數千倍。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大模型部署:這個階段解決的核心問題是如何覆蓋各類算力差別較大的設備,包括用於高清遙感影像分析的雲側設備、用於電力線路巡檢的邊側設備、以及用於鐵路故障檢測的端側設備等等。三十億參數的大模型未必能夠滿足用戶的速度要求,盤古 CV 大模型中專門設計了模型抽取和知識蒸餾算法,能夠根據用戶需求抽取高效子模型,並且確保將大模型學習到的知識最大限度地傳遞給子模型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大模型迭代:盤古 CV 大模型配備了數據挖掘和增量學習模塊,其中的一比特監督學習、雙向自步學習等算法能夠減少 90% 以上的人力干預;同時類別增量、難例增量學習等技術也能夠在增量學習過程中減少 90% 以上的算力消耗。配合基於圖網絡的模型融合技術,盤古 CV 大模型最終可實現閉環迭代,模型的泛化能力也會在使用過程中逐漸增強。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"大模型背後的技術支撐以及實踐案例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"盤古 NLP 大模型具備千億參數、10的23次方、40TB 的中文文本訓練數據,如果是使用單卡來支持盤古大模型訓練,需要數百年的時間才能訓練完。那麼,盤古大模型背後到底有着什麼樣的技術支撐呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據悉,盤古大模型的 AI 算力和數據吞吐能力都是由鵬城雲腦 II 提供的,這是國內最大規模的 AI 訓練集羣。除了硬件算力支持,華爲底層軟件、訓練框架、ModelArts 平臺也爲盤古大模型提供了技術保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在算法方面,華爲雲的算法團隊和循環智能(Recurrent AI)的 NLP 團隊聯合攻關,突破了大模型微調的難題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對底層算子性能,盤古大模型基於 CANN 採用了算子量化、算子融合優化等技術,單算子性能能夠提升 30% 以上。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對並行策略,華爲 MindSpore 採用了“流水線並行、模型並行和數據並行”的多維自動混合並行技術,大幅降低了手動編碼的工作量,集羣線性度提升 20%。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對訓練資源調度,華爲雲 ModelArts 支持 E 級算力調度,提供最優的網絡通信能力。藉助 ModelArts 平臺的海量數據處理能力,盤古大模型僅用 7 天就可以完成 40TB 文本數據處理。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"光說不練假把式,瞭解了盤古大模型背後的技術支撐之後,我們來看看盤古大模型是如何應用到實際案例中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"國網重慶永川供電公司是國內早期採用無人機智能巡檢技術來替代人工巡檢的電力公司,並將無人機數據採集應用於輸電線路、變電站、配電線路自主巡檢等多個業務場景。但是傳統的無人機智能巡檢 AI 模型開發中,他們遇到了兩個難題,一是如何進行缺陷樣本的高效標註,二是智能巡檢故障種類繁多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這兩個問題,國網重慶永川供電公司與華爲雲合作應用了盤古 CV 大模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據標註方面,盤古 CV 大模型利用海量無標註電力數據進行預訓練,並結合少量標註樣本微調的高效開發模式,提出了針對電力行業的預訓練模型。應用之後,樣本篩選效率提升約 30 倍,篩選質量提升約 5 倍,以永川每天採集 5 萬張高清圖片爲例,可節省人工標註時間 170 人天。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在模型通用性方面,結合盤古搭載的自動數據增廣以及類別自適應損失函數優化策略,可以做到一個模型適配上百種缺陷,一個模型就可以替代永川原先的 20 多個小模型,極大地減少了模型維護成本,平均精度提升 18.4%,模型開發成本降低 90%。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"國網重慶永川供電公司的應用案例,讓我們見識到了盤古大模型在電力智能巡檢方面的優勢,盤古大模型能夠快速適配到電力行業的不同場景,真正做到了規模化可複製。相信未來,我們可以在更多行業領域看到盤古大模型的應用實踐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章