1.6 萬億參數!谷歌訓練一超級人工智能語言模型,相當於9個GPT-3

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參數是機器學習算法的關鍵。它們是模型的一部分,是從歷史的訓練數據中學到的。一般而言,在語言領域中,參數的數量和複雜度之間的相關性保持得非常好。舉例來說,OpenAI 的 GPT-3,是有史以來訓練過的最大的語言模型之一,就擁有 1750 億個參數,它能夠進行原始類比、生成食譜,甚至完成基本代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"谷歌的研究人員開發出了一種基準測試方法,認爲它能讓他們訓練出一個包含超過一萬億個參數的語言模型,這可能是迄今爲止對這種相關性最全面的測試方法之一。他們表示,他們的 1.6 萬億參數模型,看起來是目前規模最大的,其速度比之前谷歌開發的最大語言模型(T5-XXL)提高了 4 倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正如研究人員在一篇詳細介紹他們研究成果的論文中所指出的,大規模訓練是獲得強大模型的有效途徑。在大數據集和參數數量的支持下,簡單的架構超越了複雜的算法。但是,高效的大規模訓練和密集的計算是關鍵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正因爲如此,研究人員纔會追求所謂的 SwitchTransformer ——一種“稀疏激活”技術,即僅使用模型的權值子集,或僅轉換模型中輸入數據的參數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Switch Transformer 建立在專家混合的基礎上,這是 90 年代初首次提出的人工智能模型範式。大致的概念是,在一個更大的模型中保留多個專家,或者說是專門處理不同任務的模型,並且有一個“門控網絡”爲任何給定數據選擇諮詢哪些專家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Switch Transformer 的新穎之處在於,它有效地利用了爲密集矩陣乘法(廣泛用於語言模型的數學運算)設計的硬件,如 GPU 和谷歌的張量處理單元(TPU)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於研究者來說,在分佈式訓練設置中,他們的模型會在不同的設備上拆分唯一的權重,這樣權重就會隨着設備數量的增加而增加,但是仍然可以管理每臺設備的內存和計算軌跡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中一項實驗,研究人員使用 32 個 TPU 內核對 Colossal Clean Crawled Corpus 預先訓練出幾種不同的 Switch Transformer 模型, Colossal Clean Crawled Corpus 是一組大小爲 750 GB 的文本數據集,它們來自 Reddit、維基百科和其他網絡資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究人員爲這些模型安排了任務,讓它們預測那些 15% 的單詞被掩蔽的段落中缺失的單詞,以及其他一些挑戰,例如通過檢索文本來回答一系列日益困難的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片: https:\/\/uploader.shimo.im\/f\/cVuRzapGE2oAZNYS.png"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究人員聲稱,他們的 1.6 萬億參數模型(Switch-C),擁有 2048 名專家,顯示出“完全沒有訓練不穩定性”,而更小的模型(Switch-XXL)包含 3950 億個參數和 64 名專家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,在一次基準測試中,Sanford Question Answering Dataset(SQuAD)的 Switch-C 的得分更低,只有 87.7;而 Switch-XXL 的得分爲 89.6。研究人員將其歸因於微調質量、計算要求和參數數量之間的關係不透明。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Switch Transformer 在這種情況下可以獲得一些下游任務的收益。舉例來說,據研究人員稱,它在使用同樣數量的計算資源的情況下,可以達到 7 倍以上的預訓練速度,研究人員表示,可以用大的稀疏模型來創建更小的密集模型,對任務進行微調後,其質量可提高 30%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中一個測試是,Switch Transformer 模型被訓練爲在 100 多種不同的語言之間進行翻譯,研究人員觀察到 101 種語言“總體上都有提高”,91% 的語言的翻譯速度是基準模型的 4 倍以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“儘管這項工作主要集中於極其龐大的模型,但是我們也發現,只需兩個專家模型就可以提高性能,同時還可以在普通 GPU 或 TPU 的內存限制下輕鬆地進行擬合。”研究人員在論文中寫道。“我們不能完全保持模型的質量,但是把稀疏的模型提煉成密集模型,可以達到 10 到 100 倍的壓縮率,同時專家模型的質量提高約 30%。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在未來的工作中,研究人員計劃將 Switch Transformer 應用到“新的、跨不同模式”,包括圖像和文本。他們認爲,稀疏的模型可能會給不同媒體和多模態模型帶來好處。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不幸的是,研究人員的工作沒有考慮到這些大型語言模型在現實世界中的影響。語言模型經常會放大這些公共數據中編碼的偏見;部分培訓數據並非不常見,它們來自具有普遍性別、種族和宗教偏見的社區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OpenAI 是一家人工智能研究公司,它指出,這可能導致把像“naughty”或“sucked”這樣的詞放在女性代詞旁邊,把“Islam”放在“terrorism”旁邊。其他研究,英特爾、麻省理工學院以及加拿大人工智能項目 CIFAR 的研究人員在去年 4 月份發表了一份研究報告,報告指出,一些最流行的模型存在着很強的刻板印象,包括谷歌的 BERT 和 XLNet、OpenAI 的 GPT-2 和 Facebook 的 RoBERTa。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據 Middlebury Institute of International Studies 稱,惡意行爲者可能會利用這種偏見,通過傳播錯誤信息、虛假信息和徹頭徹尾的謊言來煽動不和諧,從而“使個人處於極端的極右思想和行爲之中,成爲暴力的個人”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"供參考 @mmitchell_ai 和我發現,九月份谷歌召開了一次關於 LLM的會議,但是我們團隊沒有人被邀請或者知道這次會議。所以當他們決定在自己的 “操場”中做什麼後,他們只希望人工智能的道德規範變成橡皮圖章。https:\/\/t.co\/tlT0tj1sTt— Timnit Gebru (@timnitGebru)2021 年 1 月 13 日"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"谷歌發佈的機器學習研究的政策是否會在其中起到作用尚不清楚。去年年底,路透社報道說,該公司的研究人員在進行面部和情緒分析以及種族、性別或政治派別分類之前,現在需要諮詢法律、政策和公關團隊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"去年12月初,谷歌解僱了人工智能倫理學家 Timnit Gebru,據說部分原因是因爲她的一篇關於大型語言模型的研究論文,其中討論了這些模型的風險,包括其碳足跡對邊緣羣體的影響,以及持續存在的針對特定人羣的性虐待語言、仇恨言論、微攻擊、刻板印象和其他非人道主義語言的趨勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kyle Wiggers,技術記者,現居美國紐約市,爲 VentureBeat 撰寫有關人工智能的文章。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/venturebeat.com\/2021\/01\/12\/google-trained-a-trillion-parameter-ai-language-model\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章