DeepMind的新強化學習系統是邁向通用AI的下一步嗎?

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文是TechTalks在人工智能領域最新研究發現的"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/tag\/ai-research-papers\/","title":"xxx","type":null},"content":[{"type":"text","text":"論文評論系列"}]},{"type":"text","text":"。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於精通圍棋、星際爭霸2以及其他遊戲的深度強化學習模型而言,關鍵的挑戰之一"},{"type":"text","text":"是它們無法將其能力泛化到訓練領域之外。這種限制使得將這些系統應用到現實世界中變得非常困難,在現實世界中,情況比訓練 AI 模型的環境複雜得多且不可預測。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最近,DeepMind AI研究實驗室中的科學家們在一篇“開放式學習”倡議的博文中,宣佈他們“初步訓練出了可以在無需接受人類交互數據的情況下,遊玩多種遊戲的代理”。他們新的項目包括一個有現實動態變化的3D環境,以及可以學習解決各種挑戰的深度強化學習代理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"據DeepMind的AI研究者稱,這套全新的系統是向着“創造更加通用、具備適應持續變化環境的彈性能力的代理”邁出的重要一步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該論文的發現表明,在將強化學習應用於複雜問題方面取得了一些令人印象深刻的進展。但它們也提醒人們,當前的系統距離實現人工智能社區幾十年來一直夢寐以求的通用智能能力還有多遠。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"深度強化學習的脆弱性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2c\/0c\/2c412fa3778388ebb5b9d30d9e76010c.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2019\/05\/28\/what-is-reinforcement-learning\/","title":null,"type":null},"content":[{"type":"text","text":"強化學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的一個關鍵優勢在於其可以在執行動作並接受反饋的同時發展出新的行爲方式,這種做法與人類和動物通過與環境互動學習知識的方式相類似。有些科學家將強化學習稱爲是“"},{"type":"link","attrs":{"href":"https:\/\/venturebeat.com\/2021\/01\/02\/leading-computer-scientists-debate-the-next-steps-for-ai-in-2021\/","title":null,"type":null},"content":[{"type":"text","text":"首個智能的計算理論"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"深度強化學習結合了強化學習與"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/01\/28\/deep-learning-explainer\/","title":null,"type":null},"content":[{"type":"text","text":"深度神經網絡"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",是包括DeepMind最著名的AlphaGo和"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2019\/11\/04\/deepmind-ai-starcraft-2-reinforcement-learning\/","title":null,"type":null},"content":[{"type":"text","text":"AlphaStar模型"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在內許多強化AI的核心。在這兩種模型中,AI系統都可以在各自的遊戲領域內打敗人類世界的冠軍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但強化學習系統的靈活度不足也是衆所皆知的缺陷。舉例來說,一個可以在專家模式打通星際爭霸2的強化學習模型卻打不過同類型遊戲(比如魔獸爭霸3)的任何難度。遊戲中哪怕是一點點的變化都會造成AI模型在性能上的降級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"“這些代理通常都被限制在它們訓練時所用的遊戲上,雖然遊戲的佈局、初始條件、對手可能會變化,但代理的目標在訓練與測試時必須保持不變,一丁點的偏離都會造成代理災難性的失敗。”DeepMind的研究者在"},{"type":"link","attrs":{"href":"https:\/\/deepmind.com\/research\/publications\/open-ended-learning-leads-to-generally-capable-agents","title":null,"type":null},"content":[{"type":"text","text":"論文"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"中如此寫道,同一篇論文也提供了他們開放式學習的研究中全部的細節。另一方面,人類則非常擅長跨領域的知識轉移。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"XLand環境"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f9\/fa\/f9ce40d3b3577d1364ae233104380afa.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind新項目的目標是創建“一個人工代理,其行爲可以超越它所訓練的遊戲集,提供更強的泛化能力”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲此,該團隊創建了XLand,一個能夠生成由靜態拓撲結構和可移動物體組成的3D環境的引擎。該遊戲引擎模擬了剛體物理學,並允許玩家以各種方式使用這種物體(例如,創建坡道、阻斷路徑等等)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"XLand是一個豐富的環境,你可以在其中對代理進行幾乎無限數量的任務訓練。XLand的主要優勢之一是能夠使用編程規則自動生成大量環境和挑戰情況來訓練AI代理。這也解決了機器學習系統的一大難題——如何獲得大量人造的訓練數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"根據這篇博文,研究人員在XLand中創建了“數十億的任務,跨越不同的遊戲、世界和玩家”。這些遊戲包括了尋找物體這類的簡單目標,到AI代理需要權衡不同獎勵的收益與代價這類複雜設定。有的遊戲也包括了多個代理之間的對抗或合作元素。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"深度強化學習"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind使用深度強化學習以及其他的幾個小技巧來創建可以在XLand環境中茁壯成長的AI代理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"每個代理的強化學習模型都會收到一個第一人稱的世界視角、代理的物理狀態(比如是否持有物品),以及其當前的目標。每一個代理都會對其策略神經網絡的參數進行微調,以獲得當前目標的最高獎勵。神經網絡的架構包含了一個注意力機制,以確保代理人能夠平衡地完成主要目標所需要完成的子目標的優化工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一旦代理能夠熟練應對它的當前挑戰,計算任務生成器就會爲代理創建一個新的挑戰。每個新任務都是根據代理人的訓練歷史生成的,有助於將代理的技能分配到更廣闊的挑戰之中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind同時還利用它龐大的計算資源(多虧其所有者Alphabet公司)同步訓練大量代理,並在不同代理之間轉移所學習的參數以提升強化學習系統的通用能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/60\/59\/60bc7cb24ed33c64088d9306d596ae59.png","alt":null,"title":"DeepMind 使用多步驟及基於羣體的機制來訓練許多強化學習代理","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"強化學習代理的性能是通過它們完成一系列未訓練過的任務的一般能力來評估的,測試用的任務內容包括了常見的“奪旗”和“捉迷藏”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind稱,他們的所有代理在XLand中約四千的獨特世界中訓練了約70萬個獨特的遊戲,並在340萬個獨特的任務中經歷了2000億個訓練步驟(在論文中研究人員寫道,1億個步驟相當於大約30分鐘的訓練)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"人工智能研究人員寫道:“目前,我們的代理已經能夠參與每一個程序生成的評估任務,除了少數幾個甚至對人類來說都不可能的任務”。“而且我們看到的結果清楚地展示了跨越整個任務空間的一般化、零次行爲。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2020\/08\/12\/what-is-one-shot-learning\/","title":null,"type":null},"content":[{"type":"text","text":"零次機器學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"模型可以解決在其訓練數據集中不存在的問題。在XLand這樣的複雜空間中,零次學習可能意味着代理已經獲得了關於其環境的基本知識,而不是在特定的任務和環境中記憶圖像幀的序列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當研究人員試圖爲新的任務調整它們時,強化學習代理進一步表現出泛化學習的跡象。根據他們的發現,對"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2019\/06\/10\/what-is-transfer-learning\/","title":null,"type":null},"content":[{"type":"text","text":"新任務進行30分鐘的微調"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"就足以使用新方法訓練的強化學習代理產生令人印象深刻的改進。相比之下,在同樣的時間內從頭開始訓練的代理在大多數任務上的表現幾乎爲零。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"高級別行爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"據DeepMind稱,強化學習代理表現出了“啓發式行爲”,如工具使用、團隊合作和多步驟規劃。如果得到證實,這可能是一個重要的里程碑。深度學習系統經常被批評爲學習統計上的相關關係"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/03\/15\/machine-learning-causality\/","title":null,"type":null},"content":[{"type":"text","text":"而不是因果關係"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。如果神經網絡能夠發展出高層次的概念,如利用物體創建坡道或造成閉塞,它可能會對機器人和自動駕駛汽車等領域產生巨大影響,而深度學習目前正在這些領域中掙扎前行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但這些都是假設,DeepMind的研究人員對在他們的發現上妄下結論持謹慎態度。\"他們在博文中寫道:“鑑於環境的性質,很難確定意向性是否真的存在--我們看到的行爲往往看起來是偶然的,但我們仍然看到它們在持續發生。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但他們相信,他們的強化學習代理“意識到了它們身體的基本情況和時間的流逝,而且它們可以瞭解它們遇到的遊戲的高級結構”。這種"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/07\/26\/ai-visual-reasoning-agent-dataset\/","title":null,"type":null},"content":[{"type":"text","text":"基本的自學技能"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"是人工智能界高度追求的另一個目標。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"智力理論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c7\/35\/c7bdc2c3a15e23b50647bb644f6e1b35.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind的一些頂級科學家最近"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/06\/07\/deepmind-artificial-intelligence-reward-maximization\/","title":null,"type":null},"content":[{"type":"text","text":"發表了一篇論文"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",他們在其中假設,僅靠單一的獎勵鞭策強化學習便足以最終達到"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2020\/05\/13\/what-is-artificial-general-intelligence-agi\/","title":null,"type":null},"content":[{"type":"text","text":"人工通用智能"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(AGI)的程度。科學家們認爲,一個擁有正確獎勵的智能代理可以發展各種能力,如感知和自然語言理解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"儘管DeepMind的這個新方法仍然需要在多個人工設置的獎勵上訓練強化學習代理,但這與他們通過強化學習實現AGI的總體觀點是一致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"“DeepMind通過這篇論文所展示的是,一個單一的RL代理開發出的智能是可以達到多個目標的”,Pathmind的首席執行官克里斯·尼科爾森(Chris Nicholson)告訴TechTalks。“而且它在完成一件事時學到的技能可以推廣到其他目標。這與人類智力的應用方式非常相似。例如,我們學習抓取和操縱物體,並擴展到敲打錘子甚至是鋪牀。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"尼科爾森還認爲,該論文的其他方面的發現暗示了向一般智力的進展。“父母會認識到,開放式的探索正是他們的幼兒學習在世界中移動的方式。他們把東西從櫃子裏拿出來,再放回去。他們發明了自己的小目標並進一步熟練掌握,雖然這些目標在成年人看來可能毫無意義,”他說。“DeepMind正在以編程方式爲其代理在這個世界上設定目標,而這些代理正在學習如何逐一掌握這些目標。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"尼科爾森說,強化學習代理也顯示出在他們自己的虛擬世界中發展"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/04\/26\/reinforcement-learning-embodied-ai\/","title":null,"type":null},"content":[{"type":"text","text":"具身智能"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的跡象,就像人類擁有的那種。他說:“這再一次表明,人們學習移動和操縱的豐富和可塑性的環境有利於一般智能的出現,而且智能的生物和物理類比可以指導人工智能的進一步工作。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"南加州大學計算機科學副教授薩蒂亞納拉亞·拉加瓦查裏(Sathyanaraya Raghavachary)對DeepMind論文中的說法持懷疑態度,尤其是關於本體感知、時間意識以及對目標和環境的高層次理解的結論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"“即使是人類也無法做到對自己本體狀況的完全掌控,更不要說那些VR的代理了,”拉加瓦查裏在爲TechTalks做的評論中提到過,一個綜合的大腦對身體的感知由兩部分組成,一是適量的對本體的感知,二是對空間中定位的認識。“對時間的意識也是如此。大腦需要對過去有記憶,同時對現在與過去時間之間的相對有認識。論文作者的意思可能是說,代理在追蹤他們在移動紫色金字塔這類的行爲所造成的環境漸變時,底層的物理模擬器也將產生狀態的變化。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"拉加瓦查裏還指出,如果代理能夠理解其任務的高層結構,那麼爲達到最佳結果所需的這2000億步的模擬訓練也就用不着了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"“就像是他們在結論中說的一樣,(強化學習的)底層架構還缺乏實現這三件(本體感知、時間意識、理解高層次任務的結構)所需的東西”,他說,“總之,XLand也只是'差不多相同'而已”。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"模擬與現實世界之間的差距"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"簡而言之,這篇論文論證了一個觀點,如果我們可以創建一個足夠複雜的環境,設計出正確的強化學習架構,在計算資源上不吝嗇開銷,讓模型能夠積累出足夠的經驗,那麼我們就可以在同一環境中泛化不同類型的任務。這也是人類與動物在自然進化中"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/06\/17\/evolution-rewards-artificial-intelligence\/","title":null,"type":null},"content":[{"type":"text","text":"開發出智能"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"事實上,DeepMind已經開發出了類似的項目:"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2019\/01\/02\/humanizing-ai-deep-learning-alphazero\/","title":null,"type":null},"content":[{"type":"text","text":"AlphaZero"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",一個精通雙人回合制遊戲的強化學習模型。通過添加零次學習元素,XLand的實驗將這一概念擴展到了更高的層面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通過XLand訓練過的代理最終都會將他們獲得的經驗應用於現實生活中,類似機器人或自動駕駛等應用,但作者並不認爲這將會是一個領域的突破。我們仍然需要做出妥協,手動設置限制以減少現實環境的複雜性;或者增加人工強化,比如爲機器學習模型灌輸先驗知識或添加額外的傳感器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DeepMind的強化學習代理或許已經可以在XLand的虛擬世界中稱王稱霸,但這些模擬的情景也僅僅只是複雜的現實世界的冰山一角而已。這種模擬與現實之間的差距將在很長一段時間內給AI代理帶來挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2021\/08\/02\/deepmind-xland-deep-reinforcement-learning\/","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/bdtechtalks.com\/2021\/08\/02\/deepmind-xland-deep-reinforcement-learning\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章