PROCEDURAL GENERALIZATION BY PLANNING WITH SELF-SUPERVISED WORLD MODELS

原創

2022-11-25 13:34

發表時間：2022(ICLR2022)
文章要點：這篇文章基於muzero來度量model-based agent的泛化能力。主要研究了三個因素：planning, self-supervised representation learning, and procedural data diversity。Planning就是指MCTS，self-supervised representation learning就是指learned model，procedural data diversity就是指我的訓練集有多少個MDP,相當於我在多少個環境或者任務上訓練的，多就說明procedural data diversity大，這樣泛化就會相對容易。
然後主要考慮兩種泛化性：procedural and task。Procedural generalization指的是reward function不變，但是觀測發生改變，比如顯示方式、地圖地形、目標位置等等。Task generalization就是說環境是一樣的，但是reward function（task）發生改變。得出了結論是，muzero在這兩種泛化性問題上總體來說表現都很好。Planning對泛化性很重要，主要來源於動作選擇和策略優化（while simply learning a value-equivalent model can bring representational benefits, the best results come from also using this model for action selection and/or policy optimization）。self-supervised representation learning對於performance和data efficiency很有幫助，對數據多樣性的要求也變低了，這得益於學到了一個更準確的world model，對細節的學習更精確（the primary benefit brought by self-supervision is in learning a more accurate world model and capturing more fine-grained details such as the position of the characters and enemies.），學到的表徵更穩健（self-supervision leads to more robust representations）。但是在ML-45這個任務上，發現效果不行，得出的結論是對於不同的泛化問題需要不同的方法。
總結：問題挺好的，但是單單從performance來說泛化性好壞，以及解釋原因，總覺得有點站不住腳。就是這之間好像沒有必然的因果關係，內部原理還是不清楚。
疑問：無。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

PROCEDURAL GENERALIZATION BY PLANNING WITH SELF-SUPERVISED WORLD MODELS

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

State Distribution-aware Sampling for Deep Q-learning

Large Batch Experience Replay

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結