An Investigation of Model-Free Planning

原創

2022-05-26 13:42

發表時間：2019（ICML 2019）
文章要點：這篇文章主要是做實驗探討了一下什麼形式算planning。之前的planning通常會設置一個具體的planning算法，比如Monte Carlo rollouts,MCTS等等，或者在網絡裏面嵌入類似planning的結構，比如VIN,ATreeC之類的。作者想說，其實不需要去設計這些具體的planning，直接就像LSTM這些帶有時序關係的網絡就能展現出planning的特性了。
具體的，作者首先定義什麼是planning。不像經典的planning的定義方法，planning需要有一個look-ahead機制之類的，作者認爲重要的不是設計這個機制，而是planning帶來的預見性（foresight）。作者認爲的planning，是generalize能力強（First, an effective planning algorithm should be able to generalize with relative ease to different situations），能從少量樣本中有效學習（Second, a planning agent should be able to learn efficiently from relatively small amounts of data），充分利用時間限制（Third, an effective planning algorithm should be able to make good use of additional thinking time.）。說白了，只要你效果好，你就是planning，我根本不care你到底有沒有具體的planning機制。
然後作者就直接搞了ConvLSTM結構（Repeated ConvLSTM (DRC) network architecture），通過堆不同的時序結構的深度和寬度來做實驗。然後就說，這也是planning，而且效果不錯。
總結：標題取得很大很好，內容感覺配不上標題。
疑問：這個都能ICML，不知道是哪裏被reviewer認可了。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

An Investigation of Model-Free Planning

自學編程兩個月，現在我月入 4 萬元

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

State Distribution-aware Sampling for Deep Q-learning

Large Batch Experience Replay

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結