The importance of experience replay database composition in deep reinforcement learning

原創

2023-07-27 13:32

發表時間：2015（Deep Reinforcement Learning Workshop, NIPS 2015）
文章要點：這篇文章基於DDPG探索了buffer裏面experience的組成對性能的影響。一個重要的觀點是，次優的經驗也是有利於訓練的，少了這些experience會很大程度影響性能（the importance of negative experiences that are not close to an optimal policy.
training with samples that are insufficiently spread over the state-action space can cause the method to fail.
when the neural network training data are not varied enough, the network is likely to over fit）。
作者分別直接訓DDPG，用隨機收集的樣本訓DDPG，以及用最好的policy收集的樣本訓DDPG，發現只用最好的policy收集的樣本訓練的效果是最差的

作者又做了另一個實驗來說明多樣性的問題，如下圖

只保留兩條最後的軌跡作爲訓練樣本，或者保留一條最開始的和一條最後的。發現存了最開的trial的效果更好，也就是說還是要多樣化的樣本更好，這樣能避免網絡overfit。
總結：簡單的實驗，取了兩個極端的變量來做測試，結論至少在簡單的實驗上是有道理的。擴展到更復雜的任務可能會有點問題，就像之前的paper說的，可能最開的樣本已經偏離當前policy很多了，用這個更新可能用處不大。既要考慮多樣性，也要考慮on policy纔行。
最近感覺，coverage不夠造成的主要的問題還是外推誤差（extrapolation error），只要用in distribution的更新方式去學value，應該就不會有前面的問題了。
疑問：無。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

The importance of experience replay database composition in deep reinforcement learning

《日本蠟燭圖》讀書筆記 & 技術分析回測

一分鐘部署 Llama3 中文大模型，沒別的，就是快

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

ffmpeg 百度雲盤

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

State Distribution-aware Sampling for Deep Q-learning

Large Batch Experience Replay

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結