Asynchronous Methods for Deep Reinforcement Learning 閱讀筆記

標籤（空格分隔）：增強學習算法論文筆記

本文的貢獻在於提出了異步學習的算法，並應用在A2C Q-learning等算法中

該論文作者提出了異步訓練（Asynchronous Methods）的方法應用到強化學習的各個算法中（Sarsa,one-step Q-learning n-step Q-learning和 advantage actor-critic）然後作者通過實驗說明將異步訓練方式應用在 A2C中的效果最好，於是就有了A3C(Asynchronous advantage actor-critic).

作者在設計 Asynchronous Methods初衷是爲了解決：在線學習獲得的訓練數據不穩定，而且數據與數據之間的相關性比較大

通常的做法是採用replay memory的機制，這種方法能夠保證穩定性以及減少數據之間的相關性，但是replay memeory的機制同時也將算法限制在off-policy的範疇之內了。

關於on-policy 和off-policy：
on-policy: 訓練數據都是最新的策略而非老的策略採集而來的；
off-ploicy: 訓練數據是由歷史的（包括最新的）策略採集而來

除此之外，增強學習需要的數據比較大，需要大量的experience,如果實用relplay memory則將會加大訓練成本

因此，作者爲了解決訓練數據相關性比較大 replay memory 佔用大量的資源，提出了Asynchronous Methods方法。

Asynchronous Methods的核心思想是用多個action-learner(相當於多個agnet)來玩一個遊戲，由於遊戲的初始狀態是隨機的，這樣就能保證數據之間相關性較少且可以on-policy學習。

相對於replay memory, Asynchronous Methods優點在於：
（1）可以將算法應用在on-policy
（2）減少大量的顯存，可以在多核CPU上進行訓練，大大少訓練成本

論文原話：

We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network

Aggregating over memory in this way reduces non-stationarity and decorre- lates updates, but at the same time limits the methods to off-policy reinforcement learning algorithms

it uses more memory and computation per real interaction; and it requires off-policy learning algorithms that can update from data generated by an older policy.

Instead of experience replay, we asynchronously execute multiple agents in parallel, on multiple instances of the environment.

Keeping the learners on a single machine removes the communication costs of sending gradients and parameters and enables us to use.

然後作者將Asynchronous Methods分別應用在Q-learning 和 n-step Q-learning 以及A2C上。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Asynchronous Methods for Deep Reinforcement Learning 閱讀筆記

Asynchronous Methods for Deep Reinforcement Learning 閱讀筆記

lightdb hash index的性能和限制

FeUdal Networks for Hierarchical Reinforcement Learning 閱讀筆記

Feature Pyramid Networks for Object Detection 閱讀筆記

ROIPoolingLayer源碼解析

ResNet-BN tensorflow源碼解析

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning 閱讀筆記

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結