通俗易懂 empowered RL

"All Else Being Equal Be Empowered":

Inspired by examples from the animal kingdom, social sciences and games the authors proposed empowerment, a rather universal function, defined as the information-theoretic capacity of an agent’s actuation channel.

Organisms may be seen to maintain “essential variables”, like body temperature, sugar levels, pH levels. Homeostasis provides organisms with a local gradient telling which actions to make or which states to seek. The mechanism itself is universal and quite simple, however the choice of variables and the methods of regulation is not. They are evolved and are specific to different phyla.

The unifying theme of these and many other examples is the striving towards situations where in the long term one could do many different things if one wanted to, where one has more control or influence over the world. Predators with better sensors and actuators can hunt better. Having high status in a group of chimpanzees allows one more mating choices. Having a lot of money enables one to engage in more activities. One can choose from an array of options. However, if one doesn’t know what to do, a good rule of thumb is to choose actions leading to higher status, more power, money and control. We will now apply this idea to “embodied” agents.

Empowerment can be seen as the agent’s potential to change the world, that is, how much the agent could do in principle. This is in general different from the actual change the agent inflicts.

Briefly, empowerment is defined as the capacity of the actuation channel of the agent.

The Communication Problem:

There is a sender and a receiver. The sender transmits a signal, denoted by a random variable X, to the receiver, who receives a potentially different signal, denoted by a random variable Y. The communication channel between the sender and the receiver defines how transmitted signals correspond to received signals. In the case of discrete signals the channel can be described by a conditional probability distribution p(y|x).

Given p(y|x):

Channel capacity is the maximum amount of information the received signal can contain about the transmitted signal. Thus, mutual information is a function of p(x) and p(y|x), whereas channel capacity is a function of the channel p(y|x) only. Another important difference is that mutual information is symmetric in X and Y and is thus acausal, whereas channel capacity requires complete control over X and is thus asymmetric and causal.

Assuming that the agent is allowed to perform any actions for n time steps, what is the maximum amount of information it can “inject” into the momentary reading of its sensor after these n time steps? The more of the information can be made to appear in the sensor, the more control or influence the agent has over its sensor.

We need to measure the maximum amount of information the agent could “inject” or transmit into its sensor by performing a sequence of actions of length n.

In the paper titled "Empowerment-driven Exploration using Mutual Information Estimation", empowerment is used as an intrinsic reward to empower reinforcement learning.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章