只是据说是无痛入门…坚持看下来会有收获,但只看也不够的。
Chapter 1 Introduction
Distinguishing features
- trial-and-error search
- delayed reward
The agent has to exploit what it has already experienced in order to
obtain reward, but in also has to explore in order to make better
action selections in the future.
Challenges
- trade-off problem between exploration and exploitation
- consider the whole problem of a goal-directed agent interacting with an uncertain environment
Elements
- policy
- reward
- value function
- model of the environment
Without rewards there could be no values, and the only purpose of
estimating values is to achieve more reward. Nevertheless, it is
values with which we are most concerned when making and evaluating
decisions. Action choices are made based on value judgments. We seek
actions that bring about states of highest value, not highest reward,
because these actions obtain the greatest amount of reward for us over
the long run.
Tic-Tac-Toe Example
https://blog.csdn.net/JerryLife/article/details/81385766
Reinforcement learning uses the formal framework of Markov decision
processes to define the interaction between a learning agent and its
environment in terms of states, actions, and rewards. This framework
is intended to be a simple way of representing essential features of
the artificial intelligence problem