只是據說是無痛入門…堅持看下來會有收穫,但只看也不夠的。
Chapter 1 Introduction
Distinguishing features
- trial-and-error search
- delayed reward
The agent has to exploit what it has already experienced in order to
obtain reward, but in also has to explore in order to make better
action selections in the future.
Challenges
- trade-off problem between exploration and exploitation
- consider the whole problem of a goal-directed agent interacting with an uncertain environment
Elements
- policy
- reward
- value function
- model of the environment
Without rewards there could be no values, and the only purpose of
estimating values is to achieve more reward. Nevertheless, it is
values with which we are most concerned when making and evaluating
decisions. Action choices are made based on value judgments. We seek
actions that bring about states of highest value, not highest reward,
because these actions obtain the greatest amount of reward for us over
the long run.
Tic-Tac-Toe Example
https://blog.csdn.net/JerryLife/article/details/81385766
Reinforcement learning uses the formal framework of Markov decision
processes to define the interaction between a learning agent and its
environment in terms of states, actions, and rewards. This framework
is intended to be a simple way of representing essential features of
the artificial intelligence problem