Reinforcement Learning: An Introduction - Richard S. Sutton Part 0: Introduction

只是據說是無痛入門…堅持看下來會有收穫,但只看也不夠的。

Chapter 1 Introduction

Distinguishing features

  • trial-and-error search
  • delayed reward

The agent has to exploit what it has already experienced in order to
obtain reward, but in also has to explore in order to make better
action selections in the future.

Challenges

  • trade-off problem between exploration and exploitation
  • consider the whole problem of a goal-directed agent interacting with an uncertain environment

Elements

  • policy
  • reward
  • value function
  • model of the environment

Without rewards there could be no values, and the only purpose of
estimating values is to achieve more reward. Nevertheless, it is
values with which we are most concerned when making and evaluating
decisions. Action choices are made based on value judgments. We seek
actions that bring about states of highest value, not highest reward,
because these actions obtain the greatest amount of reward for us over
the long run.

Tic-Tac-Toe Example
https://blog.csdn.net/JerryLife/article/details/81385766

Reinforcement learning uses the formal framework of Markov decision
processes to define the interaction between a learning agent and its
environment in terms of states, actions, and rewards. This framework
is intended to be a simple way of representing essential features of
the artificial intelligence problem

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章