文章目錄
書名: Reinforcement Learning State-of-the-Art
筆者簡介:一名學生
state and action
transition function
reward function
Markov Decision Process
policy
強化學習的基本流程
Optimality Criteria and Discounting
Before we can talk about algorithms for computing optimal policies, we have to define what that means. That is, we have to define what the model of optimality is.
Value Functions and Bellman Equations
A value function represents an estimate how good it is for the agent to be in a certain state (or how good it is to perform a certain action in that state). The notion of how good is expressed in terms of an optimality criterion, i.e. in terms of the expected return.
greedy policy
Policy Improvement——Fundamental DP Algorithms
Algorithm1