無模型預測
Model-Free Prediction
蒙特卡羅強化學習
Monte-Carlo Reinforcement Learning
- 從經歷完整的經驗序列來估計狀態值 MC methods learn directly from episodes of experience
- 無模型,不清楚MDP的狀態轉移和獎勵
MC is model-free: no knowledge of MDP transitions / rewards - 完整的經驗序列 MC learns from complete episodes: no bootstrapping
- 價值=收穫的平均值 MC uses the simplest possible idea: value = mean return
- Caveat: can only apply MC to episodic MDPs
- 必須終止,才能得到平均值 All episodes must terminate