The Predictron: End-To-End Learning and Planning

原創

2023-04-03 13:32

發表時間：2017（ICML 2017）
文章要點：這篇文章設計了一個叫Predictron的結構，在abstract的狀態上進行學習，通過multiple planning depths來使得model self-consistent，進行端對端的學習。這裏的設定是MRP，不是MDP，所以沒有動作，只有狀態轉移。整個模型包括一個state representation，也就是encoder，一個model，用來做狀態轉移，以及一個value function。這裏的一個想法就是，不管是1-step的planning，還是k-step的planning，他們最終學到的值都應該是一樣的。就算我搞一個\(\lambda\)-return，最終的預測還是應該是一個東西。然後在學這個model的時候，就把所有的這些目標都一起學。比如只學k-step就是

學0-K步就是

學\(\lambda\)-return就是

最後這些目標其實都是同一個目標，所以還可以讓他們互相擬合，比如對着\(\lambda\)-return學

然後就結束了。
總結：其實這個背景設置是Markov reward process，所以沒有policy，整個過程就是在學model和value。
疑問：不是很理解創新在哪，可能比較早吧。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

前言今天大姚給大家分享一款基於C#、WPF、Prism、MaterialDesign、HandyControl開發的通訊調試工具（支持Modbus RTU、MQTT調試，界面色彩豐富）：Wu.CommTool。工具特點工具界面色彩豐

2024-05-19 14:21:58

Linux/Golang/glibC系統調用

Linux/Golang/glibC系統調用本文主要通過分析Linux環境下Golang的系統調用，以此闡明整個流程有時候涉略過多，反而遭到質疑~，寫點文章證明自己實力也好 Golang系統調用找個函數來分析 https://pk

藍天上的雲℡

2024-05-19 14:21:17

讓python代碼找到文件路徑的最好方法

也就是算出絕對路徑傳進去. import os wenjian='/'.join(os.path.abspath(__file__).split('/')[:-2])+'/' with open(wenjian+"meddata.jso

張博的博客

2024-05-19 14:19:47

Python 潮流週刊#51：用 Python 繪製美觀的圖表

本週刊由 Python貓出品，精心篩選國內外的 250+ 信息源，爲你挑選最值得分享的文章、教程、開源項目、軟件工具、播客和視頻、熱門話題等內容。願景：幫助所有讀者精進 Python 技術，並增長職業和副業的收入。本期週刊分享了 12

豌豆花下貓

2024-05-19 14:19:07

MASM中的向前引用（Forward Reference）

當程序需要引用尚未定義的變量或標號時，編譯器會如何處理呢，這就涉及到向前引用（Forward Reference）的概念。一、Forward Reference的概念程序引用到之前尚未定義的變量(Variable)、標號(L

2024-05-19 14:11:37

[MASM拾遺]Offset僞指令

Offset僞指令我一直都認爲只是獲取標識符在段中的偏移地址，但經研究，發現了部分違反直覺的細微區別： 1、在完整端聲明(Full segment definition)的模式下如果offset mygroup:myvar或o

2024-05-19 14:11:37

【Python】強化學習SARSA走迷宮

之前有實現Q-Learning走迷宮，本篇實現SARSA走迷宮。 Q-Learning是一種off-policy算法，當前步採取的決策action不直接作用於環境生成下一次state，而是選擇最優的獎勵來更新Q表。更新公式： SARSA

2024-05-19 14:11:07

h28 HTML Javascript

A script is a small piece of program that can add interactivity to our websites. For example, a script could generate a

2024-05-19 14:10:26

h29 HTML Layouts

The HTML Layouts specifies the arrangement of components on an HTML web page. A good layout structure of the webpage i

2024-05-19 14:10:26

h27 HTML Adding Favicon

What is a HTML Favicon? A favicon is a small image that represents your website and helps users identify it among mult

2024-05-19 14:10:26

h30 HTML Layout Elements

The Layout Elements of HTML In HTML, there are various semantic elements that are used to define different parts of a

2024-05-19 14:10:26

h31 HTML Layout using CSS

Now we all have learned various techniques to design an HTML layout including tables and semantic elements. We are ver

2024-05-19 14:10:26

CSS Cascading Style Sheet

cs01 CSS Syntax cs02 CSS Selectors cs03 CSS Inclusion cs04 CSS Measurement Units cs05 CSS Paddings Property REF http

2024-05-19 14:10:26

cs04 CSS Measurement Units

Values and units, in CSS, are significant as they determine the size, proportions, and positioning of elements on a web

2024-05-19 14:10:26

cs01 CSS Syntax

A CSS comprises of style rules that are interpreted by the browser and then applied to the corresponding elements in you

2024-05-19 14:10:26

24小時熱門文章

最新文章

最新評論文章