AI - Reinforcement

原創

2018-08-22 12:18

MDP (Markov Decision Process)

State: S
Action: A
Tansition Function

T(s,a,s′)=P(St+1=s′,St=s,At=a)

Reward Function

R(s)||R(s,a)||R(s,a,s′)

如果讓Initial State做Root，可以用：AND/OR Tree

例子：已知某一種Agent的出現概率如下（i：行；j: 例）：

P 1 （ i ， j ） = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

根據上面的Transition Function和某個情況歸納出：

T 1 (i, j) = ⎧ ⎩ ⎨ ⎪ ⎪ i < j; 0 i \geq j; P (i, j - i) j = 0; \sum n x = i P (i, x)

當j =0時, 按照上面公式，把紫色區域相加，即爲當j = 0時的所有值：

T1(0,0) = 0.3+0.3+0.2+0.1+0.2=1
T1(1,0) = 0.2+0.2+0.1+0.2 = 0.7
T1(2,0) = 0.2+0.1+0.2 = 0.5
…

T 1 （ i ， j ） = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 1 0.7 0.5 0.3 0.2 0 0.3 0.2 0.2 0.1 00 0.3 0.2 0.2 000 0.3 0.2 0000 0.3 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

如果有P1 和 P2:
Current State

s1 with action

a1 can make agent move to Next State

s1′
Current State

s2 with action

a2 can make agent move to Next State

s2′

T (s, a, s') = = T ((s 1, s 2), (a 1, a 2), (s 1', s 2')) T 1 (s 1 + a 1, s 1') \cdot T 2 (s 2 + a 2, s 2')

假設求Sate 1爲 2，State 2爲1；Action 對應 1 與 2 分別爲 1， 0；下一階段的Sate 1 與 State 2 對應 1，0：

T ((2, 1), (1, 0), (1, 0)) = = = T 1 (2 + 1, 1) \cdot T 2 (1 + 0, 0) T (3, 1) \cdot T 2 (1, 0) 0.6

從T1 的Matrix 找到行(i)=>3,例(j)=>1的對應數字爲0.2，假設T2(1,0)=0.3, 則最後上面例子的結果爲：

0.2⋅0.3=0.6

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

AI - Reinforcement