問題描述
這個題目來自一個我的作業:
Construct a binary tree (each node has two child nodes) of depth and assign different values to each of the leaf-nodes. Implement the MCTS algorithm and apply it to the above tree to search for the optimal value.
我們看看能不能通過MCTS找到最大的葉子節點。
定義類和函數
我們需要定義類和函數,用來實現這個作業
節點類
在節點類中,有節點編號key,節點左右兒子left,right,父節點p,遍歷次數n和回報和t。
此外還有兩個方法判斷其是否是葉子結點(沒有子結點),以及計算該節點的UCB(Upper Confidence Bound)值,其實UCB在某種程度上就是在鼓勵你多去走走還沒走過的路,避免過早收斂
import math
# Python implementation to construct a Binary Tree from
# parent array
# A node structure
class Node:
# A utility function to create a new node
def __init__(self, key, t, n):
self.key = key
self.left = None
self.right = None
self.t = t
self.n = n
self.p = None
def get_parent(self, p):
self.p = p
def judge_leave_nodes(self):
if (self.left == None) and (self.right == None):
return True
else:
return False
def calculate_UCB(self):
C = 0.2
if (self.n == 0):
if self.judge_leave_nodes():
UCB = self.t + 1000
else:
UCB = 1000
else:
UCB = self.t/self.n + C * math.sqrt(math.log(self.p.n)/self.n)
return UCB
創建節點和樹的函數
定義函數來創建樹
def createNode(parent, T,N, i, created, root):
# If this node is already created
if created[i] is not None:
return
# Create a new node and set created[i]
created[i] = Node(i,T[i],N[i])
# If 'i' is root, change root pointer and return
if parent[i] == -1:
root[0] = created[i] # root[0] denotes root of the tree
return
# If parent is not created, then create parent first
if created[parent[i]] is None:
createNode(parent, parent[i], T[i],N[i], created, root )
# Find parent pointer
p = created[parent[i]]
created[i].get_parent(p)
# If this is first child of parent
if p.left is None:
p.left = created[i]
# If second child
else:
p.right = created[i]
# Creates tree from parent[0..n-1] and returns root of the
# created tree
def createTree(parent):
n = len(parent)
# Create and array created[] to keep track
# of created nodes, initialize all entries as None
created = [None for i in range(n+1)]
root = [None]
for i in range(n):
createNode(parent, T, N, i, created, root)
return root[0]
中序遍歷樹
用來中序遍歷樹,就是顯示一下啦🙂
#Inorder traversal of tree
def inorder(root):
if root is not None:
inorder(root.left)
print (root.key, root.t, root.n, root.p)
inorder(root.right)
MCTS函數
內含MCTS的四個步驟,即Selection、Expansion、Simulation和Backpropagation
def MCTS(root):
node = root
Rollout_node = None
Previous_Rollout_node = root
number_of_rollout = 0
# choose the same path
while (Rollout_node is not Previous_Rollout_node):
# until reach leaf node
while (node.judge_leave_nodes() is not True):
left = node.left
right = node.right
## Selection
UCB_left = left.calculate_UCB()
UCB_right = right.calculate_UCB()
if (UCB_left >= UCB_right):
node = left
else:
node = right
# the node is the leaf node now, a rollout complete
# backpropagation
Previous_Rollout_node = Rollout_node
Rollout_node = node
number_of_rollout += 1
#print ("Inorder Traversal of constructed tree")
#print(inorder(root))
while node.p is not None:
node.p.t += Rollout_node.t
node.p.n += 1
node = node.p
node = Rollout_node
#print("The best path:")
#while node.p is not None:
# print("%d<-" %(node.key), end='')
# node = node.p
#print(root.key)
#print("The Reward: %d" %(Rollout_node.t))
print("The Number of Rollout: %d" %(number_of_rollout))
return Rollout_node.t
我這裏設置的結束條件是兩次走同一條路,也就意味着別的路沒有可能再試了,我感覺還是很makesense的。
測試效果
初始化一顆樹
我們這裏先做一顆簡單的樹,除了T的最後個數是我們制定的,其他的數都是和depth相關的,有規律的。
# 0:0 0
# / \
# 1:0 0 2: 0 0
# / \ / \
#3:2 0 4:1 0 5:-1 0 6:3 0
parent = [-1, 0, 0, 1, 1, 2, 2]
T = [0, 0, 0, 2, 1, -1, 3]
N = [0,0,0,0,0,0,0]
root = createTree(parent)
print ("Inorder Traversal of constructed tree")
inorder(root)
MCTS(root)
通過3步,最終找到的是這條路,得到的數是2
MCTS有問題?
我感覺MCTS的問題在上面的例子裏還是看得出來,開始的時候都沒有走過,會向左到1,然後向左到3,沒毛病。第二次右邊沒走過到2,然後都沒有走過向左到5。那麼問題來了,2的值已經被5拉低了,也就是現在由於5的表現差拖累了2,進一步讓6(原本是最好的)根本沒有機會被遍歷。這就是MCTS的問題。
不過這是因爲5和6的數差的太多,實際上我感覺並不會,比如下圍棋,我猜想你上一步走的很差,下一步大概率怎麼走都不太行?(也說不準)
但是還是這個意思,MCTS的兄弟節點某種程度上在相互影響,甚至拖累。
不過既然沒有所有都遍歷,就得承擔這個風險😏,你不能啥都要吧┓( ´∀` )┏
迴歸問題
現在我們設置也就是一共四千多個節點,先初始化數
import numpy as np
depth = 12
N = [0]*((2**(depth+1) -1))
T = [0]*((2**(depth) -1))
T.extend(np.random.randint(-100,101,2**(depth)).tolist())
import matplotlib.pyplot as plt
plt.hist(T[(2**(depth) -1):],bins = 100,color='red',histtype='stepfilled',alpha=0.75)
plt.title('distribution of T')
plt.show()
parent = [-1]
for i in range((2**depth) -1):
parent.append(i)
parent.append(i)
root = createTree(parent)
#print ("Inorder Traversal of constructed tree")
#inorder(root)
MCTS(root)
最後一層有四千多個數,我們從-100到100都有,所以理論上應該最優解是100.
我們做50次這個實驗,看看得到的數都是多少(rewards)以及用多少步得到這個數(number_of_rollout)
import numpy as np
depth = 12
rewards = []
numbers_of_rollout = []
for i in range(50):
N = [0]*((2**(depth+1) -1))
T = [0]*((2**(depth) -1))
T.extend(np.random.randint(-100,101,2**(depth)).tolist())
import matplotlib.pyplot as plt
parent = [-1]
for i in range((2**depth) -1):
parent.append(i)
parent.append(i)
root = createTree(parent)
[R,N] = MCTS(root)
rewards.append(R)
numbers_of_rollout.append(N)
所有的子結點我們分配的數值爲
所以確實應該可以得到100,但是也有-100,平均值是-1.07。那我們看看MCTS的表現吧~
看看效果吧
我們畫一下這兩個的分佈
平均可以得到88.58,感覺還是可以的,除了那個五十幾不太行,我猜他是被兄弟拖累了hhh?
這個數我覺得更有用,平均32.56步就可以得到一個相對好的值,這太不容易了,把所有的遍歷一遍要步,省了也太多了吧,這麼看起來88.58還是不錯的✌️
C的作用
在UCB中C是用來平衡exploration和exploitation的。越大的C會讓rollout的時候更加傾向於走那些沒怎麼走過的點,也就是走過的路應該會變得更多,得到的結果就應該會變得更好,讓我們看一下實際效果吧~~
可以看出來確實是變好了一點,不太顯著不過也可以理解,畢竟已經這麼好了~~
也可以看得出來明顯rollout個數是在上升的~
C確實是在做一個平衡(tradeoff),到底是選擇更好的reward還是更少的rollout,由你選擇咯(practically choice)。
不過我看wiki說一般選擇我也不知道爲什麼我這裏的C彷彿應該取的很大的樣子。。不管了問題不大
最後把最後一段的代碼附上,還有個小tip,我發現np.log,np.sqrt比math.*好用哈哈哈
Rewards = []
Number_Of_Rolls = []
def frange(start, stop=None, step=None):
if stop == None:
stop = start + 0.0
start = 0.0
if step == None:
step = 1.0
while True:
if step > 0 and start >= stop:
break
elif step < 0 and start <= stop:
break
yield ("%g" % start) # return float number
start = start + step
import numpy as np
# Python implementation to construct a Binary Tree from
# parent array
C = 0.0
while C <= 100:
# A node structure
class Node:
# A utility function to create a new node
def __init__(self, key, t, n):
self.key = key
self.left = None
self.right = None
self.t = t
self.n = n
self.p = None
def get_parent(self, p):
self.p = p
def judge_leave_nodes(self):
if (self.left == None) and (self.right == None):
return True
else:
return False
def calculate_UCB(self):
#C = 0.2
if (self.n == 0):
if self.judge_leave_nodes():
UCB = self.t + 1000
else:
UCB = 1000
else:
UCB = self.t/self.n + C * np.sqrt(np.log(self.p.n)/self.n)
return UCB
def createNode(parent, T,N, i, created, root):
# If this node is already created
if created[i] is not None:
return
# Create a new node and set created[i]
created[i] = Node(i,T[i],N[i])
# If 'i' is root, change root pointer and return
if parent[i] == -1:
root[0] = created[i] # root[0] denotes root of the tree
return
# If parent is not created, then create parent first
if created[parent[i]] is None:
createNode(parent, parent[i], T[i],N[i], created, root )
# Find parent pointer
p = created[parent[i]]
created[i].get_parent(p)
# If this is first child of parent
if p.left is None:
p.left = created[i]
# If second child
else:
p.right = created[i]
# Creates tree from parent[0..n-1] and returns root of the
# created tree
def createTree(parent):
n = len(parent)
# Create and array created[] to keep track
# of created nodes, initialize all entries as None
created = [None for i in range(n+1)]
root = [None]
for i in range(n):
createNode(parent, T, N, i, created, root)
return root[0]
#Inorder traversal of tree
def inorder(root):
if root is not None:
inorder(root.left)
print (root.key, root.t, root.n, root.p)
inorder(root.right)
def MCTS(root):
node = root
Rollout_node = None
Previous_Rollout_node = root
number_of_rollout = 0
# choose the same path
while (Rollout_node is not Previous_Rollout_node):
# until reach leaf node
while (node.judge_leave_nodes() is not True):
left = node.left
right = node.right
## Selection
UCB_left = left.calculate_UCB()
UCB_right = right.calculate_UCB()
if (UCB_left >= UCB_right):
node = left
else:
node = right
# the node is the leaf node now, a rollout complete
# backpropagation
Previous_Rollout_node = Rollout_node
Rollout_node = node
number_of_rollout += 1
#print ("Inorder Traversal of constructed tree")
#print(inorder(root))
while node.p is not None:
node.p.t += Rollout_node.t
node.p.n += 1
node = node.p
node = Rollout_node
#print("The best path:")
#while node.p is not None:
# print("%d<-" %(node.key), end='')
# node = node.p
#print(root.key)
#print("The Reward: %d" %(Rollout_node.t))
#print("The Number of Rollout: %d" %(number_of_rollout))
return (Rollout_node.t,number_of_rollout)
import numpy as np
depth = 12
rewards = []
numbers_of_rollout = []
for i in range(5):
N = [0]*((2**(depth+1) -1))
T = [0]*((2**(depth) -1))
T.extend(np.random.randint(-100,101,2**(depth)).tolist())
import matplotlib.pyplot as plt
parent = [-1]
for i in range((2**depth) -1):
parent.append(i)
parent.append(i)
root = createTree(parent)
[R,N] = MCTS(root)
rewards.append(R)
numbers_of_rollout.append(N)
Rewards.append(np.mean(rewards))
Number_Of_Rolls.append(np.mean(numbers_of_rollout))
C += 1
以及
x = range(101)
l1=plt.plot(x,Rewards,'r--',label='average rewards')
plt.plot(x,Rewards,'ro-')
#plt.title('The Lasers in Three Conditions')
plt.xlabel('C')
plt.ylabel('Rewards')
plt.legend()
plt.savefig('R.png')
plt.show()
l2=plt.plot(x,Number_Of_Rolls,'g--',label='number of rollout')
plt.plot(x,Number_Of_Rolls,'g+-')
#plt.title('The number of rollout')
plt.xlabel('number')
plt.ylabel('C')
plt.legend()
plt.savefig('N.png')
plt.show()