蒙特卡洛樹搜索(MonteCarlo Tree Search)

問題描述

這個題目來自一個我的作業:
Construct a binary tree (each node has two child nodes) of depth d=12d = 12 and assign different values to each of the 2d2^d leaf-nodes. Implement the MCTS algorithm and apply it to the above tree to search for the optimal value.
我們看看能不能通過MCTS找到最大的葉子節點。

定義類和函數

我們需要定義類和函數,用來實現這個作業

節點類

在節點類中,有節點編號key,節點左右兒子left,right,父節點p,遍歷次數n和回報和t。
此外還有兩個方法判斷其是否是葉子結點(沒有子結點),以及計算該節點的UCB(Upper Confidence Bound)值,其實UCB在某種程度上就是在鼓勵你多去走走還沒走過的路,避免過早收斂

import math
# Python implementation to construct a Binary Tree from 
# parent array 
  
# A node structure 
class Node: 
    # A utility function to create a new node 
    def __init__(self, key, t, n): 
        self.key = key 
        self.left = None
        self.right = None
        self.t = t
        self.n = n
        self.p = None
    def get_parent(self, p):
        self.p = p
    def judge_leave_nodes(self):
        if (self.left == None) and (self.right == None):
            return True
        else:
            return False
    def calculate_UCB(self):
        C = 0.2
        if (self.n == 0):
            if self.judge_leave_nodes():
                UCB = self.t + 1000
            else:
                UCB = 1000
        else:
            UCB = self.t/self.n + C * math.sqrt(math.log(self.p.n)/self.n)
        return UCB

創建節點和樹的函數

定義函數來創建樹

def createNode(parent, T,N, i, created, root): 
  
    # If this node is already created 
    if created[i] is not None: 
        return
  
    # Create a new node and set created[i] 
    created[i] = Node(i,T[i],N[i]) 
    
    # If 'i' is root, change root pointer and return 
    if parent[i] == -1: 
        root[0] = created[i] # root[0] denotes root of the tree 
        return
  
    # If parent is not created, then create parent first 
    if created[parent[i]] is None: 
        createNode(parent, parent[i], T[i],N[i], created, root ) 
  
    # Find parent pointer 
    p = created[parent[i]]
    created[i].get_parent(p)
  
    # If this is first child of parent 
    if p.left is None: 
        p.left = created[i] 
    # If second child 
    else: 
        p.right = created[i] 
  
  
# Creates tree from parent[0..n-1] and returns root of the 
# created tree 
def createTree(parent): 
    n = len(parent) 
      
    # Create and array created[] to keep track  
    # of created nodes, initialize all entries as None 
    created = [None for i in range(n+1)] 
      
    root = [None] 
    for i in range(n): 
        createNode(parent, T, N, i, created, root) 
  
    return root[0] 

中序遍歷樹

用來中序遍歷樹,就是顯示一下啦🙂

#Inorder traversal of tree 
def inorder(root): 
    if root is not None: 
        inorder(root.left) 
        print (root.key, root.t, root.n, root.p)
        inorder(root.right)  

MCTS函數

內含MCTS的四個步驟,即Selection、Expansion、Simulation和Backpropagation

def MCTS(root):
    node = root
    Rollout_node = None
    Previous_Rollout_node = root
    number_of_rollout = 0
    # choose the same path
    while (Rollout_node is not Previous_Rollout_node):
        # until reach leaf node
        while (node.judge_leave_nodes() is not True):
            left = node.left
            right = node.right
            ## Selection
            UCB_left = left.calculate_UCB()
            UCB_right = right.calculate_UCB()
            if (UCB_left >= UCB_right):
                node = left
            else:
                node = right
            # the node is the leaf node now, a rollout complete

        # backpropagation
        Previous_Rollout_node = Rollout_node
        Rollout_node = node
        number_of_rollout += 1
        #print ("Inorder Traversal of constructed tree")
        #print(inorder(root))

        while node.p is not None:
            node.p.t += Rollout_node.t
            node.p.n += 1
            node = node.p

    node = Rollout_node
    #print("The best path:")
    #while node.p is not None:
    #    print("%d<-" %(node.key), end='')
    #    node = node.p
    #print(root.key)
    #print("The Reward: %d" %(Rollout_node.t))
    print("The Number of Rollout: %d" %(number_of_rollout))
    return Rollout_node.t

我這裏設置的結束條件是兩次走同一條路,也就意味着別的路沒有可能再試了,我感覺還是很makesense的。

測試效果

初始化一顆樹

我們這裏先做一顆簡單的樹,除了T的最後2depth2^{depth}個數是我們制定的,其他的數都是和depth相關的,有規律的。

#           0:0 0
#          /      \
#     1:0 0        2: 0 0
#    /    \        /    \
#3:2 0   4:1 0   5:-1 0  6:3 0      
parent = [-1, 0, 0, 1, 1, 2, 2]
T = [0, 0, 0, 2, 1, -1, 3]
N = [0,0,0,0,0,0,0]
root = createTree(parent)

print ("Inorder Traversal of constructed tree")
inorder(root)

MCTS(root)

通過3步,最終找到的是0130\rightarrow1\rightarrow3這條路,得到的數是2

MCTS有問題?

我感覺MCTS的問題在上面的例子裏還是看得出來,開始的時候都沒有走過,會向左到1,然後向左到3,沒毛病。第二次右邊沒走過到2,然後都沒有走過向左到5。那麼問題來了,2的值已經被5拉低了,也就是現在由於5的表現差拖累了2,進一步讓6(原本是最好的)根本沒有機會被遍歷。這就是MCTS的問題。
不過這是因爲5和6的數差的太多,實際上我感覺並不會,比如下圍棋,我猜想你上一步走的很差,下一步大概率怎麼走都不太行?(也說不準)
但是還是這個意思,MCTS的兄弟節點某種程度上在相互影響,甚至拖累。
不過既然沒有所有都遍歷,就得承擔這個風險😏,你不能啥都要吧┓( ´∀` )┏

迴歸問題

現在我們設置depth=12depth=12也就是一共四千多個節點,先初始化數

import numpy as np
depth = 12

N = [0]*((2**(depth+1) -1))
T = [0]*((2**(depth) -1))
T.extend(np.random.randint(-100,101,2**(depth)).tolist())
import matplotlib.pyplot as plt
plt.hist(T[(2**(depth) -1):],bins = 100,color='red',histtype='stepfilled',alpha=0.75)
plt.title('distribution of T')
plt.show()

parent = [-1]
for i in range((2**depth) -1):
    parent.append(i)
    parent.append(i)

root = createTree(parent)
#print ("Inorder Traversal of constructed tree")
#inorder(root)
MCTS(root)

最後一層有四千多個數,我們從-100到100都有,所以理論上應該最優解是100.
我們做50次這個實驗,看看得到的數都是多少(rewards)以及用多少步得到這個數(number_of_rollout)

import numpy as np
depth = 12
rewards = []
numbers_of_rollout = []

for i in range(50):
    N = [0]*((2**(depth+1) -1))
    T = [0]*((2**(depth) -1))
    T.extend(np.random.randint(-100,101,2**(depth)).tolist())
    import matplotlib.pyplot as plt

    parent = [-1]
    for i in range((2**depth) -1):
        parent.append(i)
        parent.append(i)

    root = createTree(parent)
    [R,N] = MCTS(root)
    rewards.append(R)
    numbers_of_rollout.append(N)

所有的子結點我們分配的數值爲
在這裏插入圖片描述
所以確實應該可以得到100,但是也有-100,平均值是-1.07。那我們看看MCTS的表現吧~

看看效果吧

我們畫一下這兩個的分佈
在這裏插入圖片描述
平均可以得到88.58,感覺還是可以的,除了那個五十幾不太行,我猜他是被兄弟拖累了hhh?
在這裏插入圖片描述
這個數我覺得更有用,平均32.56步就可以得到一個相對好的值,這太不容易了,把所有的遍歷一遍要2122^{12}步,省了也太多了吧,這麼看起來88.58還是不錯的✌️

C的作用

在UCB中C是用來平衡exploration和exploitation的。越大的C會讓rollout的時候更加傾向於走那些沒怎麼走過的點,也就是走過的路應該會變得更多,得到的結果就應該會變得更好,讓我們看一下實際效果吧~~
在這裏插入圖片描述
可以看出來確實是變好了一點,不太顯著不過也可以理解,畢竟已經這麼好了~~
在這裏插入圖片描述
也可以看得出來明顯rollout個數是在上升的~
C確實是在做一個平衡(tradeoff),到底是選擇更好的reward還是更少的rollout,由你選擇咯(practically choice)。
不過我看wiki說一般選擇(2)\sqrt(2)我也不知道爲什麼我這裏的C彷彿應該取的很大的樣子。。不管了問題不大
最後把最後一段的代碼附上,還有個小tip,我發現np.log,np.sqrt比math.*好用哈哈哈

Rewards = []
Number_Of_Rolls = []

def frange(start, stop=None, step=None):

    if stop == None:
        stop = start + 0.0
        start = 0.0

    if step == None:
        step = 1.0

    while True:
        if step > 0 and start >= stop:
            break
        elif step < 0 and start <= stop:
            break
        yield ("%g" % start) # return float number
        start = start + step

import numpy as np
# Python implementation to construct a Binary Tree from 
# parent array
C = 0.0
while C <= 100:
# A node structure 
    class Node: 
        # A utility function to create a new node 
        def __init__(self, key, t, n): 
            self.key = key 
            self.left = None
            self.right = None
            self.t = t
            self.n = n
            self.p = None
        def get_parent(self, p):
            self.p = p
        def judge_leave_nodes(self):
            if (self.left == None) and (self.right == None):
                return True
            else:
                return False
        def calculate_UCB(self):
            #C = 0.2
            if (self.n == 0):
                if self.judge_leave_nodes():
                    UCB = self.t + 1000
                else:
                    UCB = 1000
            else:
                UCB = self.t/self.n + C * np.sqrt(np.log(self.p.n)/self.n)
            return UCB



    def createNode(parent, T,N, i, created, root): 

        # If this node is already created 
        if created[i] is not None: 
            return

        # Create a new node and set created[i] 
        created[i] = Node(i,T[i],N[i]) 

        # If 'i' is root, change root pointer and return 
        if parent[i] == -1: 
            root[0] = created[i] # root[0] denotes root of the tree 
            return

        # If parent is not created, then create parent first 
        if created[parent[i]] is None: 
            createNode(parent, parent[i], T[i],N[i], created, root ) 

        # Find parent pointer 
        p = created[parent[i]]
        created[i].get_parent(p)

        # If this is first child of parent 
        if p.left is None: 
            p.left = created[i] 
        # If second child 
        else: 
            p.right = created[i] 


    # Creates tree from parent[0..n-1] and returns root of the 
    # created tree 
    def createTree(parent): 
        n = len(parent) 

        # Create and array created[] to keep track  
        # of created nodes, initialize all entries as None 
        created = [None for i in range(n+1)] 

        root = [None] 
        for i in range(n): 
            createNode(parent, T, N, i, created, root) 

        return root[0] 

    #Inorder traversal of tree 
    def inorder(root): 
        if root is not None: 
            inorder(root.left) 
            print (root.key, root.t, root.n, root.p)
            inorder(root.right)
    def MCTS(root):
        node = root
        Rollout_node = None
        Previous_Rollout_node = root
        number_of_rollout = 0
        # choose the same path
        while (Rollout_node is not Previous_Rollout_node):
            # until reach leaf node
            while (node.judge_leave_nodes() is not True):
                left = node.left
                right = node.right
                ## Selection
                UCB_left = left.calculate_UCB()
                UCB_right = right.calculate_UCB()
                if (UCB_left >= UCB_right):
                    node = left
                else:
                    node = right
                # the node is the leaf node now, a rollout complete

            # backpropagation
            Previous_Rollout_node = Rollout_node
            Rollout_node = node
            number_of_rollout += 1
            #print ("Inorder Traversal of constructed tree")
            #print(inorder(root))

            while node.p is not None:
                node.p.t += Rollout_node.t
                node.p.n += 1
                node = node.p

        node = Rollout_node
        #print("The best path:")
        #while node.p is not None:
        #    print("%d<-" %(node.key), end='')
        #    node = node.p
        #print(root.key)
        #print("The Reward: %d" %(Rollout_node.t))
        #print("The Number of Rollout: %d" %(number_of_rollout))
        return (Rollout_node.t,number_of_rollout)


    import numpy as np
    depth = 12
    rewards = []
    numbers_of_rollout = []

    for i in range(5):
        N = [0]*((2**(depth+1) -1))
        T = [0]*((2**(depth) -1))
        T.extend(np.random.randint(-100,101,2**(depth)).tolist())
        import matplotlib.pyplot as plt

        parent = [-1]
        for i in range((2**depth) -1):
            parent.append(i)
            parent.append(i)

        root = createTree(parent)
        [R,N] = MCTS(root)
        rewards.append(R)
        numbers_of_rollout.append(N)

    Rewards.append(np.mean(rewards))
    Number_Of_Rolls.append(np.mean(numbers_of_rollout))
    C += 1

以及

x = range(101)
l1=plt.plot(x,Rewards,'r--',label='average rewards')
plt.plot(x,Rewards,'ro-')
#plt.title('The Lasers in Three Conditions')
plt.xlabel('C')
plt.ylabel('Rewards')
plt.legend()
plt.savefig('R.png')
plt.show()
l2=plt.plot(x,Number_Of_Rolls,'g--',label='number of rollout')
plt.plot(x,Number_Of_Rolls,'g+-')
#plt.title('The number of rollout')
plt.xlabel('number')
plt.ylabel('C')
plt.legend()
plt.savefig('N.png')
plt.show()
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章