1、本篇文章主要介紹在gym環境下env環境編寫(未待完續,空閒更新)
經典的迷宮寶藏例子
import numpy as np
import time
import tkinter as tk # 窗口界面庫
UNIT = 40 # 像素
MAZE_H = 5 # 高寬格子數
MAZE_W = 3
class Maze(tk.Tk): # 新類繼承父類tkinter.Tk
def __init__(self):
super(Maze, self).__init__() # super類繼承方法,初始化
self.action_space = ['u', 'd', 'l', 'r']
self.n_actions = len(self.action_space)
self.title('maze')
self.geometry('{0}x{1}'.format(MAZE_H * UNIT, MAZE_H * UNIT)) # format格式化函數 geometry分辨率函數
self._build_maze()
def _build_maze(self):
self.canvas = tk.Canvas(self, bg='white', # tk畫圖組件
height=MAZE_H * UNIT,
width=MAZE_W * UNIT)
for c in range(0, MAZE_W * UNIT, UNIT): # 網格
x0, y0, x1, y1 = c, 0, c, MAZE_H * UNIT
self.canvas.create_line(x0, y0, x1, y1)
for r in range(0, MAZE_H * UNIT, UNIT):
x0, y0, x1, y1 = 0, r, MAZE_W * UNIT, r
self.canvas.create_line(x0, y0, x1, y1)
origin = np.array([20, 20]) # 原點
hell2_center = origin + np.array([UNIT * 0, UNIT * 1])
self.hell2 = self.canvas.create_rectangle(
hell2_center[0] - 15, hell2_center[1] - 15,
hell2_center[0] + 15, hell2_center[1] + 15,
fill='black')
hell3_center = origin + np.array([UNIT * 1, UNIT * 1])
self.hell3 = self.canvas.create_rectangle(
hell3_center[0] - 15, hell3_center[1] - 15,
hell3_center[0] + 15, hell3_center[1] + 15,
fill='black')
hell6_center = origin + np.array([UNIT * 1, UNIT * 3])
self.hell6 = self.canvas.create_rectangle(
hell6_center[0] - 15, hell6_center[1] - 15,
hell6_center[0] + 15, hell6_center[1] + 15,
fill='black')
hell7_center = origin + np.array([UNIT * 2, UNIT * 3])
self.hell7 = self.canvas.create_rectangle(
hell7_center[0] - 15, hell7_center[1] - 15,
hell7_center[0] + 15, hell7_center[1] + 15,
fill='black')
oval_center = origin + np.array([UNIT * 2, UNIT * 4]) # 寶藏
self.oval = self.canvas.create_oval(
oval_center[0] - 15, oval_center[1] - 15,
oval_center[0] + 15, oval_center[1] + 15,
fill='yellow')
self.rect = self.canvas.create_rectangle( # 主體
origin[0] - 15, origin[1] - 15,
origin[0] + 15, origin[1] + 15,
fill='red')
self.canvas.pack() # 打包
def reset(self):
self.update()
time.sleep(0.1)
self.canvas.delete(self.rect) # 配置Python Tkinter 畫布(Canvas),刪除變化的矩形,然後重新創建
origin = np.array([20, 20])
self.rect = self.canvas.create_rectangle(
origin[0] - 15, origin[1] - 15,
origin[0] + 15, origin[1] + 15,
fill='red') # 創建中心位置紅色方塊代表當前位置
# return observation
return self.canvas.coords(self.rect)
def step(self, action):
s = self.canvas.coords(self.rect)
base_action = np.array([0, 0])
if action == 0: # up
if s[1] > UNIT:
base_action[1] -= UNIT
elif action == 1: # down
if s[1] < (MAZE_H - 1) * UNIT:
base_action[1] += UNIT
elif action == 2: # right
if s[0] < (MAZE_W - 1) * UNIT:
base_action[0] += UNIT
elif action == 3: # left
if s[0] > UNIT:
base_action[0] -= UNIT
self.canvas.move(self.rect, base_action[0], base_action[1]) # move agent
s_ = self.canvas.coords(self.rect) # next state
# reward function
if s_ == self.canvas.coords(self.oval):
reward = 1
done = True
s_ = 'terminal'
elif s_ in [self.canvas.coords(self.hell2), self.canvas.coords(self.hell3),
self.canvas.coords(self.hell6), self.canvas.coords(self.hell7)]:
reward = -1
done = True
s_ = 'terminal'
else:
reward = 0
done = False
return s_, reward, done
def render(self):
time.sleep(0.1)
self.update()
def update():
for t in range(10):
s = env.reset()
while True:
env.render()
a = 1
s, r, done = env.step(a)
if done:
break
if __name__ == '__main__':
env = Maze()
env.after(100, update)
env.mainloop()
經典的CartPole例子
reset函數:
def reset(self):
self.state = self.np_random.uniform(low=-0.05, high=0.05, size=(4,))
# 利⽤均勻隨機分佈初試化環境的狀態
self.steps_beyond_done = None
# 設置當前步數爲None
return np.array(self.state)
numpy的random庫, 用於生成隨機數矩陣,uniform函數用於產生均勻分佈隨機數(其他函數可以產生高斯分佈隨機數等。)
2、補充一點DQN算法模塊的函數閱讀
函數:尋寶算法模塊choose_action
def choose_action(self, observation):
self.check_state_exist(observation)
if np.random.uniform() < self.epsilon: # epsilon等於0.9,90%概率Q函數決定行動
state_action = self.q_table.loc[observation, :] # 從Q表中提取觀測值那一行的Q值
# 最大Q值對應多個動作時,隨機選擇動作
action = np.random.choice(state_action[state_action == np.max(state_action)].index)
else: # 10%概率隨機行動,探索其他可能
# choose random action
action = np.random.choice(self.actions)
return action
注:這裏的第一個observation是[5,5,35,35]。這個數組矩陣來自於canvas.create_rectangle函數,根據左上和右下座標繪製矩形,定義一個像素40*40,則起點位置的狀態是[5,5,35,35]。
參考文獻:Pandas中loc和iloc函數用法詳解(源碼+實例) https://blog.csdn.net/w_weiying/article/details/81411257
參考文獻:莫煩老師,DQN代碼學習筆記 https://blog.csdn.net/yyyxxxsss/article/details/80467058