無人駕駛12:路徑規劃,生成路徑曲線

優達學城的第7個項目實現:https://github.com/luteresa/P7-Path-Planning.git

yak

運動規劃問題:

tg000

配置空間

配置空間:定義機器人的所有可能配置,一般在二維空間中,定義爲[x,y, theat],即二維座標加方向。

運動規劃:

就是根據初始配置(由定位模塊傳感器獲得),目的配置(由行爲規劃模塊獲得),在一定的限制條件下(車輛屬性,交通法規),結合預測模塊(提供有關障礙區域隨時間演進的信息),在配置空間中,生成一個可執行的運動序列(納入了其他車輛及行人考量)。

這些運動(可轉化爲車輛的最終指令,比如剎車,油門,轉向)將驅動機器人,從開始配置移動到目標配置,過程中會繞開障礙物,

討論規劃算法時,常需要考慮兩個重要因素,完整性和最優性

tg001

運動規劃算法有很多種,包括

組合法:將自由空間分成小塊,然後將這些原子元素鏈接起來,解決運動規劃問題;這種方法與我們直覺相符,去發現最初的近似解;但當環境規模擴大後,這種方法表現不佳;

位勢場算法:是反饋算法,每個障礙物會產生一種反重力場,這使車輛不會去靠近它們。比如算法軌跡會繞過行人和自行車;
位勢場算法的主要問題是,有時候它會把我們帶入局部極小值中。難以找到最合適解;

最優控制法:通過算法來生成控制PoD,使用動態模型,爲車輛,最初配置和最終配置建模,概算法需要我們生成一系列的輸入,比如轉向和減速,這些輸入會把我們從開始配置帶到最終配置。在此過程中,算法會優化與控制輸入,車輛配置等相關的成本函數。比如最小汽油消耗。這種算法有多種實現方式,大部分都基於數值優化法;

但是要把所有與其他車輛相關的限制都很好地納入考量,還要使算法很快地的得出結果,是相當困難的。

抽樣算法:使用一個基於碰撞檢測的模塊,該模塊探測自由空間,測試是否一個配置在其中會發生碰撞,不同於組合法和最優控制法(這兩種方法均需要分析整個環境),抽樣法不需要檢測所有自由空間,便可以發現一條路徑,搜索過的路徑保存在一個圖結構中,然後使用圖搜索算法例如DAD^\star 或者A^\star來搜索。

tg002

有兩種方法可以認定是基於抽樣的,

離散法:它依賴於構造良好的輔助集合或者輸入,比如覆蓋在我們配置空間上的一張格子圖;

概率統計法:它依賴連續配置空間的概率樣本,待探索的可能配置或者狀態集,可能是無窮多的,這讓這類算法具有一種不錯的性質,那就是它們是概率完整的,並且有時候概率的優化,意味着它們最終會找出一個解來,前提是給予足夠的計算時間。
tg005

HybridAHybrid A^\star

1.混合AA^\star的更新方程

(假設無人車爲自行人模型)

tg101

這個方程對於絕大部分具有X,Y域的配置空間是通用的。

無人車應用中,ω\omega是個受限值,因爲無人車不可能圍繞Z周無限轉動(掃地機器人可以)。

假設最大轉向角(左右均)爲35o35^o,可以用3種新的基本運動來擴展搜索樹,搜索迷宮效果

tg103

去掉多餘信息,得到一條平滑的可駕駛曲線。

tg104

1.不是最優的,也不一定是最平滑的。;

2.有時候找不出存在解(爲了實現連續,丟棄了部分網格擴展方向的完整性);
爲了解決該問題,可以給搜索空間添加第三維,即朝向,這樣不會過早關閉實際有解的配置空間。

和標準AA^\star相比,混合AA^\star犧牲了少許完整性和優化性,但是確保了可駕駛性。

tg100

而在實際應用中,混合AA^\star效率極高,而且幾乎每次都可以找到不錯的路徑。

2. HybridAHybrid A^\star的啓發函數

The paper Junior: The Stanford Entry in the Urban Challenge is a good read overall, but Section 6.3 - Free Form Navigation is especially good and goes into detail about how the Stanford team thought about heuristics for their Hybrid A* algorithm (which they tended to use in parking lots).

僞代碼實現:

def expand(state, goal):
    next_states = []
    for delta in range(-35, 40, 5): 
        # Create a trajectory with delta as the steering angle using 
        # the bicycle model:

        # ---Begin bicycle model---
        delta_rad = deg_to_rad(delta)
        omega = SPEED/LENGTH * tan(delta_rad)
        next_x = state.x + SPEED * cos(theta)
        next_y = state.y + SPEED * sin(theta)
        next_theta = normalize(state.theta + omega)
        # ---End bicycle model-----

        next_g = state.g + 1
        next_f = next_g + heuristic(next_x, next_y, goal)

        # Create a new State object with all of the "next" values.
        state = State(next_x, next_y, next_theta, next_g, next_f)
        next_states.append(state)

    return next_states

def search(grid, start, goal):
    # The opened array keeps track of the stack of States objects we are 
    # searching through.
    opened = []
    # 3D array of zeros with dimensions:
    # (NUM_THETA_CELLS, grid x size, grid y size).
    closed = [[[0 for x in range(grid[0])] for y in range(len(grid))] 
        for cell in range(NUM_THETA_CELLS)]
    # 3D array with same dimensions. Will be filled with State() objects 
    # to keep track of the path through the grid. 
    came_from = [[[0 for x in range(grid[0])] for y in range(len(grid))] 
        for cell in range(NUM_THETA_CELLS)]

    # Create new state object to start the search with.
    x = start.x
    y = start.y
    theta = start.theta
    g = 0
    f = heuristic(start.x, start.y, goal)
    state = State(x, y, theta, 0, f)
    opened.append(state)

    # The range from 0 to 2pi has been discretized into NUM_THETA_CELLS cells. 
    # Here, theta_to_stack_number returns the cell that theta belongs to. 
    # Smaller thetas (close to 0 when normalized  into the range from 0 to 
    # 2pi) have lower stack numbers, and larger thetas (close to 2pi when 
    # normalized) have larger stack numbers.
    stack_num = theta_to_stack_number(state.theta)
    closed[stack_num][index(state.x)][index(state.y)] = 1

    # Store our starting state. For other states, we will store the previous 
    # state in the path, but the starting state has no previous.
    came_from[stack_num][index(state.x)][index(state.y)] = state

    # While there are still states to explore:
    while opened:
        # Sort the states by f-value and start search using the state with the 
        # lowest f-value. This is crucial to the A* algorithm; the f-value 
        # improves search efficiency by indicating where to look first.
        opened.sort(key=lambda state:state.f)
        current = opened.pop(0)

        # Check if the x and y coordinates are in the same grid cell 
        # as the goal. (Note: The idx function returns the grid index for 
        # a given coordinate.)
        if (idx(current.x) == goal[0]) and (idx(current.y) == goal.y):
            # If so, the trajectory has reached the goal.
            return path

        # Otherwise, expand the current state to get a list of possible 
        # next states.
        next_states = expand(current, goal)
        for next_s in next_states:
            # If we have expanded outside the grid, skip this next_s.
            if next_s is not in the grid:
                continue
            # Otherwise, check that we haven't already visited this cell and
            # that there is not an obstacle in the grid there.
            stack_num = theta_to_stack_number(next_s.theta)
            if closed[stack_num][idx(next_s.x)][idx(next_s.y)] == 0 
                and grid[idx(next_s.x)][idx(next_s.y)] == 0:
                # The state can be added to the opened stack.
                opened.append(next_s)
                # The stack_number, idx(next_s.x), idx(next_s.y) tuple 
                # has now been visited, so it can be closed.
                closed[stack_num][idx(next_s.x)][idx(next_s.y)] = 1
                # The next_s came from the current state, and is recorded.
                came_from[stack_num][idx(next_s.x)][idx(next_s.y)] = current

代碼實現:

HBF::maze_path HBF::search(vector< vector<int> > &grid, vector<double> &start, 
                           vector<int> &goal) {
  // Working Implementation of breadth first search. Does NOT use a heuristic
  //   and as a result this is pretty inefficient. Try modifying this algorithm 
  //   into hybrid A* by adding heuristics appropriately.

  /**
   * TODO: Add heuristics and convert this function into hybrid A*
   */
  vector<vector<vector<int>>> closed(
    NUM_THETA_CELLS, vector<vector<int>>(grid[0].size(), vector<int>(grid.size())));
  vector<vector<vector<maze_s>>> came_from(
    NUM_THETA_CELLS, vector<vector<maze_s>>(grid[0].size(), vector<maze_s>(grid.size())));
  double theta = start[2];
  int stack = theta_to_stack_number(theta);
  int g = 0;

  maze_s state;
  state.g = g;
  state.x = start[0];
  state.y = start[1];
  state.theta = theta;
  state.f = g + heuristic(state.x, state.y, goal);

  closed[stack][idx(state.x)][idx(state.y)] = 1;
  came_from[stack][idx(state.x)][idx(state.y)] = state;
  int total_closed = 1;
  vector<maze_s> opened = {state};
  bool finished = false;
  while(!opened.empty()) {
  	sort(opened.begin(), opened.end(), compare_maze_s);
    maze_s current = opened[0]; //grab first elment
    opened.erase(opened.begin()); //pop first element

    int x = current.x;
    int y = current.y;

    if(idx(x) == goal[0] && idx(y) == goal[1]) {
      std::cout << "found path to goal in " << total_closed << " expansions" 
                << std::endl;
      maze_path path;
      path.came_from = came_from;
      path.closed = closed;
      path.final = current;

      return path;
    }

    vector<maze_s> next_state = expand(current, goal);//擴展多個可運動的方向

    for(int i = 0; i < next_state.size(); ++i) {
      int g2 = next_state[i].g;
      double x2 = next_state[i].x;
      double y2 = next_state[i].y;
      double theta2 = next_state[i].theta;

      if((x2 < 0 || x2 >= grid.size()) || (y2 < 0 || y2 >= grid[0].size())) {
        // invalid cell
        continue;
      }
      //stack2範圍是多少?
      int stack2 = theta_to_stack_number(theta2);

      if(closed[stack2][idx(x2)][idx(y2)] == 0 && grid[idx(x2)][idx(y2)] == 0) {
        opened.push_back(next_state[i]);
        closed[stack2][idx(x2)][idx(y2)] = 1;
        came_from[stack2][idx(x2)][idx(y2)] = current;
        ++total_closed;
      }
    }
  }

  std::cout << "no valid path." << std::endl;
  HBF::maze_path path;
  path.came_from = came_from;
  path.closed = closed;
  path.final = state;

  return path;
}

環境分類

此算法是非結構化環境中,路徑探尋的最佳算法之一。

tg106

非結構化場景:應用實例有停車場和迷宮等,特點是駕駛規則限制少,速度低,與高速公路以及城市環境殊爲不同,這種環境裏並不存在明顯的參考路徑(也就是哪種車輛90%時間都在上面行駛的路徑),因爲這種環境中,情況隨時在變化。

結構化場景:相比而言,高速公路和街道是相當結構化的環境,所有的駕駛都在預定義規則的限制下進行,而我們的駕駛也可按照某些規則進行,如車輛行駛方向、車道邊界、速度限制等。這些規則對車輛行駛加以限制,當我們生成運動軌跡時,也必須滿足所有這些規則。

一種在無規則下的最佳算法,如果不能有效利用這些規則,則不適合這種結構化環境。

而對於高速公路這種結構化的環境,我們可以有效利用環境規則,比如以車道線爲參考。下面學習一種適合高速公路的軌跡生成算法。

多項式軌跡法(適用高速環境)

Frenet 座標系, s, d, t

用Frenet座標,可以簡化車輛運動模型,比如高速路上的一個超車場景

tg107

該場景實際上還忽略了一個因素,時間t,因爲高速上的環境是動態的,隨時可能有車周邊經過。

tg108

考慮三維場景(因場景是隨機的,必須考慮時間因素)

tg200

將三維場景分解爲兩個獨立的二維模型, s/t, d/t

tg201

Structured Trajectory Generation Overview

我們的目標是生成連續的軌跡,軌跡是一個函數,在任一時間點上,該函數可以求得一輛車的一個配置.

比如下場景,汽車在高速路上勻速行駛,10S後需要駛出高速,即已知當前狀態(位置,速度,加速度),終點狀態,以及時間,擬合出最佳的連續曲線。

tg300

擬合的曲線,既要考慮連續性,還要考慮平滑性,否則車輛會給人不舒適感甚至出現危險。

很明顯,引起不舒適感的因素是加速度的變化率(顛簸)。

tg301

到這裏,問題轉化爲,擬合曲線,並且加速度變化率最小(最舒適),即jerk最小化

jerk minimizing軌跡

按上面分析問題簡化爲,在單個維度,s或者d,找出能夠使得Jerk(t)最小的軌跡.

Takahashi等人已經證明:任何Jerk最優化問題中的解都可以用一個5次多項式來表示:

tg302

問題轉化爲,根據邊界條件,求5次多項式,對應有6個可調參數,6個邊界條件

tg303

假設橫軸d的初始速度/加速度,最終速度/加速度均爲0,運動曲線如下:

tg304

求Jerk係數方法:

根據運動學公式,對s(t)1次求導得到速度,2次求導得到加速度,公式如下:

s(t)=α0+α1t+α2t2+α3t3+α4t4+α55 s(t) = \alpha_0 + \alpha_1t + \alpha_2t^2 + \alpha_3t^3 + \alpha_4t^4 + \alpha_5^5

s˙(t)=α1+2α2t+3α3t2+4α4t3+5α54 \dot s(t) = \alpha_1 + 2\alpha_2t + 3\alpha_3t^2 + 4\alpha_4t^3 + 5\alpha_5^4

s¨(t)=2α2+6α3t+12α4t2+20α53 \ddot s(t) = 2\alpha_2 + 6\alpha_3t + 12\alpha_4t^2 + 20\alpha_5^3

對初始邊界條件,ti=0t_i=0:
si=s(0)=α0 s_i = s(0) = \alpha_0

si˙=si˙(0)=α1 \dot{s_i} = \dot{s_i}(0) = \alpha_1

si¨=si¨(0)=2α2 \ddot{s_i} = \ddot{s_i}(0) = 2\alpha_2

這樣,就只剩下3個未知係數:

s(t)=si+si˙t+si¨2t2+α3t3+α4t4+α5t5s(t) s(t) = s_i + \dot{s_i}t + \frac{\ddot{s_i}}{2}t^2 + \alpha_3t^3 + \alpha_4t^4 + \alpha_5t^5s(t)

s˙(t)=si˙+si¨t+3α3t2+4α4t3+5α5t4 \dot{s}(t) = \dot{s_i} + \ddot{s_i}t + 3 \alpha_3t^2 + 4\alpha_4t^3 + 5\alpha_5t^4

s¨(t)=si¨+6α3t+12α4t2+20α5t3 \ddot{s}(t) = \ddot{s_i} + 6 \alpha_3t + 12\alpha_4t^2 + 20\alpha_5t^3

方程改寫爲:
s(t)=α3t3+α4t4+α55+C1 s(t) = \alpha_3t^3 + \alpha_4t^4 + \alpha_5^5 + C_1

s˙(t)=3α3t2+4α4t3+5α54+C2 \dot s(t) = 3\alpha_3t^2 + 4\alpha_4t^3 + 5\alpha_5^4 + C_2

s¨(t)=6α3t+12α4t2+20α53+C3 \ddot s(t) = 6\alpha_3t + 12\alpha_4t^2 + 20\alpha_5^3 + C_3

當t不等於0時,

s(tf)=α3t3+α4t4+α55+C1 s(t_f) = \alpha_3t^3 + \alpha_4t^4 + \alpha_5^5 + C_1

s˙(tf)=3α3t2+4α4t3+5α54+C2 \dot s(t_f) = 3\alpha_3t^2 + 4\alpha_4t^3 + 5\alpha_5^4 + C_2

s¨(tf)=6α3t+12α4t2+20α53+C3 \ddot s(t_f) = 6\alpha_3t + 12\alpha_4t^2 + 20\alpha_5^3 + C_3

寫成矩陣形式:
tg309

根據最終配置空間值和時間,就可以求出所有係數。

實現代碼:

vector<double> JMT(vector<double> &start, vector<double> &end, double T) {
  /**
   * Calculate the Jerk Minimizing Trajectory that connects the initial state
   * to the final state in time T.
   *
   * @param start - the vehicles start location given as a length three array
   *   corresponding to initial values of [s, s_dot, s_double_dot]
   * @param end - the desired end state for vehicle. Like "start" this is a
   *   length three array.
   * @param T - The duration, in seconds, over which this maneuver should occur.
   *
   * @output an array of length 6, each value corresponding to a coefficent in 
   *   the polynomial:
   *   s(t) = a_0 + a_1 * t + a_2 * t**2 + a_3 * t**3 + a_4 * t**4 + a_5 * t**5
   *
   * EXAMPLE
   *   > JMT([0, 10, 0], [10, 10, 0], 1)
   *     [0.0, 10.0, 0.0, 0.0, 0.0, 0.0]
   */
	MatrixXd A = MatrixXd(3,3);
	auto T2 = T*T;
  	auto T3 = T2*T;
	auto T4 = T2*T2;
	auto T5 = T3*T2;
	A << T3,T4,T5,
		3*T2,4*T3,5*T4,
		6*T, 12*T2,20*T3;
  	MatrixXd B = MatrixXd(3,1);
  	B << end[0] - (start[0] + start[1]*T + 0.5*start[2]*T2),
  	end[1] - (start[1] + start[2]*T),
  	end[2] - start[2];

	MatrixXd 
Ai = A.inverse();

	MatrixXd C = Ai*B;

  	vector<double> result = {start[0],start[1],0.5*start[2]};

	for (int i = 0; i < C.size(); i++) {
		result.push_back(C.data()[i]);
	}

	return result;
}

同樣,在橫軸上也可以求出其6個參數。

即,在初始狀態,目標狀態都已知的情況下,給定一個時間t,就可以可到一個連續,平滑的曲線。

tg308

如何選擇最優

按上面的方法,對每個不同的目標配置空間,或時間T,都可以生成不同的Jerk軌跡,

那麼如何選擇最優軌跡,需要考慮哪些限制條件呢?

tg400

設計代價函數:

Cd=kjJt(d(t))+ktT+kdd12 C_d = k_jJ_t(d(t)) + k_tT + k_d d1^2

kjJt(d(t))k_jJ_t(d(t)): 懲罰Jerk較大的備選軌跡;

ktTk_tT: 制動應當迅速,時間段;

kdd12k_d d1^2: 目標狀態不應偏離道路中心線太遠;

其中kj,kt,kdk_j, k_t,k_d是這三個懲罰項的比例係數,也叫權重值,它們的大小決定是代價函數更偏向哪一方便優化。

tg401

以上損失函數僅適用於高速場景,在極端低速情況下,車輛的制動能力是不完整的,不能再將d設計爲t的五次多項式,損失函數也會有所不同,

但是這種基於優先採樣軌跡的思路,以及通過優化損失函數搜索最優軌跡的方法原理是相通的。

在縱向上,優化場景大致分三類:

跟車

匯流和停車

車速保持

在高速公路等應用場景中,目標配置中並不需要考慮目標位置(即 s1s_1 ),所以初始目標配置仍然是 (s0,s0˙,s0¨)(s_0,\dot{s_0},\ddot{s_0}),目標配置變成s1˙,s1¨\dot{s_1},\ddot{s_1},損失函數爲:

Cs=kjJt(s(t))+ktT+ks˙(s1˙sc˙)2 C_s = k_jJ_t(s(t)) + k_tT + k_{\dot{s}}(\dot{s_1} - \dot{s_c})^2

其中 scs_c 是我們想要保持的縱向速度,第三個懲罰項的引入實際上是爲了讓目標配置中的縱向速度儘可能接近設定速度,該情景下的目標配置集爲:
[s1˙,s1¨,T]ij=[[sc˙+Δsi˙],0,Tj] [\dot{s_1}, \ddot{s_1}, T]_{ij} = [[\dot{s_c} + \Delta\dot{s_i}], 0, T_j]

即優化過程中的可變參數爲 Δsi˙\Delta\dot{s_i}TjT_j ,同樣,也可以通過設置 ΔT\Delta TΔs˙\Delta \dot s 來設置軌跡採樣的密度,從而獲得一個有限的縱向軌跡集合:

tg402

以上分別討論了橫向和縱向最優軌跡搜索方法,在應用中也可以將兩個方差合併爲一個:
Ctotal=klatCd+klonCs C_{total} = k_{lat}C_d + k_{lon}C_s
這樣,就可以通過最小化CtotalC_{total}得到優化軌跡集合,不僅可以得到“最優”估計參數,還可以得到“次優”,“次次優”軌跡等。

tg407

可行性檢測

tg405

我們設計的損失函數,並沒有包括障礙物規避的相關懲罰,也沒有最大速度,最大加速度和最大麴率等控制約束限制。

每考慮這些的主要原因是,使得損失函數設計非常複雜。

基於Frenet座標系的方法,是將這些因素的考量獨立出來,在完成優化軌跡以後進行。

對所有完成的軌跡做一次可行性檢測,過濾不符合約束條件,或碰到障礙物的軌跡。

檢查內容一般有:

s方向上的速度是否超過設定的最大限速

s方向的加速度是否超過設定的最大加速度

軌跡的曲率是否超過最大麴率

軌跡是否會引起碰撞(事故)

障礙物規避,又與行爲預測有關聯,也可在預測部分就規避掉,兩者都是一個複雜的課題。

Optimal Trajectory Generation For Dynamic Street Scenarios In A Frenet Frame

小結:

以上學習了好幾種軌跡生成算法,在解決無人駕駛汽車遇到的所有情況時,沒有哪種方法可以說是最佳方法;

現實中的無人車會使用若干種方法,依情況而定,在停車場,可能使用混合方法,在車流量不大的高速上,可能使用多項式軌跡生成法;

同時也許會使用別的方法來應對其他情況,比如分叉路口,車流密集道路等。

Additional Resources on Path Planning

Indoors
Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation by S. W. Gao, et. al.

Abstract: How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and “intentions” directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals.

City Navigation

Learning to Navigate in Cities Without a Map by P. Mirowski, et. al.

Abstract: Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognizable landmarks and robust visual processing, that can simultaneously support continuous self-localization (“I am here”) and a representation of the goal (“I am going there”). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. […] We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometers away. […]

Intersections

A Look at Motion Planning for Autonomous Vehicles at an Intersection by S. Krishnan, et. al.

Abstract: Autonomous Vehicles are currently being tested in a variety of scenarios. As we move towards Autonomous Vehicles, how should intersections look? To answer that question, we break down an intersection management into the different conundrums and scenarios involved in the trajectory planning and current approaches to solve them. Then, a brief analysis of current works in autonomous intersection is conducted. With a critical eye, we try to delve into the discrepancies of existing solutions while presenting some critical and important factors that have been addressed. Furthermore, open issues that have to be addressed are also emphasized. We also try to answer the question of how to benchmark intersection management algorithms by providing some factors that impact autonomous navigation at intersection.

Planning in Traffic with Deep Reinforcement Learning

DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation by L. Fridman, J. Terwilliger and B. Jenik

Abstract: We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章