FollowNet: Robot Navigation by Following Natural Language Directions with DRL

Abstract

We present FollowNet, an end-to-end differentiable neural architecture for learning multi-modal navigation policies. FollowNet maps natural language instructions as well as visual and depth inputs to locomotion primitives. (語言指令,視覺輸入映射到動作原語中).
FollowNet processes instructions using an attention mechanism(注意力機制) conditioned on its visual and depth input to focus on the relevant parts of the command while performing the navigation task. Deep reinforcement learning (RL) a sparse reward learns simultaneously the state representation, the attention function, and control policies(在學習稀疏獎勵的同時學了狀態表徵,注意力方程和控制策略).

Introduction

The novel aspect of the FollowNet architecture is a language instruction attention mechanism that is conditioned on the agent’s sensory observations. This allows the agent to do two things.

First, it keeps track of the instruction command and focuses on different parts as it explores the environment. (能完成指令的跟蹤以及在探索過程中重點關注不同的區域.)
Second, it associates motion primitives, sensory observations, and sections of the instruction with the reward received, which enables the agent to generalize to new instructions. (FollowNet將運動源語,傳感器觀測,指令與收到的回報相關聯,這使得網絡對新的指令具有普適性.)

Related Work

In this work, we provide natural language instructions instead of the explicit goal, and the agent must learn to interpret the instructions to complete the task. (常見的端到端導航算法DRL主要解決顯示目標點的導航問題,本文要求agent要從指令中發掘隱式的目標點.)

Methods

Problem formulation

we assume the robot to be a point-mass with 3-DOF $(x,y,\theta)$ , navigating in a 2D grid overlaid on a 3D indoor house environment. To train a DQN agent, we formulate the task as a POMDP(部分可觀馬爾可夫決策過程): a tuple $(O,A,D,R)$ with observations $o=[o_{NL},o_{V}] \in O$ , where ${o_{NL}=[\omega_{1}, \omega_{2}, \cdots, \omega_{i}]}$ is a natural language instruction sampled from a set of user-provided directions for reaching a goal. ${o_{V}}$ is the visual input available to the agent, which consists of the image that the robot sees at a time-step ${i}$ . The set of actions ${A=(turn\frac{\pi}{2}, go \ straight, turn\frac{3\pi}{2} )}$ . The system dynamics ${D:O \times A \rightarrow O}$ are deterministic and apply the action to the robot. The reward ${R:O \rightarrow R}$ rewards an agent reaching a landmark (waypoint) mentioned in the instruction.
Fig. 2 provides an example task, where the robot starts at the position and orientation specified by the blue triangle, and must reach the goal location specified by the red circle.

FollowNet

We present FollowNet, a neural architecture for approximating the action value function directly from the language and visual inputs. (直接從語言和視覺輸入進行動作值函數近似,說明該算法在DQN框架下.)

To simplify the image processing task, we assume a separate preprocessing step parses the visual input ${o_{v} \in R^{n \times m}}$ to obtain a semantic segmentation ${o_{S}}$ which assigns a one-hot semantic class id to each pixel,
and a depth map ${o_{D}}$ which assigns a real number to each pixel corresponding to the distance from the robot. (對輸入圖像進行預處理,得到語義分割結果和一個深度圖信息.完成這個步驟需要一系列的卷積神經網絡CNN和全連接層FC實現.)
We use a single layer bi-directional GRU network to encode the natural language instruction. To enable the agent
to focus on different parts of the instruction depending on the context, we add a feed-forward attention layer.

we use a feed-forward attention layer ${FF_{A}}$ conditioned on ${v{C}}$ , which is the concatenated embeddings of the visual and language inputs, to obtain unnormalized scores ${e_{i}}$ for each token ${\omega_{i}}$ (結合視覺和語言輸入,獲得未歸一化的單詞分數). ${e_{i}}$ are normalized using the softmax function to obtain the attention scores ${\alpha_{i}}$ , which correspond to the relative importance of each token of the instruction for the current time step (將單詞分數歸一化得到注意力分數,這個分數表徵了指令中每個token的相對重要性). We take the attention-weighted mean of the output vectors ${o_{i}}$ , and pass it through another feed-forward layer to obtain ${v_{L} \in R^{d_{L}}}$ , which is the final encoding of the natural language instruction (將所有的token與注意力分數加權後,得到原指令的編碼結果).
The Q function is then estimated from the concatenated ${[v_{S}; v_{D}; v_{L}]}$ passed through a final feed-forward layer. During training, we sample actions from the Q-Function using $\epsilon-\textrm{greedy}$ policy to collect experience, and update the Q-network to minimize the Bellman error over batches of transitions using gradient descent. After the Q function is trained, we used the greedy policy ${\pi(o)}:O \rightarrow A$ , with respect to learned
${\hat{Q}, \pi(o)=\pi^{Q}(O)=\arg\max_{a \in A}\hat{Q}(o,a)},$ to take the robot to the goal presented in the instruction ${o_{l}}$ .

FollowNet: Robot Navigation by Following Natural Language Directions with DRL

Abstract

Introduction

Related Work

Methods

Problem formulation

FollowNet

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

ROS Melodic 版本安裝+將ROS的源設置爲國內的源

Virtual-to-real DRL: Continuous Control of Mobile Robots for Mapless Navigation

解決英偉達NVIDIA Jetson AGX Xavier Desktop Sharing無法運行 + Xavier 配置遠程桌面

FollowNet: Robot Navigation by Following Natural Language Directions with DRL

ROS2學習筆記之創建啓動文件篇

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結