IROS2019 部分論文整理2

Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

abstract

To autonomously navigate and plan interactions in real-world environments, robots require the ability to robust perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene geometry, the key insight toward a truly functional understanding of the environment is the usage of higher level entities during mapping, such as individual object instances.
(個人理解:除了使用環境中物體的幾何信息作爲傳感器的量測值用作SLAM之外,還要對周圍環境做語義理解才能達到更高層面的感知範圍。)
This work presents an approach to incrementally build volumetric object-centric maps during online scanning with a localized RGB-D camera. First, a per-frame segmentation scheme combines an unsupervised geometric approach with instance-aware semantic predictions to detect both recognized scene elements as well as previously unseen objects.
(第一步,將單幀用基於幾何的無監督方法進行分割預測,同時檢測已經識別的元素和未見過的元素。)
Next, a data association step tracks the predicted instances across the different frames.
(第二步進行一個數據關聯步驟,在不同幀之間跟蹤預測的個體。)
Finally, a map integration strategy fuses information about their 3D shape, location, and, if available, semantic class into a global volume.
(最後,地圖集成策略將物體的3D形狀、位置和語義類的信息融合到一個全局地圖中。)
Evaluation on a publicly available dataset shows that the proposed approach for building instance-level semantic maps is competitive with state-of-the-art methods, while additionally able to discover objects of unseen categories. The system is further evaluated within a real-world robotic mapping setup, for which qualitative results highlight the online nature of the method.
Code is available at https://github.com/ethz-asl/voxblox-plusplus.

Main contribution

  1. A combined geometric-semantic segmentation (組合幾何-語義分割) scheme that extends object detection to novel, previously unseen categories.
  2. A data association strategy for tracking and matching instance predictions across multiple frames. (多幀之間的個體跟蹤和匹配)
  3. Evaluation of the framework on a publicly available dataset and within an online robotic mapping setup.

Related work

  1. Object detection and segmentation
    最近的Mask-CNN架構能夠預測每個被檢測實例的逐像素語義註釋掩碼。基於學習(learning-based)的個體分割方法的主要限制在於需要大量的訓練數據。對於在真實場景中可能遇到的所有可能的類別,這種帶註釋的數據需要大量人力成本。此外,這些算法只能識別訓練集中提供的固定的類集合,因此不能正確地分割和分類其他未見過的對象類別。
  2. Semantic object-level mapping
    最近深度學習技術的發展使得在SLAM系統中集成大量豐富的語義信息成爲可能,[16]將一個CNN的semantic預測融合在一個使用SLAM框架的稠密圖中。然而,傳統的語義分割不知道對象實例,即它不能消除屬於同一類別的個別實例之間的歧義。因此,[16]中的方法不提供關於場景中單個對象的幾何和相對位置的任何信息。然而,基於幾何的方法往往過度分割鉸接場景元素。因此,在沒有instance-level信息的情況下,一個聯合的語義幾何分割不足以將場景的各個部分分割成不同的獨立對象。針對當前主流基於幾何和基於學習的方法進行分析和總結。
    與本文相關工作[20]提出了一種增量的基於幾何的分割策略,對同一個實例進行與YOLO-v2的bounding box耦合來分類和融合幾何分割。
    在本文中,volumetric TSDF-based representation沒有丟棄有價值的空閒空間信息,並且在3D地圖中顯式的從未知空間中區分出被觀測到的空閒區間。與之前的所有方法相比,該方法能夠逐步提供密集重建的環境體積地圖,其中包含關於場景中已知和未知對象元素的形狀(shape)和姿態(pose)信息。

Method

本文所提出的方法主要爲四個步驟:(1)幾何分割(geometric segmentation)(2)語義實例感知分割細化(semantic instance-aware segmentation refinement)(3)數據關聯(data association)(4)地圖集成(map integration)。首先,深度地圖根據一個基於凸形(convexity-based geometric)的幾何方法生成準確描述現實世界物理邊界的線段輪廓;接下來此RGB frame由Mask RNN檢測對象實例並進行像素級的語義標註,每一個實例用來和相關的深度分割結果進行語義標註,合併屬於同一幾何上過度分割的非凸對象實例的線段。數據關聯策略將當前幀中發現的片段及其組成實例與已存儲在map中的片段相匹配,最後,片段被集成到稠密的3D地圖中,這個融合策略對獨立的分割結果進行跟蹤。
在這裏插入圖片描述
(1)幾何分割
在這裏插入圖片描述
計算每個深度點的法線,然後計算相鄰發現的夾角來尋找區域邊界。再計算深度跳變大的位置作爲特徵。
(2)語義個體感知分割修正
To complement the unsupervised geometric segmentation of each depth frame with semantic object instance information, the corresponding RGB images are processed with the Mask R-CNN framework [1] (Mask R-CNN完成了在depth上的無監督幾何分割和語義信息的結合)。
在這裏插入圖片描述
(3)數據關聯
Because the frame-wise segmentation processes each incoming RGB-D image pair independently, it lacks any spatiotemporal information about corresponding segments and instances across the different frames. Specifically, this means that it does not provide an association between the set of predicted segments St{S_{t}} and the set of segments St+1{S_{t+1}}. 獨立按幀處理的rgb-d圖像pair缺乏不同幀之間的時空關聯信息.同時,在不同幀之間可能將同一個物體分割成兩類。
A data association step is proposed here to track corresponding geometric segments (跟蹤相關幾何分割) and predicted object instances across frames(跨幀的預測物體).
(4)地圖集成
The 3D segments discovered in the current frame, including some which are enriched with class and instance information, are fused into a global volumetric map. To this end, the Voxblox [10] TSDF-based dense mapping framework is extended to additionally encode object segmentation information.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章