语义分割进阶之路之回首2015cvpr(四)

Deep Hierarchical Parsing for Semantic Segmentation [full paper] [ext. abstract]
Abhishek Sharma, Oncel Tuzel, David W. Jacobs

 

 

摘要:

This paper proposes a learning-based approach to scene parsing inspired by the deep Recursive Context Propagation Network (RCPN). RCPN is a deep feed-forward neural network that utilizes the contextual information from the entire image, through bottom-up followed by top-down context propagation via random binary parse trees. This improves the feature representation of every super-pixel in the image for better classification into semantic categories. We analyze RCPN and propose two novel contributions to further improve the model. We first analyze the learning of RCPN parameters and discover the presence of bypass error paths in the computation graph of RCPN that can hinder contextual propagation. We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function. Secondly, we use an MRF on the parse tree nodes to model the hierarchical dependency present in the output. Both modifications provide performance boosts over the original RCPN and the new system achieves state-of-the-art performance on Stanford Background, SIFT-Flow and Daimler urban datasets.

 

本文提出了一种基于学习的场景解析方法,该方法受深度递归上下文传播网络(RCPN)的启发。 RCPN是一种深度前馈神经网络,它利用来自整个图像的上下文信息,通过自下而上的跟随传播,然后通过随机二进制分析树进行自上而下的上下文传播。这改进了图像中每个超像素的特征表示,以便更好地分类为语义类别。我们分析了RCPN并提出了两个新的贡献来进一步改进模型。我们首先分析RCPN参数的学习,并发现可能阻碍上下文传播的RCPN计算图中存在旁路错误路径。我们建议通过在原始RCPN损失函数中包括随机解析树的内部节点的分类丢失来解决该问题。其次,我们在解析树节点上使用MRF来模拟输出中存在的层次依赖性。这两项修改都提供了优于原始RCPN的性能提升,新系统在斯坦福背景,SIFT-Flow和戴姆勒城市数据集上实现了最先进的性能。

摘要刚看完,又是熟悉的那几个字,上下文信息。其实本文是基于RCPN算法的改进,而且改进的是两个地方,一个是RESNET的思路(我特意看了一下resnet,2015年发布,是在2015cvpr之后发表的,所以能说什么呢?这篇论文优先提出了这种通过旁支丢弃神经元的方法),其次才用了马尔科夫随机场。

目前就这两篇cvpr图像分割论文基本上可以领会到,图像分割在2015年的研究集中在上下文、卷积神经网络、各种条件随机场。更不可否认的是,这些方法依然是目前语义分割的灵魂。

 

 

1. Introduction

Semantic segmentation refers to the problem of labeling every pixel in an image with the correct semantic category. Handling the immense variability in the appearance of semantic categories requires the use of context to achieve human-level accuracy, as shown, for example, by [24, 14, 13]. Specifically, [14, 13] found that human performance in labeling a super-pixel is worse than a computer when both have access to that super-pixel only. Effectively using context presents a significant challenge, especially when a real-time solution is required. An elegant deep recursive neural network approach for semantic segmentation was proposed in [19], referred to as RCPN. The main idea was to facilitate the propagation of contextual information from each super-pixel to every other super-pixel through random binary parse trees. First, a semantic mapper mapped visual features of the super-pixels into a semantic space. This was followed by a recursive combination of semantic features of two adjacent image regions, using a combiner, to yield the holistic feature vector of the entire image, termed the root feature. Next, the global information contained in the root feature was disseminated to every super-pixel in the image, using a decombiner, followed by classification of each super-pixel via a categorizer. The parameters were learned by minimizing the classification loss of the super-pixels by backpropagation through structure [5]. RCPN was shown to outperform recent approaches in terms of per-pixel accuracy (PPA) and mean-class accuracy (MCA). Most interestingly, it was almost two orders of magnitude faster than competing algorithms. RCPN’s speed and state-of-the-art performance motivate us to carefully analyze it. In this paper we show that it still has some weaknesses and we show how to remedy them. In particular, the direct path from the semantic mapper to the categorizer gives rise to bypass errors that can cause RCPN to bypass the combiner and decombiner assembly. This can cause back-propogation to reduce RCPN to a simple multilayer neural network for each super-pixel. We propose modifications to RCPN that overcome this problem

1. Pure-node RCPN - We improve the loss function by adding the classification loss of those internal nodes of the random parse trees that correspond to a single semantic category, referred to as pure-nodes. This serves three purposes.

a) It provides more labels for training, which results in better generalization.

b) It encourages stronger gradients deep in the network.

c) Lastly, it tackles the problem of bypass errors, resulting in better use of contextual information.

2. Tree MRF RCPN - Pure-node RCPN also provides us with reliable estimates of the internal node label distributions. We utilize the label distribution of the internal nodes to define a tree-style MRF on the parse tree to model the hierarchical dependency between the nodes. 1 The resulting architectures provide promising improvements over the previous state-of-the-art on three semantic segmentation datasets: Stanford background [6], SIFT flow [11] and Daimler urban [16]. The next section describes some of the related works followed by a brief overview of RCPN in Sec.

3. We describe our proposed methods in Sec.

4 followed by experiments in Sec.

5. Finally, we conclude in Sec. 6.

 

1.引言

语义分割是指用正确的语义类别标记图像中每个像素的问题。处理语义类别外观的巨大变化需要使用上下文来实现人类级别的准确性,例如[24,14,13]所示。具体而言,[14,13]发现,当两者都只能访问超级像素时,人们在标记超像素时的表现比计算机更差。有效地使用上下文是一项重大挑战,尤其是在需要实时解决方案时。在[19]中提出了一种优雅的深度递归神经网络语义分割方法,称为RCPN。主要思想是通过随机二进制分析树促进从每个超像素到每个其他超像素的上下文信息的传播。首先,语义映射器将超像素的视觉特征映射到语义空间。接下来是使用组合器对两个相邻图像区域的语义特征进行递归组合,以产生整个图像的整体特征向量,称为根特征。接下来,使用解组合器将根特征中包含的全局信息传播到图像中的每个超像素,然后通过分类器对每个超像素进行分类。通过结构反向传播最小化超像素的分类损失来学习参数[5]。在每像素精度(PPA)和平均精度(MCA)方面,RCPN的表现优于最近的方法。最有趣的是,它比竞争算法快了近两个数量级。 RCPN的速度和最先进的性能促使我们仔细分析它。在本文中,我们表明它仍然存在一些缺点,并且我们展示了如何补救它们。特别是,从语义映射器到分类器的直接路径引起旁路错误,这可能导致RCPN绕过组合器和解组合器组件。这可以导致反向传播将RCPN减少到每个超像素的简单多层神经网络。我们建议修改RCPN来克服这个问题

1.纯节点RCPN  - 我们通过添加对应於单个语义类别(称为纯节点)的随机解析树的那些内部节点的分类丢失来改进损失函数。这有三个目的。

a)它为培训提供了更多标签,从而实现更好的概括。

b)它鼓励网络中更强的渐变。

c)最后,它解决了绕过错误的问题,从而更好地利用了上下文信息。

2.树MRF RCPN  - 纯节点RCPN还为我们提供了内部节点标签分布的可靠估计。我们利用内部节点的标签分布在解析树上定义树型MRF,以模拟节点之间的层次依赖关系。 1由此产生的结构在三个语义分割数据集上提供了前所未有的改进:斯坦福背景[6],SIFT流[11]和戴姆勒城市[16]。下一节将介绍一些相关的工作,然后简要概述第二节中的RCPN。我们在Sec中描述了我们提出的方法。

4然后在Sec中进行实验。最后,我们在第二节结束。 6。

 

首先介绍了自己的底牌,RCPN和超像素。

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章