ResNet

一、綜述

ResNet(Residual Networks,殘差網絡),論文——Deep Residual Learning for Image Recognition(下載地址:https://arxiv.org/pdf/1512.03385.pdf),是MSRA(微軟亞洲研究院)的何凱明(http://kaiminghe.com/)團隊設計的。在2015年ImageNet上大放異彩。在ImageNet的classification、detection、localization以及COCO的detection和segmentation上均斬獲了第一名的成績,而且該論文也獲得了CVPR2016的best paper,實在是實至名歸。

ResNet最根本的動機就是所謂的退化(degradation)問題,即當模型的層次加深時,錯誤率卻提高了,如下圖:


一般認爲,隨着模型的深度加深,學習能力增強,即更深的模型不應當產生比它更淺的模型更高的錯誤率。上圖表面:常規的網絡的堆疊(plain network)在網絡很深的時候,效果卻越來越差了。這裏其中的原因之一即是網絡越深,梯度消失的現象就越來越明顯。

針對這個問題,作者提出了一個Residual的結構:


通過在輸出個輸入之間引入一個shortcut connection(可以翻譯爲越層鏈接),而不是簡單的堆疊網絡,這樣可以解決網絡由於很深出現梯度消失的問題,從而可可以把網絡做的很深。ResNet其中一個網絡結構如下圖所示 :


最下邊的是VGG-19網絡,中間爲Plain Network,最上邊的是基於Plain Network構建的ResNet網絡。Plain Network 主要是受 VGG 網絡啓發,主要採用3*3濾波器,遵循兩個設計原則:

  1. 對於相同輸出特徵圖尺寸,卷積層有相同個數的濾波器
  2. 如果特徵圖尺寸縮小一半,濾波器個數加倍以保持每個層的計算複雜度。通過步長爲2的卷積來進行降採樣。一共34個權重層。

可以看到,ResNet網絡就是在傳統的順序堆疊網絡上增加了shortcut connection。ResNet的shortcut connection可以這樣理解:


殘差網絡單元其中可以分解成右圖的形式,從圖中可以看出,殘差網絡其實是由多種路徑組合的一個網絡,直白了說,殘差網絡其實是很多並行子網絡的組合,整個殘差網絡其實相當於一個多人投票系統(Ensembling)。下面來說明爲什麼可以這樣理解:

如果把殘差網絡理解成一個Ensambling系統,那麼網絡的一部分就相當於少一些投票的人,如果只是刪除一個基本的殘差單元,對最後的分類結果應該影響很小;而最後的分類錯誤率應該適合刪除的殘差單元的個數成正比的,論文裏的結論也印證了這個猜測。

ResNet殘差學習模塊現在也有很多種,甚至可以根據項目要求自己定義。下圖顯示的就是兩層和三層對應的學習模塊:


二、論文核心

ResNet的核心內容之一,即“Deeper Bottleneck Architectures”(以下簡稱DBA),論文裏的原圖是這樣的:


然後我們再看一下原作中的ResNet全貌:


上圖以50-layer爲例:在進入DBA之前,比較簡單:

①卷積層"7×7, 64, stride 2"、

②BN層、

③ReLU層、

④池化層"3×3 max pool, stride 2"

接下來有四個DBA,拿最上一個DBA爲例,其內部具體結構如下圖:



三、Tensorflow silm實現(前向)

#coding:utf-8
import collections
import tensorflow as tf
import time
import math
from datetime import datetime
slim = tf.contrib.slim

#定義了一個block類
class Block(collections.namedtuple('Bolck', ['scope', 'init_fn', 'args'])):
    'A named tuple describing a ResNet block.'

#定義了一個下采樣函數
def subsample(inputs, factor, scope = None):
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride = factor, scope = scope)

#定義conv2d_same函數創建卷積層
def conv2d_same(inputs, num_outputs, kernel_size, stride, scope = None):
    if stride == 1:
        return slim.conv2d(inputs, num_outputs, kernel_size, \
                           stride = 1, padding = 'SAME', \
                           scope = scope)
    else:
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg
        inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, \
                           stride = stride, padding = 'VALID', \
                           scope = scope)

#定義堆疊blocks的函數
#@這個符號用於裝飾器中,用於修飾一個函數,把被修飾的函數作爲參數傳遞給裝飾器
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, outputs_collections = None):
    for block in blocks:
        with tf.variable_scope(block.scope, 'block', [net]) as sc:
            for i, unit in enumerate(block.args):
                with tf.variable_scope('unit_%d' %(i + 1), values = [net]):
                    unit_depth, unit_depth_bottleneck, unit_stride = unit 
                    net = block.unit_fn(net, depth = unit_depth,
                                        unit_depth_bottleneck = unit_depth_bottleneck,
                                        stride = unit_stride)
                    net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)

    return net

#創建ResNet通用的arg_scope
def resnet_arg_scope(is_training = True,
                     weight_decay = 0.0001,
                     batch_norm_decay = 0.997,
                     batch_norm_epsilon = 1e-5,
                     batch_norm_scale = True):

    batch_norm_params = {
        'is_training': is_training,
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,
    }

    with slim.arg_scope(
            [slim.conv2d],
            weights_regularizer = slim.l2_regularizer(weight_decay),
            weights_initializer = slim.variance_scaling_initializer(),
            activation_fn = tf.nn.relu,
            normalizer_fn = slim.batch_norm,
            normalizer_params = batch_norm_params):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
            with slim.arg_scope([slim.max_pool2d], padding = 'SAME') as arg_sc:
                return arg_sc

#定義bottleneck殘差學習單元
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride, outputs_collections = None, scope = None):
    with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank = 4)
        preact = slim.batch_norm(inputs, activation_fn = tf.nn.relu, scope = 'preact')

        if depth == depth_in:
            shortcut = subsample(inputs, stride, 'shortcut')
        else:
            shortcut = slim.conv2d(preact, 
                                   depth, 
                                   [1, 1], 
                                   stride = stride, 
                                   normalizer_fn = None, 
                                   activation_fn = None, 
                                   scope = 'shortcut')
            residual = slim.conv2d(preact, 
                                   depth_bottleneck, 
                                   [1, 1], 
                                   stride = 1, 
                                   scope = 'conv1')
            residual = conv2d_same(residual, 
                                   depth_bottleneck, 
                                   3, 
                                   stride, 
                                   scope = 'conv2')
            residual = slim.conv2d(residual, 
                                   depth, 
                                   [1, 1], 
                                   stride = 1, 
                                   normalizer_fn = None, 
                                   activation_fn = None, 
                                   scope = 'conv3')
        output = shortcut + residual
        return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)

#定義生成ResNet V2主函數
def resnet_v2(inputs, 
              blocks, 
              num_classes = None, 
              global_pool = True, 
              include_root_block = True, 
              reuse = None, 
              scope = None):
    with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse = reuse) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        with slim.arg_scope([slim.conv2d, bottleneck, stack_blocks_dense], outputs_collections = end_points_collection):
            net = inputs

        if include_root_block:
            with slim.arg_scope([slim.conv2d], activation_fn = None, normalizer_fn = None):
                net = conv2d_same(net, 64, 7, stride = 2, scope = 'conv1')
            net = slim.max_pool2d(net, [3, 3], stride = 2, scope = 'pool1')
        net = stack_blocks_dense(net, blocks)
        net = slim.batch_norm(net, activation_fn = tf.nn.relu, scope = 'postnorm')

        if global_pool:
            net = tf.reduce_mean(net, [1, 2], name = 'pool5', keep_dims = True)

        if num_classes is not None:
            net = slim.conv2d(net, num_classes, [1, 1], activation_fn = None, normalizer_fn = None, scope = 'logits')
            end_points = slim.utils.convert_collection_to_dict(end_points_collection)

            if num_classes is not None:
                end_points['predictions'] = slim.softmax(net, scope = 'predictions')

            return net, end_points

#定義50層的ResNet   
def resnet_v2_50(inputs, 
                 num_classes = None, 
                 global_pool = True, 
                 reuse = None, 
                 scope = 'resnet_v2_50'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 1024, 1)] * 3)]
    return resnet_v2(inputs, 
                     blocks, 
                     num_classes, 
                     global_pool, 
                     include_root_block = True, 
                     reuse = reuse, 
                     scope = scope) 

#定義101層的ResNet  
def resnet_v2_101(inputs, 
                  num_classes = None, 
                  global_pool = True, 
                  reuse = None, 
                  scope = 'resnet_v2_101'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, 
                     blocks, 
                     num_classes, 
                     global_pool, 
                     include_root_block = True, 
                     reuse = reuse, 
                     scope = scope) 

#定義152層的ResNet  
def resnet_v2_152(inputs, 
                  num_classes = None, 
                  global_pool = True, 
                  reuse = None, 
                  scope = 'resnet_v2_152'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, 
                     blocks, 
                     num_classes, 
                     global_pool, 
                     include_root_block = True, 
                     reuse = reuse, 
                     scope = scope)

#定義200層的ResNet
def resnet_v2_200(inputs, 
                  num_classes = None, 
                  global_pool = True, 
                  reuse = None, 
                  scope = 'resnet_v2_200'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, 
                     blocks, 
                     num_classes, 
                     global_pool, 
                     include_root_block = True, 
                     reuse = reuse, 
                     scope = scope)

def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration = %.3f' %(datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %(datetime.now(), info_string, num_batches, mn, sd))

batch_size = 32
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(resnet_arg_scope(is_training = False)):
    net, end_points = resnet_v2_152(inputs, 1000)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
num_batches = 100
time_tensorflow_run(sess, net, "Forward")

四、原生TensorFlow實現

import skimage.io  # bug. need to import this before tensorflow
import skimage.transform  # bug. need to import this before tensorflow
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.training import moving_averages

from config import Config

import datetime
import numpy as np
import os
import time

MOVING_AVERAGE_DECAY = 0.9997
BN_DECAY = MOVING_AVERAGE_DECAY
BN_EPSILON = 0.001
CONV_WEIGHT_DECAY = 0.00004
CONV_WEIGHT_STDDEV = 0.1
FC_WEIGHT_DECAY = 0.00004
FC_WEIGHT_STDDEV = 0.01
RESNET_VARIABLES = 'resnet_variables'
UPDATE_OPS_COLLECTION = 'resnet_update_ops'  # must be grouped with training op
IMAGENET_MEAN_BGR = [103.062623801, 115.902882574, 123.151630838, ]

tf.app.flags.DEFINE_integer('input_size', 224, "input image size")


activation = tf.nn.relu


def inference(x, is_training,
              num_classes=1000,
              num_blocks=[3, 4, 6, 3],  # defaults to 50-layer network
              use_bias=False, # defaults to using batch norm
              bottleneck=True):
    c = Config()
    c['bottleneck'] = bottleneck
    c['is_training'] = tf.convert_to_tensor(is_training,
                                            dtype='bool',
                                            name='is_training')
    c['ksize'] = 3
    c['stride'] = 1
    c['use_bias'] = use_bias
    c['fc_units_out'] = num_classes
    c['num_blocks'] = num_blocks
    c['stack_stride'] = 2

    with tf.variable_scope('scale1'):
        c['conv_filters_out'] = 64
        c['ksize'] = 7
        c['stride'] = 2
        x = conv(x, c)
        x = bn(x, c)
        x = activation(x)

    with tf.variable_scope('scale2'):
        x = _max_pool(x, ksize=3, stride=2)
        c['num_blocks'] = num_blocks[0]
        c['stack_stride'] = 1
        c['block_filters_internal'] = 64
        x = stack(x, c)

    with tf.variable_scope('scale3'):
        c['num_blocks'] = num_blocks[1]
        c['block_filters_internal'] = 128
        assert c['stack_stride'] == 2
        x = stack(x, c)

    with tf.variable_scope('scale4'):
        c['num_blocks'] = num_blocks[2]
        c['block_filters_internal'] = 256
        x = stack(x, c)

    with tf.variable_scope('scale5'):
        c['num_blocks'] = num_blocks[3]
        c['block_filters_internal'] = 512
        x = stack(x, c)

    # post-net
    x = tf.reduce_mean(x, reduction_indices=[1, 2], name="avg_pool")

    if num_classes != None:
        with tf.variable_scope('fc'):
            x = fc(x, c)

    return x


# This is what they use for CIFAR-10 and 100.
# See Section 4.2 in http://arxiv.org/abs/1512.03385
def inference_small(x,
                    is_training,
                    num_blocks=3, # 6n+2 total weight layers will be used.
                    use_bias=False, # defaults to using batch norm
                    num_classes=10):
    c = Config()
    c['is_training'] = tf.convert_to_tensor(is_training,
                                            dtype='bool',
                                            name='is_training')
    c['use_bias'] = use_bias
    c['fc_units_out'] = num_classes
    c['num_blocks'] = num_blocks
    c['num_classes'] = num_classes
    inference_small_config(x, c)

def inference_small_config(x, c):
    c['bottleneck'] = False
    c['ksize'] = 3
    c['stride'] = 1
    with tf.variable_scope('scale1'):
        c['conv_filters_out'] = 16
        c['block_filters_internal'] = 16
        c['stack_stride'] = 1
        x = conv(x, c)
        x = bn(x, c)
        x = activation(x)
        x = stack(x, c)

    with tf.variable_scope('scale2'):
        c['block_filters_internal'] = 32
        c['stack_stride'] = 2
        x = stack(x, c)

    with tf.variable_scope('scale3'):
        c['block_filters_internal'] = 64
        c['stack_stride'] = 2
        x = stack(x, c)

    # post-net
    x = tf.reduce_mean(x, reduction_indices=[1, 2], name="avg_pool")

    if c['num_classes'] != None:
        with tf.variable_scope('fc'):
            x = fc(x, c)

    return x


def _imagenet_preprocess(rgb):
    """Changes RGB [0,1] valued image to BGR [0,255] with mean subtracted."""
    red, green, blue = tf.split(3, 3, rgb * 255.0)
    bgr = tf.concat(3, [blue, green, red])
    bgr -= IMAGENET_MEAN_BGR
    return bgr


def loss(logits, labels):
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
 
    regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)

    loss_ = tf.add_n([cross_entropy_mean] + regularization_losses)
    tf.scalar_summary('loss', loss_)

    return loss_


def stack(x, c):
    for n in range(c['num_blocks']):
        s = c['stack_stride'] if n == 0 else 1
        c['block_stride'] = s
        with tf.variable_scope('block%d' % (n + 1)):
            x = block(x, c)
    return x

# define a DBA(Deeper Bottleneck Architectures)
def block(x, c):
    filters_in = x.get_shape()[-1]

    # Note: filters_out isn't how many filters are outputed. 
    # That is the case when bottleneck=False but when bottleneck is 
    # True, filters_internal*4 filters are outputted. filters_internal is how many filters
    # the 3x3 convs output internally.
    m = 4 if c['vbottleneck'] else 1
    filters_out = m * c['block_filters_internal']

    shortcut = x  # branch 1

    c['conv_filters_out'] = c['block_filters_internal']

    if c['bottleneck']:
        with tf.variable_scope('a'):
            c['ksize'] = 1
            c['stride'] = c['block_stride']
            x = conv(x, c)
            x = bn(x, c)
            x = activation(x)

        with tf.variable_scope('b'):
            x = conv(x, c)
            x = bn(x, c)
            x = activation(x)

        with tf.variable_scope('c'):
            c['conv_filters_out'] = filters_out
            c['ksize'] = 1
            assert c['stride'] == 1
            x = conv(x, c)
            x = bn(x, c)
    else:
        with tf.variable_scope('A'):
            c['stride'] = c['block_stride']
            assert c['ksize'] == 3
            x = conv(x, c)
            x = bn(x, c)
            x = activation(x)

        with tf.variable_scope('B'):
            c['conv_filters_out'] = filters_out
            assert c['ksize'] == 3
            assert c['stride'] == 1
            x = conv(x, c)
            x = bn(x, c)

    with tf.variable_scope('shortcut'):
        if filters_out != filters_in or c['block_stride'] != 1:
            c['ksize'] = 1
            c['stride'] = c['block_stride']
            c['conv_filters_out'] = filters_out
            shortcut = conv(shortcut, c)
            shortcut = bn(shortcut, c)

    return activation(x + shortcut)


def bn(x, c):
    x_shape = x.get_shape()
    params_shape = x_shape[-1:]

    if c['use_bias']:
        bias = _get_variable('bias', params_shape,
                             initializer=tf.zeros_initializer)
        return x + bias


    axis = list(range(len(x_shape) - 1))

    beta = _get_variable('beta',
                         params_shape,
                         initializer=tf.zeros_initializer)
    gamma = _get_variable('gamma',
                          params_shape,
                          initializer=tf.ones_initializer)

    moving_mean = _get_variable('moving_mean',
                                params_shape,
                                initializer=tf.zeros_initializer,
                                trainable=False)
    moving_variance = _get_variable('moving_variance',
                                    params_shape,
                                    initializer=tf.ones_initializer,
                                    trainable=False)

    # These ops will only be preformed when training.
    mean, variance = tf.nn.moments(x, axis)
    update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, BN_DECAY)
    update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, BN_DECAY)
    tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_mean)
    tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_variance)

    mean, variance = control_flow_ops.cond(c['is_training'], lambda: (mean, variance),
                                           lambda: (moving_mean, moving_variance))

    x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, BN_EPSILON)
    #x.set_shape(inputs.get_shape()) ??

    return x


def fc(x, c):
    num_units_in = x.get_shape()[1]
    num_units_out = c['fc_units_out']
    weights_initializer = tf.truncated_normal_initializer(
        stddev=FC_WEIGHT_STDDEV)

    weights = _get_variable('weights',
                            shape=[num_units_in, num_units_out],
                            initializer=weights_initializer,
                            weight_decay=FC_WEIGHT_STDDEV)
    biases = _get_variable('biases',
                           shape=[num_units_out],
                           initializer=tf.zeros_initializer)
    x = tf.nn.xw_plus_b(x, weights, biases)
    return x


def _get_variable(name,
                  shape,
                  initializer,
                  weight_decay=0.0,
                  dtype='float',
                  trainable=True):
    "A little wrapper around tf.get_variable to do weight decay and add to"
    "resnet collection"
    if weight_decay > 0:
        regularizer = tf.contrib.layers.l2_regularizer(weight_decay)
    else:
        regularizer = None
    collections = [tf.GraphKeys.VARIABLES, RESNET_VARIABLES]
    # tf.get_variable()
    # Gets an existing variable with these parameters or create a new one.
    return tf.get_variable(name,
                           shape=shape,
                           initializer=initializer,
                           dtype=dtype,
                           regularizer=regularizer,
                           collections=collections,
                           trainable=trainable)

def conv(x, c):
    ksize = c['ksize']
    stride = c['stride']
    filters_out = c['conv_filters_out']

    filters_in = x.get_shape()[-1] # get_shape() return a tuple,tuple[-1] is the value of last element
    shape = [ksize, ksize, filters_in, filters_out]
    initializer = tf.truncated_normal_initializer(stddev=CONV_WEIGHT_STDDEV)
    weights = _get_variable('weights',
                            shape=shape,
                            dtype='float',
                            initializer=initializer,
                            weight_decay=CONV_WEIGHT_DECAY)
    return tf.nn.conv2d(x, weights, [1, stride, stride, 1], padding='SAME')


def _max_pool(x, ksize=3, stride=2):
    return tf.nn.max_pool(x,
                          ksize=[1, ksize, ksize, 1],
                          strides=[1, stride, stride, 1],
                          padding='SAME')

完整的Python代碼GitHub:https://github.com/IreneLini94/tensorflow-resnet.git







發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章