relu和crelu使用

之前不了解crelu,随便将网络中的relu换成crelu, 然后调了半天的bug。
—–自己写bug,自己调bug, 死循环ing ——

先看写一段代码:

import tensorflow as tf
import collections
slim = tf.contrib.slim

weights_initializer = tf.contrib.layers.xavier_initializer(uniform=True)
biases_initializer = tf.random_uniform_initializer(-0.01, 0.01)
activation_fn = tf.nn.relu
is_bn_training = True
reuse = False

with slim.arg_scope([slim.conv2d], padding='VALID', stride=[2, 1], weights_initializer=weights_initializer,
                                biases_initializer=biases_initializer, activation_fn=None, reuse=reuse):
    with slim.arg_scope([slim.batch_norm], decay=0.9997,
                            center=True, scale=True, epsilon=1e-5, activation_fn=activation_fn,
                            is_training=is_bn_training, reuse=reuse):

        # (batchsize, height, weight, channels)
        inputs = tf.placeholder(tf.float32, (None, None, 1, 1))
        trn_net = tf.pad(inputs, [[0, 0], [32, 32], [0, 0], [0, 0]])
        #(input, 输出的feautres-map的数量, kernel, stride=[2,1])
        trn_net = slim.conv2d(trn_net, 16, [64, 1], scope='conv1')
        trn_net = slim.batch_norm(trn_net, scope='bnorm1')
        trn_net = slim.max_pool2d(trn_net, [8, 1], scope='pool1', stride=[8, 1])

        trn_net = tf.pad(trn_net, [[0, 0], [16, 16], [0, 0], [0, 0]])
        trn_net = slim.conv2d(trn_net, 32, [32, 1], scope='conv2')
        trn_net = slim.batch_norm(trn_net, scope='bnorm2')
        for var in tf.global_variables():
            print(var)

当激活函数是 relu 时打印的输出:

<tf.Variable 'conv1/weights:0' shape=(64, 1, 1, 16) dtype=float32_ref>
<tf.Variable 'conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/beta:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/gamma:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_mean:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_variance:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'conv2/weights:0' shape=(32, 1, 16, 32) dtype=float32_ref>  #shape (kernel_height,  kernel_weight,  pre-feature-map的数量,cur-feature-map的数量)
<tf.Variable 'conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/beta:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/gamma:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_mean:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_variance:0' shape=(32,) dtype=float32_ref>

当激活函数修改为crelu时打印的输出:

<tf.Variable 'conv1/weights:0' shape=(64, 1, 1, 16) dtype=float32_ref>
<tf.Variable 'conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/beta:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/gamma:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_mean:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_variance:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'conv2/weights:0' shape=(32, 1, 32, 32) dtype=float32_ref>  ##shape (kernel_height,  kernel_weight,  pre-feature-map的数量,cur-feature-map的数量)
<tf.Variable 'conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/beta:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/gamma:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_mean:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_variance:0' shape=(32,) dtype=float32_ref>

两个参数量是不一致的,特别是在finetune网络的时候,一定要注意使用,否则一直会报错。
使用CReLU时,要有意识的将滤波器数量减半,否则会将输入的feature-map的数量扩展为两倍, 网络参数将会增加。

下面具体看一下 crelu [1]的原理:
论文通过用AlexNet在Cifar数据集上进行试验,通过观测每一层中滤波器的分布,低层的滤波器分布存在对称的现象, 所以考虑在低层的网络中采用 CRelu效果会提升很多。

CRelu的输入是BN之后的16张feature-map, CRelu的输入是 featrues = [features, -features], 所以输入的feature-map的数量扩展成了一倍,所以这里第一层BN之后的feature-map 的数目由 16 变成了 32, 所以在 第二层卷积的时候权重变成了 (32, 1, 32, 32).

Shang W, Sohn K, Almeida D, et al. Understanding and improving convolutional neural networks via concatenated rectified linear units[C]//International Conference on Machine Learning. 2016: 2217-2225.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章