relu和crelu使用

之前不瞭解crelu,隨便將網絡中的relu換成crelu, 然後調了半天的bug。
—–自己寫bug,自己調bug, 死循環ing ——

先看寫一段代碼:

import tensorflow as tf
import collections
slim = tf.contrib.slim

weights_initializer = tf.contrib.layers.xavier_initializer(uniform=True)
biases_initializer = tf.random_uniform_initializer(-0.01, 0.01)
activation_fn = tf.nn.relu
is_bn_training = True
reuse = False

with slim.arg_scope([slim.conv2d], padding='VALID', stride=[2, 1], weights_initializer=weights_initializer,
                                biases_initializer=biases_initializer, activation_fn=None, reuse=reuse):
    with slim.arg_scope([slim.batch_norm], decay=0.9997,
                            center=True, scale=True, epsilon=1e-5, activation_fn=activation_fn,
                            is_training=is_bn_training, reuse=reuse):

        # (batchsize, height, weight, channels)
        inputs = tf.placeholder(tf.float32, (None, None, 1, 1))
        trn_net = tf.pad(inputs, [[0, 0], [32, 32], [0, 0], [0, 0]])
        #(input, 輸出的feautres-map的數量, kernel, stride=[2,1])
        trn_net = slim.conv2d(trn_net, 16, [64, 1], scope='conv1')
        trn_net = slim.batch_norm(trn_net, scope='bnorm1')
        trn_net = slim.max_pool2d(trn_net, [8, 1], scope='pool1', stride=[8, 1])

        trn_net = tf.pad(trn_net, [[0, 0], [16, 16], [0, 0], [0, 0]])
        trn_net = slim.conv2d(trn_net, 32, [32, 1], scope='conv2')
        trn_net = slim.batch_norm(trn_net, scope='bnorm2')
        for var in tf.global_variables():
            print(var)

當激活函數是 relu 時打印的輸出:

<tf.Variable 'conv1/weights:0' shape=(64, 1, 1, 16) dtype=float32_ref>
<tf.Variable 'conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/beta:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/gamma:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_mean:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_variance:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'conv2/weights:0' shape=(32, 1, 16, 32) dtype=float32_ref>  #shape (kernel_height,  kernel_weight,  pre-feature-map的數量,cur-feature-map的數量)
<tf.Variable 'conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/beta:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/gamma:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_mean:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_variance:0' shape=(32,) dtype=float32_ref>

當激活函數修改爲crelu時打印的輸出:

<tf.Variable 'conv1/weights:0' shape=(64, 1, 1, 16) dtype=float32_ref>
<tf.Variable 'conv1/biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/beta:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/gamma:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_mean:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'bnorm1/moving_variance:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'conv2/weights:0' shape=(32, 1, 32, 32) dtype=float32_ref>  ##shape (kernel_height,  kernel_weight,  pre-feature-map的數量,cur-feature-map的數量)
<tf.Variable 'conv2/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/beta:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/gamma:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_mean:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'bnorm2/moving_variance:0' shape=(32,) dtype=float32_ref>

兩個參數量是不一致的,特別是在finetune網絡的時候,一定要注意使用,否則一直會報錯。
使用CReLU時,要有意識的將濾波器數量減半,否則會將輸入的feature-map的數量擴展爲兩倍, 網絡參數將會增加。

下面具體看一下 crelu [1]的原理:
論文通過用AlexNet在Cifar數據集上進行試驗,通過觀測每一層中濾波器的分佈,低層的濾波器分佈存在對稱的現象, 所以考慮在低層的網絡中採用 CRelu效果會提升很多。

CRelu的輸入是BN之後的16張feature-map, CRelu的輸入是 featrues = [features, -features], 所以輸入的feature-map的數量擴展成了一倍,所以這裏第一層BN之後的feature-map 的數目由 16 變成了 32, 所以在 第二層卷積的時候權重變成了 (32, 1, 32, 32).

Shang W, Sohn K, Almeida D, et al. Understanding and improving convolutional neural networks via concatenated rectified linear units[C]//International Conference on Machine Learning. 2016: 2217-2225.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章