深度学习之 alexnet详解3

一、综述：

本篇将介绍AlexNet网络的基本结构。AlexNet网络是12年Alex Krizhevsky提出的深度网络模型，故网络用他的名字命名。该网络一共有8个训练参数的网络（不包括池化层和LRN层），前5层为卷积层，后3层为全连接层。最后一个全连接层为1000类输出的Softmax分类层。LRN层出现在第1个及第2个卷积层后，最大池化层出现在LRN层及最后一个卷积层。每一层都有ReLu激活函数。

图片来源：《ImageNet Classification with Deep ConvolutionalNeural Networks》

论文作者训练网络时，用了两块GPU显卡，故网络结构有两条路径组成，最后一层合二为一。

AlexNet网络参数（此表展示了一个GPU网络参数）

二、网络结构（AlexNet_model.py）：

1.定义print_activations函数，用来展示卷积层和池化层的输出tensor尺寸。这个函数的入口接受一个Tensor的作为输入，并且显示结构尺寸：

def print_activations(t):
    print(t.op.name, ' ', t.get_shape().as_list())

2.第一个卷积层conv1。TensorFlow的name_scope函数可以区分不同层的组件。用法是

with tf.name_scope('conv1') as scope:能够将scope中生成的变量自动命名为conv1/xxxx （后同）。使用tf.truncated_normal截断的正态分布函数，标准差为0.1，初始化卷积核参数kernel；[11, 11, 3, 64]：卷积核的尺寸为11x11,3个颜色通道，64个卷积核；使用tf.nn.conv2d对输入的images进行卷积操作，[1, 4, 4, 1]：卷积的核移动步长为4（横竖），padding模式选取“SAME”，即全画幅扫描（最终不改变图片尺寸）；用tf.constant将卷积层的bias全部初始化为常数0，trainable=True设置bias可以被训练改变；使用tf.nn.bias_add把conv和biases加起来，用tf.nn.relu做非线性操作；最后使用预先定义的print_activations函数把这一层的conv1的tensor打印出来，并将这层的可训练参数kernel和biases添加到parameters列表中；使用tf.nn.lrn对输出的conv1参数进行LRN处理，depth_radius为4，bias为1，alpha为0.001/9，beta为0.75（基本是论文中的推荐值），LRN层的作用是对局部神经活动创建竞争机制，使得其中响应比较大的值变得相对更大，并抑制其他反馈较小的神经元，增强模型泛化能力（但是目前的神经网络已经不再使用LRN层操作了，主要是效果不太明显，反而会一直前馈和反馈的速度（整体速度下降1/3））;使用tf.nn.max_pool对前面得到的LRN值做最大池化操作，[1, 3, 3, 1]池化尺寸为3x3，[1, 2, 2, 1]横竖移动步长为2（这种池化尺寸大于移动步长，会有特征得重复，可以增加特征的丰富性），padding模式为“VALID”（不填充，取样不能超过边框）；最后将输出结果结构参数pool1打印出来：

# conv1
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)
        print_activations(conv1)
        parameters += [kernel, biases]
# pool1
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool1')
    print_activations(pool1)

3.第二个卷积层。和第一个卷积层基本一样，区别在于[5, 5, 64, 192]：卷积核为5x5，输入通道数为第一个卷积层的卷积核数量64，卷积核数量扩充为192，卷积的步长为1（全图像素扫描）：

# conv2
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
    print_activations(conv2)

  # pool2
    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool2')
    print_activations(pool2)

4.第三个卷积层。基本和一、二层卷积类似。区别在[3, 3, 192, 384]：卷积核尺寸为3x3，输入通道数为第二层卷积核数量192，卷积核数量继续扩充到384，卷积步长为1，padding为‘SAME’即：全画像素扫描，不改变图像尺寸：

# conv3
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv3)

5.第四个卷积层。和之前的卷积层类似，区别是[3, 3, 384, 256]：卷积核为3x3，输入通道数为384，输出通道数为256：

 # conv4
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv4)

6.第五个卷积层。[3, 3, 256, 256]：卷积核为3x3，输入通道数为256，输出通道数为256：

 # conv5
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv5)

  # pool5
    pool5 = tf.nn.max_pool(conv5,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool5')
    print_activations(pool5)

7.第一个全连接层。把第五个卷积层输出结果pool5用tf.reshape函数扁平化成一维向量；用get_shape函数获取数据扁平化后的数据长度；使用tf.truncated_normal初始化weight，正态分布标准差定为0.1，隐藏层节点为4096，bias初始化为0；使用tf.nn.relu函数进行非线性化操作；将weights和bias添加到parameters参数列表中；打印fc1的结构参数：

 # fc1
    with tf.name_scope('fc1') as scope:
        reshape = tf.reshape(pool5, [batch_size,-1])
        dim = reshape.get_shape()[-1].value
        weights = tf.Variable(tf.truncated_normal([dim,4096],dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        bias = tf.Variable(tf.constant(0.0, shape=[4096], dtype=tf.float32),name='biases')
        fc1 = tf.nn.relu(tf.matmul(reshape,weights) + bias,name=scope)
        parameters += [weights, bias]
        print_activations(fc1)

8.第二个全连接层。和第一个全连接层基本类似，唯一区别是[4096,4096]：输入节点为4096：

  # fc2
    with tf.name_scope('fc2') as scope:
        weights = tf.Variable(tf.truncated_normal([4096,4096],dtype=tf.float32,stddev=1e-1)
                              ,name='weights')
        bias = tf.Variable(tf.constant(0.0,shape=[4096], dtype=tf.float32),name='biases')
        fc2 = tf.nn.relu(tf.matmul(fc1,weights) + bias,name=scope)
        parameters += [weights, bias]
        print_activations(fc2)

9.第三个全连接层。和前两个全连接层基本类似，唯一区别是[4096,1000]：输出为1000个分类：

 # fc3
    with tf.name_scope('fc3') as scope:
        weights = tf.Variable(tf.truncated_normal([4096,1000], dtype=tf.float32, stddev=1e-1)
                              , name='weights')
        bias = tf.Variable(tf.constant(0.0, shape=[1000], dtype=tf.float32),name='biases')
        logits = tf.nn.relu(tf.matmul(fc2,weights) + bias,name=scope)

        parameters += [weights, bias]
        print_activations(logits)

一直尝试着在IMageNet数据集上覆现AlexNet等一些列工作，但硬件设备不允许。故，退而求其次：学习AlexNet的思想，在Cifar-10数据集上，展现经典网络的威力。

AlexNet网络的几点总结：

1.网络的整体思路架构是：卷积层+池化层+非线性变化层，然后到全连接层将输入归结到不同类别的概率值；

2.LRN层的效果在之后的网络中表现不佳，故渐渐淡出了人们的视野；

AlexNet思想实现：

1.数据集：CIFAR-10 and CIFAR-100 datasets

2.参考代码：python+tensorflow1.13

深度学习之 alexnet详解3

LED顯示行業之閃爍現象

深度學習之深度學習資源

深度學習之 alexnet詳解2

深度學習之 alexnet詳解3

LED顯示行業之上位機軟件使用篇

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結