如何在Caffe中配置每一個層的結構

最近剛在電腦上裝好Caffe，由於神經網絡中有不同的層結構，不同類型的層又有不同的參數，所有就根據Caffe官網的說明文檔做了一個簡單的總結。

1. Vision Layers

1.1 卷積層(Convolution)

類型：CONVOLUTION

例子

layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  blobs_lr: 1          # learning rate multiplier for the filters
  blobs_lr: 2          # learning rate multiplier for the biases
  weight_decay: 1      # weight decay multiplier for the filters
  weight_decay: 0      # weight decay multiplier for the biases
  convolution_param {
    num_output: 96     # learn 96 filters
    kernel_size: 11    # each filter is 11x11
    stride: 4          # step 4 pixels between each filter application
    weight_filler {
      type: "gaussian" # initialize the filters from a Gaussian
      std: 0.01        # distribution with stdev 0.01 (default mean: 0)
    }
    bias_filler {
      type: "constant" # initialize the biases to zero (0)
      value: 0
    }
  }
}

blobs_lr: 學習率調整的參數，在上面的例子中設置權重學習率和運行中求解器給出的學習率一樣，同時是偏置學習率爲權重的兩倍。

weight_decay：

卷積層的重要參數

必須參數：

num_output (c_o)：過濾器的個數

kernel_size (or kernel_h and kernel_w)：過濾器的大小

可選參數：

weight_filler [default type: 'constant' value: 0]：參數的初始化方法

bias_filler：偏置的初始化方法

bias_term [default true]：指定是否是否開啓偏置項

pad (or pad_h and pad_w) [default 0]：指定在輸入的每一邊加上多少個像素

stride (or stride_h and stride_w) [default 1]：指定過濾器的步長

group (g) [default 1]: If g > 1, we restrict the connectivityof each filter to a subset of the input. Specifically, the input and outputchannels are separated into g groups, and the ith output group channels will beonly connected to the ith input group channels.

通過卷積後的大小變化：

輸入：n * c_i * h_i * w_i

輸出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通過同樣的方法計算。

1.2 池化層（Pooling）

類型：POOLING

例子

layers {
  name: "pool1"
  type: POOLING
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3 # pool over a 3x3 region
    stride: 2      # step two pixels (in the bottom blob) between pooling regions
  }
}

卷積層的重要參數

必需參數：

kernel_size (or kernel_h and kernel_w)：過濾器的大小

可選參數：

pool [default MAX]：pooling的方法，目前有MAX, AVE, 和STOCHASTIC三種方法

pad (or pad_h and pad_w) [default 0]：指定在輸入的每一遍加上多少個像素

stride (or stride_h and stride_w) [default1]：指定過濾器的步長

通過池化後的大小變化：

輸入：n * c_i * h_i * w_i

輸出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通過同樣的方法計算。

1.3 Local Response Normalization (LRN)

類型：LRN

Local ResponseNormalization是對一個局部的輸入區域進行的歸一化（激活a被加一個歸一化權重（分母部分）生成了新的激活b），有兩種不同的形式，一種的輸入區域爲相鄰的channels（cross channel LRN），另一種是爲同一個channel內的空間區域（within channel LRN）

計算公式：對每一個輸入除以

可選參數：

local_size [default 5]：對於cross channel LRN爲需要求和的鄰近channel的數量；對於within channel LRN爲需要求和的空間區域的邊長

alpha [default 1]：scaling參數

beta [default 5]：指數

norm_region [default ACROSS_CHANNELS]: 選擇哪種LRN的方法ACROSS_CHANNELS 或者WITHIN_CHANNEL

2. Loss Layers

深度學習是通過最小化輸出和目標的Loss來驅動學習。

2.1 Softmax

類型: SOFTMAX_LOSS

2.2 Sum-of-Squares / Euclidean

類型: EUCLIDEAN_LOSS

2.3 Hinge / Margin

類型: HINGE_LOSS

例子：

# L1 Norm
layers {
  name: "loss"
  type: HINGE_LOSS
  bottom: "pred"
  bottom: "label"
}

# L2 Norm
layers {
  name: "loss"
  type: HINGE_LOSS
  bottom: "pred"
  bottom: "label"
  top: "loss"
  hinge_loss_param {
    norm: L2
  }
}

可選參數：

norm [default L1]: 選擇L1或者 L2範數

輸入：

n * c * h * wPredictions

n * 1 * 1 * 1Labels

輸出

1 * 1 * 1 * 1Computed Loss

2.4 Sigmoid Cross-Entropy

類型：SIGMOID_CROSS_ENTROPY_LOSS

2.5 Infogain

類型：INFOGAIN_LOSS

2.6 Accuracy and Top-k

類型：ACCURACY
用來計算輸出和目標的正確率，事實上這不是一個loss，而且沒有backward這一步。

3. 激勵層（Activation / Neuron Layers）

一般來說，激勵層是element-wise的操作，輸入和輸出的大小相同，一般情況下就是一個非線性函數。

3.1 ReLU / Rectified-Linear and Leaky-ReLU

類型: RELU

例子:

layers {
  name: "relu1"
  type: RELU
  bottom: "conv1"
  top: "conv1"
}

可選參數：

negative_slope [default 0]:指定輸入值小於零時的輸出。

ReLU是目前使用做多的激勵函數，主要因爲其收斂更快，並且能保持同樣效果。

標準的ReLU函數爲max(x, 0)，而一般爲當x > 0時輸出x，但x <= 0時輸出negative_slope。RELU層支持in-place計算，這意味着bottom的輸出和輸入相同以避免內存的消耗。

3.2 Sigmoid

類型: SIGMOID

例子:

layers {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: SIGMOID
}

SIGMOID 層通過 sigmoid(x) 計算每一個輸入x的輸出，函數如下圖。

3.3 TanH / Hyperbolic Tangent

類型: TANH

例子:

layers {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: SIGMOID
}

TANH層通過 tanh(x) 計算每一個輸入x的輸出，函數如下圖。

3.3 Absolute Value

類型: ABSVAL

例子:

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: ABSVAL
}

ABSVAL層通過 abs(x) 計算每一個輸入x的輸出。

3.4 Power

類型: POWER

例子：

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: POWER
  power_param {
    power: 1
    scale: 1
    shift: 0
  }
}

可選參數：
power [default 1]
scale [default 1]
shift [default 0]

POWER層通過 (shift + scale * x) ^ power計算每一個輸入x的輸出。

3.5 BNLL

類型: BNLL

例子：

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: BNLL
}

BNLL (binomial normal log likelihood) 層通過 log(1 + exp(x)) 計算每一個輸入x的輸出。

4. 數據層（Data Layers）

數據通過數據層進入Caffe，數據層在整個網絡的底部。數據可以來自高效的數據庫（LevelDB 或者 LMDB），直接來自內存。如果不追求高效性，可以以HDF5或者一般圖像的格式從硬盤讀取數據。

4.1 Database

類型：DATA

必須參數：

source:包含數據的目錄名稱

batch_size:一次處理的輸入的數量

可選參數：

rand_skip:在開始的時候從輸入中跳過這個數值，這在異步隨機梯度下降（SGD）的時候非常有用

backend [default LEVELDB]: 選擇使用 LEVELDB 或者 LMDB

4.2 In-Memory

類型: MEMORY_DATA
必需參數：
batch_size, channels, height, width: 指定從內存讀取數據的大小
The memory data layer reads data directly from memory, without copying it. In order to use it, one must call MemoryDataLayer::Reset (from C++) or Net.set_input_arrays (from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time.

4.3 HDF5 Input

類型: HDF5_DATA
必要參數：
source:需要讀取的文件名

batch_size：一次處理的輸入的數量

4.4 HDF5 Output

類型: HDF5_OUTPUT
必要參數：
file_name: 輸出的文件名

HDF5的作用和這節中的其他的層不一樣，它是把輸入的blobs寫到硬盤

4.5 Images

類型: IMAGE_DATA
必要參數：
source: text文件的名字，每一行給出一張圖片的文件名和label
batch_size: 一個batch中圖片的數量
可選參數：
rand_skip：在開始的時候從輸入中跳過這個數值，這在異步隨機梯度下降（SGD）的時候非常有用
shuffle [default false]

new_height, new_width: 把所有的圖像resize到這個大小

4.6 Windows

類型：WINDOW_DATA

4.7 Dummy

類型：DUMMY_DATA

Dummy 層用於development 和debugging。具體參數DummyDataParameter。

5. 一般層（Common Layers）

5.1 全連接層Inner Product

類型：INNER_PRODUCT
例子：

layers {
  name: "fc8"
  type: INNER_PRODUCT
  blobs_lr: 1          # learning rate multiplier for the filters
  blobs_lr: 2          # learning rate multiplier for the biases
  weight_decay: 1      # weight decay multiplier for the filters
  weight_decay: 0      # weight decay multiplier for the biases
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "fc7"
  top: "fc8"
}

必要參數：

num_output (c_o)：過濾器的個數

可選參數：

weight_filler [default type: 'constant' value: 0]：參數的初始化方法

bias_filler：偏置的初始化方法

bias_term [default true]：指定是否是否開啓偏置項

通過全連接層後的大小變化：

輸入：n * c_i * h_i * w_i

輸出：n * c_o * 1 *1

5.2 Splitting

類型：SPLIT
Splitting層可以把一個輸入blob分離成多個輸出blobs。這個用在當需要把一個blob輸入到多個輸出層的時候。

5.3 Flattening

類型：FLATTEN
Flattening是把一個輸入的大小爲n * c * h * w變成一個簡單的向量，其大小爲 n * (c*h*w) * 1 * 1。

5.4 Concatenation

類型：CONCAT

例子：

layers {
  name: "concat"
  bottom: "in1"
  bottom: "in2"
  top: "out"
  type: CONCAT
  concat_param {
    concat_dim: 1
  }
}

可選參數：

concat_dim [default 1]：0代表鏈接num，1代表鏈接channels

通過全連接層後的大小變化：

輸入：從1到K的每一個blob的大小n_i * c_i * h * w

輸出：

如果concat_dim = 0: (n_1 + n_2 + ... + n_K) *c_1 * h * w，需要保證所有輸入的c_i 相同。

如果concat_dim = 1: n_1 * (c_1 + c_2 + ... +c_K) * h * w，需要保證所有輸入的n_i 相同。

通過Concatenation層，可以把多個的blobs鏈接成一個blob。

5.5 Slicing

The SLICE layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.

5.6 Elementwise Operations

類型：ELTWISE

5.7 Argmax

類型：ARGMAX

5.8 Softmax

類型：SOFTMAX

5.9 Mean-Variance Normalization

類型：MVN

6. 參考

Caffe