caffe: 數據層

在caffe中layer分爲以下幾類：

Data Layers
Vision Layers
Recurrent Layers
Common Layers
Normalization Layers
Activation / Neuron Layers
Utility Layers
Loss Layers

首先介紹數據層，數據層是每個模型的最底層，是模型的入口，不僅提供數據的輸入，也提供數據從Blobs轉換成別的格式進行保存輸出。通常數據的預處理（如減去均值, 放大縮小, 裁剪和鏡像等），也在這一層設置參數實現。數據來源可以來自高效的數據庫（如LevelDB和LMDB），也可以直接來自於內存。如果不是很注重效率的話，數據也可來自磁盤的hdf5文件和圖片格式文件。所有的數據層的都具有的公用參數：先看示例：

 layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "examples/cifar10/mean.binaryproto"
  }
  data_param {
    source: "examples/cifar10/cifar10_train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}

下面分別介紹這些參數的含義：
1）name：表示該層的名字，一般選取有意義的名稱;
2) type: 表示層類型，如果是Data，表示數據來源於LevelDB或LMDB。根據數據的來源不同，數據層的類型也不同（後面會詳細闡述）。通常都是採用的LevelDB或LMDB數據，因此層類型設置爲Data。
3）top或bottom: 每一層用bottom來輸入數據，用top來輸出數據。如果只有top沒有bottom，則此層只有輸出，沒有輸入。反之亦然。如果有多個 top或多個bottom，表示有多個blobs數據的輸入和輸出。
4）data 與 label: 在數據層中，至少有一個命名爲data的top。如果有第二個top，若是分類問題則一般是label。根據任務的不同top的有所不同，例如：siftflow-fcn16s 的datalayer如下：

layer {
  name: "data"
  type: "Python"
  top: "data"
  top: "sem"
  top: "geo"
  python_param {
    module: "siftflow_layers"
    layer: "SIFTFlowSegDataLayer"
    param_str: "{\'siftflow_dir\': \'../data/sift-flow\', \'seed\': 1337, \'split\': \'trainval\'}"
  }
}

5）include: 一般訓練的時候和測試的時候，模型的層是不一樣的。該層（layer）是屬於訓練階段的層，還是屬於測試階段的層，需要用include來指定。如果沒有include參數，則表示該層既在訓練模型中，又在測試模型中

6）Transform_param: 數據的預處理，可以將數據變換到定義的範圍內。如設置scale、mirror、crop_size、mean_file等。例如：

transform_param {
    scale: 0.00390625
    mean_file_size: "examples/cifar10/mean.binaryproto"
    # 用一個配置文件來進行均值操作
    mirror: 1  # 1表示開啓鏡像，0表示關閉，也可用ture和false來表示
    # 剪裁一個 227*227的圖塊，在訓練階段隨機剪裁，在測試階段從中間裁剪
    crop_size: 227
  }

參考caffe.proto 文件，有關transform_param 的code如下：

// Message that stores parameters used to apply transformation
// to the data layer's data
message TransformationParameter {
  // For data pre-processing, we can do simple scaling and subtracting the
  // data mean, if provided. Note that the mean subtraction is always carried
  // out before scaling.
  optional float scale = 1 [default = 1];
  // Specify if we want to randomly mirror data.
  optional bool mirror = 2 [default = false];
  // Specify if we would like to randomly crop an image.
  optional uint32 crop_size = 3 [default = 0];
  // mean_file and mean_value cannot be specified at the same time
  optional string mean_file = 4;
  // if specified can be repeated once (would subtract it from all the channels)
  // or can be repeated the same number of times as channels
  // (would subtract them from the corresponding channel)
  repeated float mean_value = 5;
  // Force the decoded image to have 3 color channels.
  optional bool force_color = 6 [default = false];
  // Force the decoded image to have 1 color channels.
  optional bool force_gray = 7 [default = false];
}

type:參數輸入類型很多，所以單獨拿出來做解釋，下面詳細介紹：
1) tpye：Data，這種最常用，例如：

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}

必須設置的參數：
source: 包含數據庫的目錄名稱，如examples/mnist/mnist_train_lmdb
batch_size: 每次處理的數據個數，如64
可選的參數：
rand_skip: 在開始的時候，路過某個數據的輸入。通常對異步的SGD很有用。
backend: 選擇是採用LevelDB還是LMDB, 默認是LevelDB.
參考源碼：

 **message DataParameter {
  enum DB {
    LEVELDB = 0;
    LMDB = 1;
  }
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 4;
  // The rand_skip variable is for the data layer to skip a few data points
  // to avoid all asynchronous sgd clients to start at the same point. The skip
  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
  // be larger than the number of keys in the database.
  // DEPRECATED. Each solver accesses a different subset of the database.
  optional uint32 rand_skip = 7 [default = 0];
  optional DB backend = 8 [default = LEVELDB];
  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
  // simple scaling and subtracting the data mean, if provided. Note that the
  // mean subtraction is always carried out before scaling.
  optional float scale = 2 [default = 1];
  optional string mean_file = 3;
  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
  // crop an image.
  optional uint32 crop_size = 5 [default = 0];
  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
  // data.
  optional bool mirror = 6 [default = false];
  // Force the encoded image to have 3 color channels
  optional bool force_encoded_color = 9 [default = false];
  // Prefetch queue (Increase if data feeding bandwidth varies, within the
  // limit of device memory for GPU training)
  optional uint32 prefetch = 10 [default = 4];
}**

2）type：memory_data

layer {
  top: "data"
  top: "label"
  name: "memory_data"
  type: "MemoryData"
  memory_data_param{
    batch_size: 2
    height: 100
    width: 100
    channels: 1
  }
  transform_param {
    scale: 0.0078125
    mean_file: "mean.proto"
    mirror: false
  }
}

必須設置的參數： batch_size：每一次處理的數據個數，比如2; channels：通道數; height：高度; width: 寬度;
參考源碼：

message MemoryDataParameter {
  optional uint32 batch_size = 1;
  optional uint32 channels = 2;
  optional uint32 height = 3;
  optional uint32 width = 4;
}

3) type:HDF5data

layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "examples/hdf5_classification/data/train.txt"
    batch_size: 10
  }
}

必須設置的參數：source: 讀取的文件名稱;batch_size: 每一次處理的數據個數

// Message that stores parameters used by HDF5DataLayer
message HDF5DataParameter {
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 2;

  // Specify whether to shuffle the data.
  // If shuffle == true, the ordering of the HDF5 files is shuffled,
  // and the ordering of data within any given HDF5 file is shuffled,
  // but data between different files are not interleaved; all of a file's
  // data are output (in a random order) before moving onto another file.
  optional bool shuffle = 3 [default = false];
}

4） type:ImageData

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  image_data_param {
    source: "examples/_temp/file_list.txt"
    batch_size: 50
    new_height: 256
    new_width: 256
  }
}

必須設置的參數：
source: 一個文本文件的名字，每一行給定一個圖片文件的名稱和標籤（label)
batch_size: 每一次處理的數據個數，即圖片數
可選參數：
rand_skip: 在開始的時候，路過某個數據的輸入。通常對異步的SGD很有用。
shuffle: 隨機打亂順序，默認值爲false
new_height,new_width: 如果設置，則將圖片進行resize
參考源碼：

message ImageDataParameter {
  // Specify the data source.
  optional string source = 1;
  // Specify the batch size.
  optional uint32 batch_size = 4 [default = 1];
  // The rand_skip variable is for the data layer to skip a few data points
  // to avoid all asynchronous sgd clients to start at the same point. The skip
  // point would be set as rand_skip * rand(0,1). Note that rand_skip should not
  // be larger than the number of keys in the database.
  optional uint32 rand_skip = 7 [default = 0];
  // Whether or not ImageLayer should shuffle the list of files at every epoch.
  optional bool shuffle = 8 [default = false];
  // It will also resize images if new_height or new_width are not zero.
  optional uint32 new_height = 9 [default = 0];
  optional uint32 new_width = 10 [default = 0];
  // Specify if the images are color or gray
  optional bool is_color = 11 [default = true];
  // DEPRECATED. See TransformationParameter. For data pre-processing, we can do
  // simple scaling and subtracting the data mean, if provided. Note that the
  // mean subtraction is always carried out before scaling.
  optional float scale = 2 [default = 1];
  optional string mean_file = 3;
  // DEPRECATED. See TransformationParameter. Specify if we would like to randomly
  // crop an image.
  optional uint32 crop_size = 5 [default = 0];
  // DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
  // data.
  optional bool mirror = 6 [default = false];
  optional string root_folder = 12 [default = ""];
}

5）type：Python 這個有點牛，可以自己定義
例如：鼎鼎大名的fcn 就採用了這個：

layer {
  name: "data"
  type: "Python"
  top: "data"
  top: "sem"
  top: "geo"
  python_param {
    module: "siftflow_layers"
    layer: "SIFTFlowSegDataLayer"
    param_str: "{\'siftflow_dir\': \'../data/sift-flow\', \'seed\': 1337, \'split\': \'trainval\'}"
  }
}

具體可去caffe官網查看內容，先寫這麼多吧！！！

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

caffe：製作數據集遇到的問題

按列合併兩個.txt 文件

caffe: 數據層

ubuntu 16.04 Qt library 4.8.6 + Qt creator 2.5

ImportError: libcublas.so.6.0: cannot open shared object file: No such file or directory

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結