libeblearn tutorial: energy-based learning in C++

libeblearn tutorial: energy-based learning in C++

By Pierre Sermanet and Yann LeCun (New York University)




The eblearn (energy-based learning) C++library libeblearn contains machine learning algorithms whichcan be used for computer vision. The library has a generic and modulararchitecture, allowing easy prototyping and building of different algorithms(supervised or unsupervised learning) and configurations from basic modules.Those algorithms were used for a variety for applications, including roboticswith the Learning Applied to GroundRobots DARPA project (LAGR).

用c++编写的eblearn库包含机器学习算法,可用于计算机视觉。这个库包含一个基本的和模块化的结构,可以使用一个简单的原型,也可以基于基础模块构建一个不同的算法。这些算法已经被很多应用使用,比如Ground Robots DARPA 项目。

Contents

·         Energy-Based Learning

·         Modular Architecture

o    Basic Modules

o    Layer Modules

o    Machine Modules

o    Trainable Machine Modules

o    GUI display

·         Supervised Learning

o    1. Build your Dataset

o    2. Build your Neural Network

o    3. Train your Network

o    4. Run your Network

o    Further Reading

·         Unsupervised Learning

·         Semi-supervised Learning

·         Energy-Based Learning 基于能量学习
    Modular Architecture
        Basic Modules基本模型
        Layer Modules分层模型
        Machine Modules机器模型
        Trainable Machine Modules可训练的机器模型
        GUI display图形展现
    Supervised Learning监督学习
        1. Build your Dataset构建数据库
        2. Build your Neural Network构建神经网络
        3. Train your Network训练网络
        4. Run your Network执行网络
        Further Reading
    Unsupervised Learning非监督学习
    Semi-supervised Learning半监督学习

Energy-based learning

What is the energy-based learning?
FIXME.

More resources on energy-based models:

§  [video-lecture] Energy-based models & Learning forInvariant Image Recognition: a 4-hour video of a tutorial onenergy-based models by Yann LeCun at the Machine Learning Summer School inChicago in 2005.

§  http://www.cs.nyu.edu/~yann/research/ebm/index.html:tutorials, talks, videos and publications on EBMs.

§   

§  [LeCun et al 2006]ATutorial on Energy-Based Learning, in Bakir et al. (eds) "Predicting StructuredOutputs", MIT Press 2006: a 60-page tutorial on energy-based learning,with an emphasis on structured-output models. The tutorial includes anannotated bibliography of discriminative learning, with a simple view of CRF,maximum-margin Markov nets, and graph transformer networks.

Modular Architecture and Building Blocks

Eblearn was designed to be modular so thatany module can be arranged is different ways with other modules to formdifferent layers and different machines. There are 2 main types of modules(declared inEblArch.h):

Eblearn基于模块设计,这样不同的模块可以使用其它模块来构建成不同的分层和不同的机器学习模型。有两种不同的模型:一个输入一个输出的和两个输入一个输出的。

§  module_1_1: a modulewith 1 input and 1 output.

§  module_2_1: a modulewith 2 inputs and 1 output.

Each module derives from one of those twobase classes and implements the fprop, bprop, bbprop and forget methods:

这两个模型都是基于两个基本的类,并且是继承fprop、bprop、bbprop和forget方法。

§  fprop: the forward-propagation method whichpropagates the input(s) through the module to the output. In the rest of thistutorial, a module will usually be described by what its fprop method does.

§  fprop:前向传播方法,将输入传播到输出。

§  bprop: the backward-propagation method whichpropagates back the output though the module and the input(s). This methodusually updates the parameters to be learned inside the module using forexample derivatives, and also propagates derivatives to the input of the moduleso that preceding modules can also be back-propagated.

§  bprop:反向传播方法,通过模型计算训练得出的输出和真实输出的差异。这个方法可以通过模型内部的学习来更新参数,如衍生,并且将衍生值传给输入,所以前面的模块也可以反向传播。

§  bbprop: the second-order backward-propagationmethod. In the case of the Lenet networks for example, this is used to computesecond-order derivatives and adapt learning rates accordingly for eachparameter to speed-up the learning process.

§  bbprop:第二层的反向传播方法。比如在Lenet网络中,bbprop用来计算第二层的衍生值,并且可以调整每个参数学习概率来加速学习进程。

§  forget: initializes the weights of the modules torandom, based on a forget_param_linear object.

§  forget:随机初始化模型的权重。

Note that the type of inputs and outputsthe modules accept are state_idx objects which are temporary bufferscontaining the results of a module's processing. For example, an fprop callwill fill the x slot of the output state_idx, whereas a bprop call will fill the dx slot of the input state_idx (using the dx slot of the output state_idx).

模型接受的输入和输出是state_idx结构,立面存储的是模型处理过程的中间缓存结果。比如,一个fpro的调用将会填充输出state_idx的x维度,一个bprop的调用将会填充dx维度。

Next we describe some modules and show howthey are combined with other modules. We first talk about a few basic modules,which are then used to form layers that are again used to form machines. Notethat all the modules that we are describing (basic modules, layers andmachines) derive from module_1_1 or module_2_1 which means that you can writeand combine your own modules in your own ways, which are not restricted to theway we describe here.

接下来我们将描述这个模块以及展现他们是怎么通过其它模块组合的。我们首先介绍一些基础模块,它们将组合成层,然后层再组合成机器模型。注意:我们这里描述的所有模型均源于module_1_1 和module_2_1 这两个模型,所以我们可以用自己的方法来组合我们自己的模块。

Basic module examples

constant addition, linear, convolution and subsamplingmodules

Those basic modules (found in EblBasic.h) are usedin the LeNet architecture to perform the basic operations:

§  addc_module: thismodule adds a constant to each element of the input and puts the result in theoutput. 这个模型对输入的每个参数都加上一个常数,然后将结果传到输出。

§  linear_module: thismodule does a linear combination of the input and its internal weights and putsthe result in the output. 这个模型将输入和内部的权重做一个线性融合,然后将结果传到输出。

§  convolution_module_2D:convolves 2D input (dimensions 1 and 2, dimension 0 may have a size more than1) using the internal weights as kernels and puts the result in the output. 这个模型的输入(参数个数可能为2,1,0)使用内部权重作为核,然后将结果传到输出。

§  subsampling_module_2D:subsamples 2D input (dimensions 1 and 2) and puts the result in the output. 附属样本的输入

non-linear modules

These modules (EblNonLinearity.h)perform non-linear operations:

§  tanh_module: appliesthe hyperbolic tangent function on the input and puts the result in the output. 对输入使用双曲线正切函数,然后将结果传到输出。

§  stdsigmoid_module:applies the standard sigmoid function on the input and puts the result in theoutput. 对输入使用S形的标准双曲线,然后将结果输出。

Layer module examples

These layers (EblLayers.h) arebuilt by stacking the basic modules described previously on top of each otherto form more complicated operations:

这个层次由基本模块堆合起来,用于实现更复杂的功能。

§  nn_layer_full: afully-connected layer which performs a linear combination of the input and theinternal weights, adds a bias and applies a sigmoid. As always, the result isput in the output. This layer is build by stacking up alinear_module_replicable (see Replicability), anaddc_module and a tanh_module.一个全连接的层,对输入和内部权重做线性组合,同时采用S曲线方法增加一个偏差,将结果传到输出。这个层由linear_module_replicable、addc_module和tanh_module组合而成。

§   

§  nn_layer_convolution:a convolution layer which performs a 2D convolution on the input, adds a biasand applies a sigmoid, putting the result in the output. This layer is build bystacking up a convolution_module_2D_replicable, an addc_module and atanh_module.卷基层,对输入进行一个二维卷积,然后使用sigmoid增加一个偏差,结果传入输出。这个层由convolution_module_2D_replicable、addc_module和tanh_module组合而成。

§  nn_layer_subsampling:a subsampling layer which subsamples the input, adds a bias and applies asigmoid, putting the results in the output. This layer is build by stacking upa subsampling_module_2D_replicable, an addc_module and a tanh_module.对输入进行子抽样,然后使用sigmoid增加一个偏差,结果传入输出。这个层由subsampling_module_2D_replicable、addc_module和tanh_module组成。

Machine module examples

Like the layers are built by assemblingbasic modules, machines (EblMachines.h) can bebuilt by assembling layers together, for instance the following machines:

b. Overview of the architecture of thesupervised_euclidean_machine, a module_2_1 (2 inputs and 1 output) that combinemany module_1_1 (1 input and 1 output) modules and one module_2_1, theeuclidean_module. This architecture is useful during the training phase wherethe bprop method is called, whereas during the testing phase only the lenet7block will be used and only the fprop method will be called. Note thatstate_idx::x are temporary buffers containing outputs of fprops in betweenmodules and state_idx::dx contain outputs of bprops.

监督的欧几里得模型架构,是一个两输入一输出的模型,由多个一输入一输出和1个二输入一输出的模型组合而成。在训练阶段,bprop方法会被调用,在测试阶段,lenaet7模块会被使用并且fprop方法会被调用。需要注意的是,state_idx::x这个变量是fprop输出的暂存变量,idx::dx是bprop的暂存变量。

§  nn_machine_cscscf: aLeNet type machine which calls in order the following layers: convolution (c),subsampling (s), convolution (c), subsampling (s), convolution (c) and finallya fully-connected layer (f). This machine is parametrized by the size of theinput, the sizes of the convolution and subsampling kernels, the size of thefully connected layer output and the number of outputs.LeNet类型的机器模型在下面几层中依次调用:卷积层、子抽样层、卷积层、子抽样层、卷积层,最后是一个全连接层。这个模型的大小由输入的容量、卷积和子抽样的核、全连接的大小及输出的数量决定。

§  lenet5: this machineis a nn_machine_cscscf with a particular configuration, it takes a 32x32 input,then applies 5x5 convolution kernels, 2x2 subsampling kernels, 5x5convolutions, 2x2 subsamplings, 1x1 convolutions and full connections between a120-dimensional input to 10 outputs. This specific network is used for the10-digits handwriten caracters recognition (see MNIST demo).lenet5,这个模型是一个nn_machine_cscscf结构,包括32*32的输入,5*5的卷积核,2*2的子抽样核,5*5的卷积核,2*2的子抽样核以及1*1的卷积核全连接,有120维的输入和10维的输出。可用于手写字母的识别。

§  lenet7: similarly tolenet5, this machine is a nn_machine_cscscf with a particular configuration. Ittakes a 96x96 input, then applies 5x5 convolution kernels, 4x4 subsamplingkernels, 6x6 convolutions, 3x3 subsamplings, 6x6 convolutions and fullconnections between a 100-dimensional input to 5 outputs. This network wasspecifically designed for the NORB demo but can be an inspiration for similarobject recognition tasks.

§  lenet7_binocular:This network is almost identical to lenet7 except that it accepts stereoscopicimages as input.

a. The lenet7_binocular neural networkarchitecture: there are 2 stereoscopic 96x96 images as input, 5x5 convolutionsare performed, then 4x4 subsampling, 6x6 convolutions, 3x3 subsampling, 6x6convolutions and full connections to the 5 outputs.

Trainable machine modules

The modules described in the previoussections need to be encapsulated in a module_2_1 with a loss function modulesin order to be trained supervised. For example, to train the nn_machine_cscscfmachines, we combine it with a euclidean cost module:

上面描述的模型需要使用二输入一输出和损失函数模型来封装,用以监督训练。例如,为了训练nn_machine_cscscf,我们将它与一个欧式损失模型组合。

§  supervised_euclidean_machine:a module_2_1 machine containing a module_1_1 (in the NORB demo case a lenet7machine) and a euclidean_module. Thus this machines takes an input which ispropagated in the module_1_1 and a groundtruth label as second input. Theoutput of the module_1_1 is then compared to the groundtruth using theeuclidean_module which takes the squared distance between the output and thegroundtruth. During the training phase, the weights are then modified based onthe gradient of the error in the back-propagation process.

Module Replicability

Modules usually operate on a specificnumber of dimensions, for example the convolution_module_2D only accepts inputswith 3 dimensions (because it applies 2D convolution on dimensions 1 and 2,dimension 0 is used according to a connection table). Thus if extra dimensionsare present (e.g. 4D or 5D) one might want to loop over the extra dimensionsand call the convolution_module_2 on each 3D subsets. We call thisreplicability because the module is replicated over the 3rd and 4th dimensions(the output also has 2 extra dimensions).

To make a module replicable, use theDECLARE_REPLICABLE_MODULE_1_1 macro (in EblArch.h). It willautomatically declare your module_1_1 as replicable and loop over extradimensions if present. For example, here is the code to declare theconvolution_module_2D as replicable:

DECLARE_REPLICABLE_MODULE_1_1(linear_module_replicable,

                              linear_module,

                              (parameter&p, intg in, intg out),

                              (p, in, out));

where linear_module_replicable is the name of the newmodule, linear_module is the name of the basemodule, (parameter &p, intg in, intg out) is the prototype of theparameters to the constructor of the module and (p, in, out) the parametersthemselves.

c. Example display of the eblearn GUI. Acall tomodule_1_1::display_fprop will display all the internal feature mapsand their corresponding description and sizes, providing that each module implementedtheir display method. On top are the input images, at the bottom the networkoutputs.

eblearn的图形功能,调用tomodule_1_1::display_fprop 会展现所有的内部特征以及他们之间相互描述和尺寸。

GUI display

If the QT library is present on yoursystem, the eblearn project will automatically compile the GUI librariescorresponding to libidx and libeblearn. Thus it produces the libeblearnguilibrary which provides display functions for the eblearn modules. For instance,it implements the display_fpropmethods for each module_1_1, thus allowing to display every stage ofthe internal representations of the neural network by calling that method onthe top-level module (See figure c).如果你的机器安装了QT库,eblearn编译的时候会自动编译GUI库,并链接到libidx和libeblearn,他将提供libeblearngui库。

For more details about the GUI features,refer to the GUI section in the libidx tutorial. Briefly, the libidxguiprovides a global object "gui" which can open new windows (gui.new_window()),draw matrices (gui.draw_matrix()), draw text (gui << at(42, 42)<< "42" << endl) and some other functionalities.

Supervised Learning

Eblearn provides supervised learningalgorithms as well as semi-supervised and unsupervised ones. They can be usedindependently or combined, we will focus on supervised algorithms only in thissection.

1. Build your dataset

What data should you provide to the network?

Creating datasets from image directories

Once you have grouped all your images indifferent directories for each class, call the dscompile tool to transform them into a datasetobject (a matrix of images with their corresponding label). This tool willextract the label information from the file directory structure: at the sourcelevel, each directory is a class named after the directory name. Then allimages found in subdirectories (regardless of their names or hierarchy) areassigned the same label and are added to the dataset.

For example with the following directorystructrure, dscompile will automatically build a dataset with 2 differentlabels, "car" and "human" and will contain 5 images intotal:

/data$ ls -R *

car:

auto.png  car01.png

 

human:

human1  human2

 

human/human1:

img01.png

 

human/human2:

img01.png img02.png

dscompile will create the following files:

1.   dset_data.mat: a ubyte matrix of size N x M x height x width , where N is the number of images and Mthe number of channels which can be 1 or 3 if the input images are in greyscaleor color or 2 or 6 if two (stereoscopic) images are passed instead of one. Hereare the possible channel configurations: G or GG, RGB or RGBRGB.

2.   dset_labels.mat: an int matrix of size Nclasses containing labels id numbers, e.g. 0, 1,...

3.   dset_classes.mat: a ubyte matrix of size Nclasses x 128 containing the names of each class.

For more details, see the tools manuals of dscompiledssplitdsmerge, and dsdisplay.

d. Example display of the MNIST datasetafter a call toLabeledDataSource::display(), which displays groundtruth(left), correct and incorrect answers (middle) and incorrect only (right).

Loading datasets into a LabeledDataSource object

2. Build your network

The eblearn neural networks are build bycombining modules together as introduced at the beginning of this tutorial.Depending on your application, you may want to write your own modules orcombine already existing modules in different ways. For example for simpleapplications (See Perceptron demo), you may want to have just onefully-connected layer in your machine. For more complex tasks, somearchitectures are already built-in, such as the lenet5 and lenet7architectures. The lenet5 and lenet7 classes derive from the nn_machine_cscscfclass and only specify specific sizes for the input, kernels and outputs. Asdescribed earlier, the lenet5architecture is capable of learning handwritten caracter recognition (10categories) while the lenet7 was used for object recognition (5 categories).

Your application will probably bedifferent than the MNIST or NORB applications but if your goal is objectrecognition you want to reuse the nn_machine_cscscf class and the lenet5 orlenet7 parameters but change a few of them (See the constructorsof lenet5 and lenet7 for more details). The main changes youwill have to do are:

§  number of outputs: for MNIST this is equal to 10 and forNORB it is 5. If you attempt to build a system that can recognize 15 differentcategories, this value will be 15.

§  size of the input: this is 1x32x32 for MNIST, 1x96x96 forNORB and 2x96x96 for the binocular version of NORB. If you have multiplechannels (e.g. RGB, YUV or binocular greyscale inputs), you may have to reworkthe connection table between the input and the first convolution layer.

§  connection tables: if there is no pre-configured networkwith the same number of input channels as you do, you have to modify theconnection table with the first layer. In nn_machine_cscscf, this table is thetable0 object in the constructors in EblMachines.cpp.It specifies which input channels will be convolved into which feature map. Forexample in lenet7, it's a full table of size 6 because the only greyscalechannel is connected to all 6 features maps of the first convolution. Whereasin lenet7_binocular, we have 2 channels as input and each of the 12 featuremaps may receive only 1 channel output ({0, 0}, {0, 1}, {1, 2}, {1, 3}) orcombine both ({0, 4}, {1, 4}, {0, 5}, {1, 5}, {0, 6}, {1, 6}, {0, 7}, {1, 7}).{1, 2} means that the channel 1 of the input is processed through theconvolution layer and put in the 2nd feature map. Similarly, {0, 7} and {1, 7}mean that both channels of the input are combined into the 7th feature map ofthe first layer.

§  sizes of kernels: depending on the complexity of thetasks, you may want to increase or decreases the sizes of the kernels(convolution or subsampling kernels), again specified in the constructors ofthe nn_machine_cscscf machines. Those sizes determine the number of parametersto learn in your network. The more complex, the more parameters are necessary.

Remember that once you chose your networkparameters and trained your network, you have to reuse the exact sameparameters when running it. The trained network will be saved in a singleparameter file and to be reused correctly it needs to be loaded with the exactsame network it was trained with.

You now have your dataset and a networkarchitecture, you are ready to train it.

3. Train your network

a. Make your network trainable

First you need to make your networktrainable. Currently it is a module_1_1 object (as shown in figure b) that isnot trainable but only runable (e.g. lenet7). To make it trainable, you need toencapsulate it in a module_2_1 with a loss function module which will computean error distance to the target label (e.g. plane label) so that the networkknows how much it should correct its weights in order to give the right answer.

For instance, the supervised_euclidean_machine class is a module_2_1 object that takes amodule_1_1 in its constructor and the output targets for each output class.When presented with an input image and a label, it computes the network outputvia the fprop method and computes the euclidean distance between the networkoutput and the target output. Then to learn from the errors (or minimize theenergies), the bprop method is called and backpropagates the gradients of theerrors all the way back through the network.

b. Create a trainer

The training procedure can be handled inthe supervised case by the supervised_trainer class. This class takes a trainablemachine (a module_2_1) and a LabeledDataSource to train and test on the dataset.

c. Compute the second derivatives (bbprop)

The second derivatives are used to setindividual learning rates for each parameter of the network, to help speed-upthe learning process and also improve the quality of the learning. The secondderivatives are computed over say a hundred iterations once before starting thetraining. They are back-propagated through the bbprop methodes.

To compute the second derivatives, callthe compute_diaghessian method of the trainer as follow:

thetrainer.compute_diaghessian(train_ds, iterations,0.02);

where train_ds is your training dataset, iterations is the number ofiterations, typically 100.

d. Train and test

After computing thesecond derivatives, you can iteratively train and test the network. By testingthe results on both the training and the testing sets after each trainingiteration, you will get a sense of the convergence of the training. Here is anexample of training for 100 iterations and displaying the training-set andtesting-set results at each step:

for (int i = 0; i < 100; ++i) {

    thetrainer.train(train_ds, trainmeter, gdp, 1);

    cout << "training: " << flush;

    thetrainer.test(train_ds, trainmeter, infp);

    trainmeter.display();

    cout << " testing: " << flush;

    thetrainer.test(test_ds, testmeter, infp);

    testmeter.display();

  }

Here is a typical output of what youshould see when training your network:

$ ./mnist /d/taf/data/mnist

* MNIST demo: learning handwritten digits using theeblearn C++ library *

Computing second derivatives on MNIST dataset:diaghessian inf: 0.985298 sup: 49.7398

Training network on MNIST with 2000 training samples and1000 test samples

training: [ 2000] size=2000  energy=0.19  correct=88.80%  errors=11.20% rejects=0.00%

 testing: [2000]  size=1000  energy=0.163 correct=90.50%  errors=9.50%  rejects=0.00%

training: [ 4000] size=2000  energy=0.1225  correct=93.25%  errors=6.75% rejects=0.00%

 testing: [4000]  size=1000  energy=0.121 correct=92.80%  errors=7.20%  rejects=0.00%

training: [ 6000] size=2000  energy=0.084  correct=95.45%  errors=4.55% rejects=0.00%

 testing: [6000]  size=1000  energy=0.098 correct=94.70%  errors=5.30%  rejects=0.00%

training: [ 8000] size=2000  energy=0.065  correct=96.45%  errors=3.55% rejects=0.00%

 testing: [8000]  size=1000  energy=0.095 correct=95.20%  errors=4.80%  rejects=0.00%

training: [10000] size=2000  energy=0.0545  correct=97.15%  errors=2.85% rejects=0.00%

 testing:[10000]  size=1000  energy=0.094 correct=95.80%  errors=4.20%  rejects=0.00%

4. Run your network

e. Output display of the multi-resolutiondetection provided by the Classified2Dclass (top 2 rows, below are the internal networkrepresentations). The plane is resized to different resolutions to handlemultiple scales and the output of each class for each resolution is shown onthe left. The brighter the output, the stronger response for the correspondingclass. The 3rd row of the network outputs corresponds to the plane category,which has the strongest responses in the top right of the images, yielding theplane bounding box classification on the right.

Multi-resolution detection: Classifier2D

While the Trainer class takes a module_1_1and trains it on a dataset, the Classifier2D class takes a trained network asinput (loading a 'parameter' saved in an Idx file) to detect objects in imagesof any size and at different resolution. It resizes the input image todifferent sizes based on the passed resolutions parameters and applies thenetwork at each scale. Finally, the values in the outputs of the network thatare higher than a certain threshold will return a positive detection at theposition in the image and a specific scale.

// parameter, network and classifier

// load the previously saved weights of a trained network

parameter theparam(1);

// input to the network will be 96x96 and there are 5outputs

lenet7_binocular thenet(theparam, 96, 96, 5);

theparam.load_x(mono_net.c_str());

Classifier2D cb(thenet, sz, lbl, 0.0, 0.01, 240, 320);

         

// find category of image

Idx res = cb.fprop(left.idx_ptr(), 1, 1.8, 60);

Further Reading

Here are resources that might be helpfulin understanding in more details how the supervised convolutional neuralnetworks work:

§  http://yann.lecun.com/ex/research/index.html:Yann LeCun's research in machine learning, contains many links to publishedpapers about convolutional neural networks and applications.

§  [LeCun et al., 1998]Gradient-BasedLearning Applied to Document Recognition (Proc. IEEE 1998): A long and detailedpaper on convolutional nets, graph transformer networks, and discriminativetraining methods for sequence labeling. We show how to build systems thatintegrate segmentation, feature extraction, classification, contextualpost-processing, and language modeling into one single learning machine trainedend-to-end. Applications to handwriting recognition and face detection aredescribed.

§  [LeCun et al., 1998]EfficientBackProp: all the tricksand the theory behind them to efficiently train neural networks withbackpropagation, including how to compute the optimal learning rate, how toback-propagate second derivatives, and other sundries.

§  Yann LeCun's research

§  Yann LeCun's Machine Learningclass at NYU

§  The NORB project: a5-class object recognition system using supervised neural networks.

§  The LAGR project:DARPA's Learning Applied to Ground Robots using supervised and unsupervisedneural networks algorithms.

Unsupervised Learning

Semi-supervised Learning

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章