libeblearn tutorial: energy-based learning in C++

libeblearn tutorial: energy-based learning in C++

By Pierre Sermanet and Yann LeCun (New York University)




The eblearn (energy-based learning) C++library libeblearn contains machine learning algorithms whichcan be used for computer vision. The library has a generic and modulararchitecture, allowing easy prototyping and building of different algorithms(supervised or unsupervised learning) and configurations from basic modules.Those algorithms were used for a variety for applications, including roboticswith the Learning Applied to GroundRobots DARPA project (LAGR).

用c++編寫的eblearn庫包含機器學習算法,可用於計算機視覺。這個庫包含一個基本的和模塊化的結構,可以使用一個簡單的原型,也可以基於基礎模塊構建一個不同的算法。這些算法已經被很多應用使用,比如Ground Robots DARPA 項目。

Contents

·         Energy-Based Learning

·         Modular Architecture

o    Basic Modules

o    Layer Modules

o    Machine Modules

o    Trainable Machine Modules

o    GUI display

·         Supervised Learning

o    1. Build your Dataset

o    2. Build your Neural Network

o    3. Train your Network

o    4. Run your Network

o    Further Reading

·         Unsupervised Learning

·         Semi-supervised Learning

·         Energy-Based Learning 基於能量學習
    Modular Architecture
        Basic Modules基本模型
        Layer Modules分層模型
        Machine Modules機器模型
        Trainable Machine Modules可訓練的機器模型
        GUI display圖形展現
    Supervised Learning監督學習
        1. Build your Dataset構建數據庫
        2. Build your Neural Network構建神經網絡
        3. Train your Network訓練網絡
        4. Run your Network執行網絡
        Further Reading
    Unsupervised Learning非監督學習
    Semi-supervised Learning半監督學習

Energy-based learning

What is the energy-based learning?
FIXME.

More resources on energy-based models:

§  [video-lecture] Energy-based models & Learning forInvariant Image Recognition: a 4-hour video of a tutorial onenergy-based models by Yann LeCun at the Machine Learning Summer School inChicago in 2005.

§  http://www.cs.nyu.edu/~yann/research/ebm/index.html:tutorials, talks, videos and publications on EBMs.

§   

§  [LeCun et al 2006]ATutorial on Energy-Based Learning, in Bakir et al. (eds) "Predicting StructuredOutputs", MIT Press 2006: a 60-page tutorial on energy-based learning,with an emphasis on structured-output models. The tutorial includes anannotated bibliography of discriminative learning, with a simple view of CRF,maximum-margin Markov nets, and graph transformer networks.

Modular Architecture and Building Blocks

Eblearn was designed to be modular so thatany module can be arranged is different ways with other modules to formdifferent layers and different machines. There are 2 main types of modules(declared inEblArch.h):

Eblearn基於模塊設計,這樣不同的模塊可以使用其它模塊來構建成不同的分層和不同的機器學習模型。有兩種不同的模型:一個輸入一個輸出的和兩個輸入一個輸出的。

§  module_1_1: a modulewith 1 input and 1 output.

§  module_2_1: a modulewith 2 inputs and 1 output.

Each module derives from one of those twobase classes and implements the fprop, bprop, bbprop and forget methods:

這兩個模型都是基於兩個基本的類,並且是繼承fprop、bprop、bbprop和forget方法。

§  fprop: the forward-propagation method whichpropagates the input(s) through the module to the output. In the rest of thistutorial, a module will usually be described by what its fprop method does.

§  fprop:前向傳播方法,將輸入傳播到輸出。

§  bprop: the backward-propagation method whichpropagates back the output though the module and the input(s). This methodusually updates the parameters to be learned inside the module using forexample derivatives, and also propagates derivatives to the input of the moduleso that preceding modules can also be back-propagated.

§  bprop:反向傳播方法,通過模型計算訓練得出的輸出和真實輸出的差異。這個方法可以通過模型內部的學習來更新參數,如衍生,並且將衍生值傳給輸入,所以前面的模塊也可以反向傳播。

§  bbprop: the second-order backward-propagationmethod. In the case of the Lenet networks for example, this is used to computesecond-order derivatives and adapt learning rates accordingly for eachparameter to speed-up the learning process.

§  bbprop:第二層的反向傳播方法。比如在Lenet網絡中,bbprop用來計算第二層的衍生值,並且可以調整每個參數學習概率來加速學習進程。

§  forget: initializes the weights of the modules torandom, based on a forget_param_linear object.

§  forget:隨機初始化模型的權重。

Note that the type of inputs and outputsthe modules accept are state_idx objects which are temporary bufferscontaining the results of a module's processing. For example, an fprop callwill fill the x slot of the output state_idx, whereas a bprop call will fill the dx slot of the input state_idx (using the dx slot of the output state_idx).

模型接受的輸入和輸出是state_idx結構,立面存儲的是模型處理過程的中間緩存結果。比如,一個fpro的調用將會填充輸出state_idx的x維度,一個bprop的調用將會填充dx維度。

Next we describe some modules and show howthey are combined with other modules. We first talk about a few basic modules,which are then used to form layers that are again used to form machines. Notethat all the modules that we are describing (basic modules, layers andmachines) derive from module_1_1 or module_2_1 which means that you can writeand combine your own modules in your own ways, which are not restricted to theway we describe here.

接下來我們將描述這個模塊以及展現他們是怎麼通過其它模塊組合的。我們首先介紹一些基礎模塊,它們將組合成層,然後層再組合成機器模型。注意:我們這裏描述的所有模型均源於module_1_1 和module_2_1 這兩個模型,所以我們可以用自己的方法來組合我們自己的模塊。

Basic module examples

constant addition, linear, convolution and subsamplingmodules

Those basic modules (found in EblBasic.h) are usedin the LeNet architecture to perform the basic operations:

§  addc_module: thismodule adds a constant to each element of the input and puts the result in theoutput. 這個模型對輸入的每個參數都加上一個常數,然後將結果傳到輸出。

§  linear_module: thismodule does a linear combination of the input and its internal weights and putsthe result in the output. 這個模型將輸入和內部的權重做一個線性融合,然後將結果傳到輸出。

§  convolution_module_2D:convolves 2D input (dimensions 1 and 2, dimension 0 may have a size more than1) using the internal weights as kernels and puts the result in the output. 這個模型的輸入(參數個數可能爲2,1,0)使用內部權重作爲核,然後將結果傳到輸出。

§  subsampling_module_2D:subsamples 2D input (dimensions 1 and 2) and puts the result in the output. 附屬樣本的輸入

non-linear modules

These modules (EblNonLinearity.h)perform non-linear operations:

§  tanh_module: appliesthe hyperbolic tangent function on the input and puts the result in the output. 對輸入使用雙曲線正切函數,然後將結果傳到輸出。

§  stdsigmoid_module:applies the standard sigmoid function on the input and puts the result in theoutput. 對輸入使用S形的標準雙曲線,然後將結果輸出。

Layer module examples

These layers (EblLayers.h) arebuilt by stacking the basic modules described previously on top of each otherto form more complicated operations:

這個層次由基本模塊堆合起來,用於實現更復雜的功能。

§  nn_layer_full: afully-connected layer which performs a linear combination of the input and theinternal weights, adds a bias and applies a sigmoid. As always, the result isput in the output. This layer is build by stacking up alinear_module_replicable (see Replicability), anaddc_module and a tanh_module.一個全連接的層,對輸入和內部權重做線性組合,同時採用S曲線方法增加一個偏差,將結果傳到輸出。這個層由linear_module_replicable、addc_module和tanh_module組合而成。

§   

§  nn_layer_convolution:a convolution layer which performs a 2D convolution on the input, adds a biasand applies a sigmoid, putting the result in the output. This layer is build bystacking up a convolution_module_2D_replicable, an addc_module and atanh_module.卷基層,對輸入進行一個二維卷積,然後使用sigmoid增加一個偏差,結果傳入輸出。這個層由convolution_module_2D_replicable、addc_module和tanh_module組合而成。

§  nn_layer_subsampling:a subsampling layer which subsamples the input, adds a bias and applies asigmoid, putting the results in the output. This layer is build by stacking upa subsampling_module_2D_replicable, an addc_module and a tanh_module.對輸入進行子抽樣,然後使用sigmoid增加一個偏差,結果傳入輸出。這個層由subsampling_module_2D_replicable、addc_module和tanh_module組成。

Machine module examples

Like the layers are built by assemblingbasic modules, machines (EblMachines.h) can bebuilt by assembling layers together, for instance the following machines:

b. Overview of the architecture of thesupervised_euclidean_machine, a module_2_1 (2 inputs and 1 output) that combinemany module_1_1 (1 input and 1 output) modules and one module_2_1, theeuclidean_module. This architecture is useful during the training phase wherethe bprop method is called, whereas during the testing phase only the lenet7block will be used and only the fprop method will be called. Note thatstate_idx::x are temporary buffers containing outputs of fprops in betweenmodules and state_idx::dx contain outputs of bprops.

監督的歐幾里得模型架構,是一個兩輸入一輸出的模型,由多個一輸入一輸出和1個二輸入一輸出的模型組合而成。在訓練階段,bprop方法會被調用,在測試階段,lenaet7模塊會被使用並且fprop方法會被調用。需要注意的是,state_idx::x這個變量是fprop輸出的暫存變量,idx::dx是bprop的暫存變量。

§  nn_machine_cscscf: aLeNet type machine which calls in order the following layers: convolution (c),subsampling (s), convolution (c), subsampling (s), convolution (c) and finallya fully-connected layer (f). This machine is parametrized by the size of theinput, the sizes of the convolution and subsampling kernels, the size of thefully connected layer output and the number of outputs.LeNet類型的機器模型在下面幾層中依次調用:卷積層、子抽樣層、卷積層、子抽樣層、卷積層,最後是一個全連接層。這個模型的大小由輸入的容量、卷積和子抽樣的核、全連接的大小及輸出的數量決定。

§  lenet5: this machineis a nn_machine_cscscf with a particular configuration, it takes a 32x32 input,then applies 5x5 convolution kernels, 2x2 subsampling kernels, 5x5convolutions, 2x2 subsamplings, 1x1 convolutions and full connections between a120-dimensional input to 10 outputs. This specific network is used for the10-digits handwriten caracters recognition (see MNIST demo).lenet5,這個模型是一個nn_machine_cscscf結構,包括32*32的輸入,5*5的卷積核,2*2的子抽樣核,5*5的卷積核,2*2的子抽樣核以及1*1的卷積核全連接,有120維的輸入和10維的輸出。可用於手寫字母的識別。

§  lenet7: similarly tolenet5, this machine is a nn_machine_cscscf with a particular configuration. Ittakes a 96x96 input, then applies 5x5 convolution kernels, 4x4 subsamplingkernels, 6x6 convolutions, 3x3 subsamplings, 6x6 convolutions and fullconnections between a 100-dimensional input to 5 outputs. This network wasspecifically designed for the NORB demo but can be an inspiration for similarobject recognition tasks.

§  lenet7_binocular:This network is almost identical to lenet7 except that it accepts stereoscopicimages as input.

a. The lenet7_binocular neural networkarchitecture: there are 2 stereoscopic 96x96 images as input, 5x5 convolutionsare performed, then 4x4 subsampling, 6x6 convolutions, 3x3 subsampling, 6x6convolutions and full connections to the 5 outputs.

Trainable machine modules

The modules described in the previoussections need to be encapsulated in a module_2_1 with a loss function modulesin order to be trained supervised. For example, to train the nn_machine_cscscfmachines, we combine it with a euclidean cost module:

上面描述的模型需要使用二輸入一輸出和損失函數模型來封裝,用以監督訓練。例如,爲了訓練nn_machine_cscscf,我們將它與一個歐式損失模型組合。

§  supervised_euclidean_machine:a module_2_1 machine containing a module_1_1 (in the NORB demo case a lenet7machine) and a euclidean_module. Thus this machines takes an input which ispropagated in the module_1_1 and a groundtruth label as second input. Theoutput of the module_1_1 is then compared to the groundtruth using theeuclidean_module which takes the squared distance between the output and thegroundtruth. During the training phase, the weights are then modified based onthe gradient of the error in the back-propagation process.

Module Replicability

Modules usually operate on a specificnumber of dimensions, for example the convolution_module_2D only accepts inputswith 3 dimensions (because it applies 2D convolution on dimensions 1 and 2,dimension 0 is used according to a connection table). Thus if extra dimensionsare present (e.g. 4D or 5D) one might want to loop over the extra dimensionsand call the convolution_module_2 on each 3D subsets. We call thisreplicability because the module is replicated over the 3rd and 4th dimensions(the output also has 2 extra dimensions).

To make a module replicable, use theDECLARE_REPLICABLE_MODULE_1_1 macro (in EblArch.h). It willautomatically declare your module_1_1 as replicable and loop over extradimensions if present. For example, here is the code to declare theconvolution_module_2D as replicable:

DECLARE_REPLICABLE_MODULE_1_1(linear_module_replicable,

                              linear_module,

                              (parameter&p, intg in, intg out),

                              (p, in, out));

where linear_module_replicable is the name of the newmodule, linear_module is the name of the basemodule, (parameter &p, intg in, intg out) is the prototype of theparameters to the constructor of the module and (p, in, out) the parametersthemselves.

c. Example display of the eblearn GUI. Acall tomodule_1_1::display_fprop will display all the internal feature mapsand their corresponding description and sizes, providing that each module implementedtheir display method. On top are the input images, at the bottom the networkoutputs.

eblearn的圖形功能,調用tomodule_1_1::display_fprop 會展現所有的內部特徵以及他們之間相互描述和尺寸。

GUI display

If the QT library is present on yoursystem, the eblearn project will automatically compile the GUI librariescorresponding to libidx and libeblearn. Thus it produces the libeblearnguilibrary which provides display functions for the eblearn modules. For instance,it implements the display_fpropmethods for each module_1_1, thus allowing to display every stage ofthe internal representations of the neural network by calling that method onthe top-level module (See figure c).如果你的機器安裝了QT庫,eblearn編譯的時候會自動編譯GUI庫,並鏈接到libidx和libeblearn,他將提供libeblearngui庫。

For more details about the GUI features,refer to the GUI section in the libidx tutorial. Briefly, the libidxguiprovides a global object "gui" which can open new windows (gui.new_window()),draw matrices (gui.draw_matrix()), draw text (gui << at(42, 42)<< "42" << endl) and some other functionalities.

Supervised Learning

Eblearn provides supervised learningalgorithms as well as semi-supervised and unsupervised ones. They can be usedindependently or combined, we will focus on supervised algorithms only in thissection.

1. Build your dataset

What data should you provide to the network?

Creating datasets from image directories

Once you have grouped all your images indifferent directories for each class, call the dscompile tool to transform them into a datasetobject (a matrix of images with their corresponding label). This tool willextract the label information from the file directory structure: at the sourcelevel, each directory is a class named after the directory name. Then allimages found in subdirectories (regardless of their names or hierarchy) areassigned the same label and are added to the dataset.

For example with the following directorystructrure, dscompile will automatically build a dataset with 2 differentlabels, "car" and "human" and will contain 5 images intotal:

/data$ ls -R *

car:

auto.png  car01.png

 

human:

human1  human2

 

human/human1:

img01.png

 

human/human2:

img01.png img02.png

dscompile will create the following files:

1.   dset_data.mat: a ubyte matrix of size N x M x height x width , where N is the number of images and Mthe number of channels which can be 1 or 3 if the input images are in greyscaleor color or 2 or 6 if two (stereoscopic) images are passed instead of one. Hereare the possible channel configurations: G or GG, RGB or RGBRGB.

2.   dset_labels.mat: an int matrix of size Nclasses containing labels id numbers, e.g. 0, 1,...

3.   dset_classes.mat: a ubyte matrix of size Nclasses x 128 containing the names of each class.

For more details, see the tools manuals of dscompiledssplitdsmerge, and dsdisplay.

d. Example display of the MNIST datasetafter a call toLabeledDataSource::display(), which displays groundtruth(left), correct and incorrect answers (middle) and incorrect only (right).

Loading datasets into a LabeledDataSource object

2. Build your network

The eblearn neural networks are build bycombining modules together as introduced at the beginning of this tutorial.Depending on your application, you may want to write your own modules orcombine already existing modules in different ways. For example for simpleapplications (See Perceptron demo), you may want to have just onefully-connected layer in your machine. For more complex tasks, somearchitectures are already built-in, such as the lenet5 and lenet7architectures. The lenet5 and lenet7 classes derive from the nn_machine_cscscfclass and only specify specific sizes for the input, kernels and outputs. Asdescribed earlier, the lenet5architecture is capable of learning handwritten caracter recognition (10categories) while the lenet7 was used for object recognition (5 categories).

Your application will probably bedifferent than the MNIST or NORB applications but if your goal is objectrecognition you want to reuse the nn_machine_cscscf class and the lenet5 orlenet7 parameters but change a few of them (See the constructorsof lenet5 and lenet7 for more details). The main changes youwill have to do are:

§  number of outputs: for MNIST this is equal to 10 and forNORB it is 5. If you attempt to build a system that can recognize 15 differentcategories, this value will be 15.

§  size of the input: this is 1x32x32 for MNIST, 1x96x96 forNORB and 2x96x96 for the binocular version of NORB. If you have multiplechannels (e.g. RGB, YUV or binocular greyscale inputs), you may have to reworkthe connection table between the input and the first convolution layer.

§  connection tables: if there is no pre-configured networkwith the same number of input channels as you do, you have to modify theconnection table with the first layer. In nn_machine_cscscf, this table is thetable0 object in the constructors in EblMachines.cpp.It specifies which input channels will be convolved into which feature map. Forexample in lenet7, it's a full table of size 6 because the only greyscalechannel is connected to all 6 features maps of the first convolution. Whereasin lenet7_binocular, we have 2 channels as input and each of the 12 featuremaps may receive only 1 channel output ({0, 0}, {0, 1}, {1, 2}, {1, 3}) orcombine both ({0, 4}, {1, 4}, {0, 5}, {1, 5}, {0, 6}, {1, 6}, {0, 7}, {1, 7}).{1, 2} means that the channel 1 of the input is processed through theconvolution layer and put in the 2nd feature map. Similarly, {0, 7} and {1, 7}mean that both channels of the input are combined into the 7th feature map ofthe first layer.

§  sizes of kernels: depending on the complexity of thetasks, you may want to increase or decreases the sizes of the kernels(convolution or subsampling kernels), again specified in the constructors ofthe nn_machine_cscscf machines. Those sizes determine the number of parametersto learn in your network. The more complex, the more parameters are necessary.

Remember that once you chose your networkparameters and trained your network, you have to reuse the exact sameparameters when running it. The trained network will be saved in a singleparameter file and to be reused correctly it needs to be loaded with the exactsame network it was trained with.

You now have your dataset and a networkarchitecture, you are ready to train it.

3. Train your network

a. Make your network trainable

First you need to make your networktrainable. Currently it is a module_1_1 object (as shown in figure b) that isnot trainable but only runable (e.g. lenet7). To make it trainable, you need toencapsulate it in a module_2_1 with a loss function module which will computean error distance to the target label (e.g. plane label) so that the networkknows how much it should correct its weights in order to give the right answer.

For instance, the supervised_euclidean_machine class is a module_2_1 object that takes amodule_1_1 in its constructor and the output targets for each output class.When presented with an input image and a label, it computes the network outputvia the fprop method and computes the euclidean distance between the networkoutput and the target output. Then to learn from the errors (or minimize theenergies), the bprop method is called and backpropagates the gradients of theerrors all the way back through the network.

b. Create a trainer

The training procedure can be handled inthe supervised case by the supervised_trainer class. This class takes a trainablemachine (a module_2_1) and a LabeledDataSource to train and test on the dataset.

c. Compute the second derivatives (bbprop)

The second derivatives are used to setindividual learning rates for each parameter of the network, to help speed-upthe learning process and also improve the quality of the learning. The secondderivatives are computed over say a hundred iterations once before starting thetraining. They are back-propagated through the bbprop methodes.

To compute the second derivatives, callthe compute_diaghessian method of the trainer as follow:

thetrainer.compute_diaghessian(train_ds, iterations,0.02);

where train_ds is your training dataset, iterations is the number ofiterations, typically 100.

d. Train and test

After computing thesecond derivatives, you can iteratively train and test the network. By testingthe results on both the training and the testing sets after each trainingiteration, you will get a sense of the convergence of the training. Here is anexample of training for 100 iterations and displaying the training-set andtesting-set results at each step:

for (int i = 0; i < 100; ++i) {

    thetrainer.train(train_ds, trainmeter, gdp, 1);

    cout << "training: " << flush;

    thetrainer.test(train_ds, trainmeter, infp);

    trainmeter.display();

    cout << " testing: " << flush;

    thetrainer.test(test_ds, testmeter, infp);

    testmeter.display();

  }

Here is a typical output of what youshould see when training your network:

$ ./mnist /d/taf/data/mnist

* MNIST demo: learning handwritten digits using theeblearn C++ library *

Computing second derivatives on MNIST dataset:diaghessian inf: 0.985298 sup: 49.7398

Training network on MNIST with 2000 training samples and1000 test samples

training: [ 2000] size=2000  energy=0.19  correct=88.80%  errors=11.20% rejects=0.00%

 testing: [2000]  size=1000  energy=0.163 correct=90.50%  errors=9.50%  rejects=0.00%

training: [ 4000] size=2000  energy=0.1225  correct=93.25%  errors=6.75% rejects=0.00%

 testing: [4000]  size=1000  energy=0.121 correct=92.80%  errors=7.20%  rejects=0.00%

training: [ 6000] size=2000  energy=0.084  correct=95.45%  errors=4.55% rejects=0.00%

 testing: [6000]  size=1000  energy=0.098 correct=94.70%  errors=5.30%  rejects=0.00%

training: [ 8000] size=2000  energy=0.065  correct=96.45%  errors=3.55% rejects=0.00%

 testing: [8000]  size=1000  energy=0.095 correct=95.20%  errors=4.80%  rejects=0.00%

training: [10000] size=2000  energy=0.0545  correct=97.15%  errors=2.85% rejects=0.00%

 testing:[10000]  size=1000  energy=0.094 correct=95.80%  errors=4.20%  rejects=0.00%

4. Run your network

e. Output display of the multi-resolutiondetection provided by the Classified2Dclass (top 2 rows, below are the internal networkrepresentations). The plane is resized to different resolutions to handlemultiple scales and the output of each class for each resolution is shown onthe left. The brighter the output, the stronger response for the correspondingclass. The 3rd row of the network outputs corresponds to the plane category,which has the strongest responses in the top right of the images, yielding theplane bounding box classification on the right.

Multi-resolution detection: Classifier2D

While the Trainer class takes a module_1_1and trains it on a dataset, the Classifier2D class takes a trained network asinput (loading a 'parameter' saved in an Idx file) to detect objects in imagesof any size and at different resolution. It resizes the input image todifferent sizes based on the passed resolutions parameters and applies thenetwork at each scale. Finally, the values in the outputs of the network thatare higher than a certain threshold will return a positive detection at theposition in the image and a specific scale.

// parameter, network and classifier

// load the previously saved weights of a trained network

parameter theparam(1);

// input to the network will be 96x96 and there are 5outputs

lenet7_binocular thenet(theparam, 96, 96, 5);

theparam.load_x(mono_net.c_str());

Classifier2D cb(thenet, sz, lbl, 0.0, 0.01, 240, 320);

         

// find category of image

Idx res = cb.fprop(left.idx_ptr(), 1, 1.8, 60);

Further Reading

Here are resources that might be helpfulin understanding in more details how the supervised convolutional neuralnetworks work:

§  http://yann.lecun.com/ex/research/index.html:Yann LeCun's research in machine learning, contains many links to publishedpapers about convolutional neural networks and applications.

§  [LeCun et al., 1998]Gradient-BasedLearning Applied to Document Recognition (Proc. IEEE 1998): A long and detailedpaper on convolutional nets, graph transformer networks, and discriminativetraining methods for sequence labeling. We show how to build systems thatintegrate segmentation, feature extraction, classification, contextualpost-processing, and language modeling into one single learning machine trainedend-to-end. Applications to handwriting recognition and face detection aredescribed.

§  [LeCun et al., 1998]EfficientBackProp: all the tricksand the theory behind them to efficiently train neural networks withbackpropagation, including how to compute the optimal learning rate, how toback-propagate second derivatives, and other sundries.

§  Yann LeCun's research

§  Yann LeCun's Machine Learningclass at NYU

§  The NORB project: a5-class object recognition system using supervised neural networks.

§  The LAGR project:DARPA's Learning Applied to Ground Robots using supervised and unsupervisedneural networks algorithms.

Unsupervised Learning

Semi-supervised Learning

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章