Lecture 11 & 12 Hopfield Nets and Boltzmann Machine——Hinton課程

轉載自https://www.cnblogs.com/jesse123/p/7193308.html

注:部分課件源於Hinton的課程 Neural Networks for Machine Learning 之Hopfield Nets 和Boltzmann Machine

Lecture 11 — Hopfield Nets

Lecture 12 — Boltzmann machine learning

Ref: 能量模型(EBM)、限制波爾茲曼機(RBM)

高大上的模型和理論。

 

Lecture 11. Hopfield Nets

看了能量函數,發現:

These look very much like the weights and biases of a neural network.

 

【點到爲止】

 

Lecture 12. Boltzmann machine learning 

From: A Beginner’s Tutorial for Restricted Boltzmann Machines

Frankly, 這玩意晦澀難懂,且在卷積神經網絡的大趨勢下,工業界並沒有什麼優勢,那又何必華這麼大的力氣在此呢?

 

Each visible node takes a low-level feature from an item in the dataset to be learned. For example, from a dataset of grayscale images, each visible node would receive one pixel-value for each pixel in one image. (MNIST images have 784 pixels, so neural nets processing them must have 784 input nodes on the visible layer.)

Now let’s follow that single pixel value, x, through the two-layer net. At node 1 of the hidden layer, x is multiplied by a weight and added to a so-called bias. The result of those two operations is fed into an activation function, which produces the node’s output, or the strength of the signal passing through it, given input x.

activation f((weight w * input x) + bias b ) = output a

Alt text

Next, let’s look at how several inputs would combine at one hidden node. Each x is multiplied by a separate weight, the products are summed, added to a bias, and again the result is passed through an activation function to produce the node’s output.

Alt text

Because inputs from all visible nodes are being passed to all hidden nodes, an RBM can be defined as a symmetrical bipartite graph. 【目前爲止,與全連接相比沒有太大新意】

Symmetrical means that each visible node is connected with each hidden node (see below). Bipartite means it has two parts, or layers, and the graph is a mathematical term for a web of nodes.

At each hidden node, each input x is multiplied by its respective weight w. That is, a single input x would have three weights here, making 12 weights altogether (4 input nodes x 3 hidden nodes). The weights between two layers will always form a matrix where the rows are equal to the input nodes, and the columns are equal to the output nodes.

Each hidden node receives the four inputs multiplied by their respective weights. The sum of those products is again added to a bias (which forces at least some activations to happen), and the result is passed through the activation algorithm producing one output for each hidden node.

Alt text

If these two layers were part of a deeper neural network, the outputs of hidden layer no. 1 would be passed as inputs to hidden layer no. 2, and from there through as many hidden layers as you like until they reach a final classifying layer. (For simple feed-forward movements, the RBM nodes function as an autoencoder and nothing more.)

Alt text

【到此爲止,暫無新事】 

 

Reconstructions

But in this introduction to restricted Boltzmann machines, we’ll focus on how they learn to reconstruct data by themselves in an unsupervised fashion (unsupervised means without ground-truth labels in a test set), making several forward and backward passes between the visible layer and hidden layer no. 1 without involving a deeper network.

In the reconstruction phase, the activations of hidden layer no. 1 become the input in a backward pass. They are multiplied by the same weights, one per internode edge, just as x was weight-adjusted on the forward pass. The sum of those products is added to a visible-layer bias at each visible node, and the output of those operations is a reconstruction; i.e. an approximation of the original input. This can be represented by the following diagram:

Alt text

Because the weights of the RBM are randomly initialized, the difference between the reconstructions and the original input is often large. You can think of reconstruction error as the difference between the values of r and the input values, and that error is then backpropagated against the RBM’s weights, again and again, in an iterative learning process until an error minimum is reached.

A more thorough explanation of backpropagation is here.

As you can see, on its forward pass, an RBM uses inputs to make predictions about node activations, or the probability of output given a weighted xp(a|x; w).

But on its backward pass, when activations are fed in and reconstructions, or guesses about the original data, are spit out, an RBM is attempting to estimate the probability of inputs x given activations a, which are weighted with the same coefficients as those used on the forward pass. This second phase can be expressed as p(x|a; w).

Together, those two estimates will lead you to the joint probability distribution of inputs x and activations a, or p(x, a).

Reconstruction does something different from regression, which estimates a continous value based on many inputs, and different from classification, which makes guesses about which discrete label to apply to a given input example.

Reconstruction is making guesses about the probability distribution of the original input; i.e. the values of many varied points at once. This is known as generative learning, which must be distinguished from the so-called discriminative learning performed by classification, which maps inputs to labels, effectively drawing lines between groups of data points.

Let’s imagine that both the input data and the reconstructions are normal curves of different shapes, which only partially overlap.

To measure the distance between its estimated probability distribution and the ground-truth distribution of the input, RBMs use Kullback Leibler Divergence. A thorough explanation of the math can be found on Wikipedia.

KL-Divergence measures the non-overlapping, or diverging, areas under the two curves, and an RBM’s optimization algorithm attempts to minimize those areas so that the shared weights, when multiplied by activations of hidden layer one, produce a close approximation of the original input. On the left is the probability distibution of a set of original input, p, juxtaposed with the reconstructed distribution q; on the right, the integration of their differences.

【如何理解Boltzmann machine是生成式模型】

 

3. 深度信念網絡(DBN)

深度信念網絡(Deep Belief Network,DBN)是早期深度生成式模型的典型代表,它由多層神經元構成,這些神經元又分爲可見神經元和隱性神經元,可見單元用於接受輸入,隱單元用於提取特徵。網絡最頂上的兩層間的連接是無向的,組成聯合內存 (associative memory),較低的其他層之間有連接上下的有向連接。最底層代表了數據向量 (data vectors),每一個神經元代表數據向量的一維。

DBN的組成元件是受限玻爾茲曼機(Restricted Boltzmann Machines ,RBM)。單個RBM由兩層網絡組成:

  • 一層叫做可見層 (visible layer),由可見單元 (visible units) 組成,用於輸入訓練數據;
  • 另一層叫做隱層 (Hidden layer),由隱單元 (hidden units) 組成,用作特徵檢測器 (feature detectors)。

RBM既是一個生成模型,也是一個無監督模型,因爲它使用隱變量來描述輸入數據的分佈,而且這個過程沒有涉及數據的標籤信息。單層RBM網絡的學習目標是無監督地訓練網絡,使得可見層節點v的分佈p(v)最大可能地擬合輸入樣本所在樣本空間的真實分佈q(v)。通過計算可見向量p(v)的對數似然log p(v)的梯度來更新RBM的權值【看樣子既是局部輸入也是局部輸出】,這個計算過程涉及到了求解RBM模型所確定分佈上的期望。

對於生成式模型概率推斷過程中遇到的計算某分佈下函數的期望、計算邊緣概率分佈等複雜問題,可以採用蒙特卡洛思想近似求解。

DBN採用對比散度(Contrastive Divergence, CD-k)算法,利用Gibbs採樣的方法來估計RBM的對數似然梯度。

 

多個RBM堆疊組成一個DBN,將隱單元的激活概率(activation probabilities)作爲下一層RBM的可見層輸入數據,從底向上逐層預訓練。DBN是一種生成模型,通過訓練其神經元間的權重,我們可以讓整個神經網絡按照最大概率來生成訓練數據。

生成樣本時,使用訓練好的隨機隱單元狀態值,首先在網絡最頂兩層進行多次Gibbs採樣,生成該分佈下的採樣,然後向下傳播,得到每層的狀態和最終的樣本。 
這裏寫圖片描述

 

 

訓練

貌似有更好的改進,如下:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章