theano學習指南--玻爾茲曼機(RBM)(翻譯)

歡迎fork我的github:https://github.com/zhaoyu611/DeepLearningTutorialForChinese

最近在學習Git,所以正好趁這個機會,把學習到的知識實踐一下~ 看完DeepLearning的原理,有了大體的瞭解,但是對於theano的代碼,還是自己擼一遍印象更深 所以照着deeplearning.net上的代碼,重新寫了一遍,註釋部分是原文翻譯和自己的理解。 感興趣的小夥伴可以一起完成這個工作哦~ 有問題歡迎聯繫我 Email: [email protected] QQ: 3062984605


基於能量的模型(EBM)

基於能量的模型是將每個變量的能量進行整合。通過學習,可以使模型擁有期望的屬性。例如,我們想要變量有較低的能量,則定義基於能量的概率模型根據能量函數定義概率分佈如下:

這裏寫圖片描述 (1)

其中正則化因子這裏寫圖片描述稱爲配分函數:

這裏寫圖片描述

基於能量的模型的訓練可以是對訓練數據的負對數似然函數 進行(隨機)梯度計算。對於logistic迴歸,首先定義log似然函數,然後損失函數爲負對數函數。

這裏寫圖片描述

隨機梯度爲這裏寫圖片描述,其中這裏寫圖片描述爲模型的參數。

帶隱藏單元的EBMs

通常情況下,不需要獲得完整的x ,或者想要考慮一些非觀察量來提高模型的表達力,所以用觀察部分(仍然用x 表示)和隱藏部分h ,可以寫成:
這裏寫圖片描述(2)
該公式與公式(1)相似。我們引入(從物理學的啓發)自由能的概念,定義如下:

這裏寫圖片描述(3)
因此,有下列公式:
這裏寫圖片描述
數據的負對數似然函數的梯度有特殊的形式:

這裏寫圖片描述 (4)
注意到上述梯度包含兩部分,分別爲正項和負項。正項和負項不代表公式中各項的符號,而是代表模型中它們對概率密度的影響。第一項增加了訓練數據的概率(減少自由能的相關性),第二項較少了概率。
通常很難解析該梯度,因爲它包含這裏寫圖片描述的計算。因爲根據模型中的分佈P ,不能求解輸入量x 的所有配置形式。
計算過程中第一步是固定模型樣本數量下估計期望。樣本用來估計負數部分梯度,我們用這裏寫圖片描述來表示。梯度可以寫成:

這裏寫圖片描述 (5)
我們根據P這裏寫圖片描述中採樣這裏寫圖片描述(例如蒙特卡洛採樣)。根據上述公式,我們幾乎得到了一個實際的、隨機的算法來學習EBM。唯一缺少的因素是如何提取負粒子這裏寫圖片描述
關於採樣方法的相關文獻中,馬爾科夫鏈蒙特卡洛法特別適用類似受限玻爾茲曼機(RBM)的模型,即一個具體的EBM模型。

受限玻爾茲曼機(RBM)

受限玻爾茲曼機是對數線性馬爾科夫隨機場(MRF)的特殊形式。例如,能量模型是線性的,而其中參數是可變的。爲了讓參數能更好的表示複雜分佈(例如從有限的參數設置到無參數設置),我們考慮部分變量不做觀察(它們稱爲隱藏)。爲了獲得更多的隱藏變量(也稱作隱藏單元),我們可以擴充玻爾茲曼機(BM)的模型容量。受限玻爾茲曼機是BM的受限形式,它不包括可見-可見和隱藏-隱藏之間的連接。RBM的圖片描述如下所示:
這裏寫圖片描述
RBM的能量函數E(v,h) 定義如下:
這裏寫圖片描述 (6)
其中,W 代表連接隱層和可見層的權重,bc 分別代表可見層和隱層的偏置。
自由能的公式可表示爲:
這裏寫圖片描述
考慮到RBMs的特殊結構,可見層和隱層是條件獨立的,即給定其中一個,可知另一個。利用該屬性,得到以下公式:
這裏寫圖片描述

二進制的RBMs

在通常的二進制單元(vjhi0,1 )的學習過程中,根據公式(6)和 (2),神經元激活函數的概率形式爲:
P(hi=1|v)=sigm(ci+Wiv) (7)
P(vj=1|h)=sigm(bj+Wjh) (8)
二進制的RBM的自由能可以簡化爲:
這裏寫圖片描述(9)

二進制RBM的更新函數

比較公式(5)和(9),我們得到一個二進制RBM的對數似然函數的梯度計算:

這裏寫圖片描述(10)
如果想了解上述公式的更多細節,建議讀者閱讀以下網頁,或者 Learning Deep Architectures for AI的第五部分。我們不使用上述公式,而是根據公式(4)利用Theano T.grad得到梯度。

RBM的採樣

p(x) 的樣本可以通過運行馬爾科夫鏈直至收斂獲得,使用Gibbs採樣進行轉換操作。
N 個隨機變量這裏寫圖片描述的聯合的Gibbs採樣是通過N 個子步驟這裏寫圖片描述完成的。其中,這裏寫圖片描述S 集合中除了這裏寫圖片描述的其餘N1 個變量。
對於RBMs,S 由可見單元和隱藏單元的集合組成。然而,因爲它們是條件獨立的,所以可以使用塊Gibbs採樣。在這一背景下,對可見單元進行採樣同時給定隱藏單元的固定值。相似的,對隱藏單元進行採樣同時給定可見單元的固定值。一步馬爾科夫鏈可以表示爲:
這裏寫圖片描述
其中這裏寫圖片描述代表第n 步馬爾科夫鏈中所有隱藏單元的集合。例如,這裏寫圖片描述這裏寫圖片描述的概率隨機選擇爲0(相對爲1)。相似的,這裏寫圖片描述這裏寫圖片描述概率隨機選擇爲1(相對爲0)。
下圖爲說明示例:
這裏寫圖片描述
這裏寫圖片描述時,樣本這裏寫圖片描述是概率這裏寫圖片描述選擇的樣本。
理論上,學習過程中每個參數的更新要求運行這樣的鏈直至收斂。毫無疑問,進行該操作是十分耗時耗力的。因此,從RBMs中衍生出若干算法,能夠在學習過程中有效的從這裏寫圖片描述進行採樣。

對比散度(CD-k)

對比散度有兩個技巧可以加速採樣過程:

  • 因爲我們最終目的是這裏寫圖片描述(得到真正的數據分佈),用訓練數據初始化馬爾科夫鏈(例如,一個分佈期望接近p ,那麼馬爾科夫鏈就趨向最終分佈p )。
  • CD不需要等待鏈式收斂。只需要進行k步Gibbs採樣,就能獲取樣本。實際上,k=1 就能表示出很好的效果。

persisitent CD

persisitent CD [Tieleman08] 使用另一種類似方法從p(v,h) 中採樣。它依賴單馬爾科夫鏈,該鏈具有固定狀態(例如,不會對每個觀察案例重啓一個鏈)。對於每個參數的更新,通過k 步鏈式運算,提取新的樣本。鏈的狀態保存隨後的更新。
直觀感受是相比鏈的混合速率,如果參數更新足夠小,馬爾科夫鏈不能捕獲模型中的改變。

執行

我們構造一個RBM類。網絡的參數可以在初始化時確定,也可以作爲參數傳入類。當把RBM作爲深度網絡的一個模塊時,這一可選類型是十分有用的:權重矩陣和隱層偏置與MLP網絡的sigmoid層可以共享參數。

class RBM(object):
    """Restricted Boltzmann Machine (RBM)  """
    def __init__(
        self,
        input=None,
        n_visible=784,
        n_hidden=500,
        W=None,
        hbias=None,
        vbias=None,
        numpy_rng=None,
        theano_rng=None
    ):
        """
        RBM constructor. Defines the parameters of the model along with
        basic operations for inferring hidden from visible (and vice-versa),
        as well as for performing CD updates.

        :param input: None for standalone RBMs or symbolic variable if RBM is
        part of a larger graph.

        :param n_visible: number of visible units

        :param n_hidden: number of hidden units

        :param W: None for standalone RBMs or symbolic variable pointing to a
        shared weight matrix in case RBM is part of a DBN network; in a DBN,
        the weights are shared between RBMs and layers of a MLP

        :param hbias: None for standalone RBMs or symbolic variable pointing
        to a shared hidden units bias vector in case RBM is part of a
        different network

        :param vbias: None for standalone RBMs or a symbolic variable
        pointing to a shared visible units bias
        """

        self.n_visible = n_visible
        self.n_hidden = n_hidden

        if numpy_rng is None:
            # create a number generator
            numpy_rng = numpy.random.RandomState(1234)

        if theano_rng is None:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))

        if W is None:
            # W is initialized with `initial_W` which is uniformely
            # sampled from -4*sqrt(6./(n_visible+n_hidden)) and
            # 4*sqrt(6./(n_hidden+n_visible)) the output of uniform if
            # converted using asarray to dtype theano.config.floatX so
            # that the code is runable on GPU
            initial_W = numpy.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    size=(n_visible, n_hidden)
                ),
                dtype=theano.config.floatX
            )
            # theano shared variables for weights and biases
            W = theano.shared(value=initial_W, name='W', borrow=True)

        if hbias is None:
            # create shared variable for hidden units bias
            hbias = theano.shared(
                value=numpy.zeros(
                    n_hidden,
                    dtype=theano.config.floatX
                ),
                name='hbias',
                borrow=True
            )

        if vbias is None:
            # create shared variable for visible units bias
            vbias = theano.shared(
                value=numpy.zeros(
                    n_visible,
                    dtype=theano.config.floatX
                ),
                name='vbias',
                borrow=True
            )

        # initialize input layer for standalone RBM or layer0 of DBN
        self.input = input
        if not input:
            self.input = T.matrix('input')

        self.W = W
        self.hbias = hbias
        self.vbias = vbias
        self.theano_rng = theano_rng
        # **** WARNING: It is not a good idea to put things in this list
        # other than shared variables created in this function.
        self.params = [self.W, self.hbias, self.vbias]

下一步是根據公式(7)-(8)構造函數,代碼如下:

def propup(self, vis):
        '''This function propagates the visible units activation upwards to
        the hidden units

        Note that we return also the pre-sigmoid activation of the
        layer. As it will turn out later, due to how Theano deals with
        optimizations, this symbolic variable will be needed to write
        down a more stable computational graph (see details in the
        reconstruction cost function)

        '''
        pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_h_given_v(self, v0_sample):
        ''' This function infers state of hidden units given visible units '''
        # compute the activation of the hidden units given a sample of
        # the visibles
        pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
        # get a sample of the hiddens given their activation
        # Note that theano_rng.binomial returns a symbolic sample of dtype
        # int64 by default. If we want to keep our computations in floatX
        # for the GPU we need to specify to return the dtype floatX
        h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
                                             n=1, p=h1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_h1, h1_mean, h1_sample]
def propdown(self, hid):
        '''This function propagates the hidden units activation downwards to
        the visible units

        Note that we return also the pre_sigmoid_activation of the
        layer. As it will turn out later, due to how Theano deals with
        optimizations, this symbolic variable will be needed to write
        down a more stable computational graph (see details in the
        reconstruction cost function)

        '''
        pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_v_given_h(self, h0_sample):
        ''' This function infers state of visible units given hidden units '''
        # compute the activation of the visible given the hidden sample
        pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
        # get a sample of the visible given their activation
        # Note that theano_rng.binomial returns a symbolic sample of dtype
        # int64 by default. If we want to keep our computations in floatX
        # for the GPU we need to specify to return the dtype floatX
        v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
                                             n=1, p=v1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_v1, v1_mean, v1_sample]

我們可以用上述函數描述Gibbs採樣過程。這裏,定義兩個函數:

  • gibbs_vhv從可見單元開始執行一步採樣過程,該函數對於RBM的採樣十分有用。
  • gibbs_hvh從隱層單元開始執行一步採樣過程,該函數對於CD和PCD的更新十分有用。
    代碼如下:
 def gibbs_hvh(self, h0_sample):
        ''' This function implements one step of Gibbs sampling,
            starting from the hidden state'''
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
        return [pre_sigmoid_v1, v1_mean, v1_sample,
                pre_sigmoid_h1, h1_mean, h1_sample]
def gibbs_vhv(self, v0_sample):
        ''' This function implements one step of Gibbs sampling,
            starting from the visible state'''
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
        return [pre_sigmoid_h1, h1_mean, h1_sample,
                pre_sigmoid_v1, v1_mean, v1_sample]

注意函數要求未sigmoid激活的值作爲輸入量。如果想深入瞭解這樣做的原因,那麼需要了解Theano的工作原理。當編譯Theano函數時,計算圖中輸入量的速度和穩定性得到優化,這是通過改變子圖中若干部分實現的。這樣的優化代表softplus中log(sigmoid(x))項。對於交叉熵,當sigmoid值大於30(結果趨於1就需要這樣的優化。當sigmoid值小於-30(結果趨於0),則Theano計算log(0),最終代價爲-inf或者NaN。通常情況下,softplus中log(sigmoid(x))項會得到正常值。但這裏遇到特殊情況:sigmoid在scan優化內部,log在外部。因此,Theano會執行log(scan(…))而不是log(sigmoid(…)),也不會進行優化。我們找不到替代scan中sigmoid的方法,因爲只需要在最後一步執行。最簡單有效的辦法是輸出未sigmoid的值,在scan之外同時應用log和sigmoid。
RBM類構造了自由能函數,用於計算參數的梯度(見公式4)。注意函數中,同樣輸出未sigmoid量。

def free_energy(self, v_sample):
        ''' Function to compute the free energy '''
        wx_b = T.dot(v_sample, self.W) + self.hbias
        vbias_term = T.dot(v_sample, self.vbias)
        hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
        return -hidden_term - vbias_term

構造get_cost_updates函數,輸出CD-k和PCD-k更新的梯度。

 def get_cost_updates(self, lr=0.1, persistent=None, k=1):
        """This functions implements one step of CD-k or PCD-k

        :param lr: learning rate used to train the RBM

        :param persistent: None for CD. For PCD, shared variable
            containing old state of Gibbs chain. This must be a shared
            variable of size (batch size, number of hidden units).

        :param k: number of Gibbs steps to do in CD-k/PCD-k

        Returns a proxy for the cost and the updates dictionary. The
        dictionary contains the update rules for weights and biases but
        also an update of the shared variable used to store the persistent
        chain, if one is used.

        """

        # compute positive phase
        pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)

        # decide how to initialize persistent chain:
        # for CD, we use the newly generate hidden sample
        # for PCD, we initialize from the old state of the chain
        if persistent is None:
            chain_start = ph_sample
        else:
            chain_start = persistent

注意到get_cost_updates有一個persistent的參數。因此,我們可以使用同一段代碼執行CD和PCD。使用PCD時,persistent 是一個包含上次Gibbs採樣的共享參數。
如果persistent None,那麼在正項中對隱藏層樣本初始化Gibbs鏈,執行CD。當決定了鏈的起始點,就能得到該鏈所有用於梯度計算(見公式4的樣本。使用Theano提供的scan 來執行。該函數的使用建議讀者閱讀該鏈接

 # perform actual negative phase
        # in order to implement CD-k/PCD-k we need to scan over the
        # function that implements one gibbs step k times.
        # Read Theano tutorial on scan for more information :
        # http://deeplearning.net/software/theano/library/scan.html
        # the scan will return the entire Gibbs chain
        (
            [
                pre_sigmoid_nvs,
                nv_means,
                nv_samples,
                pre_sigmoid_nhs,
                nh_means,
                nh_samples
            ],
            updates
        ) = theano.scan(
            self.gibbs_hvh,
            # the None are place holders, saying that
            # chain_start is the initial state corresponding to the
            # 6th output
            outputs_info=[None, None, None, None, None, chain_start],
            n_steps=k,
            name="gibbs_hvh"
        )

生成Gibbs鏈之後,從鏈末端進行採樣,從而得到負項的自由能。注意到chain_end是一個代表模型參數數量的Theano的符號變量。如果應用* T.grad*,那麼該函數會通過Gibbs鏈得到梯度。這不是我們期望的(這會混淆梯度),而使用T.grad中的consider_constant 可以實現將T.grad * chain_end*作爲常量的要求。

 # determine gradients on RBM parameters
        # note that we only need the sample at the end of the chain
        chain_end = nv_samples[-1]

        cost = T.mean(self.free_energy(self.input)) - T.mean(
            self.free_energy(chain_end))
        # We must not compute the gradient through the gibbs sampling
        gparams = T.grad(cost, self.params, consider_constant=[chain_end])

最後,利用scan(它包含theano_rng隨機狀態的更新規則)求出更新字典。對於PCD,同時需要更新Gibbs鏈狀態的共享變量。

 # constructs the update dictionary
        for gparam, param in zip(gparams, self.params):
            # make sure that the learning rate is of the right dtype
            updates[param] = param - gparam * T.cast(
                lr,
                dtype=theano.config.floatX
            )
        if persistent:
            # Note that this works only if persistent is a shared variable
            updates[persistent] = nh_samples[-1]
            # pseudo-likelihood is a better proxy for PCD
            monitoring_cost = self.get_pseudo_likelihood_cost(updates)
        else:
            # reconstruction cross-entropy is a better proxy for CD
            monitoring_cost = self.get_reconstruction_cost(updates,
                                                           pre_sigmoid_nvs[-1])

        return monitoring_cost, updates

進度跟蹤

RBMs的訓練有很多技巧。考慮到公式(1)的配分函數,不能在訓練過程中估計log似然函數log(P(x)) 。因此無法直接獲得用於超參數選擇的指標。

負樣本的檢驗

訓練過程中負樣本的獲取是可見的。通過訓練,RBM定義的模型的越來越接近真實分佈ptrain(x) 。負樣本是從訓練集中獲得的樣本。顯然,壞的超參數會被丟棄。

可見濾波檢驗

模型的濾波學習過程是可見的。各個單元的權重組成灰度圖(變換爲方陣)。過濾器在數據中選擇最強的特徵。特徵在原始MNIST上並不明顯,就想探針一樣的存在。 training on natural images lead to Gabor like filters if trained in conjunction with a sparsity criteria.(這句沒看懂)

似然函數的替代

可用其他函數來代替似然函數。使用PCD訓練RBM時,可用僞似然函數代替。僞似然函數(Pseudo likehood,PL)的計算量更小,當然該算法假設各參數是相互獨立的。因此:
這裏寫圖片描述
xi 代表除了i 以外所有x 的集合。log-PL是所有xi 的log概率總和。MNIST的輸入有784個維度,計算量相當大。因此,採用隨機近似log-PL。
這裏寫圖片描述
上式是求指定i 的似然函數,其中N 是可見單元的數量。對於二進制單元,引入x~i 代表x 的相反數(1->0, 0->1)。二進制RBM的log-PL寫做:

通過RBM類的get_cost_updates函數得到代價和更新。需要注意的是,更新字典中增加了索引i 。其中i{0,1,...,N} ,遍歷整個集合。
CD訓練輸入和重構之間(與降噪自編碼相同)的交叉熵代價比僞log似然函數更可靠。下面給出計算僞似然函數的代碼:

def get_pseudo_likelihood_cost(self, updates):
        """Stochastic approximation to the pseudo-likelihood"""

        # index of bit i in expression p(x_i | x_{\i})
        bit_i_idx = theano.shared(value=0, name='bit_i_idx')

        # binarize the input image by rounding to nearest integer
        xi = T.round(self.input)

        # calculate free energy for the given bit configuration
        fe_xi = self.free_energy(xi)

        # flip bit x_i of matrix xi and preserve all other bits x_{\i}
        # Equivalent to xi[:,bit_i_idx] = 1-xi[:, bit_i_idx], but assigns
        # the result to xi_flip, instead of working in place on xi.
        xi_flip = T.set_subtensor(xi[:, bit_i_idx], 1 - xi[:, bit_i_idx])

        # calculate free energy with bit flipped
        fe_xi_flip = self.free_energy(xi_flip)

        # equivalent to e^(-FE(x_i)) / (e^(-FE(x_i)) + e^(-FE(x_{\i})))
        cost = T.mean(self.n_visible * T.log(T.nnet.sigmoid(fe_xi_flip -
                                                            fe_xi)))

        # increment bit_i_idx % number as part of updates
        updates[bit_i_idx] = (bit_i_idx + 1) % self.n_visible

        return cost

主循環

現在已經準備好了訓練網絡需要的所有元素。
在進行訓練之前,讀者應當熟悉函數* tile_raster_images*(見Plotting Samples and Filters)。因爲RBM是生成模型,所以可以將樣本以圖的形式展現。同時,可以畫出RBM的權重,更深刻的理解RBM的工作原理。值得注意的是,圖並不是完整的工作原理,因爲忽略了偏置,並將權重乘以常數(將權重轉換到0-1之間)。
有了這些功能函數,就可以開始訓練RBM,每次訓練後將圖保存本地。使用PCD訓練RBM,可以得到效果更好的生成模型。([Tieleman08])

 # it is ok for a theano function to have no output
    # the purpose of train_rbm is solely to update the RBM parameters
    train_rbm = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size]
        },
        name='train_rbm'
    )

    plotting_time = 0.
    start_time = timeit.default_timer()

    # go through training epochs
    for epoch in range(training_epochs):

        # go through the training set
        mean_cost = []
        for batch_index in range(n_train_batches):
            mean_cost += [train_rbm(batch_index)]

        print('Training epoch %d, cost is ' % epoch, numpy.mean(mean_cost))

        # Plot filters after each training epoch
        plotting_start = timeit.default_timer()
        # Construct image from the weight matrix
        image = Image.fromarray(
            tile_raster_images(
                X=rbm.W.get_value(borrow=True).T,
                img_shape=(28, 28),
                tile_shape=(10, 10),
                tile_spacing=(1, 1)
            )
        )
        image.save('filters_at_epoch_%i.png' % epoch)
        plotting_stop = timeit.default_timer()
        plotting_time += (plotting_stop - plotting_start)

    end_time = timeit.default_timer()

    pretraining_time = (end_time - start_time) - plotting_time

    print ('Training took %f minutes' % (pretraining_time / 60.))

完成RBM訓練後,使用gibbs_vhv函數執行Gibbs採樣。我們不使用隨機初始化,而是根據測試樣本初始化Gibss鏈(也可以根據訓練集合)加速收斂。使用Theano的scan進行1000次迭代,然後畫一次圖。

#################################
    #     Sampling from the RBM     #
    #################################
    # find out the number of test samples
    number_of_test_samples = test_set_x.get_value(borrow=True).shape[0]

    # pick random test examples, with which to initialize the persistent chain
    test_idx = rng.randint(number_of_test_samples - n_chains)
    persistent_vis_chain = theano.shared(
        numpy.asarray(
            test_set_x.get_value(borrow=True)[test_idx:test_idx + n_chains],
            dtype=theano.config.floatX
        )
    )

然後同時創建20條固定鏈進行採樣。構造Theano函數實現一步Gibbs採樣,並根據新的可見樣本更新固定鏈的狀態。迭代使用該函數,每1000步畫一次圖。

  plot_every = 1000
    # define one step of Gibbs sampling (mf = mean-field) define a
    # function that does `plot_every` steps before returning the
    # sample for plotting
    (
        [
            presig_hids,
            hid_mfs,
            hid_samples,
            presig_vis,
            vis_mfs,
            vis_samples
        ],
        updates
    ) = theano.scan(
        rbm.gibbs_vhv,
        outputs_info=[None, None, None, None, None, persistent_vis_chain],
        n_steps=plot_every,
        name="gibbs_vhv"
    )

    # add to updates the shared variable that takes care of our persistent
    # chain :.
    updates.update({persistent_vis_chain: vis_samples[-1]})
    # construct the function that implements our persistent chain.
    # we generate the "mean field" activations for plotting and the actual
    # samples for reinitializing the state of our persistent chain
    sample_fn = theano.function(
        [],
        [
            vis_mfs[-1],
            vis_samples[-1]
        ],
        updates=updates,
        name='sample_fn'
    )

    # create a space to store the image for plotting ( we need to leave
    # room for the tile_spacing as well)
    image_data = numpy.zeros(
        (29 * n_samples + 1, 29 * n_chains - 1),
        dtype='uint8'
    )
    for idx in range(n_samples):
        # generate `plot_every` intermediate samples that we discard,
        # because successive samples in the chain are too correlated
        vis_mf, vis_sample = sample_fn()
        print(' ... plotting sample %d' % idx)
        image_data[29 * idx:29 * idx + 28, :] = tile_raster_images(
            X=vis_mf,
            img_shape=(28, 28),
            tile_shape=(1, n_chains),
            tile_spacing=(1, 1)
        )

    # construct image
    image = Image.fromarray(image_data)
    image.save('samples.png')

結果


參數設置:PCD-15,學習率0.1,塊大小20,迭代次數15。模型訓練耗時122.466分鐘。計算機配置:Intel Xeon E5430 @ 2.66GHz CPU,單線程GotoBLAS。
結果如下:

... loading data
Training epoch 0, cost is  -90.6507246003
Training epoch 1, cost is  -81.235857373
Training epoch 2, cost is  -74.9120966945
Training epoch 3, cost is  -73.0213216101
Training epoch 4, cost is  -68.4098570497
Training epoch 5, cost is  -63.2693021647
Training epoch 6, cost is  -65.99578971
Training epoch 7, cost is  -68.1236650015
Training epoch 8, cost is  -68.3207365087
Training epoch 9, cost is  -64.2949797113
Training epoch 10, cost is  -61.5194867893
Training epoch 11, cost is  -61.6539369402
Training epoch 12, cost is  -63.5465278086
Training epoch 13, cost is  -63.3787093527
Training epoch 14, cost is  -62.755739271
Training took 122.466000 minutes
 ... plotting sample  0
 ... plotting sample  1
 ... plotting sample  2
 ... plotting sample  3
 ... plotting sample  4
 ... plotting sample  5
 ... plotting sample  6
 ... plotting sample  7
 ... plotting sample  8
 ... plotting sample  9

下圖展示濾波器15次迭代後的效果:
15次迭代後濾波器效果
下圖經過訓練後RBM生成的樣本。每行代表負粒子(粉分別從Gibbs鏈採樣),每行都進行了1000次Gibbs採樣。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章