theano學習指南---棧式降噪自編碼SdA(翻譯)

歡迎fork我的github:https://github.com/zhaoyu611/DeepLearningTutorialForChinese

最近在學習Git,所以正好趁這個機會,把學習到的知識實踐一下~ 看完DeepLearning的原理,有了大體的瞭解,但是對於theano的代碼,還是自己擼一遍印象更深。


注意:本節內容認爲讀者已經閱讀之前的 Classifying MNIST digits using Logistic RegressionMultilayer Perceptron章節。此外,本節用到以下Theano函數和概念:T.tanh, shared variables, basic arithmetic ops, T.grad, Random numbers, floatX. 如果你想在GPU上運行,請查看GPU

注意: 本節代碼下載地址:http://deeplearning.net/tutorial/code/SdA.py

棧式降噪自編碼(The Stacked Denoising Autoencoder, SdA)是棧式自編碼的擴展,首先有Vincent提出的。

本教程以之前的降噪自編碼爲基礎,如果您不熟悉編碼,我們建議您先閱讀之前的章節。

棧式自編碼

將降噪自編碼器進行堆棧組成深度網絡:將下層的輸出量作爲上層輸入量。該結構的無監督預訓練是逐層進行的。每層作爲降噪自編碼器進行訓練,目標是輸入量(上層的輸出量)的最小化重構誤差。當完成前k 層訓練,就可以進行第k+1 層訓練,因爲我們可以根據之前的層計算進行該層計算。

完成所有層的預訓練之後,網絡進行第二步訓練:微調。這裏進行有監督微調,因爲我們期望在有監督的測試中最小化預測誤差。首先在網絡的頂層增加一個邏輯迴歸層(輸出層的結果更加精確)。然後使用訓練多感知器的方法訓練整個網絡。此時,我們只考慮每個自編碼器中的編碼部分。這一步是有監督的,所以使用目標類進行訓練。(更多細節參見Multilayer Perceptron )

Theano中有很多降噪自編碼的類,所以上述邏輯很容易實現。棧式降噪自編碼可以看做由兩方面組成:一系列的自編碼器和一個多感知器。預訓練時,使用第一個方面:將模型看做一系列的自編碼器,分別訓練每個自編碼器。在訓練的第二階段,使用第二個方面。兩方面是相互關聯的,原因如下:

  • 多感知器中編碼層和sigmoid層參數共享
  • 多感知器的中間層的輸出是自編碼器的輸入
class SdA(object):
    """Stacked denoising auto-encoder class (SdA)

    A stacked denoising autoencoder model is obtained by stacking several
    dAs. The hidden layer of the dA at layer `i` becomes the input of
    the dA at layer `i+1`. The first layer dA gets as input the input of
    the SdA, and the hidden layer of the last dA represents the output.
    Note that after pretraining, the SdA is dealt with as a normal MLP,
    the dAs are only used to initialize the weights.
    """

    def __init__(
        self,
        numpy_rng,
        theano_rng=None,
        n_ins=784,
        hidden_layers_sizes=[500, 500],
        n_outs=10,
        corruption_levels=[0.1, 0.1]
    ):
        """ This class is made to support a variable number of layers.

        :type numpy_rng: numpy.random.RandomState
        :param numpy_rng: numpy random number generator used to draw initial
                    weights

        :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
        :param theano_rng: Theano random generator; if None is given one is
                           generated based on a seed drawn from `rng`

        :type n_ins: int
        :param n_ins: dimension of the input to the sdA

        :type hidden_layers_sizes: list of ints
        :param hidden_layers_sizes: intermediate layers size, must contain
                               at least one value

        :type n_outs: int
        :param n_outs: dimension of the output of the network

        :type corruption_levels: list of float
        :param corruption_levels: amount of corruption to use for each
                                  layer
        """

        self.sigmoid_layers = []
        self.dA_layers = []
        self.params = []
        self.n_layers = len(hidden_layers_sizes)

        assert self.n_layers > 0

        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        # allocate symbolic variables for the data
        self.x = T.matrix('x')  # the data is presented as rasterized images
        self.y = T.ivector('y')  # the labels are presented as 1D vector of
                                 # [int] labels

self.sigmoid_layers存儲多感知器中的sigmoid層,self.dA_layers 存儲多感知器中的降噪自編碼層。

然後,構造n_layers sigmoid層和n_layers降噪自編碼層,其中n_layers是模型的深度。其中HiddenLayerMultilayer Perceptron中的類,唯一的不同是用logistic函數 這裏寫圖片描述代替tanh非線性函數。連接sigmoid層到多感知器中,並且構造降噪自編碼器。降噪自編碼器的編碼部分可以和sigmoid層共享權矩陣和偏置。

for i in range(self.n_layers):
            # construct the sigmoidal layer

            # the size of the input is either the number of hidden units of
            # the layer below or the input size if we are on the first layer
            if i == 0:
                input_size = n_ins
            else:
                input_size = hidden_layers_sizes[i - 1]

            # the input to this layer is either the activation of the hidden
            # layer below or the input of the SdA if you are on the first
            # layer
            if i == 0:
                layer_input = self.x
            else:
                layer_input = self.sigmoid_layers[-1].output

            sigmoid_layer = HiddenLayer(rng=numpy_rng,
                                        input=layer_input,
                                        n_in=input_size,
                                        n_out=hidden_layers_sizes[i],
                                        activation=T.nnet.sigmoid)
            # add the layer to our list of layers
            self.sigmoid_layers.append(sigmoid_layer)
            # its arguably a philosophical question...
            # but we are going to only declare that the parameters of the
            # sigmoid_layers are parameters of the StackedDAA
            # the visible biases in the dA are parameters of those
            # dA, but not the SdA
            self.params.extend(sigmoid_layer.params)

            # Construct a denoising autoencoder that shared weights with this
            # layer
            dA_layer = dA(numpy_rng=numpy_rng,
                          theano_rng=theano_rng,
                          input=layer_input,
                          n_visible=input_size,
                          n_hidden=hidden_layers_sizes[i],
                          W=sigmoid_layer.W,
                          bhid=sigmoid_layer.b)
            self.dA_layers.append(dA_layer)

現在我們需要做的就是將logistic層添加到sigmoid層的頂部,這樣我們就得到了多感知器。我們使用 Classifying MNIST digits using Logistic Regression中的LogisticRegression 類完成這項工作。

 # We now need to add a logistic layer on top of the MLP
        self.logLayer = LogisticRegression(
            input=self.sigmoid_layers[-1].output,
            n_in=hidden_layers_sizes[-1],
            n_out=n_outs
        )

        self.params.extend(self.logLayer.params)
        # construct a function that implements one step of finetunining

        # compute the cost for second phase of training,
        # defined as the negative log likelihood
        self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
        # compute the gradients with respect to the model parameters
        # symbolic variable that points to the number of errors made on the
        # minibatch given by self.x and self.y
        self.errors = self.logLayer.errors(self.y)

SdA類也提供了逐層生成降噪自編碼訓練函數的方法。該類的返回值是列表形式,其中元素i 代表第i 層執行一步dA訓練。

    def pretraining_functions(self, train_set_x, batch_size):
        ''' Generates a list of functions, each of them implementing one
        step in trainnig the dA corresponding to the layer with same index.
        The function will require as input the minibatch index, and to train
        a dA you just need to iterate, calling the corresponding function on
        all minibatch indexes.

        :type train_set_x: theano.tensor.TensorType
        :param train_set_x: Shared variable that contains all datapoints used
                            for training the dA

        :type batch_size: int
        :param batch_size: size of a [mini]batch

        :type learning_rate: float
        :param learning_rate: learning rate used during training for any of
                              the dA layers
        '''

        # index to a [mini]batch
        index = T.lscalar('index')  # index to a minibatch

To be able to change the corruption level or the learning rate during training, we associate Theano variables with them.

        corruption_level = T.scalar('corruption')  # % of corruption to use
        learning_rate = T.scalar('lr')  # learning rate to use
        # begining of a batch, given `index`
        batch_begin = index * batch_size
        # ending of a batch given `index`
        batch_end = batch_begin + batch_size

        pretrain_fns = []
        for dA in self.dA_layers:
            # get the cost and the updates list
            cost, updates = dA.get_cost_updates(corruption_level,
                                                learning_rate)
            # compile the theano function
            fn = theano.function(
                inputs=[
                    index,
                    theano.In(corruption_level, value=0.2),
                    theano.In(learning_rate, value=0.1)
                ],
                outputs=cost,
                updates=updates,
                givens={
                    self.x: train_set_x[batch_begin: batch_end]
                }
            )
            # append `fn` to the list of functions
            pretrain_fns.append(fn)

        return pretrain_fns

爲了改變訓練時corruption程度和學習率,需要增加Theano中的變量。

corruption_level = T.scalar('corruption')  # % of corruption to use
        learning_rate = T.scalar('lr')  # learning rate to use
        # begining of a batch, given `index`
        batch_begin = index * batch_size
        # ending of a batch given `index`
        batch_end = batch_begin + batch_size

        pretrain_fns = []
        for dA in self.dA_layers:
            # get the cost and the updates list
            cost, updates = dA.get_cost_updates(corruption_level,
                                                learning_rate)
            # compile the theano function
            fn = theano.function(
                inputs=[
                    index,
                    theano.In(corruption_level, value=0.2),
                    theano.In(learning_rate, value=0.1)
                ],
                outputs=cost,
                updates=updates,
                givens={
                    self.x: train_set_x[batch_begin: batch_end]
                }
            )
            # append `fn` to the list of functions
            pretrain_fns.append(fn)

        return pretrain_fns

現在,函數 pretrain_fns[i]參數包括 index和可選參數corruption——corrution程度和lr——學習率。需要注意的是,參數名是Theano中變量名,而不是Python中的變量名(learning_ratecorruption_level),在使用Theano時請牢記。

微調時,我們將多個函數功能統一到一個函數中。功能包括:train_fn, valid_score test_score

def build_finetune_functions(self, datasets, batch_size, learning_rate):
        '''Generates a function `train` that implements one step of
        finetuning, a function `validate` that computes the error on
        a batch from the validation set, and a function `test` that
        computes the error on a batch from the testing set

        :type datasets: list of pairs of theano.tensor.TensorType
        :param datasets: It is a list that contain all the datasets;
                         the has to contain three pairs, `train`,
                         `valid`, `test` in this order, where each pair
                         is formed of two Theano variables, one for the
                         datapoints, the other for the labels

        :type batch_size: int
        :param batch_size: size of a minibatch

        :type learning_rate: float
        :param learning_rate: learning rate used during finetune stage
        '''

        (train_set_x, train_set_y) = datasets[0]
        (valid_set_x, valid_set_y) = datasets[1]
        (test_set_x, test_set_y) = datasets[2]

        # compute number of minibatches for training, validation and testing
        n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
        n_valid_batches //= batch_size
        n_test_batches = test_set_x.get_value(borrow=True).shape[0]
        n_test_batches //= batch_size

        index = T.lscalar('index')  # index to a [mini]batch

        # compute the gradients with respect to the model parameters
        gparams = T.grad(self.finetune_cost, self.params)

        # compute list of fine-tuning updates
        updates = [
            (param, param - gparam * learning_rate)
            for param, gparam in zip(self.params, gparams)
        ]

        train_fn = theano.function(
            inputs=[index],
            outputs=self.finetune_cost,
            updates=updates,
            givens={
                self.x: train_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: train_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='train'
        )

        test_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: test_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: test_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='test'
        )

        valid_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: valid_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],
                self.y: valid_set_y[
                    index * batch_size: (index + 1) * batch_size
                ]
            },
            name='valid'
        )

        # Create a function that scans the entire validation set
        def valid_score():
            return [valid_score_i(i) for i in range(n_valid_batches)]

        # Create a function that scans the entire test set
        def test_score():
            return [test_score_i(i) for i in range(n_test_batches)]

        return train_fn, valid_score, test_score

注意:valid_score * test_score*不是Theano的函數,而是Python函數。整個驗證集合和測試集合都要使用這兩個函數,從而分別產生代價值列表。

整合功能

以下代碼構造了棧式自編碼:

numpy_rng = numpy.random.RandomState(89677)
    print('... building the model')
    # construct the stacked denoising autoencoder class
    sda = SdA(
        numpy_rng=numpy_rng,
        n_ins=28 * 28,
        hidden_layers_sizes=[1000, 1000, 1000],
        n_outs=10
    )

訓練網絡包括兩步:逐層預訓練和微調。
在預訓練時,需要遍歷網絡的所有層。對於每層,需要使用Theano函數執行SGD優化權重從而減小該層的重構誤差。該函數用於訓練集合,訓練次數由變量pretraining_epochs決定。

s
        for epoch in range(pretraining_epochs):
            # go through the training set
            c = []
            for batch_index in range(n_train_batches):
                c.append(pretraining_fns[i](index=batch_index,
                         corruption=corruption_levels[i],
                         lr=pretrain_lr))
            print('Pre-training layer %i, epoch %d, cost %f' % (i, epoch, numpy.mean(c)))

    end_time = timeit.default_timer()

    print(('The pretraining code for file ' +
           os.path.split(__file__)[1] +
           ' ran for %.2fm' % ((end_time - start_time) / 60.)), file=sys.stderr)

微調過程類似多感知器。唯一的不同是它使用給定函數build_finetune_functions

運行代碼

執行代碼:

python code/SdA.py

使用默認參數,程序進行15次預訓練,每個batchsize爲1, 第一層corruption level爲0.1,第二層corruption level爲0.2,第三層corruption level爲0.3。預訓練學習率爲0.001,微調率爲0.1。預訓練過程耗時585.01分鐘,平均每次訓練13分鐘。36次微調過程耗時444.2分鐘,平均每次12.34分鐘。最終驗證得分爲1.39%,測試得分爲1.3% 。實驗平臺爲 an Intel Xeon E5430 @ 2.66GHz CPU, with a single-threaded GotoBLAS。

提示和技巧

減少運行時間的一個方法(假設用戶有足夠的內存):對網絡的前k1 層進行數據轉換。例如,首先訓練第一層dA。完成訓練後,計算數據集合中每個點的隱層單元的值,並存儲爲新的集合。利用新的集合,採用同樣方法計算第二層,第三層等等。此時,可以看到dA是獨立訓練的,他們完成了輸入數據的非線性轉換。當完成所有的dA訓練,就可以開始微調模型啦~!

參考文獻

[1] http://deeplearning.net/tutorial/SdA.html#sda

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章