MXNet學習2——Module

概要

本節介紹MXNet中一個重要的概念,Module(模塊),涉及到的內容包括簡單的神經網絡,模型的構造,參數的訓練以及更新,模型預測等。通過本節大概瞭解Module中一些重要的屬性和方法,對於MXNet的機制有初步的瞭解,需要注意的是本節代碼跳過了官方tutorial中存儲部分,有些說明部分沒有摘錄,感興趣請參考官方的教程
http://mxnet.io/tutorials/python/module.html

正文

import mxnet as mx
from data_iter import SyntheticData

###神經網絡連接的構造
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1',num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type='relu')
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')

data = SyntheticData(10, 128)
mx.viz.plot_network(net).view()

此處本機測試圖片沒有softmax_label

###一個最基本的module需要包含一個Symbol,context定義在哪種設備上,data_names和label_names用來尋找訓練數據以及對應的標註。需要注意的是名稱十分重要!
mod = mx.mod.Module(symbol=net,context=mx.cpu(), data_names=['data'], label_names=['softmax_label'])

import logging
logging.basicConfig(level=logging.INFO)

batch_size=32
###一個簡單的fit即可完成所有的訓練,結束之後參數均已經訓練完畢,接下來就可以預測了。不需要額外的bind和初始化,詳見API。'sgd','acc'分別表示使用隨機梯度下降算法和準確率的評價標準,num_epoch表示迭代的次數
mod.fit(data.get_iter(batch_size), eval_data=data.get_iter(batch_size), optimizer_params={'learning_rate':0.1}, optimizer='sgd', eval_metric='acc', num_epoch=5)

###predict返回所有的預測值,batch_size=32,num_batchs=10,label=10
y = mod.predict(data.get_iter(batch_size))
print('shape of predict: %s' % (y.shape,))

‘shape of predict: (320L, 10L)’

# 對於iter_predict,官網API給出說明:pred is a list of outputs from the module # i_batch is a integer # batch is the data batch from the data iterator

###preds[0]是個二維數組,第一維是batch_size,即預測的個數,第二維對應的是預測的值,每個值對應的概率,0~9每個數字對應的概率,所以shape=(batch_size,10)
for preds,i_batch,batch in mod.iter_predict(data.get_iter(batch_size)):
    pred_label = preds[0].asnumpy().argmax(axis=1)
    label = batch.label[0].asnumpy().astype('int32')
    print('batch %d, accuracy %f' % (i_batch, float(sum(pred_label==label))/len(label)))

batch 0, accuracy 0.062500
batch 1, accuracy 0.156250
batch 2, accuracy 0.187500
batch 3, accuracy 0.000000
batch 4, accuracy 0.062500
batch 5, accuracy 0.000000
batch 6, accuracy 0.062500
batch 7, accuracy 0.093750
batch 8, accuracy 0.062500
batch 9, accuracy 0.062500

###mx.metric.create定義一個評價指標,mse表示均方誤差,acc表示準確率
metric = [mx.metric.create('mse'), mx.metric.create('acc')]
mod.score(data.get_iter(batch_size), metric)
print([metric[0].get(),metric[1].get()])

[(‘mse’, 27.438781929016113), (‘accuracy’, 0.115625)]

上述介紹的是Module最基本的訓練操作,下面介紹的是複雜的使用方法

一個Module具有如下幾種狀態

  • Initial state:內存還沒有分配,還沒有準備好進行計算
  • Binded:輸入、輸出、參數的維度都已經知道,內存已經分配,準備好計算了
  • Parameter initialized:參數已經初始化,如果沒有初始化將可能得到未定義的結果
  • Optimizer installed:優化方法已經確定;確定後,參數值在梯度計算後根據優化算法進行更新(forward-backward).

mod = mx.mod.Module(symbol=net)

train_iter = data.get_iter(batch_size)
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
###使用bind的話,init_params和init_optimizer是必須要的

###Xavier是一種初始化方式類似高斯,這裏不細講
mod.init_params(initializer=mx.init.Xavier(magnitude=2.))

mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate',0.1),))

###metric至少需要一個,否則參數更新沒有依據
metric = mx.metric.create('acc')

for batch in train_iter:
     mod.forward(batch,is_train=True)
     mod.update_metric(metric, batch.label)
     mod.backward()
     mod.update()
"""
forward是計算輸出,從前往後直到output,is_train表示是否訓練,或者說參數是否更新,不更新訓練就沒有意義了,默認是None,這時值就取self.for_training,而self.for_training在bind時默認設置爲True了
update_metric更新(累加)評價指標,比如準確率,label來確定是哪一種
backward計算梯度
update更新參數,比如w,b
"""
print(metric.get())

(‘accuracy’, 0.59375)

除了上述的操作,Module還提供大量有用的信息,以下不涉及複雜操作,且簡單易懂就不翻譯了

basic names:

  • data_names: list of string indicating the names of the required data.
  • output_names: list of string indicating the names of the outputs.

state information

  • binded: bool, indicating whether the memory buffers needed for
    computation has been allocated.
  • for_training: whether the module is binded for training (if binded).
  • params_initialized: bool, indicating whether the parameters of this modules has been initialized.
  • optimizer_initialized: bool, indicating whether an optimizer is defined and initialized.
  • inputs_need_grad: bool, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.

input/output information

  • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.
  • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.
  • output_shapes: a list of (name, shape) for outputs of the module. parameters (for modules with parameters)
  • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters.
  • get_outputs(): get outputs of the previous forward operation.
  • get_input_grads(): get the gradients with respect to the inputs
    computed in the previous backward operation.
print((mod.data_shapes, mod.label_shapes, mod.output_shapes))
print(mod.get_params())


([DataDesc[data,(32, 128),<type 'numpy.float32'>,NCHW]], [DataDesc[softmax_label,(32,),<type 'numpy.float32'>,NCHW]], [('softmax_output', (32, 10L))])
({'fc2_bias': <NDArray 10 @cpu(0)>, 'fc2_weight': <NDArray 10x64 @cpu(0)>, 'fc1_bias': <NDArray 64 @cpu(0)>, 'fc1_weight': <NDArray 64x128 @cpu(0)>}, {})
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章