mxnet學習(8):Trainer

class mxnet.gluon.Trainer(params, optimizer, optimizer_params = None, kvstore = ‘device’, compression_params = None, updata_on_kvstore = None)

參考:http://mxnet.incubator.apache.org/versions/master/api/python/gluon/gluon.html?highlight=trainer#mxnet.gluon.Trainer

該類將一個優化器應用於一套參數,trainer應該與autograd一起使用.

參數

  • params 參數字典,可以通過net.collect_params()指定需要優化的參數
  • optimizer string或者mxnet.optimizer.Optimizer類的對象。使用字符串如’sgd’
  • optimizer_params 字典,key會傳遞給optimizer,比如learning_rate, wd(weight decay), clip_gradient, lr_scheduler等,更多可以參考optimizer的constructor。
  • kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on mxnet.kvstore.create for more information.
    compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
  • update_on_kvstore (bool, default None) – Whether to perform parameter updates on kvstore. If None, then trainer will choose the more suitable option depending on the type of kvstore. If the update_on_kvstore argument is provided, environment variable MXNET_UPDATE_ON_KVSTORE will be ignored.
  • Properties –
    ---------- –
  • learning_rate (float) – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.

可以用trainer.learning_rate查看當前的learning rate

函數

  • allreduce_grads():對於不同的參數,在不同的環境下減小梯度
  • load_states(fname):Loads trainer states (e.g. optimizer, momentum) from a file.
  • save_states(fname):Saves trainer states (e.g. optimizer, momentum) to a file.
  • set_learning_rate(lr):設置一個新的 lr
  • step(batch_size, ignore_stale_grad = false):應該在autograd.backward()之後,autograd.record()之外。對於普通的參數更新,可以使用step(),內部自動調用allreduce_grads()update。但是如果需要某些特定的轉換,比如梯度剪切,需要手動調用上述二者。
  • update(batch_size, ignore_stale_grad = False):更新一次參數。

Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).

ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.

class mxnet.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=0.01, lr_scheduler=None, sym=None, begin_num_update=0, multi_precision=False, param_dict=None)

http://mxnet.incubator.apache.org/versions/master/api/python/optimization/optimization.html#mxnet.optimizer.Optimizer

class mxnet.lr_scheduler.LRScheduler(base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode=‘linear’)

Base class of a learning rate scheduler.

A scheduler returns a new learning rate based on the number of updates that have been performed.

Parameters:

  • base_lr (float, optional) – The initial learning rate.
  • warmup_steps (int) – number of warmup steps used before this scheduler starts decay
  • warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
  • warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps

class mxnet.lr_scheduler.FactorScheduler(step, factor=1, stop_factor_lr=1e-08, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode=‘linear’)

Reduce the learning rate by a factor for every n steps.

It returns a new learning rate by:

base_lr * pow(factor, floor(num_update/step))

Parameters:

  • step (int) – Changes the learning rate for every n updates.
  • factor (float, optional) – The factor to change the learning rate.
  • stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.

除此以外還有

class mxnet.lr_scheduler.MultiFactorScheduler(
                                                step, 
                                                factor=1, 
                                                base_lr=0.01,
                                                warmup_steps=0,
                                                warmup_begin_lr=0,
                                                warmup_mode='linear'
                                            )
                                            
class mxnet.lr_scheduler.PolyScheduler(
                                        max_update,
                                        base_lr=0.01,
                                        pwr=2,
                                        final_lr=0,
                                        warmup_steps=0,
                                        warmup_begin_lr=0,
                                        warmup_mode='linear'
                                      )
class mxnet.lr_scheduler.CosineScheduler(
                                        max_update,
                                        base_lr=0.01,
                                        final_lr=0,
                                        warmup_steps=0,
                                        warmup_begin_lr=0,
                                        warmup_mode='linear'
                                        )                                      
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章