class mxnet.gluon.Trainer(params, optimizer, optimizer_params = None, kvstore = ‘device’, compression_params = None, updata_on_kvstore = None)
該類將一個優化器應用於一套參數,trainer應該與autograd一起使用.
參數
- params 參數字典,可以通過net.collect_params()指定需要優化的參數
- optimizer string或者mxnet.optimizer.Optimizer類的對象。使用字符串如’sgd’
- optimizer_params 字典,key會傳遞給optimizer,比如
learning_rate, wd(weight decay), clip_gradient, lr_scheduler
等,更多可以參考optimizer的constructor。 - kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on mxnet.kvstore.create for more information.
compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression. - update_on_kvstore (bool, default None) – Whether to perform parameter updates on kvstore. If None, then trainer will choose the more suitable option depending on the type of kvstore. If the update_on_kvstore argument is provided, environment variable MXNET_UPDATE_ON_KVSTORE will be ignored.
- Properties –
---------- – - learning_rate (float) – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
可以用trainer.learning_rate查看當前的learning rate
函數
- allreduce_grads():對於不同的參數,在不同的環境下減小梯度
- load_states(fname):Loads trainer states (e.g. optimizer, momentum) from a file.
- save_states(fname):Saves trainer states (e.g. optimizer, momentum) to a file.
- set_learning_rate(lr):設置一個新的 lr
- step(batch_size, ignore_stale_grad = false):應該在
autograd.backward()
之後,autograd.record()
之外。對於普通的參數更新,可以使用step()
,內部自動調用allreduce_grads()
和update
。但是如果需要某些特定的轉換,比如梯度剪切,需要手動調用上述二者。 - update(batch_size, ignore_stale_grad = False):更新一次參數。
Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
class mxnet.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=0.01, lr_scheduler=None, sym=None, begin_num_update=0, multi_precision=False, param_dict=None)
class mxnet.lr_scheduler.LRScheduler(base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode=‘linear’)
Base class of a learning rate scheduler.
A scheduler returns a new learning rate based on the number of updates that have been performed.
Parameters:
- base_lr (float, optional) – The initial learning rate.
- warmup_steps (int) – number of warmup steps used before this scheduler starts decay
- warmup_begin_lr (float) – if using warmup, the learning rate from which it starts warming up
- warmup_mode (string) – warmup can be done in two modes. ‘linear’ mode gradually increases lr with each step in equal increments ‘constant’ mode keeps lr at warmup_begin_lr for warmup_steps
class mxnet.lr_scheduler.FactorScheduler(step, factor=1, stop_factor_lr=1e-08, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode=‘linear’)
Reduce the learning rate by a factor for every n steps.
It returns a new learning rate by:
base_lr * pow(factor, floor(num_update/step))
Parameters:
- step (int) – Changes the learning rate for every n updates.
- factor (float, optional) – The factor to change the learning rate.
- stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.
除此以外還有
class mxnet.lr_scheduler.MultiFactorScheduler(
step,
factor=1,
base_lr=0.01,
warmup_steps=0,
warmup_begin_lr=0,
warmup_mode='linear'
)
class mxnet.lr_scheduler.PolyScheduler(
max_update,
base_lr=0.01,
pwr=2,
final_lr=0,
warmup_steps=0,
warmup_begin_lr=0,
warmup_mode='linear'
)
class mxnet.lr_scheduler.CosineScheduler(
max_update,
base_lr=0.01,
final_lr=0,
warmup_steps=0,
warmup_begin_lr=0,
warmup_mode='linear'
)