pytorch中Schedule与warmup_steps的用法

原創

2020-02-28 03:35

lr_scheduler相关

lr_scheduler = WarmupLinearSchedule(optimizer, warmup_steps=args.warmup_steps, t_total=num_train_optimization_steps)

其中args.warmup_steps可以认为是耐心系数
num_train_optimization_steps为模型参数的总更新次数
一般来说：

    num_train_optimization_steps = int(total_train_examples / args.train_batch_size / args.gradient_accumulation_steps)

Schedule用来调节学习率，拿线性变换调整来说，下面代码中，step是last_epoch。

    def lr_lambda(self, step):
        # 线性变换，返回的是某个数值x，然后返回到类LambdaLR中，最终返回old_lr*x
        if step < self.warmup_steps: # 调低学习率
            return float(step) / float(max(1, self.warmup_steps))
        # 调高学习率
        return max(0.0, float(self.t_total - step) / float(max(1.0, self.t_total - self.warmup_steps)))

在实际运行中，lr_scheduler.step()先将lr初始化为0. 在第一次参数更新时，此时step=1，lr由0变为初始值initial_lr；在第二次更新时，step=2，上面代码中生成某个实数alpha，新的lr=initial_lralpha；在第三次更新时，新的lr是在initial_lr基础上生成，即新的lr=initial_lralpha。其中warmup_steps可以认为是lr调整的耐心系数。
2. gradient_accumulation_steps相关
gradient_accumulation_steps通过累计梯度来解决本地显存不足问题。
假设原来的batch_size=6，样本总量为24，gradient_accumulation_steps=2
那么参数更新次数=24/6=4
现在，减小batch_size=6/2=3，参数更新次数不变=24/3/2=4
在梯度反传时，每gradient_accumulation_steps次进行一次梯度更新，之前照常利用loss.backward()计算梯度。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pytorch中Schedule与warmup_steps的用法

python使用github上的包

論文筆記： Medical Exam Question Answering with Large-scale Reading Comprehension

docker容器內uwsgi及nginx服務部署

pytorch中修改現有層及自定義層

git bash新增缺失命令

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結