pytorch訓練模型的保存與恢復

原創

Mango_house

2020-07-03 22:06

模型訓練後，需要保存到文件，以供測試和部署；或，繼續之前的訓練狀態.

https://pytorch.org/tutorials/beginner/saving_loading_models.html

1. Best Practices

https://github.com/pytorch/pytorch/blob/761d6799beb3afa03657a71776412a2171ee7533/docs/source/notes/serialization.rst

主要有兩種模型序列化保存和加載恢復的方法.
1.1 方法 M1 - 推薦

只保存和加載恢復模型參數(model parameters)：

import torch 

# 保存
torch.save(the_model.state_dict(), PATH)

# 恢復
the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))

該方法需要自己另導入模型的網絡結構信息.

1.2 方法 M2

同時保存模型的參數和網絡結構信息：

import torch

# 保存
torch.save(the_model, PATH)

# 恢復
the_model = torch.load(PATH)

該方法保存的數據綁定着特定的 classes 和所用的確切目錄結構. ‘ # 因此，再加載後經過許多重構後，可能會被打亂.

2. Stackoverflow 回答

From： Best way to save a trained model in PyTorch?

根據應用場景，選擇模型保存和加載恢復方法.
場景 C1 - 模型保存自用於推斷

自己保存模型，自己恢復模型，然後，修改模型爲 evaluation 模式.

這是因爲，默認情況時，網絡模型訓練時往往有 BatchNorm 和 Dropout 網絡層.

# 模型保存
torch.save(model.state_dict(), filepath)

# 模型恢復
model.load_state_dict(torch.load(filepath))
model.eval()

場景 C2 - 模型保存用於恢復訓練

模型訓練時，保持其訓練狀態. 需要同時保存模型model，優化器狀態(optimizer state)，epochs，score 等.

# 模型保存
state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),


     ...
    }
    torch.save(state, filepath)
# 加載模型，恢復訓練
    model.load_state_dict(state['state_dict'])
    optimizer.load_state_dict(state['optimizer'])
# 由於是要繼續訓練，則不需要調用 model.eval().

場景 C3 - 模型保存用於分享他用

TensorFlow 中，可以創建一個 .pb 文件，同時定義了網絡結構和模型權重. 這種方式非常便利，尤其在使用 Tensorflow serve.

類似地，Pytorch 中，

# 模型保存
torch.save(model, filepath)

# 模型加載
model = torch.load(filepath)

這種方法仍不夠穩定，因爲 Pytorch 仍在版本更新變化中. 所以不推薦.
3. 實例

From 在PyTorch中如何保存和恢復模型並查看參數

import torch

state = {
    'epoch': epoch,
    'state_dict': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    'best_score': best_score,
    ...
}

torch.save(state, '/path/to/checkpoint.pth' )

if resume:
    if os.path.isfile(resume_file):
        print("=> loading checkpoint '{}'".format(resume_file))
        checkpoint = torch.load(resume_file)
        start_epoch = checkpoint['epoch']
        best_score = checkpoint['best_score']
        model.load_state_dict(checkpoint['state_dict'])

另一個相對完整的例子

#saving
torch.save({
            'epoch': epoch + 1,
            'arch': args.arch,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
        }, 'checkpoint.tar' )

#loading

if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.evaluate, checkpoint['epoch']))

模型網絡層的參數可視化：

import torch.nn as nn
from collections import OrderedDict

# 網絡結構
model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1,32,5)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(32,64,5)),
('relu2', nn.ReLU())
]))
print(model)

# 網絡參數可視化
params=model.state_dict()
for k,v in params.items():
    print(k) # 網絡變量名
    print(params['conv1.weight']) # conv1 層權重 weight
    print(params['conv1.bias']) # conv1 層偏置 bias

參考文獻：
https://www.aiuai.cn/aifarm657.html
https://pytorch.org/tutorials/beginner/saving_loading_models.html
https://byjiang.com/2017/06/05/How_To_Save_And_Restore_Model/
https://www.pytorchtutorial.com/pytorch-note5-save-and-restore-models/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pytorch訓練模型的保存與恢復

這種嵌套字典類型的數據，我想把它讀取到df裏，如何操作？

微調真的能讓LLM學到新東西嗎:引入新知識可能讓模型產生更多的幻覺

iNeuOS工業互聯網操作系統，增加電力IEC104協議

微服務實踐k8s&dapr開發部署實驗（3）訂閱發佈

ubunbtu add Environment variables to system（ubuntu添加環境變量到系統的三種方法）

pytorch,torch,pytorchnet,torchnet的區分

pytorch訓練模型的保存與恢復

python筆記-1

tensorflow的安裝與測試（ubuntu下兩種方法介紹）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結