使用ONNX和Caffe2對PyTorch訓練的模型進行推斷應用

在2016年10月Face book發佈PyTorch之後，由於其面向開發者友好，它很快獲得了廣泛應用。得益於良好的Python接口，它很適合用於研究和製作快速原型。在PyTorch中調試您的代碼和測試網絡模型架構可以非常容易地完成。然而，當它投入生產時，谷歌的Tensorflow領先。使用TensorFlow服務部署機器學習模型非常容易。這在2018年5月發生了變化，當時PyTorch與Caffe2集成，並獲得了完整的生產流程。這是Face book使用的生產流程，他們使用PyTorch訓練模型，並使用Caffe2部署模型。注：Caffe2不應與Caffe混淆。它們是兩個完全不同的框架。caffe在5年前很受歡迎，但現在它似乎已經失寵了。

1、Facebook的深度學習工作流

對於開發人員和工程師來說，需要在易於使用的語言中使用一個易於使用的開發框架來提高開發效率。PyTorch提供了這個解決方案。然而，在生產部署中，Facebook需要以難以想象的規模運作。在這樣的環境所追求的，是計算效率和性能，而不是開發人員的易用性。Facebook對性能問題的回答是Caffe2，這是一個非常快速高效的框架，用C++語言編寫，用於生產。 Caffe2於2017年4月由Facebook推出。它是通用的，Caffe2模型可以部署在許多平臺，包括移動端。Facebook在Caffe2的應用已經部署在超過10億部iOS和Android手機上。 Face book保持PyTorch和Caffe2之間的互操作性。在2019年5月，隨着PyTorch1.1支持TensorBoard的發佈，可視化和調試變得非常易用。

2、開放神經網絡交換（ONN X）

開放神經網絡交換（ONN X）是一種開放的格式，用戶可以在不同的框架之間移動深度學習模型。這種開放格式最初是由Face book和微軟提出的，但現在是一個被廣泛接受的行業標準。對於PyTorch模型的部署，最常見的方法是將它們轉換爲ONNX格式，然後使用Caffe2部署導出的ON NX模型。在我們的上一篇文章中，我們描述瞭如何在PyTorch中訓練圖像分類器並進行推理。模型保存爲.pt或.pth文件。在這篇文章中，我們將解釋如何將經過訓練的PyTorch模型轉換爲ONNX模型，並在Caffe2中進行推理。我們還將使用PyTorch模型和ONNX模型來檢驗推理結果的異同。我們使用PyTorch1.4.0+cpu 和onnx1.6.0和Python3.6進行這項工作。

3、PyTorch to ONNX

讓我們看看如何將PyTorch.pt模型導出到ONNX。

 Export an ONNX model from a PyTorch .pt model

import torch.onnx
# Loading the input PyTorch model and mapping the tensors to CPU
device = torch.device('cpu')
model = torch.load('caltech_10_model_8.pt', map_location=device)

# Generate a dummy input that is consistent with the network's arhitecture
dummy_input = torch.randn(1, 3, 224, 224)

# Export into an ONNX model using the PyTorch model and the dummy input
torch.onnx.export(model, dummy_input, "animals_caltech.onnx",  export_params=True, keep_initializers_as_inputs=True)

由於我們將使用Caffe2在CPU中進行推理，我們將設備設置爲“CPU”，並將Py Torch模型映射到CPU。然後，我們需要做一個虛擬輸入，以適應網絡結構的輸入。最後，導出函數的參數包含PyTorch模型、虛擬輸入和目標ON NX文件。keep_initializers_as_inputs=True 非常重要，不可缺少，否則，caffe2加載導出的模型時會報如下錯誤。

>>> rep = backend.prepare(model)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/caffe2/python/onnx/backend.py", line 713, in prepare
    init_net, predict_net = cls._onnx_model_to_caffe2_net(model, device, opset_version, False)
  File "/usr/local/lib/python3.6/dist-packages/caffe2/python/onnx/backend.py", line 876, in _onnx_model_to_caffe2_net
    onnx_model = onnx.utils.polish_model(onnx_model)
  File "/usr/local/lib/python3.6/dist-packages/onnx/utils.py", line 21, in polish_model
    model = onnx.optimizer.optimize(model)
  File "/usr/local/lib/python3.6/dist-packages/onnx/optimizer.py", line 55, in optimize
    optimized_model_str = C.optimize(model_str, passes)
IndexError: _Map_base::at

請注意，對於安裝Caffe2，目前預先構建的二進制文件沒有CUDA對Mac、Ubuntu和CentOS的支持。所有其他平臺或着需要CUDA支持都需要從源代碼編譯。在本例中，我們在Ubuntu18.04中測試了CPU推斷。如果您想使用CUDA版本的ONNX模型，則需要從源構建Caffe2。

4、使用ONNX在Caffe2中的推斷

接下來，我們現在可以在各種設備中部署我們的ON NX模型，並在Caffe2中進行推理。

# Inference in Caffe using the ONNX model

import caffe2.python.onnx.backend as backend
import onnx

from torchvision import transforms
from PIL import Image
import time
import numpy as np

# First load the onnx model
model = onnx.load("animals_caltech.onnx")

# Prepare the backend
rep = backend.prepare(model, device="CPU")

# Transform the image
transform = transforms.Compose([
        transforms.Resize(size=224),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])
          
# Load and show the image
test_image_name = "caltech_10/test/zebra/250_0091.jpg"
test_image = Image.open(test_image_name)

# Apply the transformations to the input image and convert it into a tensor
test_image_tensor = transform(test_image)

# Make the input image ready to be input as a batch of size 1
test_image_tensor = test_image_tensor.view(1, 3, 224, 224)

time_start = time.time()
# Convert the tensor to numpy array
np_image = test_image_tensor.numpy()

# Pass the numpy array to run through the ONNX model
outputs = rep.run(np_image.astype(np.float32))
time_end = time.time()
print("Local CPU Inference time using ONNX model : ", time_end - time_start)

# Dictionary with class name and index
idx_to_class = {0: 'bear   ', 1: 'chimp  ', 2: 'giraffe', 3: 'gorilla', 4: 'llama  ', 5: 'ostrich', 6: 'porcupine', 7: 'skunk  ', 8: 'triceratops', 9: 'zebra  '}

ps = torch.exp(torch.from_numpy(outputs[0]))
topk, topclass = ps.topk(10, dim=1)
for i in range(10):
    print("Prediction", '{:2d}'.format(i+1), ":", '{:11}'.format(idx_to_class[topclass.cpu().numpy()[0][i]]), ", Class Id : ", topclass[0][i].numpy(), " Score: ", topk.cpu().detach().numpy()[0][i])

結果如下：

Local CPU Inference time using ONNX model :  0.25966334342956543
Prediction  1 : zebra       , Class Id :  9  Score:  0.9993488
Prediction  2 : triceratops , Class Id :  8  Score:  0.0002103801
Prediction  3 : giraffe     , Class Id :  2  Score:  0.0001857687
Prediction  4 : ostrich     , Class Id :  5  Score:  9.373466e-05
Prediction  5 : skunk       , Class Id :  7  Score:  4.9951355e-05
Prediction  6 : llama       , Class Id :  4  Score:  4.8880072e-05
Prediction  7 : chimp       , Class Id :  1  Score:  2.3821109e-05
Prediction  8 : porcupine   , Class Id :  6  Score:  1.8212046e-05
Prediction  9 : bear        , Class Id :  0  Score:  1.0552433e-05
Prediction 10 : gorilla     , Class Id :  3  Score:  9.918323e-06

>>> import caffe2.python.onnx.backend as backend
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/caffe2/python/onnx/backend.py", line 28, in <module>
    from caffe2.python import core, workspace, rnn_cell, gru_cell
  File "/usr/local/lib/python3.6/dist-packages/caffe2/python/core.py", line 9, in <module>
    from past.builtins import basestring
ModuleNotFoundError: No module named 'past'

如果出現如上錯誤，請使用

pip3 install future

我們加載ONNX模型並將其與設備信息一起傳遞給Caffe2。在本例中，它需要CPU，因爲我們在生成ONNX模型時是爲CPU導出的。然後，我們可以讀取輸入測試圖像，調整其大小，使其寬和高中較小的長度爲224，在調整大小時保持縱橫比。中心224×224圖像被裁剪出來並轉換成張量。此步驟將值轉換爲0-1的範圍。然後，使用ImageNet平均值和標準差對圖像進行正則化。按照輸入[通道]=（輸入[通道]-平均[通道]）/std[通道）進行計算。然後使圖像張量看起來像一批圖像，因爲網絡模型要求輸入一批圖像。然後將張量轉換爲Float32的numpy數組，並在Caffe2中運行加載的模型。模型的輸出是對數概率的形式。我們以他們的指數來獲得實際的分數，對分數進行排序，並將分數最高的結果作爲我們對輸入測試圖像的預測。我們打印所有10個類的分數，按降序排列，以便我們可以直接使用PyTorch模型進行推理時計算的分數（就像我們以前的帖子一樣），與使用Caffe2中的ONNX模型進行推理計算的分數進行比較。

5、PyTorch使用.pt模型進行推理

pytorch使用原模型進行推理如下：

import torch
from torchvision import transforms
from PIL import Image
import time
import numpy as np

# Loading the input PyTorch model and mapping the tensors to CPU
device = torch.device('cpu')
model = torch.load('caltech_10_model_8.pt', map_location=device)

# Transform the image
transform = transforms.Compose([
        transforms.Resize(size=224),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])
          
# Load and show the image
test_image_name = "caltech_10/test/zebra/250_0091.jpg"
test_image = Image.open(test_image_name)
display(test_image)

# Apply the transformations to the input image and convert it into a tensor
test_image_tensor = transform(test_image)

# Make the input image ready to be input as a batch of size 1
test_image_tensor = test_image_tensor.view(1, 3, 224, 224)

# Dictionary with class name and index
idx_to_class = {0: 'bear   ', 1: 'chimp  ', 2: 'giraffe', 3: 'gorilla', 4: 'llama  ', 5: 'ostrich', 6: 'porcupine', 7: 'skunk  ', 8: 'triceratops', 9: 'zebra  '}

time_start = time.time()

with torch.no_grad():
    model.eval()
    # Model outputs log probabilities
    out = model(test_image_tensor)

time_end = time.time()

print("Local CPU Inference time  using Pytorch model : ", time_end - time_start)


ps = torch.exp(out)
topk, topclass = ps.topk(10, dim=1)
for i in range(10):
    print("Prediction", '{:2d}'.format(i+1), ":", '{:11}'.format(idx_to_class[topclass.cpu().numpy()[0][i]]), ", Class Id : ", topclass[0][i].numpy(), " Score: ", topk.cpu().detach().numpy()[0][i])

結果如下：

Local CPU Inference time  using Pytorch model :  0.37128472328186035

Prediction  1 : zebra       , Class Id :  9  Score:  0.9993488
Prediction  2 : triceratops , Class Id :  8  Score:  0.0002103801
Prediction  3 : giraffe     , Class Id :  2  Score:  0.00018576835
Prediction  4 : ostrich     , Class Id :  5  Score:  9.373448e-05
Prediction  5 : skunk       , Class Id :  7  Score:  4.9951497e-05
Prediction  6 : llama       , Class Id :  4  Score:  4.8880025e-05
Prediction  7 : chimp       , Class Id :  1  Score:  2.3821109e-05
Prediction  8 : porcupine   , Class Id :  6  Score:  1.8212062e-05
Prediction  9 : bear        , Class Id :  0  Score:  1.0552433e-05
Prediction 10 : gorilla     , Class Id :  3  Score:  9.918323e-06

6、結果對比

2種模型的準確度幾乎完全一致，但caffe具有明顯的速度優勢。在Facebook和其他公司的移動和大規模應用程序中，ONNX模型已廣泛部署在Caffe2運行時。

參考連接：https://www.learnopencv.com/pytorch-model-inference-using-onnx-and-caffe2/

使用ONNX和Caffe2對PyTorch訓練的模型進行推斷應用

1、Facebook的深度學習工作流

2、開放神經網絡交換（ONN X）

3、PyTorch to ONNX

4、使用ONNX在Caffe2中的推斷

5、PyTorch使用.pt模型進行推理

6、結果對比

SQL優化-20231016

在Pytorch中使用Mask R-CNN進行實例分割

PyTorch中使用遷移訓練（Transfer Learning）進行圖像分類

一個卷積神經網絡(CNN)中包含的張量大小和參數個數的計算

基於Dlib的人臉識別系統

使用opencv進行疲勞監測

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結