在生產環境中基於PyTorch的C++API運行模型-以圖像分類爲例

背景

生產環境多數是使用java或者C++,本文將介紹在C++中加載PyTorch模型,執行生產環境下的推理。因此,本文的重點在於C++中如何加載模型,並進行推理預測操作,而不是模型的設計和訓練。
可以查看官方提供的說明 https://pytorch.org/tutorials/advanced/cpp_export.html#

TorchScript簡介

TorchScript是PyTorch模型的一種中間形式,可以在高性能環境(例如C ++)中運行。

PyTorch中如何創建基本模型

PyTorch中創建一個模塊包含:
(1)構造函數,爲模塊調用做準備
(2)參數和子模塊,由構造函數初始化,可以由模塊在調用期間使用
(3)forward函數,調用模塊時運行的代碼

一個簡單示例如下:

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()

    def forward(self, x, h):
        new_h = torch.tanh(x + h)
        return new_h, new_h

my_cell = MyCell()
x = torch.rand(3, 4)
h = torch.rand(3, 4)
print(my_cell(x, h))

輸出結果:

(tensor([[0.6454, 0.7223, 0.8207, 0.1638],
        [0.6929, 0.7719, 0.9481, 0.6845],
        [0.7689, 0.8348, 0.8925, 0.3200]]), tensor([[0.6454, 0.7223, 0.8207, 0.1638],
        [0.6929, 0.7719, 0.9481, 0.6845],
        [0.7689, 0.8348, 0.8925, 0.3200]]))

以上示例,我們基於torch.nn.Module創建了一個類MyCell,並定義了構造函數,這裏的構造函數僅調用了super函數。
super()函數是用於調用父類(超類)的一個方法。super是用來解決多重繼承問題的,直接用類名調用父類方法在使用單繼承的時候沒問題,但是如果使用多繼承,會涉及到查找順序、重複調用等種種問題。同時,我們還定義了forward函數,這裏的forward函數輸入是2個參數,返回2個結果。該forward函數的實際內容並不是很重要,但是它是一種僞的RNN單元,即該函數真實場景應用於循環。

我們進一步改動上述MyCell類,在原有基礎上增加一個self.linear成員屬性(是一個函數),並在forward函數中調用該成員。torch.nn.Linear是PyTorch中的一個標準模塊,如此便完成了模塊的嵌套組合。

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.linear(x) + h)
        return new_h, new_h

my_cell = MyCell()
x = torch.rand(3, 4)
h = torch.rand(3, 4)
print(my_cell)
print(my_cell(x, h))

輸出結果:

MyCell(
  (linear): Linear(in_features=4, out_features=4, bias=True)
)
(tensor([[ 0.6286, -0.1987,  0.2962,  0.6099],
        [ 0.8631, -0.2569,  0.1799,  0.6778],
        [ 0.8491,  0.5000,  0.3010,  0.1332]], grad_fn=<TanhBackward>), tensor([[ 0.6286, -0.1987,  0.2962,  0.6099],
        [ 0.8631, -0.2569,  0.1799,  0.6778],
        [ 0.8491,  0.5000,  0.3010,  0.1332]], grad_fn=<TanhBackward>))

當打印模塊的時候,輸出爲模塊的子類層次結構。比如上述打印的mycell的結果是linear子類及其參數。
通過這種方式組合模塊,就可以用可複用的組件輕鬆地創建模型。
此外,從輸出結果可以看出還有grad_fn。這是PyTorch自動微分求導給出的信息,稱爲autograd。簡而言之,該系統允許我們通過潛在的複雜程序來計算導數。該設計爲模型創建提供了極大的靈活性。

我們用例子進一步說明模型構建的靈活性。在上述基礎上新增MyDecisionGate,該模塊中用到形如循環或if語句的控制流。

class MyDecisionGate(torch.nn.Module):
  def forward(self, x):
    if x.sum() > 0:
      return x
    else:
      return -x

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()
        self.dg = MyDecisionGate()
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

my_cell = MyCell()
x = torch.rand(3, 4)
h = torch.rand(3, 4)
print(my_cell)
print(my_cell(x, h))

輸出結果:

MyCell(
  (dg): MyDecisionGate()
  (linear): Linear(in_features=4, out_features=4, bias=True)
)
(tensor([[ 0.6055,  0.5525,  0.8768,  0.6291],
        [ 0.6550,  0.7678,  0.7121, -0.0692],
        [ 0.1305,  0.2356,  0.7683,  0.4723]], grad_fn=<TanhBackward>), tensor([[ 0.6055,  0.5525,  0.8768,  0.6291],
        [ 0.6550,  0.7678,  0.7121, -0.0692],
        [ 0.1305,  0.2356,  0.7683,  0.4723]], grad_fn=<TanhBackward>))

TorchScript

以上述運行過的示例爲例,看看如何應用TorchScript。

追蹤(tracing)

簡而言之,鑑於原生PyTorch具有靈活和動態的特性,TorchScript也提供了捕獲模型定義的工具。其中一個核心的概念就是模型追蹤(tracing)。

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.linear(x) + h)
        return new_h, new_h

my_cell = MyCell()
x, h = torch.rand(3, 4), torch.rand(3, 4)
traced_cell = torch.jit.trace(my_cell, (x, h))
print(traced_cell)
traced_cell(x, h)

運行結果:

TracedModule[MyCell](
  (linear): TracedModule[Linear]()
)

與此前一樣,實例化MyCell,但是這次,使用torch.jit.trace方法調用Module,然後傳入了網絡的示例輸入。這到底是做什麼的?它已調用Module,記錄了Module運行時發生的操作,並創建了torch.jit.ScriptModule實例(TracedModule的實例)。TorchScript將其定義記錄在中間表示(或IR)中,在深度學習中通常稱爲graph。我們可以通過訪問.graph屬性來查看graph:

print(traced_cell.graph)

運行結果:

graph(%self : ClassType<MyCell>,
      %input : Float(3, 4),
      %h : Float(3, 4)):
  %1 : ClassType<Linear> = prim::GetAttr[name="linear"](%self)
  %weight : Tensor = prim::GetAttr[name="weight"](%1)
  %bias : Tensor = prim::GetAttr[name="bias"](%1)
  %6 : Float(4!, 4!) = aten::t(%weight), scope: MyCell/Linear[linear] # /home/data1/software/Anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1369:0
  %7 : int = prim::Constant[value=1](), scope: MyCell/Linear[linear] # /home/data1/software/Anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1369:0
  %8 : int = prim::Constant[value=1](), scope: MyCell/Linear[linear] # /home/data1/software/Anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1369:0
  %9 : Float(3, 4) = aten::addmm(%bias, %input, %6, %7, %8), scope: MyCell/Linear[linear] # /home/data1/software/Anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:1369:0
  %10 : int = prim::Constant[value=1](), scope: MyCell # test_pytorch.py:9:0
  %11 : Float(3, 4) = aten::add(%9, %h, %10), scope: MyCell # test_pytorch.py:9:0
  %12 : Float(3, 4) = aten::tanh(%11), scope: MyCell # test_pytorch.py:9:0
  %13 : (Float(3, 4), Float(3, 4)) = prim::TupleConstruct(%12, %12)
  return (%13)

但是,這是一個非常低級的表示形式,圖中包含的大多數信息對最終用戶沒有用。相反,我們可以使用.code屬性爲代碼提供Python語法的解釋:

print(traced_cell.code)

輸出結果:

def forward(self,
    input: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = self.linear
  weight = _0.weight
  bias = _0.bias
  _1 = torch.addmm(bias, input, torch.t(weight), beta=1, alpha=1)
  _2 = torch.tanh(torch.add(_1, h, alpha=1))
  return (_2, _2)

那麼爲什麼我們要做所有這些呢?有以下幾個原因:

  1. TorchScript代碼可以在其自己的解釋器中調用,該解釋器基本上是受限制的Python解釋器。該解釋器不獲取全局解釋器鎖,因此可以在同一實例上同時處理許多請求。
  2. 這種格式使我們可以將整個模型保存到磁盤上,並可以在另一個環境中加載,例如在以非Python語言編寫的服務中。
  3. TorchScript爲我們提供了一種表示形式,通過TorchScript我們可以對代碼進行編譯器優化以提供更有效的執行。
  4. 通過TorchScript可以與許多後端/設備運行時進行接口,這些運行時比單個操作需要更廣泛的程序視圖。

可以看到調用traced_cell產生的結果與直接執行Python模塊結果是相同的:
運行:

print(my_cell(x, h))
print(traced_cell(x, h))

運行結果:

(tensor([[0.6964, 0.5208, 0.7205, 0.6677],
        [0.6465, 0.3342, 0.7431, 0.5376],
        [0.5603, 0.1212, 0.9433, 0.8053]], grad_fn=<TanhBackward>), tensor([[0.6964, 0.5208, 0.7205, 0.6677],
        [0.6465, 0.3342, 0.7431, 0.5376],
        [0.5603, 0.1212, 0.9433, 0.8053]], grad_fn=<TanhBackward>))
(tensor([[0.6964, 0.5208, 0.7205, 0.6677],
        [0.6465, 0.3342, 0.7431, 0.5376],
        [0.5603, 0.1212, 0.9433, 0.8053]],
       grad_fn=<DifferentiableGraphBackward>), tensor([[0.6964, 0.5208, 0.7205, 0.6677],
        [0.6465, 0.3342, 0.7431, 0.5376],
        [0.5603, 0.1212, 0.9433, 0.8053]],
       grad_fn=<DifferentiableGraphBackward>))

使用 Scripting to Convert Modules

我們使用模塊的第二個版本,即traced_cell(x, h)是有原因的,而不是使用帶有控制流的子模塊的一個版本。讓我們以下述示例來闡述其背後的原因。

class MyDecisionGate(torch.nn.Module):
  def forward(self, x):
    if x.sum() > 0:
      return x
    else:
      return -x

class MyCell(torch.nn.Module):
    def __init__(self, dg):
        super(MyCell, self).__init__()
        self.dg = dg
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

my_cell = MyCell(MyDecisionGate())
x, h = torch.rand(3, 4), torch.rand(3, 4)
traced_cell = torch.jit.trace(my_cell, (x, h))
print(traced_cell.code)

輸出結果:

test_pytorch.py:4: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if x.sum() > 0:
def forward(self,
    input: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = self.linear
  weight = _0.weight
  bias = _0.bias
  x = torch.addmm(bias, input, torch.t(weight), beta=1, alpha=1)
  _1 = torch.tanh(torch.add(torch.neg(x), h, alpha=1))
  return (_1, _1)

根據.code的輸出,可以發現if-else的分支已經杳無蹤跡!爲什麼?Tracing完全按照我們所說的去做:運行代碼,記錄發生的操作,並構造一個可以做到這一點的ScriptModule。不幸的是,在這個運行過程,諸如控制流之類的信息被抹去了。
那麼如何在TorchScript中如實地表示此模塊?PyTorch提供了一個腳本編譯器,它可以直接分Python源代碼以將其轉換爲TorchScript。對上述的MyDecisionGate使用腳本編譯器進行轉換:

scripted_gate = torch.jit.script(MyDecisionGate())  # 看這裏

my_cell = MyCell(scripted_gate)
traced_cell = torch.jit.script(my_cell)  # 看這裏
print(traced_cell.code)

運行結果:

def forward(self,
    x: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = self.linear
  _1 = _0.weight
  _2 = _0.bias
  if torch.eq(torch.dim(x), 2):
    _3 = torch.__isnot__(_2, None)
  else:
    _3 = False
  if _3:
    bias = ops.prim.unchecked_unwrap_optional(_2)
    ret = torch.addmm(bias, x, torch.t(_1), beta=1, alpha=1)
  else:
    output = torch.matmul(x, torch.t(_1))
    if torch.__isnot__(_2, None):
      bias0 = ops.prim.unchecked_unwrap_optional(_2)
      output0 = torch.add_(output, bias0, alpha=1)
    else:
      output0 = output
    ret = output0
  _4 = torch.gt(torch.sum(ret, dtype=None), 0)
  if bool(_4):
    _5 = ret
  else:
    _5 = torch.neg(ret)
  new_h = torch.tanh(torch.add(_5, h, alpha=1))
  return (new_h, new_h)

現在,已經可以如實地捕獲了在TorchScript中程序的行爲。現在嘗試運行該程序:

# New inputs
x, h = torch.rand(3, 4), torch.rand(3, 4)
print(traced_cell(x, h))

運行結果:

(tensor([[ 0.3430, -0.3471,  0.7990,  0.8313],
        [-0.4042, -0.3058,  0.7758,  0.8332],
        [-0.3002, -0.3926,  0.8468,  0.7715]],
       grad_fn=<DifferentiableGraphBackward>), tensor([[ 0.3430, -0.3471,  0.7990,  0.8313],
        [-0.4042, -0.3058,  0.7758,  0.8332],
        [-0.3002, -0.3926,  0.8468,  0.7715]],
       grad_fn=<DifferentiableGraphBackward>))

注意,本文實驗的PyTorch版本是1.2.0+cu92

混合腳本(Scripting)和追蹤(Tracing)

在某些情況下,只需追蹤的的結果而不需要腳本,例如,模塊具有許多條件分支,這些分支我們並不希望展現在TorchScript中。在這種情況下,腳本可以與用以下方法追蹤:torch.jit.scripttorch.jit.script只會追蹤方法內的腳本,不會展示方法外的腳本情況。

基於上述示例修改如下:

class MyDecisionGate(torch.nn.Module):
  def forward(self, x):
    if x.sum() > 0:
      return x
    else:
      return -x

class MyCell(torch.nn.Module):
    def __init__(self, dg):
        super(MyCell, self).__init__()
        self.dg = dg
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

scripted_gate = torch.jit.script(MyDecisionGate())
x, h = torch.rand(3, 4), torch.rand(3, 4)

class MyRNNLoop(torch.nn.Module):
    def __init__(self):
        super(MyRNNLoop, self).__init__()
        self.cell = torch.jit.trace(MyCell(scripted_gate), (x, h))  # 看這裏,混合使用

    def forward(self, xs):
        h, y = torch.zeros(3, 4), torch.zeros(3, 4)
        for i in range(xs.size(0)):
            y, h = self.cell(xs[i], h)
        return y, h

rnn_loop = torch.jit.script(MyRNNLoop())
print(rnn_loop.code)

運行結果:

def forward(self,
    xs: Tensor) -> Tuple[Tensor, Tensor]:
  h = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  y = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  y0, h0 = y, h
  for i in range(torch.size(xs, 0)):
    _0 = self.cell
    _1 = torch.select(xs, 0, i)
    _2 = _0.linear
    weight = _2.weight
    bias = _2.bias
    _3 = torch.addmm(bias, _1, torch.t(weight), beta=1, alpha=1)
    _4 = torch.gt(torch.sum(_3, dtype=None), 0)
    if bool(_4):
      _5 = _3
    else:
      _5 = torch.neg(_3)
    _6 = torch.tanh(torch.add(_5, h0, alpha=1))
    y0, h0 = _6, _6
  return (y0, h0)

在上面的基礎上再包裝一層WrapRNN類,具體如下:

class MyDecisionGate(torch.nn.Module):
  def forward(self, x):
    if x.sum() > 0:
      return x
    else:
      return -x

class MyCell(torch.nn.Module):
    def __init__(self, dg):
        super(MyCell, self).__init__()
        self.dg = dg
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

scripted_gate = torch.jit.script(MyDecisionGate())
x, h = torch.rand(3, 4), torch.rand(3, 4)

class MyRNNLoop(torch.nn.Module):
    def __init__(self):
        super(MyRNNLoop, self).__init__()
        self.cell = torch.jit.trace(MyCell(scripted_gate), (x, h))  # 看這裏,混合使用

    def forward(self, xs):
        h, y = torch.zeros(3, 4), torch.zeros(3, 4)
        for i in range(xs.size(0)):
            y, h = self.cell(xs[i], h)
        return y, h

class WrapRNN(torch.nn.Module):
  def __init__(self):
    super(WrapRNN, self).__init__()
    self.loop = torch.jit.script(MyRNNLoop())

  def forward(self, xs):
    y, h = self.loop(xs)
    return torch.relu(y)

traced = torch.jit.trace(WrapRNN(), (torch.rand(10, 3, 4)))
print(traced.code)

運行輸出結果:

def forward(self,
    argument_1: Tensor) -> Tensor:
  _0 = self.loop
  h = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  h0 = h
  for i in range(torch.size(argument_1, 0)):
    _1 = _0.cell
    _2 = torch.select(argument_1, 0, i)
    _3 = _1.linear
    weight = _3.weight
    bias = _3.bias
    _4 = torch.addmm(bias, _2, torch.t(weight), beta=1, alpha=1)
    _5 = torch.gt(torch.sum(_4, dtype=None), 0)
    if bool(_5):
      _6 = _4
    else:
      _6 = torch.neg(_4)
    h0 = torch.tanh(torch.add(_6, h0, alpha=1))
  return torch.relu(h0)

保存和加載TorchScript模型

PyTorch提供API,以存檔格式將TorchScript模塊保存到磁盤或從磁盤加載TorchScript模塊。這種格式包括代碼,參數,屬性和調試信息,這意味着歸檔文件是模型的獨立表示形式,可以在完全獨立的過程中加載。
對上述示例中的RNN模型進行保存並加載如下:

traced.save('wrapped_rnn.zip')

loaded = torch.jit.load('wrapped_rnn.zip')

print(loaded)
print(loaded.code)

運行結果:

ScriptModule(
  (loop): ScriptModule(
    (cell): ScriptModule(
      (dg): ScriptModule()
      (linear): ScriptModule()
    )
  )
)
def forward(self,
    argument_1: Tensor) -> Tensor:
  _0 = self.loop
  h = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  h0 = h
  for i in range(torch.size(argument_1, 0)):
    _1 = _0.cell
    _2 = torch.select(argument_1, 0, i)
    _3 = _1.linear
    weight = _3.weight
    bias = _3.bias
    _4 = torch.addmm(bias, _2, torch.t(weight), beta=1, alpha=1)
    _5 = torch.gt(torch.sum(_4, dtype=None), 0)
    if bool(_5):
      _6 = _4
    else:
      _6 = torch.neg(_4)
    h0 = torch.tanh(torch.add(_6, h0, alpha=1))
  return torch.relu(h0)

從上述結果可以看出,序列化保留了模塊層次結構和代碼。也可以將模型加載到C ++中以實現不依賴Python的執行。下面我們就介紹在C++中如何加載模型並進行推理操作。

在C++中加載TorchScript模型

Step 1:將PyTorch模型轉換爲Torch Script

將PyTorch模型從Python轉到C++需要通過Torch Script實現。Torch Script 是PyTorch模型的一種表示,它可以被Torch Script 編譯器理解、編譯和序列化。 如果用普通的“eager”API編寫PyTorch模型,則必須首先將模型轉換爲 Torch Script。

前面章節已經介紹過2種將PyTorch模型轉換爲Torch Script 的方法。第一種是追蹤(tracing),通過實例輸入對模型結構做一次評估,並記錄這些輸入通過模型的流動狀態。該方法適用於模型有限使用控制流的情況。第二種方法是在模型中添加明確的註釋,使得Torch Script 編譯器可以直接解析和編譯模型代碼。更詳細資料可以參考Torch Script reference

通過Tracing

要通過追蹤方式將PyTorch模型轉換爲Torch Script,必須將帶有樣例輸入的模型實例輸入到torch.jit.trace函數。這將產生一個torch.jit.ScriptModule對象,該對象在forward 方法中嵌入模型評估的追蹤。
具體使用示例如下:

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)

被追蹤的ScriptModule對象,現在可以被視爲常規的PyTorch模塊。

output = traced_script_module(torch.ones(1, 3, 224, 224))
print(output[0, :5])

輸出結果:

tensor([0.7741, 0.0539, 0.6656, 0.7301, 0.2207], grad_fn=<SliceBackward>)

通過Annotation(註釋)

在某些情況下,例如,如果模型採用控制流的特定形式,那麼直接以Torch Script 寫出模型,並相應地標註模型也許是更好的選擇。以下述Pytorch模型爲例展開說明:

import torch

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

因爲這個模塊中的forward方法使用依賴於輸入的控制流依,這種模塊不適合於追蹤方法。相反,可以將其轉換爲ScriptModule。爲了將模塊轉換爲ScriptModule,需要用torch.jit.script編譯模塊:

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

另外,對於nn.Module中不需要的方法(因爲TorchScript對於有些python特性目前是不支持的),可以用@torch.jit.ignore將其去除。

Step 2:將Script Module序列化到文件中

對於獲取到的ScriptModule對象(不管是用tracing方法還是annotation方法得到的),可以將其序列化爲一個文件,以便後續在其他環境(如C++)中使用。具體序列化方式如下:

traced_script_module.save("traced_resnet_model.pt")

如果同時想要序列化模塊my_module,可以使用my_module.save("my_module_model.pt")

Step 3:在C++中加載Torch Script模塊

在C++中加載序列化的PyTorch模型需要用到PyTorch C++ API,即LibTorch庫。LibTorch中有共享庫、頭文件和CMake構建配置文件。

最簡化的C++應用

example-app.cpp的內容如下:

#include <torch/script.h> // One-stop header.

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }


  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";
}

其中頭文件<torch/script.h>包括了運行示例所必需的LibTorch庫中所有的相關依賴。上述示例接收序列化的ScriptModule文件,並通過torch::jit::load()加載序列化的文件,返回結果是torch::jit::script::Module對象。

構建依賴和創建

上述代碼對應的CMakeLists.txt內容如下:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 11)

官方下載libtorch,並解壓:
在這裏插入圖片描述

其中lib目錄包含鏈接時所需的共享庫;include包含程序中用到的頭文件;share目錄包含必要的CMake配置,以方便上面find_package(Torch)命令的使用。

最後還需要構建應用程序。假設目錄佈局如下:

example-app/
  CMakeLists.txt
  example-app.cpp

可以運行下面的命令來從example-app/文件夾內構建應用程序:

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/home/data1/devtools/libtorch/ ..
make

這裏DCMAKE_PREFIX_PATH值爲下載libtorch後解包的位置。
編譯後,運行方式如下:

./example-app <path_to_model>/traced_resnet_model.pt

Step 4:在C++中執行Script Module

上述的介紹已經能夠實現在C++中加載序列化的ResNet18,現在需要做的是運行模型進行推理。具體如下:

// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));

// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

上述代碼的前2行是模型的輸入,再調用script::Module中的forward方法,返回結果的類型是IValue,需要進一步通過toTensor()轉爲tensor。

注意:如果想把模型以GPU運行,則只需對模型處理如下:model.to(at::kCUDA);。同時要確保模型的輸入也在CUDA內存中,可以用以下方式實現:tensor.to(at::kCUDA),則會返回一個新的位於CUDA內存中的tensor。

圖像分類實例

環境準備

需要預先安裝cmake、opencv、 PyTroch 1.2。 在opencv安裝過程可能會出現一些諸如gcc版本(本文使用的gcc5.2)過低等環境安裝問題,這裏就展開說明了。

C++中加載模型

以使用resnet18模型進行圖像分類爲例。

Step 1:將PyTorch模型轉爲Torch Script

運行如下腳本:

import torch
import torchvision
from torchvision import transforms
from PIL import Image
from time import time
import numpy as np

# An instance of your model.
model = torchvision.models.resnet18(pretrained=True)
model.eval()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("model.pt")

# evalute time
batch = torch.rand(64, 3, 224, 224)
start = time()
output = traced_script_module(batch)
stop = time()
print(str(stop-start) + "s")

# read image
image = Image.open('dog.png').convert('RGB')
default_transform = transforms.Compose([
        transforms.Resize([224, 224]),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
      ])
image = default_transform(image)

# forward
output = traced_script_module(image.unsqueeze(0))
print(output[0, :10])

# print top-5 predicted labels
labels = np.loadtxt('synset_words.txt', dtype=str, delimiter='\n')

data_out = output[0].data.numpy()
sorted_idxs = np.argsort(-data_out)

for i,idx in enumerate(sorted_idxs[:5]):
  print('top-%d label: %s, score: %f' % (i, labels[idx], data_out[idx]))

獲得model.pt

Step 2:在C++中調用Torch Script

(1)需要先下載LibTorch並解包,在make編譯時候需要指定該lib的路徑。
(2)利用cmake工具對業務代碼,即使用Torch Script的代碼進行編譯

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/home/data1/devtools/libtorch ..
make

(3)運行

./example-app ../model.pt ../dog.png ../synset_words.txt

打印結果:

top-1 label:n02108422 bull mastiff
its score:17.9795
top-2 label:n02093428 American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier
its score:13.3846
top-3 label:n02109047 Great Dane
its score:12.8465
top-4 label:n02093256 Staffordshire bullterrier, Staffordshire bull terrier
its score:12.1885
top-5 label:n02110958 pug, pug-dog
its score:11.9975

從打印結果可以看出,預測結果爲n02108422 bull mastiff,即牛頭獒。
先看下輸入圖像:
Alt
再網絡搜索bull mastiff確認:
Alt

附上完整代碼:

#include <torch/script.h>
#include <torch/torch.h> 
//#include <torch/serialize/Tensor.h> 
#include <ATen/Tensor.h>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgproc/types_c.h>  

#include <iostream>
#include <memory>
#include <string>
#include <vector>

/* main */
int main(int argc, const char* argv[]) {
  if (argc < 4) {
    std::cerr << "usage: example-app <path-to-exported-script-module> "
      << "<path-to-image>  <path-to-category-text>\n";
    return -1;
  }

  // Deserialize the ScriptModule from a file using torch::jit::load().
  //std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
  torch::jit::script::Module module = torch::jit::load(argv[1]);  
  std::cout << "load model ok\n";

  // Create a vector of inputs.
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(torch::rand({64, 3, 224, 224}));

  // evalute time
  double t = (double)cv::getTickCount();
  module.forward(inputs).toTensor();
  t = (double)cv::getTickCount() - t;
  printf("execution time = %gs\n", t / cv::getTickFrequency());
  inputs.pop_back();

  // load image with opencv and transform
  cv::Mat image;
  image = cv::imread(argv[2], 1);
  cv::cvtColor(image, image, CV_BGR2RGB);
  cv::Mat img_float;
  image.convertTo(img_float, CV_32F, 1.0/255);
  cv::resize(img_float, img_float, cv::Size(224, 224));
  //std::cout << img_float.at<cv::Vec3f>(56,34)[1] << std::endl;
  //auto img_tensor = torch::CPU(torch::kFloat32).tensorFromBlob(img_float.data, {1, 224, 224, 3});
  auto img_tensor = torch::from_blob(img_float.data, {1, 224, 224, 3});//.to(torch::CPU);
  img_tensor = img_tensor.permute({0,3,1,2});
  img_tensor[0][0] = img_tensor[0][0].sub_(0.485).div_(0.229);
  img_tensor[0][1] = img_tensor[0][1].sub_(0.456).div_(0.224);
  img_tensor[0][2] = img_tensor[0][2].sub_(0.406).div_(0.225);
  inputs.push_back(img_tensor);
  
  // Execute the model and turn its output into a tensor.
  torch::Tensor out_tensor = module.forward(inputs).toTensor();
  std::cout << out_tensor.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';

  // Load labels
  std::string label_file = argv[3];
  std::ifstream rf(label_file.c_str());
  CHECK(rf) << "Unable to open labels file " << label_file;
  std::string line;
  std::vector<std::string> labels;
  while (std::getline(rf, line))
    labels.push_back(line);
  std::cout << "Found all " << labels.size() << " labels"<<std::endl;
  // print predicted top-5 labels
  std::tuple<torch::Tensor,torch::Tensor> result = out_tensor.sort(-1, true);
  torch::Tensor top_scores = std::get<0>(result)[0];
  torch::Tensor top_idxs = std::get<1>(result)[0].toType(torch::kInt32);
  
  auto top_scores_a = top_scores.accessor<float,1>();
  auto top_idxs_a = top_idxs.accessor<int,1>();

  for (int i = 0; i < 5; ++i)
  {
    int idx = top_idxs_a[i];
    std::cout<<"top-" << i+1 << " label:"<<labels[idx]<<std::endl;
    //printf("top-%s")
    std::cout<<"its score:"<<top_scores_a[i]<<std::endl;
  }
//  cv::imshow("image", image);
//  cv::waitKey(0);
  return 0;
}

參考資料

https://pytorch.org/blog/model-serving-in-pyorch/
https://medium.com/datadriveninvestor/deploy-your-pytorch-model-to-production-f69460192217
https://github.com/iamhankai/cpp-pytorch
https://pytorch.org/tutorials/advanced/cpp_export.html#step-1-converting-your-pytorch-model-to-torch-script

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章