TensorRT實戰(二) 如何使用TRT python API搭建簡單的VGG16網絡

2020-01-01 初版
2020-01-10 修改vgg結構至torchvision.models.vgg, 更新代碼


一、讀入權重並搭建網絡

參考TRT提供的官方文檔python_samples,注意這個TRT版本是6.0的,目前TRT已經更新到了7.0,不過看Release Note可以發現,TRT6.0與TRT7.0在API上沒有變動,因此也不必有所顧忌。另外,由於這個Python Sample必須要將TRT整個給下載下來,才能看到其中的PyThon API的文檔,因此這裏我給的是自己的倉庫鏈接。github上官方有提供CPP API文檔,可見Building a Simple MNIST Network Layer by Layer,不過本文是使用PyThon API搭建,所以就不再談及CPP API的事。

1.1 分析源碼

python_samples/network_api_pytorch_mnist中有README.mdmodel.pysample.pyrequirement.txt,明顯的,我們需要具體分析model.pysample.py這兩個文件,model.py是使用PyTorch搭建的MNIST網絡,sample.py則是使用TRT API搭建,前者皆包含訓練、測試過程,後者僅有測試,因此後者沒有經過F.log_softmax操作。下面是我摘抄出來的部分核心代碼,熟悉的人一眼便可明白:

model.py

# Network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, kernel_size=5)
        self.conv2 = nn.Conv2d(20, 50, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(800, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.max_pool2d(self.conv1(x), kernel_size=2, stride=2)
        x = F.max_pool2d(self.conv2(x), kernel_size=2, stride=2)
        x = x.view(-1, 800)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

sample.py

def populate_network(network, weights):
    # Configure the network layers based on the weights provided.
    # 標記網絡輸入
    input_tensor = network.add_input(name=ModelData.INPUT_NAME, dtype=ModelData.DTYPE, shape=ModelData.INPUT_SHAPE)

    # 對應PyTorch之self.conv1
    conv1_w = weights['conv1.weight'].numpy()
    conv1_b = weights['conv1.bias'].numpy()
    conv1 = network.add_convolution(input=input_tensor, num_output_maps=20, kernel_shape=(5, 5), kernel=conv1_w, bias=conv1_b)
    conv1.stride = (1, 1)

    # 對應PyTorch之F.max_pool2d
    pool1 = network.add_pooling(input=conv1.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    pool1.stride = (2, 2)

    # 對應PyTorch之self.conv2
    conv2_w = weights['conv2.weight'].numpy()
    conv2_b = weights['conv2.bias'].numpy()
    conv2 = network.add_convolution(pool1.get_output(0), 50, (5, 5), conv2_w, conv2_b)
    conv2.stride = (1, 1)

    # 對應PyTorch之F.max_pool2d
    pool2 = network.add_pooling(conv2.get_output(0), trt.PoolingType.MAX, (2, 2))
    pool2.stride = (2, 2)

    # 對應PyTorch之self.fc1
    fc1_w = weights['fc1.weight'].numpy()
    fc1_b = weights['fc1.bias'].numpy()
    fc1 = network.add_fully_connected(input=pool2.get_output(0), num_outputs=500, kernel=fc1_w, bias=fc1_b)

    # 對應PyTorch之self.relu
    relu1 = network.add_activation(input=fc1.get_output(0), type=trt.ActivationType.RELU)

    # 對應PyTorch之self.fc2
    fc2_w = weights['fc2.weight'].numpy()
    fc2_b = weights['fc2.bias'].numpy()
    fc2 = network.add_fully_connected(relu1.get_output(0), ModelData.OUTPUT_SIZE, fc2_w, fc2_b)

    # 設置該層輸出名字
    fc2.get_output(0).name = ModelData.OUTPUT_NAME
    # 標記網絡輸出
    network.mark_output(tensor=fc2.get_output(0))

sample.pypopulate_network中,network是返回值,weights是輸入值,對應model.pyNetNet.state_dict(),注意weights是加載在CPU上的。

兩者相互比較、對應,對於PyTorch而言,輸入x首先經過conv1卷積、F.max_pool2d池化、conv2卷積、F.max_pool2d池化、view(-1)一維化、relu激活、fc全連接、F.log_softmax歸一化輸出結果概率分佈;對於TRT而言,整個鏈路的行爲需要跟PyTorch一致,不同的是TRT不需要訓練,因此就不需要log_softmax了。比較兩者可以歸結如下表所示,簡單且複雜:

PyTorch Operators TRT API Operators
self.conv1 = nn.Conv2d(1, 20, kernel_size=5) conv1_w = weights[‘conv1.weight’].numpy()
conv1_b = weights[‘conv1.bias’].numpy()
conv1 = network.add_convolution(input=input_tensor, num_output_maps=20, kernel_shape=(5, 5), kernel=conv1_w, bias=conv1_b)
conv1.stride = (1, 1)
F.max_pool2d(self.conv1(x), kernel_size=2, stride=2) pool1 = network.add_pooling(input=conv1.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
pool1.stride = (2, 2)
self.conv2 = nn.Conv2d(20, 50, kernel_size=5) conv2_w = weights[‘conv2.weight’].numpy()
conv2_b = weights[‘conv2.bias’].numpy()
conv2 = network.add_convolution(pool1.get_output(0), 50, (5, 5), conv2_w, conv2_b)
conv2.stride = (1, 1)
F.max_pool2d(self.conv2(x), kernel_size=2, stride=2) pool2 = network.add_pooling(conv2.get_output(0), trt.PoolingType.MAX, (2, 2))
pool2.stride = (2, 2)
self.fc1 = nn.Linear(800, 500) fc1_w = weights[‘fc1.weight’].numpy()
fc1_b = weights[‘fc1.bias’].numpy()
fc1 = network.add_fully_connected(input=pool2.get_output(0), num_outputs=500, kernel=fc1_w, bias=fc1_b)
F.relu(self.fc1(x)) relu1 = network.add_activation(input=fc1.get_output(0), type=trt.ActivationType.RELU)
self.fc2 = nn.Linear(500, 10) fc2_w = weights[‘fc2.weight’].numpy()
fc2_b = weights[‘fc2.bias’].numpy()
fc2 = network.add_fully_connected(relu1.get_output(0), ModelData.OUTPUT_SIZE, fc2_w, fc2_b)
F.log_softmax(x, dim=1) THERE IS NO NEED…

注意

對於TRT API而言,networktensorrt.INetworkDefinition類,使用network.add_xxx成員函數爲該對象添加任意成員變量——網絡層tensorrt.ILayer,不同的層繼承自基類tensorrt.ILayer,派生出不同子類並擁有不同行爲。這些網絡層,都是TRT類,網絡層的.get_output(0)只能輸出tensorrt.ITensor類,無法在構建網絡的時候輸出其中的內容,這裏又有TRT的run timebuild time的概念,可閱讀TensorRT Developer Guide之動態shape,如果需要對各層進行調試的話,只能xxxlayer.get_output(0).shape輸出這個Tensor類的shape,或者在構建完成後的運行時輸出結果,對內部進行調試是不可能的。

總之,TRT API的行爲描述如下:

getoutput
getoutput
input_tensor = network.add_input
a_layer = network.add_xxxlayers
b_layer = network.add_xxxlayers
network.mark_output

add_inputadd_xxxlayersget_output(0)mark_output與對網路結構的清晰認知,如果你有上述5點,恭喜你TensorRT入門了。

1.2 搭建網絡

那麼對於VGG而言該怎麼做呢?首先你得熟悉網絡的構成,以VGG16爲例,它應該長得如下圖所示:

VGG16

詳細點,網絡參數應該如下表所示,這裏使用的是PyTorch的torchvision.models.vgg提供的VGG16結構:

VGG16 Config
Conv-3x3-64-strd1-pad1
Relu
Conv-3x3-64-strd1-pad1
Relu
Maxpool-2x2-strd2-pad0
Conv-3x3-128-strd1-pad1
Relu
Conv-3x3-128-strd1-pad1
Relu
Maxpool-2x2-strd2-pad0
Conv-3x3-256-strd1-pad1
Relu
Conv-3x3-256-strd1-pad1
Relu
Conv-3x3-256-strd1-pad1
Relu
Maxpool-2x2-strd2-pad0
Conv-3x3-512-strd1-pad1
Relu
Conv-3x3-512-strd1-pad1
Relu
Conv-3x3-512-strd1-pad1
Relu
Maxpool-2x2-strd2-pad0
Conv-3x3-512-strd1-pad1
Relu
Conv-3x3-512-strd1-pad1
Relu
Conv-3x3-512-strd1-pad1
Relu
Maxpool-2x2-strd2-pad0
Avgpool-1x1-strd0-pad0
FC-4096
Relu
FC-4096
Relu
FC-1000
Relu

源碼torchvision.models.vgg的VGG16中,在features塊和classifier塊中有個avgpool塊,裏面是PyTorch內建的AdaptiveAvgPool2d層,這裏我使用普通的Avgpool進行替代,該層的參數設置參考上篇文章PyTorch2ONNX2TensorRT 踩坑日誌5. 使用AvgPooling替換AdaptivePooling,另外,源碼在全連接層後面跟了Dropout,這是在訓練過程中防止過擬合的,在推理過程是不需要的,因此這裏我就全部拋棄了。也就是說,我們需要用到add_convolution13次、add_fully_connected3次、add_activation13次、add_pooling6次,python_samples完全能夠勝任。

"""
讀入VGG16的權重,這裏我使用的VGG16的信息如下:
    Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.
    [VGG16 pre-trained weight](https://drive.google.com/open?id=1jOBAqe4fPFMCgRnYt794lYgSAlx4hwCj)

"""
weights = torch.load('./vgg16_20M.pth', map_location='cpu')
for k, v in weights.items():
    print("Layer: {}".format(k))

通過上述代碼打印出模型各層的名字,將weights['xxx'].numpy()裏的xxx進行逐一替代就好了。

1.3 完整代碼

雖然很長,但熟練了以後,碼起來就是重複工作了。在ModelData裏的DTYPE需要顯式地定義模型使用的數據類型,例如用FP32推理,就設定ModelData.DTYPE = trt.float32;用FP16推理,設定ModelData.DTYPE = trt.float16,並且在builder處強制設置builder.fp16_mode = True

詳細代碼見github -> i_just_want_a_simple_demo/trt_api_pytorch/vgg16_sample/

import tensorrt as trt


class ModelData(object):
    INPUT_NAME  = "in_frame"
    # P, C, H, W
    INPUT_SHAPE = (1, 3, 224, 224)
    OUTPUT_NAME = "out_frame"
    DTYPE       = trt.float32


def populate_network(network, weights):
    # Configure the network layers based on the weights provided.
    input_tensor      = network.add_input(name=ModelData.INPUT_NAME, dtype=ModelData.DTYPE, shape=ModelData.INPUT_SHAPE)

    # VGG16 features
    # VGG16_block_1
    vgg16_f0_w        = weights['features.0.weight'].numpy()
    vgg16_f0_b        = weights['features.0.bias'].numpy()
    vgg16_f0          = network.add_convolution(input=input_tensor, num_output_maps=64, kernel_shape=(3, 3), kernel=vgg16_f0_w, bias=vgg16_f0_b)
    vgg16_f0.padding  = (1, 1)
    vgg16_f0.name     = 'vgg16_conv_1_1'
    vgg16_f1          = network.add_activation(input=vgg16_f0.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f1.name     = 'vgg16_relu_1_1'
    vgg16_f2_w        = weights['features.2.weight'].numpy()
    vgg16_f2_b        = weights['features.2.bias'].numpy()
    vgg16_f2          = network.add_convolution(input=vgg16_f1.get_output(0), num_output_maps=64, kernel_shape=(3, 3), kernel=vgg16_f2_w, bias=vgg16_f2_b)
    vgg16_f2.padding  = (1, 1)
    vgg16_f2.name     = 'vgg16_conv_1_2'
    vgg16_f3          = network.add_activation(input=vgg16_f2.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f3.name     = 'vgg16_relu_1_2'
    vgg16_f4          = network.add_pooling(input=vgg16_f3.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    vgg16_f4.stride   = (2, 2)
    vgg16_f4.name     = 'vgg16_max_pool_1'

    # VGG16_block_2
    vgg16_f5_w        = weights['features.5.weight'].numpy()
    vgg16_f5_b        = weights['features.5.bias'].numpy()
    vgg16_f5          = network.add_convolution(input=vgg16_f4.get_output(0), num_output_maps=128, kernel_shape=(3, 3), kernel=vgg16_f5_w, bias=vgg16_f5_b)
    vgg16_f5.padding  = (1, 1)
    vgg16_f5.name     = "vgg16_conv_2_1"
    vgg16_f6          = network.add_activation(input=vgg16_f5.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f6.name     = 'vgg16_relu_2_1'
    vgg16_f7_w        = weights['features.7.weight'].numpy()
    vgg16_f7_b        = weights['features.7.bias'].numpy()
    vgg16_f7          = network.add_convolution(input=vgg16_f6.get_output(0), num_output_maps=128, kernel_shape=(3, 3), kernel=vgg16_f7_w, bias=vgg16_f7_b)
    vgg16_f7.padding  = (1, 1)
    vgg16_f7.name     = "vgg16_conv_2_2"
    vgg16_f8          = network.add_activation(input=vgg16_f7.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f8.name     = 'vgg16_relu_2_2'
    vgg16_f9          = network.add_pooling(input=vgg16_f8.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    vgg16_f9.stride   = (2, 2)
    vgg16_f9.name     = 'vgg16_max_pool_2'

    # VGG16_block_3
    vgg16_f10_w       = weights['features.10.weight'].numpy()
    vgg16_f10_b       = weights['features.10.bias'].numpy()
    vgg16_f10         = network.add_convolution(input=vgg16_f9.get_output(0), num_output_maps=256, kernel_shape=(3, 3), kernel=vgg16_f10_w, bias=vgg16_f10_b)
    vgg16_f10.padding = (1, 1)
    vgg16_f10.name    = "vgg16_conv_3_1"
    vgg16_f11         = network.add_activation(input=vgg16_f10.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f11.name    = 'vgg16_relu_3_1'
    vgg16_f12_w       = weights['features.12.weight'].numpy()
    vgg16_f12_b       = weights['features.12.bias'].numpy()
    vgg16_f12         = network.add_convolution(input=vgg16_f11.get_output(0), num_output_maps=256, kernel_shape=(3, 3), kernel=vgg16_f12_w, bias=vgg16_f12_b)
    vgg16_f12.padding = (1, 1)
    vgg16_f12.name    = "vgg16_conv_3_2"
    vgg16_f13         = network.add_activation(input=vgg16_f12.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f13.name    = 'vgg16_relu_3_2'
    vgg16_f14_w       = weights['features.14.weight'].numpy()
    vgg16_f14_b       = weights['features.14.bias'].numpy()
    vgg16_f14         = network.add_convolution(input=vgg16_f13.get_output(0), num_output_maps=256, kernel_shape=(3, 3), kernel=vgg16_f14_w, bias=vgg16_f14_b)
    vgg16_f14.padding = (1, 1)
    vgg16_f14.name    = "vgg16_conv_3_3"
    vgg16_f15         = network.add_activation(input=vgg16_f14.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f15.name    = 'vgg16_relu_3_3'
    vgg16_f16         = network.add_pooling(input=vgg16_f15.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    vgg16_f16.stride  = (2, 2)
    vgg16_f16.name    = 'vgg16_max_pool_3'

    # VGG16_block_4
    vgg16_f17_w       = weights['features.17.weight'].numpy()
    vgg16_f17_b       = weights['features.17.bias'].numpy()
    vgg16_f17         = network.add_convolution(input=vgg16_f16.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f17_w, bias=vgg16_f17_b)
    vgg16_f17.padding = (1, 1)
    vgg16_f17.name    = "vgg16_conv_4_1"
    vgg16_f18         = network.add_activation(input=vgg16_f17.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f18.name    = 'vgg16_relu_4_1'
    vgg16_f19_w       = weights['features.19.weight'].numpy()
    vgg16_f19_b       = weights['features.19.bias'].numpy()
    vgg16_f19         = network.add_convolution(input=vgg16_f18.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f19_w, bias=vgg16_f19_b)
    vgg16_f19.padding = (1, 1)
    vgg16_f19.name    = "vgg16_conv_4_2"
    vgg16_f20         = network.add_activation(input=vgg16_f19.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f20.name    = 'vgg16_relu_4_2'
    vgg16_f21_w       = weights['features.21.weight'].numpy()
    vgg16_f21_b       = weights['features.21.bias'].numpy()
    vgg16_f21         = network.add_convolution(input=vgg16_f20.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f21_w, bias=vgg16_f21_b)
    vgg16_f21.padding = (1, 1)
    vgg16_f21.name    = "vgg16_conv_4_3"
    vgg16_f22         = network.add_activation(input=vgg16_f21.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f22.name    = 'vgg16_relu_4_3'
    vgg16_f23         = network.add_pooling(input=vgg16_f22.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    vgg16_f23.stride  = (2, 2)
    vgg16_f23.name    = 'vgg16_max_pool_4'

    # VGG16_block_5
    vgg16_f24_w       = weights['features.24.weight'].numpy()
    vgg16_f24_b       = weights['features.24.bias'].numpy()
    vgg16_f24         = network.add_convolution(input=vgg16_f23.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f24_w, bias=vgg16_f24_b)
    vgg16_f24.padding = (1, 1)
    vgg16_f24.name    = "vgg16_conv_5_1"
    vgg16_f25         = network.add_activation(input=vgg16_f24.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f25.name    = "vgg16_relu_5_1"
    vgg16_f26_w       = weights['features.26.weight'].numpy()
    vgg16_f26_b       = weights['features.26.bias'].numpy()
    vgg16_f26         = network.add_convolution(input=vgg16_f25.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f26_w, bias=vgg16_f26_b)
    vgg16_f26.padding = (1, 1)
    vgg16_f26.name    = "vgg16_conv_5_2"
    vgg16_f27         = network.add_activation(input=vgg16_f26.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f27.name    = "vgg16_relu_5_2"
    vgg16_f28_w       = weights['features.28.weight'].numpy()
    vgg16_f28_b       = weights['features.28.bias'].numpy()
    vgg16_f28         = network.add_convolution(input=vgg16_f27.get_output(0), num_output_maps=512, kernel_shape=(3, 3), kernel=vgg16_f28_w, bias=vgg16_f28_b)
    vgg16_f28.padding = (1, 1)
    vgg16_f28.name    = "vgg16_conv_5_3"
    vgg16_f29         = network.add_activation(input=vgg16_f28.get_output(0), type=trt.ActivationType.RELU)
    vgg16_f29.name    = "vgg16_relu_5_3"
    vgg16_f30         = network.add_pooling(input=vgg16_f29.get_output(0), type=trt.PoolingType.MAX, window_size=(2, 2))
    vgg16_f30.stride  = (2, 2)
    vgg16_f30.name    = 'vgg16_max_pool_5'

    # VGG16 nn.AdaptiveAvgPool2d((7, 7))
    vgg16_a0          = network.add_pooling(input=vgg16_f30.get_output(0), type=trt.PoolingType.AVERAGE, window_size=(1, 1))
    vgg16_a0.name     = 'vgg16_avg_pool_0'

    # VGG16 torch.flatten(x, 1)
    # there is no need for torch.flatten(x, 1). because, tensorrt.IFullyConnectedLayer would first reshape the input
    # tensor from shape {P, C, H, W} into {P, C*H*W}.

    # VGG16 classifier
    # VGG16_fc_1
    vgg16_c0_w        = weights['classifier.0.weight'].numpy()
    vgg16_c0_b        = weights['classifier.0.bias'].numpy()
    vgg16_c0          = network.add_fully_connected(input=vgg16_a0.get_output(0), num_outputs=4096, kernel=vgg16_c0_w, bias=vgg16_c0_b)
    vgg16_c0.name     = "vgg16_fc_1"
    vgg16_c1          = network.add_activation(input=vgg16_c0.get_output(0), type=trt.ActivationType.RELU)
    vgg16_c1.name     = "vgg16_relu_fc_1"
    # there is no need for Dropout during inference

    # VGG16_fc_2
    vgg16_c3_w        = weights['classifier.3.weight'].numpy()
    vgg16_c3_b        = weights['classifier.3.bias'].numpy()
    vgg16_c3          = network.add_fully_connected(input=vgg16_c1.get_output(0), num_outputs=4096, kernel=vgg16_c3_w, bias=vgg16_c3_b)
    vgg16_c3.name     = "vgg16_fc_2"
    vgg16_c4          = network.add_activation(input=vgg16_c3.get_output(0), type=trt.ActivationType.RELU)
    vgg16_c4.name     = "vgg16_relu_fc_2"
    # there is no need for Dropout during inference

    # VGG16_fc_3
    vgg16_c6_w        = weights['classifier.6.weight'].numpy()
    vgg16_c6_b        = weights['classifier.6.bias'].numpy()
    vgg16_c6          = network.add_fully_connected(input=vgg16_c4.get_output(0), num_outputs=1000, kernel=vgg16_c6_w, bias=vgg16_c6_b)
    vgg16_c6.name     = "vgg16_fc_3"
    # Output
    vgg16_c6.get_output(0).name = ModelData.OUTPUT_NAME
    network.mark_output(tensor=vgg16_c6.get_output(0))

二、混合精度

(未完待續)


三、量化

(未完待續)


四、性能分析

(未完待續)


五、小技巧

5.1 如何動態輸入、輸出

如果在我的網絡內部,先是定義了一個resize操作,讓輸入插值到固定尺寸,然後再編碼/解碼、得到結果,最後我還想讓這個輸出跟我的輸入一致,即如下操作:

input             -> resized_input             -> inference -> output                  -> resized_output
{1, 3, in_w, in_h}   {1, 3, in_w_new, in_h_new}                {1, 3, out_w, img_out_h}   {1, 3, in_w, in_h}

TRT中如果是動態輸入,TRT在運行時(runtime)就有個shape tensor的概念,區別於execution tensor,在網絡建立的時候,就已經規定了輸入、輸出大小了,在7. Working With Dynamic Shapes中有詳細說明。shape tensor是一個一維的tensor,記錄着輸入tensor的大小,對應的操作層爲IShapeLayer,詳見文檔,所以我們就可以按下列操作獲取輸入的shape了。然後使用IResizeLayer就可以讓輸出與輸入大小一致了。

input_tensor  = network.add_input("input", trt.float32,(13, -1, -1))  # 輸入順序爲BCWH, 這裏的W和H設爲-1, 即寬高是動態的, 需要在runtime纔可以確定
input_shape   = network.add_shape(input=input_tensor)
print(input_shape.get_output(0).shape)  # 輸出爲(4,), 即一維tensor, input_shape 裏面的內容爲 input_tensor 的shape

output_tensor = network.add_resize(input=last_layer.get_output(0))  # 得到最後一層 last_layer 的輸出
output_tensor.resize_mode   = trt.ResizeMode.LINEAR
output_tensor.align_corners = True
output_tensor.set_input(1, input_shape.get_output(0))

5.2 如何構建BN layer

TRT API中是沒有Batch Normalization layer的,需要手動搭建。BN按公式需要矩陣操作,TRT API使用IScaleLayer層爲BN提供矩陣乘法,因此使用IScaleLayer就可以搭建BN layer了。詳見文章TensorRT實戰(一) 如何搭建Batch Normalization層

5.3 如何設定Pool layer的ceil mode

TRT API中tensorrt.IPoolingLayer處乍一看是沒有ceil_mode字樣的。實際上,ceil_mode=True的作用是when True, will use ceil instead of floor to compute the output shape,就是當採樣框長度不能整除輸入Tensor大小的時候,在Tensor的右下添加一圈-NaN,如下圖所示,就能看懂ceil_mode的作用了。

在這裏插入圖片描述

那麼,TRT API所對應的,就是tensorrt.PaddingMode.EXPLICIT_ROUND_UP,官方文檔torch2trt以及TensorRT#84也能提供印證。


六、錯誤合集

6.1 mEngine.getHasImplicitBatchDim()

6.1.1 報錯信息

[TensorRT] ERROR: Parameter check failed at: engine.cpp::executeV2::701, condition: !mEngine.getHasImplicitBatchDim()

其中我使用的推理函數是execute_v2,所以報錯爲executeV2相關。該報錯的原因是因爲,在TRT的engine的建立過程中,使用了隱式的batch大小,即未規定builder.create_networkbatch到底是隱式的、還是顯式的。

6.1.2 解決辦法

查看TRT的文檔可以發現,create_network的原型爲create_network(self: tensorrt.tensorrt.Builder, flags: int = 0) → tensorrt.tensorrt.INetworkDefinition,所以按下面代碼就可以規定顯式batch了。

flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder.create_network(flag)

6.2 mEngine.bindingIsInput(bindingIndex)

6.2.1 報錯信息

[TensorRT] ERROR: Parameter check failed at: engine.cpp::setBindingDimensions::893, condition: mEngine.bindingIsInput(bindingIndex)

網絡使用了動態輸入,在推理過程中,context未設置binding shape

6.2.2 解決方法

查看文檔,其原型爲set_binding_shape(self: tensorrt.tensorrt.IExecutionContext, binding: int, shape: tensorrt.tensorrt.Dims) → bool,因此按下面代碼規定就好了。

context.set_binding_shape(binding_index, (Batch, Channel, Width, Height))
# binding_index 動態輸入的index,規定這是網絡的第幾個輸入
# shape()       輸入的shape,這裏我使用的是BCWH
發佈了28 篇原創文章 · 獲贊 27 · 訪問量 3萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章