tensorflow：自定義op簡單介紹

本文轉自：https://blog.csdn.net/u012436149/article/details/73737299

tensorflow 自定義 op

本文只是簡單的翻譯了 https://www.tensorflow.org/extend/adding_an_op 的簡單部分，高級部分請移步官網。

可能需要新定義 c++ operation 的幾種情況：

現有的 operation 組合不出來你想要的 op。
現有的 operation 組合出來的 operation 十分低效。
如果你想要手動融合一些操作。

爲了實現你的自定義操作，你需要做一下幾件事：

在 c++ 文件中註冊一個新op： Op registration 定義了 op 的功能接口，它和 op 的實現是獨立的。例如：op registration 定義了 op 的名字和 op的輸出輸出。它同時也定義了 shape 方法，被用於 tensor 的 shape 接口。
在 c++ 中實現 op：op 的實現稱之爲 kernel ，它是op 的一個具體實現。對於不同的輸入輸出類型或者架構（CPUs，GPUs）可以有不同的 kernel 實現。
創建一個 python wrapper（可選的）：這個 wrapper 是一個公開的 API，用來在 python中創建 op。 op registration 會生成一個默認的 wrapper，我們可以直接使用或者自己添加一個。
寫一個計算 op 梯度的方法（可選）。
測試 op：爲了方便，我們通常在 python 中測試 op，但是你也可以在 c++ 中進行測試。如果你定義了 gradients，你可以通過 Python 的 gradient checker 驗證他們。這裏有個例子relu_op_test.py ，測試 ReLU-like 的 op 的前向和梯度過程。

Define the op’s interface

You define the interface of an op by registering it with the TensorFlow system.
在註冊 op 的時候，你需要指定：

op 的名字
op 的輸入（名字，類型），op 的輸出（名字，類型）
docstrings
op 可能需要的一些 attrs

爲了演示這個到底怎麼工作的，我們來看一個簡單的例子：

定義一個 op ：輸入是一個 int32 的 tensor ，輸出是輸入的拷貝，除了第一個元素保留，其它全都置零。
爲了創建這個 op 的接口，我們需要：創建一個文件，名字爲 zero_out.cc. 然後調用 REGISTER_OP 宏，使用這個宏來定義 op 的接口：

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"

using namespace tensorflow;

REGISTER_OP("ZeroOut")
      .Input("to_zero: int32")
      .Output("zeroed: int32")
      .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        return Status::OK();
      });

這個 ZeroOut op 接收一個 int 32 的 tensor 作爲輸入，輸出同樣也是一個 int32的 tensor。這個 op 也使用了一個 shape 方法來確保輸入和輸出的維度是一樣的。例如，如果輸入的tensor 的shape 是 [10, 20]，那麼，這個 shape 方法保證輸出的 shape 也是 [10, 20]。
注意： op 的名字必須遵循駝峯命名法，而且要保證 op 的名字的唯一性。

Implement the kernel for the op

當你定義了 op 的接口之後，你可以提供一個或多個關於op 的實現。
爲了實現這些 kernels：

創建一個類，繼承 OpKernel 類
重寫 OpKernel 類的 Compute 方法，Compute 方法提供了一個類型爲 OpKernelContext* 的context 參數，從這裏，我們可以訪問到一些有用的信息，比如輸入和輸出 tensor

將 kernel 代碼也放到之前創建的 zero_out.cc 文件中：

#include "tensorflow/core/framework/op_kernel.h"
using namespace tensorflow;

class ZeroOutOp : public OpKernel {
 public:
  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    // 獲取輸入 tensor
    const Tensor& input_tensor = context->input(0);
    auto input = input_tensor.flat<int32>();

    // 創建輸出 tensor, context->allocate_output 用來分配輸出內存？
    Tensor* output_tensor = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
                                                     &output_tensor));
    auto output_flat = output_tensor->flat<int32>();

    // 執行計算操作。
    const int N = input.size();
    for (int i = 1; i < N; i++) {
      output_flat(i) = 0;
    }

    // Preserve the first input value if possible.
    if (N > 0) output_flat(0) = input(0);
  }
};

在實現了 kernel 之後，就可以將這個註冊到 tensorflow 系統中去了。在註冊時，你需要對 op 的運行環境指定一些限制。例如，你可能有一個 kernel 代碼是給 CPU 用的，另一個是給 GPU用的。通過把下列代碼添加到 zero_out.cc 中來完成這個功能：

REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);

注意：你實現的 OpKernel 的實例可能會被並行訪問，所以，請確保 Compute方法是線程安全的。保證訪問類成員的方法都加上
mutex。或者更好的選擇是，不要通過類成員來分享狀態。考慮使用 ResourceMgr 來追蹤狀態。

Multi-threaded CPU kernels

多線程主要由 work shard 搞定。

GPU kernels

請移步官網

Build the op library

使用系統編譯器編譯定義的 op
我們可以使用系統上的 c++ 編譯器 g++ 或者 clang 來編譯 zero_out.cc 。二進制的 PIP 包已經將編譯所需的頭文件和庫安裝到了系統上。Tensorflow 的 python library 提供了一個用來獲取頭文件目錄的函數 get_include。下面是這個函數在 ubuntu 上的輸出：

$ python
>>> import tensorflow as tf
>>> tf.sysconfig.get_include()
'/usr/local/lib/python2.7/site-packages/tensorflow/include'

假設你已經安裝好了 g++ ，你可以使用下面一系列的命令將你的 op 編譯成一個動態庫。

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC -I $TF_INC -O2

如果你的 g++ 版本>5.0 的話，加上這個參數 -D_GLIBCXX_USE_CXX11_ABI=0

Use the op in Python

Tensorflow 的 python 接口提供了 tf.load_op_library 函數用來加載動態 library，同時將 op 註冊到tensorflow 框架上。load_op_library 返回一個 python module，它包含了 op和 kernel 的 python wrapper 。因此，一旦你編譯好了一個 op，就可以使用下列代碼通過 python來執行它：

import tensorflow as tf
zero_out_module = tf.load_op_library('./zero_out.so')
with tf.Session(''):
  zero_out_module.zero_out([[1, 2], [3, 4]]).eval()

# Prints
array([[1, 0], [0, 0]], dtype=int32)

記住：生成的函數的名字是 snake_case name。如果在c++文件中， op 的名字是ZeroOut，那麼在python 中，名字是 zero_out。

Verify that the op works

一個驗證你的自定義的op是否正確工作的一個好的方法是爲它寫一個測試文件。創建一個 zero_out_op_test.py 文件，內容爲：

import tensorflow as tf

class ZeroOutTest(tf.test.TestCase):
  def testZeroOut(self):
    zero_out_module = tf.load_op_library('./zero_out.so')
    with self.test_session():
      result = zero_out_module.zero_out([5, 4, 3, 2, 1])
      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])

if __name__ == "__main__":
  tf.test.main()

然後運行這個 test

代碼

//zero_out.cc 文件
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/op_kernel.h"
using namespace tensorflow;

REGISTER_OP("ZeroOut")
    .Input("to_zero: int32")
    .Output("zeroed: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
      c->set_output(0, c->input(0));
      return Status::OK();
    });

class ZeroOutOp : public OpKernel {
 public:
  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* context) override {
    // 將輸入 tensor 從 context 中取出。
    const Tensor& input_tensor = context->input(0);
    auto input = input_tensor.flat<int32>();

    // 創建一個 ouput_tensor, 使用 context->allocate_ouput() 給它分配空間。
    Tensor* output_tensor = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
                                                     &output_tensor));
    auto output_flat = output_tensor->flat<int32>();

    // Set all but the first element of the output tensor to 0.
    const int N = input.size();
    for (int i = 1; i < N; i++) {
      output_flat(i) = 0;
    }

    // Preserve the first input value if possible.
    if (N > 0) output_flat(0) = input(0);
  }
};
REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);

#創建動態鏈接庫的命令
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -I $TF_INC -O2

總結

tensorflow 自定義 op 的方法可以總結爲：

寫個 diy_op.cc 文件
用 g++ 把這個文件編譯成動態鏈接庫
在 python 中使用 tf.load_op_library 將庫導入。
就可以使用了。

還有一種方法是用 bazel 編譯。

參考資料

https://www.tensorflow.org/extend/adding_an_op

tensorflow：自定義op簡單介紹

tensorflow 自定義 op

Define the op’s interface

Implement the kernel for the op

Multi-threaded CPU kernels

GPU kernels

Build the op library

Use the op in Python

Verify that the op works

代碼

總結

參考資料

DAPPER 事務 TRANSACTION

在tensorflow中使用增量編譯添加OP（不必編譯tensorflow）

tensorflow：自定義op簡單介紹

ubuntu16.04 安裝python-pcl

ubuntu16.04使用prebuilt binaries 方法安裝點雲庫PCL

Ubuntu 16.04 上安裝搜狗輸入法教程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結