ONNX Runtime 源碼閱讀：模型推理過程概覽

簡介

ONNX Runtime是一個用於ONNX(Open Neural Network Exchange)模型推理的引擎。微軟聯合Facebook等在2017年搞了個深度學習以及機器學習模型的格式標準–ONNX，順路提供了一個專門用於ONNX模型推理的引擎，onnxruntime。目前ONNX Runtime 還只能跑在HOST端，不過官網也表示，對於移動端的適配工作也在進行中。
一半處於工作需要一半出於興趣，決定閱讀一下onnxruntime的源碼。這裏算個學習記錄吧。

安裝

ONNX Runtime 的GitHub倉庫地址爲 https://github.com/microsoft/onnxruntime 。編譯安裝過程可以參照GitHub上的說明，這裏爲了方便，直接選擇了PyPi的安裝源。執行

pip install onnxruntime

即完成了安裝。需要注意的是隻支持Python3。

開始

涉及文件

onnxruntime\onnxruntime\python\session.py
onnxruntime\onnxruntime\core\framework\utils.cc
onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc
onnxruntime\onnxruntime\core\session\inference_session.cc
onnxruntime\onnxruntime\core\session\inference_session.h

代碼入口

代碼閱讀需要先找到一個入口。通過onnxruntime的例子我們知道，在Python使用使用onnxruntime很簡單，主要代碼就三行：

import onnxruntime
sess = onnxruntime.InferenceSession('YouModelPath.onnx')
output = sess.run([output_nodes], {input_nodes: x})

第一行導入onnxruntime模塊；第二行創建一個InferenceSession的實例並傳給它一個模型地址；第三行調用run方法進行模型推理。因此onnxruntime模塊中的InferenceSession就是我們的切入點。

實例生成

ONNX Runtime的代碼組織非常良好，我們很容易找到InferenceSession所在文件session.py，整個文件非常簡單，就只定義了一個InferenceSession類。通過閱讀InferenceSession的__init__函數，

    def __init__(self, path_or_bytes, sess_options=None, providers=[]):
        """
        :param path_or_bytes: filename or serialized model in a byte string
        :param sess_options: session options
        :param providers: providers to use for session. If empty, will use
            all available providers.
        """
        self._path_or_bytes = path_or_bytes
        self._sess_options = sess_options
        self._load_model(providers)
        self._enable_fallback = True

    def _load_model(self, providers=[]):
        if isinstance(self._path_or_bytes, str): 
            self._sess = C.InferenceSession(
                self._sess_options if self._sess_options else C.get_default_session_options(), 
                self._path_or_bytes, True)
        elif isinstance(self._path_or_bytes, bytes):
            self._sess = C.InferenceSession(
                self._sess_options if self._sess_options else C.get_default_session_options(), 
                self._path_or_bytes, False)
        # elif isinstance(self._path_or_bytes, tuple):
            # to remove, hidden trick
        #   self._sess.load_model_no_init(self._path_or_bytes[0], providers)
        else:
            raise TypeError("Unable to load from type '{0}'".format(type(self._path_or_bytes)))
        # 注意看下面這句話，後面我們還會回來詳細講
        self._sess.load_model(providers)

我們發現其實這裏InferenceSession只不過是一個殼，所有工作都委託給了C.InferenceSession，C從導入語句from onnxruntime.capi import _pybind_state as C可知其實就是一個C語言實現的Python接口，其源碼在onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc中。onnxruntime_pybind_state.cc是將C++代碼暴露給Python的一個接口，就像是一個門，代碼經過這裏，就從Python進入了C++的世界。
門在這了，開門的鑰匙在哪兒？
我們進盯着Python中

 self._sess = C.InferenceSession(
                self._sess_options if self._sess_options else C.get_default_session_options(), 
                self._path_or_bytes, True)

這句話，它是全村的希望。通過這句話，我們知道，在onnxruntime_pybind_state.cc應該會定義有一個類，名叫InferenceSession，一頓操作猛如虎，定位到InferenceSession定義的地方：

py::class_<InferenceSession>(m, "InferenceSession", R"pbdoc(This is the main class used to run a model.)pbdoc")
      // In Python3, a Python bytes object will be passed to C++ functions that accept std::string or char*
      // without any conversion. So this init method can be used for model file path (string)
      // and model content (bytes)
      .def(py::init([](const SessionOptions& so, const std::string& arg, bool is_arg_file_name) {
        // Given arg is the file path. Invoke the corresponding ctor().
        if (is_arg_file_name) {
          return onnxruntime::make_unique<InferenceSession>(so, arg, SessionObjectInitializer::Get());
        }

        // Given arg is the model content as bytes. Invoke the corresponding ctor().
        std::istringstream buffer(arg);
        return onnxruntime::make_unique<InferenceSession>(so, buffer, SessionObjectInitializer::Get());
      }))

歡迎來到C++。def(py::init([](const SessionOptions& so, const std::string& arg, bool is_arg_file_name)實現了類似Python中__init__的功能，其根據傳入的模型參數類型（模型的地址還是模型的數據流），調用C++中的類InferenceSession的相應構造函數構造一個的實例，然後將這個實例的指針返回給Python。由於我們例子中傳入的是模型的地址字符串，因此我們需要找到的是簽名類型爲：

InferenceSession(const SessionOptions& session_options,
                   const std::string& model_uri,
                   logging::LoggingManager* logging_manager = nullptr);

的構造函數。
這裏有個奇怪的現象：

  if (is_arg_file_name) {
          return onnxruntime::make_unique<InferenceSession>(so, arg, SessionObjectInitializer::Get());
        }

中第三個參數我們通過查看SessionObjectInitializer::Get()獲取到的是類SessionObjectInitializer的一個實例，但是InferenceSession對應的構造函數對應爲所需要的是一個logging::LoggingManager的指針，對不上，咋整？我們知道C++可不像Python，C++是強類型的語言，不將就。這裏作者用了個小技巧，他爲SessionObjectInitializer定義了兩個類型轉換函數，讓編譯器幫他轉到所需要的類型，這裏編譯器會將SessionObjectInitializer轉換成logging::LoggingManager*。
來看看

InferenceSession(const SessionOptions& session_options,
                   const std::string& model_uri,
                   logging::LoggingManager* logging_manager = nullptr);

的實現：

InferenceSession::InferenceSession(const SessionOptions& session_options,
                                   const std::string& model_uri,
                                   logging::LoggingManager* logging_manager)
    : insert_cast_transformer_("CastFloat16Transformer") {
  model_location_ = ToWideString(model_uri);
  model_proto_ = onnxruntime::make_unique<ONNX_NAMESPACE::ModelProto>();
  auto status = Model::Load(model_location_, *model_proto_);
  ORT_ENFORCE(status.IsOK(), "Given model could not be parsed while creating inference session. Error message: ",
              status.ErrorMessage());

  // Finalize session options and initialize assets of this session instance
  ConstructorCommon(session_options, logging_manager);
}

這裏主要就做了三件事：

將模型地址保存在類成員變量model_location_中；
將模型二進制內容保存在類成員變量model_proto_;
調用ConstructorCommon完成剩餘的工作。
ConstructorCommon中做些環境檢查，準備log輸出等工作。其中最主要的是，是創建了一個SessionState實例session_state_，這是類成員變量，其中打包了爲運行這個模型所需要的線程池、模型結構、provider等信息。至於什麼是Provider，其實就是模型所跑的硬件，比如是CPU還是GPU，到了這裏其實session_state_裏面很多信息還沒完備，例如模型結構並未保存，Provider還只是個殼，裏面並沒有保存任何硬件信息，還需要一個初始化階段。至此，InferenceSession實例創建完畢。

初始化

又回到最初的起點，Python代碼開始的地方，最後一句self._sess.load_model(providers)，其實現如下：

.def(
          "load_model", [](InferenceSession* sess, std::vector<std::string>& provider_types) {
            OrtPybindThrowIfError(sess->Load());
            InitializeSession(sess, provider_types);
          },
          R"pbdoc(Load a model saved in ONNX format.)pbdoc")

load_model主要做了一下事情：

將模型二進制內容解析；
選擇模型運行方式，並行還是串行；
選擇模型Provider，如果用戶沒有指定Provider，就把目前運行環境中支持的硬件都註冊，比如GPU，CPU等，並且保證CPU一定可用；
確定模型中各個節點的運行先後順序。
這裏先不細說了，只需要知道它是按照ONNX標準將二進制數據解析成一個圖並將它存儲在session_stat_中就可以了。以後再詳細說。經過這一步之後，session_state_已經完備，到達神裝，可以隨時開戰。

運行

經過初始化之後，一切就緒。我們直接看C++中InferenceSession的run方法好了，因爲通過前面知道，在Python中的操作最終都會調用到C++的代碼來執行實際的內容。雖然InferenceSession重載了很多run方法，但是最終都會輾轉調用到簽名爲

Status InferenceSession::Run(const RunOptions& run_options, const std::vector<std::string>& feed_names,
                             const std::vector<OrtValue>& feeds, const std::vector<std::string>& output_names,
                             std::vector<OrtValue>* p_fetches)

的這個。在這裏，run方法對輸入數據做了些檢查等工作後，變將數據、模型信息，provider信息等，傳遞給了utils::ExecuteGraph:

utils::ExecuteGraph(*session_state_, feeds_fetches_manager, feeds, *p_fetches,
                            session_options_.execution_mode,
                            run_options.terminate, run_logger))

而utils::ExecuteGraph反手又將工作委託給了utils::ExecuteGraphImpl，而utils::ExecuteGraphImpl將會根據前面初始化中確定的各個node的執行先後順序，找到node類似對對應的kernel，調用他們Compute()方法進行計算。

總結

一個大概流程就是通過使用pybind11將C++接口暴露給Python，Python經過簡單封裝後提供給用戶直接使用。上面有幾個關鍵點值得深入研究：

模型節點執行順序的確定；
模型節點Provider的選擇；
模型解析過程；
模型推理詳細過程；
模型如何高效推理。
最後，一圖勝千言：

本文首發於個人公衆號TensorBoy。如果你覺得內容還不錯，歡迎分享並關注我的公衆號TensorBoy，掃描下方二維碼獲取更多精彩原創內容！

ONNX Runtime 源碼閱讀：模型推理過程概覽

簡介

安裝

開始

涉及文件

代碼入口

實例生成

初始化

運行

總結

移位操作搞定兩數之商

如何基於surging跨網關跨語言進行緩存降級

2024合集

程序員天天 CURD，怎麼才能成長，職業發展的思考(2)

教你用Perl實現Smgp協議

如何通過前端表格控件在10分鐘內完成一張分組報表？

win11關閉自動檢測病毒刪文件

通用代碼生成器簡介

lightdb 單機模式下數據庫平移

千兆寬帶實際網速能到達多少？

ONNX Runtime 源碼閱讀：模型推理過程概覽

FlatBuffer內部解析原理簡介

一文教你如何用Python讀取圖片GPS定位

LLVM，一堆積木的故事

Ubuntu 登錄界面無限循環無法登陸解決方案

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結