四、ONNX Runtime中的構建工具CMake使用指南和ABI_Dev_Notes

翻譯來源
通常,有多種方法可以完成同一件事。這就是爲什麼我們有此指南。這與哪個正確/錯誤無關。這是爲了使項目代碼朝着同一方向發展。
構建一套軟件通常有很多方法,這裏是ONNX Runtime團隊建議的構建規範。
首先cmake的版本:
cmake_minimum_required(VERSION 3.13)

將影響最小化 Scope the impact to minimal

如果您想更改某些設置,請嘗試將影響範圍縮小到本地。
If you want to change some setting, please try to scope down the impact to be local.

  • 使用 target_include_directories 而不是 include_directories
  • 使用 target_compile_definitions 而不是 add_definitions
  • 使用 target_compile_options 而不是 add_compile_options
  • 不要使用能改變全局標誌位的變量如 CMAKE_CXX_FLAGS

例如,將宏定義添加到一個VC項目中, 應該使用 target_compile_definitions, 而不是add_definitions.

靜態庫順序很重要 Static library order matters

首先,應該知道,將靜態庫鏈接到可執行文件(或共享庫)目標時,順序很重要。
比方說,如果A和B是靜態庫,C是可執行程序。

  • A depends B.
  • C depends A and B.

然後我們應該寫成:

target_link_libraries(C PRIVATE A B)

而不是

target_link_libraries(C PRIVATE B A)  #Wrong!

在windows平臺,如果一個符號在多個庫中定義,則靜態庫的順序確實很重要。
On Windows, the order of static libraries does matter if a symbol is defined in more than one library.
在linux平臺,只會在一個靜態庫引用另一個靜態庫時需要考慮順序。
On Linux, it matters when one static library references another.
因此,一般而言,請始終按正確的順序排列依賴庫(根據它們的依賴關係)。
So, in general, please always put them in right order (according to their dependency relationship).

不要使用 target_link_libraries去鏈接靜態庫(Don’t call target_link_libraries on static libraries)

如果萬不得已,千萬不要使用 target_link_libraries去鏈接靜態庫。

前所述,library的順序很重要。如果你在一行中顯式列出所有庫,並且如果某些庫位置錯誤,則很容易定位到錯誤的地方。
然而,如果任何通過target_link_libraries鏈接的靜態庫,

  • 首先,您應該知道,靜態庫沒有鏈接步驟
  • 其次,一旦遇到順序問題,將很難修復。因爲許多依賴是隱性的,所以它們的位置將不受我們的控制。

You could do it, but please don’t.
As we said before, library order matters. If you explicitly list all the libs in one line, and if some libs were in wrong position, it’s easy to fix.

However, if any static lib was built with target_link_libraries,

  • First you should know ,there is no link step for a static lib
  • Second, once you hit the ordering problem, it would be harder to fix. Because many of the deps were implicit, and their position would be out of our control.

任何一個Linux程序(或者共享庫)都應鏈接到libpthread和libatomic Every linux program(and shared lib) should link to libpthread and libatomic

在Linux世界中,有兩組pthread符號。 在標準c庫中是假的,在pthread.so中才是真的。 如果在進程啓動時未加載真正的線程庫,則該進程因爲缺少核心部分而不會使用多線程。
因此,我們將“ Threads :: Threads”附加到每個共享lib(。so,。dll)和可執行程序的lib列表中。雖然這種操作很容易被忘記,但是如果真忘記添加了,編譯的時候會報undefined錯誤。
另一個相關的事情是:如果使用了std :: atomic,也請在其中添加atomic lib。因爲std :: atomic的某些用法需要鏈接到libatomic。see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
注意:然而,在極少數情況下,即使您告訴鏈接程序用pthread鏈接程序,有時它也不會聽您的話。它可能會忽略你的鏈接順序,然後報一堆錯誤。see https://github.com/protocolbuffers/protobuf/issues/5923.

In Linux world, there are two set of pthread symbols. A fake one in the standard c library, and a real one in pthread.so. If the real one is not loaded while the process was starting up, then the process shouldn’t use multiple threading because the core part was missing.

So, We append “Threads::Threads” to the lib list of every shared lib(.so,.dll) and exe target. It’s easy to get missed. If it happened, the behavior is undefined.

Another related thing is: if std::atomic was in use, please also add the atomic lib there. Because some uses of std::atomic require linking to libatomic. see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html

NOTE: However, in rare cases, even you told linker to link the program with pthread, sometimes it doesn’t listen to you. It may ignore your order, cause issues. see https://github.com/protocolbuffers/protobuf/issues/5923.

不要直接使用"-pthread"標誌位 Don’t use the “-pthread” flag directly.

Because:

  1. CUDA的編譯器nvcc不支持"-pthread"標誌位 It doesn’t work with nvcc(the CUDA compiler)
  2. 不便攜 Not portable.

不要費心將此"-pthread"標誌位添加到編譯時標誌中。在Linux上,它沒有用。在某些非常舊的類Unix系統上,它可能會有所幫助,但我們目前僅支持Ubuntu 16.04。

Don’t bother to add this flag to your compile time flags. On Linux, it’s useless. On some very old unix-like system, it may be helpful, but we only support Ubuntu 16.04.

Use “Threads::Threads” for linking. Use nothing for compiling.

CUDA項目應使用新的CUDA cmake方法 CUDA projects should use the new cmake CUDA approach

There are two ways of enabling CUDA in cmake.

  1. (new): enable_language(CUDA)
  2. (old): find_package(CUDA)

Use the first one, because the second one is deprecated. Don’t use “find_package(CUDA)”. It also means, don’t use the vars like:

  • CUDA_NVCC_FLAGS
  • CUDA_INCLUDE_DIRS
  • CUDA_LIBRARIES

So, be careful on this when you copy code from another project to ours, the changes may not work.

ABI_Dev_Notes

Global Variables

windows系統中全局變量可能在“ DllMain”內部構造或破壞。在DLL入口點中可以安全執行的操作有很多限制。例如,你不能將ONNX Runtime InferenceSession放入全局變量。
Global variables may get constructed or destructed inside “DllMain”. There are significant limits on what you can safely do in a DLL entry point. See ‘DLL General Best Practices’. For example, you can’t put a ONNX Runtime InferenceSession into a global variable.

線程局部變量Thread Local variables

線程局部變量必須是局部函數,在Windows上將被初始化爲首次使用。 否則,它可能無法工作。
此外,如果變量具有非平凡的析構函數,則必須在卸載onnxruntime.dll之前銷燬這些線程局部變量。 這意味着,只有onnxruntime內部線程可以訪問這些變量。 就是說,該線程必須由onnxruntime創建並由onnxruntime銷燬。
Thread Local variables must be function local, that on Windows they will be initialized as the first time of use. Otherwise, it may not work.
Also, you must destroy these thread Local variables before onnxruntime.dll is unloaded, if the variable has a non-trivial destructor. That means, only onnxruntime internal threads can access these variables. It is, the thread must be created by onnxruntime and destroyed by onnxruntime.

不能存在undefined symbols

在Windows上,您無法使用undefined symbols構建DLL。 每個symbol必須在鏈接時解析。 在Linux上,您可以。
爲了簡化操作,我們要求每個符號都必須在鏈接時解析。 相同的規則適用於所有平臺。 對於我們來說,這更容易控制符號的可見性。
On Windows, you can’t build a DLL with undefined symbols. Every symbol must be get resolved at link time. On Linux, you can.
In order to simplify things, we require every symbol must get resolved at link time. The same rule applies for all the platforms. And this is easier for us to control symbol visibility.

默認可見性以及如何導出 symbol Default visibility and how to export a symbol

在Linux上,默認情況下,鏈接器認爲每個符號都是全局的。 它易於使用,但也容易引起衝突和core dumps。 我們在ONNX python綁定中遇到了太多此類問題。 確實,如果您有一個不錯的設計,則對於每個共享庫,只需要導出一個功能即可。 ONNX Runtime python接口綁定就是一個很好的例子。See pybind11 FAQ for more info.
爲了控制可見性,我們在Linux上使用linkder version scripts,在Windows上使用def文件。它們工作原理類似,實現了下面的:

  1. 只導出C 函數接口.
  2. 所有函數名稱必須在文本文件中明確列出
  3. 不要導出任何C ++類/結構或全局變量

另外,在Linux和Mac操作系統上,所有代碼都必須使用“ -fPIC”進行編譯。 在Windows上,我們不使用dllexport,但仍然需要dllimport。
因此,我們的DLLEXPORT宏類似於:

#ifdef _WIN32
// Define ORT_DLL_IMPORT if your program is dynamically linked to Ort.
#ifdef ORT_DLL_IMPORT
#define ORT_EXPORT __declspec(dllimport)
#else
#define ORT_EXPORT
#endif
#else
#define ORT_EXPORT
#endif

On Linux, by default, at linker’s view, every symbol is global. It’s easy to use but it’s also much easier to cause conflicts and core dumps. We have encountered too many such problems in ONNX python binding. Indeed, if you have a well design, for each shared lib, you only need to export one function. ONNX Runtime python binding is a good example. See pybind11 FAQ for more info.

For controling the visibility, we use linkder version scripts on Linux and def files on Windows. They work similar. That:

  1. Only C functions can be exported.
  2. All the function names must be explicitly listed in a text file.
  3. Don’t export any C++ class/struct, or global variable.

Also, on Linux and Mac operating systems, all the code must be compiled with “-fPIC”.
On Windows, we don’t use dllexport but we still need dllimport.

Therefore, our DLLEXPORT macro is like:

#ifdef _WIN32
// Define ORT_DLL_IMPORT if your program is dynamically linked to Ort.
#ifdef ORT_DLL_IMPORT
#define ORT_EXPORT __declspec(dllimport)
#else
#define ORT_EXPORT
#endif
#else
#define ORT_EXPORT
#endif

RTLD_LOCAL vs RTLD_GLOBAL

RTLD_LOCAL 和 RTLD_GLOBAL是POSIX系統裏的 dlopen(3)函數的2個標誌位。默認情況下爲RTLD_LOCAL。基本上可以說,在Windows上沒有類似RTLD_GLOBAL之類的東西。
在一種情況下,您需要在POSIX系統上使用RTLD_GLOBAL

  1. 有一個共享庫,它由某些應用程序動態加載(如python或dotnet)
  2. 共享庫靜態鏈接到ONNX運行時
  3. 共享庫需要動態加載custom op
    然後,應該使用RTLD_GLOBAL而不是RTLD_LOCAL加載共享庫。否則,在custom op 庫中,它將找不到ONNX運行時符號。

RTLD_LOCAL and RTLD_GLOBAL are two flags of dlopen(3) function on POSIX systems. By default, it’s RTLD_LOCAL. And basically you can say, there no corresponding things like RTLD_GLOBAL on Windows.

There is one case you need to use RTLD_GLOBAL on POSIX systems:
5. There is a shared lib which is dynamically loaded by some application(like python or dotnet)
6. The shared lib is statically linked to ONNX Runtime
7. The shared lib needs to dynamically load a custom op

Then the shared lib should be loaded with RTLD_GLOBAL, not RTLD_LOCAL. Otherwise in the custom op library, it can not find ONNX Runtime symbols.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章