【46】大坑inline函數

原文鏈接
原文鏈接
這一篇將會聊聊C++中一個極具迷惑性的關鍵字 ———— inline。
雖然只是一個小小的關鍵字，但要是沒有真正瞭解它，也是很容易踩坑的。
本文暫時不討論 inline variable，主要討論 inline function。

一、什麼是 inline
或許你對它有那麼一點點熟悉，但是又說不清。它的中文翻譯爲“內聯”。它經常跟一個東西共同出現，稱爲“內聯函數(inline function)”。正是這樣的翻譯，對新手產生了太多的誤會。那麼什麼是 inline 呢？現在暫時不做解答。接下來我們會經常翻 cppreference，所以你可以先點開放到一旁，然後我們進入正題。

二、inline 之初印象
上網搜一下，C++ inline 的作用，你看到的都是什麼？都是說 inline 函數可以自動把函數展開呀，可以減小函數調用的開銷呀……還會好心提醒你，太大的函數不宜用 inline 呀！會導致代碼膨脹呀！只有那些短小精悍的、經常調用的函數使用 inline 才能看到非常明顯的效果呀…………我在 N 年前，也是這麼傻傻的，所以看到短函數就想加 inline，唯恐性能變差。於是，在印象裏，inline 就跟優化掛上鉤了。

三、inline 之初體驗
我們先從簡單的例子開始，新建一個 main.cpp 源文件：

#include
int max(int a, int b)
{
return a < b ? b : a;
}
int main()
{
int res = max(1, 2);
std::cout << res << “\n”;
}
我們編譯運行，好，通過了。看到這個函數手癢了是吧，好我們給它加上 inline，再次編譯運行，也通過了。到這裏還沒有問題，具體它的優化我們暫時不去分析。我們要養成一個工程習慣嘛！這是個簡單的例子，但實際中我們寫的代碼可能很多，可能有很多很多個類似於 max 這樣的函數，於是我們就想着把他們區分開，於是我們又創建了一個 a.h 頭文件，把 max 扔進去，我們還想着接口與實現分離，於是我們創建一個 a.cpp 源文件來實現函數的定義，文件結構變成了這樣：

// a.h
int max(int a, int b);
// a.cpp
#include “a.h”
int max(int a, int b)
{
return a < b ? b : a;
}
// main.cpp
#include
#include “a.h”
int main()
{
int res = max(1, 2);
std::cout << res << “\n”;
}
我們運行一遍，沒有問題，然後我們試着給 max 函數加上 inline，你知道，Ⅰ.inline 要跟函數的定義放在一塊，所以我們就要在 a.cpp 給它加上：

inline int max(int a, int b){…}
我們再次編譯運行，WTF！！怎麼報錯了！！不就加了個 inline 嗎？！！冷靜了一會，我們看看報錯信息，如果你用的是 VS，那麼你大概會看到這樣的提示：

1>main.obj : error LNK2019: 無法解析的外部符號 “int __cdecl max(int,int)” (?max@@YAHHH@Z)，該符號在函數 _main 中被引用
1>C:\Users\Alinshans\documents\visual studio 2017\Projects\test\Debug\test.exe : fatal error LNK1120: 1 個無法解析的外部命令
是不是感覺有點熟悉？似曾相識？如果你用的是 g++ 去編譯，那麼大概會得到這樣的提示：

main.cpp:(.text+0x13): undefined reference to `max(int, int)’
要是你對C/C++編譯的過程有一點點了解的話，我們繼續嘗試把編譯過程和鏈接過程分開來做：

$ g++ -c main.cpp
$ g++ main.o -o a.out
你就會發現，執行第一條命令（編譯）時，是沒有錯的，執行第二條命令（鏈接）時，就報錯了。

main.o:main.cpp:(.text+0x18): undefined reference to `max(int, int)’
collect2.exe: error: ld returned 1 exit status
這裏簡單說明一下，大多數的建置環境都是在編譯過程進行 inlining（爲了替換函數調用，編譯器需要知道函數的實體長什麼樣，這就解釋了 Ⅰ），某些可以在連接期完成，少數的可以在運行期完成。我們只考慮絕大部分情況：inlining 在大多數 C++ 程序中是編譯期行爲。
好了，我們講回來，爲什麼會出現這個鏈接錯誤呢？注意到剛剛打開的網頁這裏的第二、三條：

The definition of an inline function or variable (since C++17) must be present in the translation unit where it is accessed (not necessarily before the point of access).
An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:
It must be declared inline in every translation unit.
It has the same address in every translation unit.

這裏提到了 external linkage，若想詳細瞭解可以看這裏。嫌太長不看的你只需要知道我們定義的 max 函數，具有 external linkage，那麼它就要滿足：

在你需要引用它的編譯單元可見
在每個編譯單元都要聲明爲 inline
用人話講就是，inline function 的定義需要出現在每個編譯單元（.cpp/.cc/.cxx 等），也就是說你要把 main.cpp 寫成這樣：

// main.cpp
#include
#include “a.h”
inline int max(int a, int b)
{
return a < b ? b : a;
}
int main()
{
int res = max(1, 2);
std::cout << res << “\n”;
}
好了，這下就沒問題了，可以編譯通過了。但是，這就需要我們把每一個在源文件內定義的 inline function，複製到每一個需要引用它的源文件。這樣的事情我想沒有誰願意做。所以，一般情況下，Ⅱ. 在源文件中定義的函數，不要聲明爲 inline，除非你有特殊用途。

四、inline 之再體驗
我們剛剛已經嘗試了在源文件中給函數加上或不加上 inline 的區別，接下來就試一試在頭文件中給函數加上或不加上 inline 的區別吧！我們在 a.h 中新增一個函數：

// a.h
int max(int a, int b);
int min(int a, int b)
{
return a < b ? a : b;
}
好，a.cpp 文件依然沒有變：

// a.cpp
#include “a.h”
inline int max(int a, int b)
{
return a < b ? b : a;
}
在 main.cpp 中引用這個 min 函數：

// main.cpp
#include
#include “a.h”
inline int max(int a, int b)
{
return a < b ? b : a;
}
int main()
{
int res = max(1, 2);
int res2 = min(1, 2);
std::cout << res << " " << res2 << “\n”;
}
好我們編譯運行一下，WTF！！怎麼又報錯了！？？這次又是什麼鬼！？我們仔細看看，在 VS 下的提示是：

1>main.obj : error LNK2005: “int __cdecl min(int,int)” (?min@@YAHHH@Z) 已經在 a.obj 中定義
1>C:\Users\Alinshans\documents\visual studio 2017\Projects\test\Debug\test.exe : fatal error LNK1169: 找到一個或多個多重定義的符號
是不是感覺經常看到這類的錯誤摸不着頭腦？（注意，若使用 g++ 編譯運行時，沒有報錯，並且運行結果對了，這不是值得僥倖的，它實際上有問題）爲什麼會提示重定義呢？我們稍微思考一下就能想明白：在 a.h 中定義了 min 這個函數，而 a.h 同時被 a.cpp 和 main.cpp include 了，而 include 的作用其實就相當於複製黏貼一遍，所以在編譯 a.cpp 和 main.cpp 時，會產生兩個相同的符號，也就是我們所看到的錯誤提示。

五、什麼時候應該使用 inline
我們再聯繫一下之前在源文件中添加 inline 得出的結論，你想到了什麼？inline function 的定義要在每個編譯單元可見，對不對？所以我們嘗試一下把 min 函數聲明爲 inline：

// a.h
int max(int a, int b);
inline int min(int a, int b)
{
return a < b ? a : b;
}
再次編譯運行，沒有報錯！而且結果也是正確的！那爲什麼我們使用 inline 時，就不會有這個錯誤呢？

我們還是看到剛剛的鏈接，看到第一條：

There may be more than one definition of an inline function or variable (since C++17) in the program as long as each definition appears in a different translation unit and (for non-static inline functions and variables (since C++17)) all definitions are identical. For example, an inline function or an inline variable (since C++17) may be defined in a header file that is #include’d in multiple source files.

用人話說就是，inline function 是具有這樣的屬性滴，什麼屬性捏，就是可以出現在多個編譯單元，並且它們的定義都是相同滴。
所以我們在頭文件內定義的函數（函數模板另說），就必須要加上 inline 聲明，這樣做就是告訴編譯器，我們是 inline function，雖然定義出現了很多次，但是都是相同滴，老哥你別報錯！
所以，Ⅲ. 當函數定義出現在頭文件時，使用 inline 。

六、inline 與類成員函數、模板
以上我們討論了四種情況，接下來這個部分應該也是很讓新手糾結的。因爲一個類，可能會有很多 getter/setter 之類的短小的函數，於是就會去糾結要不要加 inline。那麼我們同樣分類來討論，根據類的成員函數定義的位置，有以下三種（假設類的聲明在 a.h，定義在 a.cpp）：

在頭文件中，在類內定義
在頭文件中，在類外定義
在源文件中定義
我們一個一個來談談。首先是在類內定義的，需不需要加 inline 呢？還是看到我們打開的頁面，最上面這裏：

A function defined entirely inside a class/struct/union definition, whether it’s a member function or a non-member friend function, is implicitly an inline function.

在類內定義的函數，是隱式 inline 的，所以不需要你加 inline，而且，LLVM CodingStandards 也是這樣提出的：

Don’t use inline when defining a function in a class definition

所以，在類內定義的函數，是不需要加 inline 滴，當然加了也不會錯啦~

我們再講講在源文件中定義的，要是你認真看了之前的內容，你就應該想到，不要把 inline function 的定義放在源文件中。所以如果你的類成員函數定義在了源文件中，也是不可以加 inline。
還剩下一個，其實我想不到什麼理由，可以讓成員函數既不在類內定義，也不在源文件中定義，偏偏要在頭文件中並且在類外定義（模板類成員函數/類模板成員函數除外）。如果成員函數比較短小，那麼你就可以直接定義在類內，否則就定義在源文件中。所以剩下這種情況的寫法，我是不推薦的，如果非得要定義在頭文件且在類外，那就必須要聲明爲 inline，否則也會有重定義的錯誤。
所以，簡單起見，Ⅳ. 類成員函數都不要加 inline。

然後再講模板，包括了函數模板、類模板成員函數和模板類成員函數。有點暈是吧，反正就是跟模板扯上關係了的，這些函數都自帶 inline 語義。也就是說，你把剛剛的那個 a.h 文件裏的 min 函數改成模板：

template
T min(T a, T b)
{
return a < b ? a : b;
}
加不加 inline，都是可以正常運行的。所以，Ⅴ. 模板相關函數不需要聲明爲 inline，也具有 inline 的語義。

七、inline 與優化
剛剛說了那麼多 inline 的用法，好像跟你瞭解的優化沒扯上什麼關係啊！那麼現在就到了要摧毀印象的時候了。我們就先用這段原本的代碼來測試：

// main.cpp
#include
#include
inline int max(int a, int b)
{
return a < b ? b : a;
}
int main()
{
int res = max(1, 2);
std::printf("%d\n", res);
}
現在，max 函數是聲明爲 inline 的，我們可以看反彙編代碼，來看看 max 是否有調用。如果使用 g++，可以分別運行以下三條命令：

$ g++ -E main.cpp -o main.i
$ g++ -S main.i -o main.s
$ g++ -O2 -S main.i -o main2.s
然後 main.s 和 main2.s 就是分別未使用優化和使用了 O2 優化後的反彙編代碼。
在 VS 下看反彙編就非常簡單了，隨便設置一個斷點，然後點調試->開始調試，等調試開始後，點調試->窗口->反彙編，就可以看到反彙編代碼了。因爲 VS 的反彙編的代碼比較清晰好看，所以就以 VS 中的反彙編爲例。
我們先在 Debug 模式下，查看反彙編代碼（主要部分）：

int res = max(1, 2);
002218AE push 2
002218B0 push 1
002218B2 call max (02212BCh)
002218B7 add esp,8
002218BA mov dword ptr [res],eax
std::printf("%d\n", res);
002218BD mov eax,dword ptr [res]
002218C0 push eax
002218C1 push offset string “%d\n” (0227B30h)
002218C6 call _printf (022132Fh)
002218CB add esp,8
我們可以看到，是有調用 max 函數的。我們再切換到 Release 模式，查看反彙編代碼：

int res = max(1, 2);
std::printf("%d\n", res);
00131040 push 2
00131042 push offset string “%d\n” (01320F8h)
int res = max(1, 2);
std::printf("%d\n", res);
00131047 call printf (0131010h)
0013104C add esp,8
是的，max 函數的調用已經不見了，不過你認爲這是拜你加上的 inline 所賜的嗎？我們去掉 inline ，再重複一遍剛剛的過程，你會發現，結果是一模一樣的。
你沒有死心，說，這個函數太簡單了，我是編譯器我也能看得出來怎麼優化，要是函數複雜一點，比如有循環、遞歸什麼的，編譯器就不會自動優化了！
那好吧，我們把 main.cpp 改成這樣：

#include
#include
int test(int i)
{
int x = 0;
for (int j = 0; j < i; ++j)
{
x += j;
}
return x;
}
int main()
{
std::printf("%d\n", test(100));
}
在 Debug 下反彙編：

std::printf("%d\n", test(100));
010118AE push 64h
010118B0 call test (0101136Bh)
010118B5 add esp,4
010118B8 push eax
010118B9 push offset string “%d\n” (01017B30h)
010118BE call _printf (0101132Fh)
010118C3 add esp,8
在 Release 下反彙編：

std::printf("%d\n", test(100));
00F31042 xor ecx,ecx
00F31044 xor eax,eax
std::printf("%d\n", test(100));
00F31046 xor edx,edx
00F31048 xor esi,esi
00F3104A xor edi,edi
00F3104C nop dword ptr [eax]
00F31050 inc edi
00F31051 add esi,2
00F31054 add edx,3
00F31057 add ecx,eax
00F31059 add edi,eax
00F3105B add esi,eax
00F3105D add edx,eax
00F3105F add eax,4
00F31062 cmp eax,64h
00F31065 jl main+10h (0F31050h)
00F31067 lea eax,[edx+esi]
00F3106A add eax,edi
00F3106C add ecx,eax
00F3106E push ecx
00F3106F push offset string “%d\n” (0F320F8h)
00F31074 call printf (0F31010h)
00F31079 add esp,8
喔，不要看它這麼長，其實它是直接算出結果的了，所以已經沒有 test 的調用了。這次看用 g++ 生成的反彙編會更清晰一些：
不開優化：

main:
.LFB1022:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $100, %edi
call _Z4testi
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
開 O2 優化：

main:
.LFB1022:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $4950, %esi
movl $.LC1, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
看到了吧，這一次的 test 函數，我沒有加 inline，在開啓編譯器優化的情況下，它還是可以自動去優化的。
你還會說，那啥，那啥……你還想說什麼，自己去驗證吧。我可以做最後一個實驗。
現在把 main.cpp 改成這樣：

// main.cpp
#include
#include
#include
inline int test(int i)
{
int prime[100];
int k = 0;
for (int n = 2; n <= i; ++n)
{
bool is_prime = true;
for (int j = 2; j <= static_cast(std::sqrt(n)); ++j)
{
if (n % j == 0)
{
is_prime = false;
break;
}
}
if (is_prime)
{
prime[k] = n;
++k;
}
}
int sum = 0;
for (int n = 0; n < k; ++n)
{
sum += prime[n];
}
return sum;
}
int main()
{
std::printf("%d\n", test(100));
}
嗯…是有點兒長，我可是把 test 函數聲明爲 inline 的！然後在 Debug 下反彙編：

std::printf("%d\n", test(100));
00131ABE push 64h
00131AC0 call test (013102Dh)
00131AC5 add esp,4
00131AC8 push eax
00131AC9 push offset string “%d\n” (0137B30h)
00131ACE call _printf (0131339h)
00131AD3 add esp,8
在 Release 下反彙編：

std::printf("%d\n", test(100));
00FC1170 call test (0FC1040h)
00FC1175 push eax
00FC1176 push offset string “%d\n” (0FC20F8h)
00FC117B call printf (0FC1010h)
00FC1180 add esp,8
我已經聲明瞭 inline，可是無論有沒有開優化，它也不會去掉這個函數調用了。

八、inline 的真正意義
現在你該停下來思考思考了，什麼是 inline ？是 “內聯” 嗎？inline 的意義是什麼？發起一個 “內聯” 請求嗎？
但事實上，你會發現，有時候，你不用 inline，會報錯；有時候，你用了 inline，又會報錯。你期望使用 inline 可以優化程序效率，但貌似跟你加不加 inline 沒有什麼關係啊？
inline 的含義，似乎與 “優化”、“內聯”，已經漸行漸遠了。

好好的思考一會兒吧！

think.jpg

我們還是繼續看打開的 cppreference，注意到這裏有一段話：

The original intent of the inline keyword was to serve as an indicator to the optimizer that inline substitution of a function is preferred over function call, that is, instead of executing the function call CPU instruction to transfer control to the function body, a copy of the function body is executed without generating the call. This avoids overhead created by the function call (copying the arguments and retrieving the result) but it may result in a larger executable as the code for the function has to be repeated multiple times.
Since this meaning of the keyword inline is non-binding, compilers are free to use inline substitution for any function that’s not marked inline, and are free to generate function calls to any function marked inline. Those optimization choices do not change the rules regarding multiple definitions and shared statics listed above.

看不懂沒關係，其實它就是說：在很久很久以前，inline 作爲給編譯器優化的提示符，它的含義是非綁定的，編譯器可以自由的選擇、決定是否對一個函數進行展開。而如今，編譯器根本不需要這樣的提示，如果它認爲一個函數值得內聯展開，它會自動展開，否則，即使你聲明爲 inline，它也會拒絕。
可以看看這一篇 SO上的回答：

It is said that inline hints to the compiler that you think the function should be inlined. That may have been true in 1998, but a decade later the compiler needs no such hints. Not to mention humans are usually wrong when it comes to optimizing code, so most compilers flat out ignore the ‘hint’.

static - the variable/function name cannot be used in other compilation units. Linker needs to make sure it doesn’t accidentally use a statically defined variable/function from another compilation unit.
extern - use this variable/function name in this compilation unit but don’t complain if it isn’t defined. The linker will sort it out and make sure all the code that tried to use some extern symbol has its address.
inline - this function will be defined in multiple compilation units, don’t worry about it. The linker needs to make sure all compilation units use a single instance of the variable/function.
現在你應該差不多能夠理解了，現在的編譯器，並不需要你用 inline 去提醒，不要小看搞編譯器那幫人，除非你覺得自己的水平比他們的高，想着幫編譯器優化的，一般人往往是錯誤的。

我們看到打開的鏈接裏面還有一句框起來的話：

Because the meaning of the keyword inline for functions came to mean “multiple definitions are permitted” rather than “inlining is preferred”, that meaning was extended to variables.

翻譯過來就是：Ⅵ. inline 的含義更多的是“允許多重定義”而不是“優先選擇內聯”。

醒悟了嗎？inline 這個關鍵字，以及它的翻譯，就是一個坑，它真正的含義並不是要去內聯一個函數，而是表示 “老哥，別怕！無論你看到了多少個定義，但其實我們都是一樣的！” 所以編譯器看到一個 inline function，會允許它在不同的編譯單元出現多次，因爲它知道它們都是一樣的，具有共同的內存地址。

最後希望看完這篇文章的童鞋們，都可以深刻的理解 C++ 的 inline。

九、總結
inline 要跟函數的定義放在一塊
在源文件中定義的函數，不要聲明爲 inline
當函數在頭文件中定義且有可能被多個源文件包含時，使用 inline
類成員函數不需要聲明爲 inline
模板相關的函數不需要聲明爲 inline，也具有 inline 的語義
inline 的含義更多的是“允許多重定義”而不是“優先選擇內聯”
inline 跟 “優化” 沒有半毛錢的關係
※ 注：以上總結適用於不熟悉、不瞭解 inline 的同學。若對以上內容都清楚瞭解，使用 inline 的時候，清楚在做什麼，會發生什麼，那就隨便怎麼用啦！