譯自five popular myths about c++ --by Bjarne Stroustrup (4)



Myth 4: "For efficiency, you must write low-level code"
爲了效率,你必須編寫底層代碼


Many people seem to believe that efficient code must be low level. Some even seem to believe that low-level code is inherently efficient (“If it’s that ugly, it must be fast! Someone must have spent a lot of time and ingenuity to write that!”). You can, of course, write efficient code using low-level facilities only, and some code has to be low-level to deal directly with machine resources. However, do measure to see if your efforts were worthwhile; modern C++ compilers are very effective and modern machine architectures are very tricky. If needed, such low-level code is typically best hidden behind an interface designed to allow more convenient use. Often, hiding the low level code behind a higher-level interface also enables better optimizations (e.g., by insulating the low-level code from “insane” uses). Where efficiency matters, first try to achieve it by expressing the desired solution at a high level, don’t dash for bits and pointers.
許多人認爲底層的代碼一定是高效的。甚至有人認爲底層代碼天生就是高效的(如果它很醜陋,那一定很高效。一定有人花了大量時間和精力去優化它)。當然你可以用底層代碼寫出高效的代碼,有時爲了直接處理硬件資源不得不使用底層代碼。但是,你要評估下它值不值得:現代的c++ 編譯器非常高效,同時現在的硬件架構也非常複雜。如果有需要的話,像這樣的底層代碼往往爲了方便使用被設計成接口。通常,通過高層接口隱藏底層代碼會帶來更好的優化(比如避免底層代碼的濫用)。需要效率的時候,首先嚐試在高層接口中去實現,而不要亂用位和指針。


5.1 C’s qsort()
c語言的 qsort()


Consider a simple example. If you want to sort a set of floating-point numbers in decreasing order, you could write a piece of code to do so. However, unless you have extreme requirements (e.g., have more numbers than would fit in memory), doing so would be most naïve. For decades, we have had library sort algorithms with acceptable performance characteristics. My least favorite is the ISO standard C library qsort():
考慮一個簡單的例子。如果你要降序排列一組浮點數,你可以寫一段代碼實現它,但是除非必須要求那麼做(內存受限),否則這麼做太天真了。十年間,我們已經有了性能還不錯的排序算法庫。我最不喜歡 ios 標準庫的 qsort 算法。

int greater(const void* p, const void* q)  // three-way compare
{
  double x = *(double*)p;  // get the double value stored at the address p
  double y = *(double*)q;
  if (x>y) return 1;
  if (x<y) return -1;
  return 0;
}

void do_my_sort(double* p, unsigned int n)
{
  qsort(p,n,sizeof(*p),greater);
}

int main()
{
  double a[500000];
  // ... fill a ...
  do_my_sort(a,sizeof(a)/sizeof(*a));  // pass pointer and number of elements
  // ...
}


If you are not a C programmer or if you have not used qsort recently, this may require some explanation; qsort takes four arguments
如果你不是c 程序員,或者沒用過 qsort 的話,可能需要解釋下,qsort 接受 4 個參數:
A pointer to a sequence of bytes
數據指針
The number of elements
數據元素個數
The size of an element stored in those bytes
一個元素的大小
A function comparing two elements passed as pointers to their first bytes
一個函數,接受 2個參數,分別指向2個元素的首地址


Note that this interface throws away information. We are not really sorting bytes. We are sorting doubles, but qsort doesn’t know that so that we have to supply information about how to compare doubles and the number of bytes used to hold a double. Of course, the compiler already knows such information perfectly well. However, qsort’s low-level interface prevents the compiler from taking advantage of type information. Having to state simple information explicitly is also an opportunity for errors. Did I swap qsort()’s two integer arguments? If I did, the compiler wouldn’t notice. Did my compare() follow the conventions for a C three-way compare?
注意,這個接口漏掉了什麼。我們並不是真的要對字節排序。我們想對浮點數排序,但 qsort 不知道,所以我們不得不提供一些信息,包括怎麼比較浮點數和保存浮點數需要的字節數。當然,編譯器已經知道這些信息就再好不過了,但 qsort 的底層接口阻止編譯器使用類型信息。不得不顯式的表示信息也增加了出錯的機率。我是不是寫錯了 qsort 中的2個參數,即使我錯了,編譯器也不會發現。我的比較函數有沒有遵循 c 語言的 three-way 比較規則(什麼時候返回1,-1,0)


If you look at an industrial strength implementation of qsort (please do), you will notice that it works hard to compensate for the lack of information. For example, swapping elements expressed as a number of bytes takes work to do as efficiently as a swap of a pair of doubles. The expensive indirect calls to the comparison function can only be eliminated if the compiler does constant propagation for pointers to functions.
如果你看過一個 qsort 的實現,你會發現它會努力去彌補信息缺少帶來的問題。比如,交換用字節數表示的元素時儘量做到和交換浮點數一樣高效。如果編譯器用常量指針做參數傳遞給函數會降低間接調用比較函數時的開銷。


5.2 C++’s sort()
c++ 的 sort()


Compare qsort() to its C++ equivalent, sort():
比較2個等價版本

void do_my_sort(vector<double>& v)
{
  sort(v,[](double x, double y) { return x>y; });  // sort v in decreasing order
}

int main()
{
  vector<double> vd;
  // ... fill vd ...
  do_my_sort(v);
  // ...
}


Less explanation is needed here. A vector knows its size, so we don’t have to explicitly pass the number of elements. We never “lose” the type of elements, so we don’t have to deal with element sizes. By default, sort() sorts in increasing order, so I have to specify the comparison criteria, just as I did for qsort(). Here, I passed it as a lambda expression comparing two doubles using >. As it happens, that lambda is trivially inlined by all C++ compilers I know of, so the comparison really becomes just a greater-than machine operation; there is no (inefficient) indirect function call.
這裏不用太多解釋。vector 知道自己的大小,我們不再需要顯式傳遞元素的數量。我們不會漏掉元素的類型,所以也不用處理元素佔用字節。默認情況下,sort 執行升序排列,所以必須指定比較規則像 qsort 那樣。在這裏,我傳遞一個 lambda 表達式,使用 > 比較2個浮點數。據我所知所有的編譯器執行 lambda 表達式時都是簡單的內聯,這樣,比較變成了大於號的機器操作,沒有低效的間接函數調用。


I used a container version of sort() to avoid being explicit about the iterators. That is, to avoid having to write:
我使用了容器版本的 sort ,爲了避免顯式使用迭代器。避免像下面這樣寫:

std::sort(v.begin(),v.end(),[](double x, double y) { return x>y; });


I could go further and use a C++14 comparison object:
我可以更進一步,使用 c++14版本的對象:

sort(v,greater<>()); // sort v in decreasing order


Which version is faster? You can compile the qsort version as C or C++ without any performance difference, so this is really a comparison of programming styles, rather than of languages. The library implementations seem always to use the same algorithm for sort and qsort, so it is a comparison of programming styles, rather than of different algorithms. Different compilers and library implementations give different results, of course, but for each implementation we have a reasonable reflection of the effects of different levels of abstraction.
哪個版本更快?你可以用 c 或 c++ 編譯 qsort,它們沒有效率的差別,所以這只是編程風格的比較,而不是語言的比較。對於 sort 和 qsort 的庫實現一直使用相同的算法,所以這也只是編程風格的比較,而不是算法。不同的編譯器和庫實現有不同的結果,當然,對於每一個實現,我們會理性的思考不同層次抽象的效果。


I recently ran the examples and found the sort() version 2.5 times  faster than the qsort() version. Your mileage will vary from compiler to compiler and from machine to machine, but I have never seen qsort beat sort. I have seen sort run 10 times faster than qsort. How come? The C++ standard-library sort is clearly at a higher level than qsort as well as more general and flexible. It is type safe and parameterized over the storage type, element type, and sorting criteria. There isn’t a pointer, cast, size, or a byte in sight. The C++ standard library STL, of which sort is a part, tries very hard not to throw away information. This makes for excellent inlining and good optimizations.
我最近運行實例,發現 sort 比 qsort 快 2.5倍。由於編譯器機器環境的不同,結果不同,但我從見過 qsort 比 sort 快。我見過 sort 比 qsort 快 10倍,怎麼來的?c++標準庫 sort 和 qsort 相比,明顯是更高層次的抽象,同時也更通用更靈活。它類型安全,使存儲類型,元素類型,排序規則參數化,看不到指針, 類型轉換,長度,字節等等。c++ 標準庫 STL,包括 sort, 努力做到不丟失信息,這有利於更好的內聯和優化。


Generality and high-level code can beat low-level code. It doesn’t always, of course, but the sort/qsort comparison is not an isolated example. Always start out with a higher-level, precise, and type safe version of the solution. Optimize (only) if needed.
通用性和高層次的代碼比底層代碼更優。當然,也不是總是,但 sort 和 qsort 並不是個例。總是從一個高層,精確,類型安全的版本着手解決,如果需要再優化。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章