譯自five popular myths about c++ --by Bjarne Stroustrup (3)

Myth 3: "For reliable software, you need Garbage Collection"
作爲可以信賴的軟件,垃圾回收機制不可少

Garbage collection does a good, but not perfect, job at reclaiming unused memory. It is not a panacea. Memory can be retained indirectly and many resources are not plain memory. Consider:
在回收未使用的內存上,垃圾回收機制做得很好,但不完美,它並不是萬能的。內存可以被間接保留而且許多資源並不是簡單的內存問題。
// take input from file iname and produce output on file oname
//從文件 iname 讀入,輸出到文件 oname
class Filter { 
public:
  Filter(const string& iname, const string& oname); // constructor
  ~Filter();                                        // destructor
  // ...
private:
  ifstream is;
  ofstream os;
  // ...
};

This Filter’s constructor opens two files. That done, the Filter performs some task on input from its input file producing output on its output file. The task could be hardwired into Filter, supplied as a lambda, or provided as a function that could be provided by a derived class overriding a virtual function. Those details are not important in a discussion of resource management. We can create Filters like this:
Filter 類的構造函數打開2個文件,然後執行讀入輸入文件、輸入結果保存到輸出文件的任務。這些任務可能包括硬連接 Filter,提供 lambda 表達式,或者提供一個覆蓋派生類虛函數的函數。討論資源管理的這些細節並不重要。我們可以這樣創建 Filter 對象。
void user()
{
  Filter flt {“books”,”authors”};
  Filter* p = new Filter{“novels”,”favorites”};
  // use flt and *p
  delete p;
}

From a resource management point of view, the problem here is how to guarantee that the files are closed and the resources associated with the two streams are properly reclaimed for potential re-use.
從資源管理的角度看,問題在於如何保證文件已經正確關閉以及與2個流對象關聯的資源如何重新使用

The conventional solution in languages and systems relying on garbage collection is to eliminate the delete (which is easily forgotten, leading to leaks) and the destructor (because garbage collected languages rarely have destructors and “finalizers” are best avoided because they can be logically tricky and often damage performance). A garbage collector can reclaim all memory, but we need user actions (code) to close the files and to release any non-memory resources (such as locks) associated with the streams. Thus memory is automatically (and in this case perfectly) reclaimed, but the management of other resources is manual and therefore open to errors and leaks.
對於依賴垃圾回收機制的語言和系統來說,方便的方法就是根除 delete(容易被忘記,導致溢出) 和 析構(垃圾回收機制的語言很少使用析構,finalizers 也最好避免因爲它們邏輯古怪並且常常會影響性能)。垃圾回收器可以重用所有內存,但是需要用戶手動關閉文件並釋放與流對象相關的所有非內存資源。內存被自動回收了,但其他資源需要手動操作,那麼就會帶來報錯和溢出的風險。

The common and recommended C++ approach is to rely on destructors to ensure that resources are reclaimed. Typically, such resources are acquired in a constructor leading to the awkward name “Resource Acquisition Is Initialization” (RAII) for this simple and general technique. In user(), the destructor for flt  implicitly calls the destructors for the streams is and os. These constructors in turn close the files and release the resources associated with the streams. The delete would do the same for *p.
c++通常推薦使用析構去確保資源被回收。通常,構造使用的這些資源來自RAII(獲得資源就是初始化)這一簡單普通的技術。在函數 user 中,flt 的析構隱式調用is 和os 流對象的析構。這些析構(原文 constructors,構造?)依次關閉文件釋放流對象關聯的資源。 delete 對指針同樣這麼做。

Experienced users of modern C++ will have noticed that user() is rather clumsy and unnecessarily error-prone. This would be better:
有 c++11 經驗的用戶可能已經注意到 user 函數相當笨拙並有出錯的可能,這麼寫應該更好:
void user2()
{
  Filter flt {“books”,”authors”};
  unique_ptr<Filter> p {new Filter{“novels”,”favorites”}};
  // use flt and *p
}

Now *p will be implicitly released whenever user() is exited. The programmer cannot forget to do so. The unique_ptr is a standard-library class designed to ensure resource release without runtime or space overheads compared to the use of built-in “naked” pointers.
現在無論 user 何時退出,指針p指向的內存資源都會隱式釋放。程序員應該記住這個方法,與內置指針不同,unique_ptr是一套可以保證資源釋放後沒有運行時和空間開銷的標準庫。

However, we can still see the new, this solution is a bit verbose (the type Filter is repeated), and separating the construction of the ordinary pointer (using new) and the smart pointer (here, unique_ptr) inhibits some significant optimizations. We can improve this by using a C++14 helper function make_unique that constructs an object of a specified type and returns a unique_ptr to it:
但是,我們仍然發現 new 的存在,新的方案有點囉嗦(Filter類型重複了),而且這種普通指針和智能指針的分隔結構掩蓋了我們代碼優化的意義(我覺得原文應該是這個意思),我們可以使用 c++14提供的函數繼續優化,函數 make_unique 構造指定類型的對象,然後返回其unique_ptr
void user3()
{
  Filter flt {“books”,”authors”};
  auto p = make_unique<Filter>(“novels”,”favorites”);
  // use flt and *p
}

Unless we really needed the second Filter to have pointer semantics (which is unlikely) this would be better still:
除非我們真的需要第二個Filter 對象的指針,否則下面的代碼更好。
void user3()
{
  Filter flt {“books”,”authors”};
  Filter flt2 {“novels”,”favorites”};
  // use flt and flt2
}

This last version is shorter, simpler, clearer, and faster than the original.
最後一個版本最好,簡單簡潔快速。
But what does Filter’s destructor do? It releases the resources owned by a Filter; that is, it closes the files (by invoking their destructors). In fact, that is done implicitly, so unless something else is needed for Filter, we could eliminate the explicit mention of the Filter destructor and let the compiler handle it all. So, what I would have written was just:
但是 Filter 析構應該做些什麼呢?釋放一個 Filter 對象的資源;就是關閉文件(通過調用流對象的析構),實際上,這些是隱式完成的,除非對於 Filter 還要額外做些什麼,否則我們不會顯式定義其析構,都交給編譯器默認生成。所以我只要這樣寫就可以了:
class Filter { // take input from file iname and produce output on file oname
public:
  Filter(const string& iname, const string& oname);
  // ...
private:
  ifstream is;
  ofstream os;
  // ...
};

void user3()
{
  Filter flt {“books”,”authors”};
  Filter flt2 {“novels”,”favorites”};
  // use flt and flt2
}


This happens to be simpler than what you would write in most garbage collected languages (e.g., Java or C#) and it is not open to leaks caused by forgetful programmers. It is also faster than the obvious alternatives (no spurious use of the free/dynamic store and no need to run a garbage collector). Typically, RAII also decreases the resource retention time relative to manual approaches.
這比那些支持垃圾回收的語言寫起來更簡潔,對於健忘的程序員,也不會導致溢出。顯然也比其他可選方案快很多(無需模擬自由、動態內存的存儲,無需運行垃圾回收機制)。相對於手動操作,RAII 也降低了資源滯留的時間。
This is my ideal for resource management. It handles not just memory, but general (non-memory) resources, such as file handles, thread handles, and locks. But is it really general? How about objects that needs to be passed around from function to function? What about objects that don’t have an obvious single owner?
這是我理想的資源管理方法,不僅用於內存,還可以用於普通資源像文件句柄,線程句柄,鎖等等。但它真的通用了嗎?如果對象需要在函數間傳遞呢?如果對象沒有一個明確的單一所屬呢?


4.1 Transferring Ownership: move
所有權的移交:move


Let us first consider the problem of moving objects around from scope to scope. The critical question is how to get a lot of information out of a scope without serious overhead from copying or error-prone pointer use. The traditional approach is to use a pointer:
我們先來思考一下在域間移動對象的問題。關鍵點在於在不避免拷貝或易錯指針等重大開銷的情況下怎麼在域外獲取其信息。傳統方法是使用指針:

X* make_X()
{
  X* p = new X:
  // ... fill X ..
  return p;
}

void user()
{
  X* q = make_X();
  // ... use *q ...
  delete q;
}


Now who is responsible for deleting the object? In this simple case, obviously the caller of make_X() is, but in general the answer is not obvious. What if make_X() keeps a cache of objects to minimize allocation overhead? What if user() passed the pointer to some other_user()? The potential for confusion is large and leaks are not uncommon in this style of program.
現在誰負責指針的刪除工作呢?在上例中,顯然是 make_X 的調用者,但通常答案並不明確。如果爲了降低開銷,make_X 需要對象的緩存呢?如果 user 將指針傳遞給其他 other_user 呢?在這種編程風格中,極易混亂和溢出。


I could use a shared_ptr or a unique_ptr to be explicit about the ownership of the created object. For example:
我可以使用 shared_ptr 或者 unique_ptr 顯式的表明已有對象的歸屬。舉例:

unique_ptr<X> make_X();


But why use a pointer (smart or not) at all? Often, I don’t want a pointer and often a pointer would distract from the conventional use of an object. For example, a Matrix addition function creates a new object (the sum) from two arguments, but returning a pointer would lead to seriously odd code:
但是爲嘛非要用指針(智能或非智能)呢?通常我也不想用指針,和傳統的使用對象比較,返回指針有點多餘(看下面好像是這個意思),比如說,Matrix 類型的加法函數,計算2個參數的和,但卻返回一個指針,這看起來好奇怪。

unique_ptr<Matrix> operator+(const Matrix& a, const Matrix& b);
Matrix res = *(a+b);

That * is needed to get the sum, rather than a pointer to it. What I really want in many cases is an object, rather than a pointer to an object. Most often, I can easily get that. In particular, small objects are cheap to copy and I wouldn’t dream of using a pointer:
那個解引用應該是一個結果,而不是指向結果的指針。多數情況下,我只要一個對象,而不是指針。尤其是那些小的類型,只要簡單的copy 就好,根本不用考慮指針。
double sqrt(double); // a square root function
double s2 = sqrt(2); // get the square root of 2

On the other hand, objects holding lots of data are typically handles to most of that data. Consider istream, string, vector, list, and thread. They are all just a few words of data ensuring proper access to potentially large amounts of data. Consider again the Matrix addition. What we want is
另一方面,擁有許多數據的類型,一般也會有處理這些數據的操作,像 istream, string, vector, list, thread.它們只用幾個簡單的數據操作命令就保證了對大量數據的訪問,再看回 Matrix 的加法函數,我們想要的是:
Matrix operator+(const Matrix& a, const Matrix& b); // return the sum of a and b
Matrix r = x+y;

We can easily get that.
簡單的得到結果
Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix res;
  // ... fill res with element sums ...
  return res;
}

By default, this copies the elements of res into r, but since res is just about to be destroyed and the memory holding its elements is to be freed, there is no need to copy: we can “steal” the elements. Anybody could have done that since the first days of C++, and many did, but it was tricky to implement and the technique was not widely understood. C++11 directly supports “stealing the representation” from a handle in the form of move operations that transfer ownership. Consider a simple 2-D Matrix of doubles:
默認情況下,這會拷貝 res 中的成員到 r,但是隻要 res 銷燬了,其成員佔有的內存就會被釋放,有一種不需要 copy 的方法,我們可以“偷”。從接觸 c++的第一天起,很多人都想過這麼幹,但這種方法很難實現而且技術不容易被普遍接受。c++11直接支持“竊取信息”,通過move操作形式的句柄移交所有權,看一下二維雙重 Matrix 的例子:
class Matrix {
  double* elem; // pointer to elements
  int nrow;     // number of rows
  int ncol;     // number of columns
public:
  Matrix(int nr, int nc)                  // constructor: allocate elements
    :elem{new double[nr*nc]}, nrow{nr}, ncol{nc}
  {
    for(int i=0; i<nr*nc; ++i) elem[i]=0; // initialize elements
  }

  Matrix(const Matrix&);                  // copy constructor
  Matrix operator=(const Matrix&);        // copy assignment

  Matrix(Matrix&&);                       // move constructor
  Matrix operator=(Matrix&&);             // move assignment

  ~Matrix() { delete[] elem; }            // destructor: free the elements

// …
};


A copy operation is recognized by its reference (&) argument. Similarly, a move operation is recognized by its rvalue reference (&&) argument. A move operation is supposed to “steal” the representation and leave an “empty object” behind. For Matrix, that means something like this:
通過判斷參數是左值引用或右值引用來區別 copy 和 move 移動。move “竊取信息”後,源對象就成了“空殼”。拿 Matrix 來說,就是這樣的:

Matrix::Matrix(Matrix&& a)                   // move constructor
  :nrow{a.nrow}, ncol{a.ncol}, elem{a.elem}  // “steal” the representation “竊取資源”
{
  a.elem = nullptr;                          // leave “nothing” behind 置空源對象
}


That’s it! When the compiler sees the return res; it realizes that res is soon to be destroyed. That is, res will not be used after the return. Therefore it applies the move constructor, rather than the copy constructor to transfer the return value. In particular, for
就這麼簡單!當編譯器執行到 "return res;",會意識到 res 很快就會被銷燬。那樣的話,在 return 後,res 就不能使用了。於是,編譯器使用 move 構造而不是 copy 構造轉移返回值。

Matrix r = a+b;


the res inside operator+() becomes empty -- giving the destructor a trivial task -- and res’s elements are now owned by r. We have managed to get the elements of the result -- potentially megabytes of memory -- out of the function (operator+()) and into the caller’s variable. We have done that at a minimal cost (probably four word assignments).
特別注意的是,此時 operator+() 中的 res 已經空了,留下一點析構的善後工作,res 所有的元素現在歸 r 所有。我們已經將operator+ 中的結果(或許有幾兆)轉移到調用者的變量中了,我們只用了一點成本,可能只是4行賦值語句。


Expert C++ users have pointed out that there are cases where a good compiler can eliminate the copy on return completely (in this case saving the four word moves and the destructor call). However, that is implementation dependent, and I don’t like the performance of my basic programming techniques to depend on the degree of cleverness of individual compilers. Furthermore, a compiler that can eliminate the copy, can as easily eliminate the move. What we have here is a simple, reliable, and general way of eliminating complexity and cost of moving a lot of information from one scope to another.
已經有專業用戶指出,某些情況下,好的編譯器可以清除返回的 copy 信息(這中情況下,會保存4行 move 操作和析構調用)。然而這是對現實的依賴,我不喜歡由個別編譯器的智能程度來決定我的基礎編程能力的性能。而且能清除 copy 的編譯器肯定能清除 move. 我們現在有一套簡單可行通用的方法去消除域間移動大數據時帶來的複雜性和開銷。


Often, we don’t even need to define all those copy and move operations. If a class is composed out of members that behave as desired, we can simply rely on the operations generated by default. Consider:
通常,我們不必定義所有的 copy move 操作,如果一個類缺少所需的成員操作,我們可以依賴默認生成的操作。

class Matrix {
    vector<double> elem; // elements
    int nrow;            // number of rows
    int ncol;            // number of columns
public:
    Matrix(int nr, int nc)    // constructor: allocate elements
      :elem(nr*nc), nrow{nr}, ncol{nc}
    { }

    // ...
};


This version of Matrix behaves like the version above except that it copes slightly better with errors and has a slightly larger representation (a vector is usually three words).
這個版本很像上面的,除了對錯誤稍微的處理和更多的描述(沒看明白這句啥意思)


What about objects that are not handles? If they are small, like an int or a complex<double>, don’t worry. Otherwise, make them handles or return them using “smart” pointers, such as unique_ptr and shared_ptr. Don’t mess with “naked” new and delete operations.
那些不是句柄的對象呢?如果他們像 int 那麼小,或者 complex<double>,不要擔心。使用智能指針處理或返回他們,不要單純的使用 new delete.


Unfortunately, a Matrix like the one I used in the example is not part of the ISO C++ standard library, but several are available (open source and commercial). For example, search the Web for “Origin Matrix Sutton” and see Chapter 29 of my The C++ Programming Language (Fourth Edition) [11] for a discussion of the design of such a matrix.
不幸的是,上面使用的 Matrix 並不是標準庫裏的,但是很多都可用。在網上搜索“Origin Matrix Sutton”,你可以看見在我的書The C++ Programming Language (Fourth Edition)的第29章在討論如何設計這樣的一個矩陣。


4.2 Shared Ownership: shared_ptr
共享所有


In discussions about garbage collection it is often observed that not every object has a unique owner. That means that we have to be able ensure that an object is destroyed/freed when the last reference to it disappears. In the model here, we have to have a mechanism to ensure that an object is destroyed when its last owner is destroyed. That is, we need a form of shared ownership. Say, we have a synchronized queue, a sync_queue, used to communicate between tasks. A producer and a consumer are each given a pointer to the sync_queue:
在討論垃圾回收機制時,常常觀察到不是所有的對象都有唯一的所有者。這就意味着當最後一個引用銷燬後,我們必須確保該對象正確銷燬釋放。在這個例子中,我們必須有一套機制以保證最後一個所有者銷燬後,該對象也會被銷燬。我們需要一套所有權共享機制。這裏,我們有一個用於任務間通訊的同步隊列 sync_queue,提供者和使用者同時擁有指向 sync_queue 指針:

void startup()
{
  sync_queue* p  = new sync_queue{200};  // trouble ahead!
  thread t1 {task1,iqueue,p};  // task1 reads from *iqueue and writes to *p
  thread t2 {task2,p,oqueue};  // task2 reads from *p and writes to *oqueue
  t1.detach();
  t2.detach();
}


I assume that task1, task2, iqueue, and oqueue have been suitably defined elsewhere and apologize for letting the thread outlive the scope in which they were created (using detatch()). Also, you may imagine pipelines with many more tasks and sync_queues. However, here I am only interested in one question: “Who deletes the sync_queue created in startup()?” As written, there is only one good answer: “Whoever is the last to use the sync_queue.” This is a classic motivating case for garbage collection. The original form of garbage collection was counted pointers: maintain a use count for the object and when the count is about to go to zero delete the object. Many languages today rely on a variant of this idea and C++11 supports it in the form of shared_ptr. The example becomes:
我假設 task1 task2 iqueue oqueue 已經在其他地方定義,通過使用 detatch() 使線程的生命週期比它所在的域更長。你可能想到了多任務管道 和 sync_queues。可是在這裏,我只對一件事感興趣:誰刪除了 startup() 中創建的sync_queue。只有一個正確的答案,那就是 sync_queue 最後的使用者。這是一個典型的垃圾回收機制的案列。垃圾回收的原型是計數指針:記錄被使用的對象數,當計數爲 0 時,刪除對象。許多語言都是以這個原型演變來的,c++11中使用 shared_ptr 的形式 ,例子變爲:

void startup()
{
  auto p = make_shared<sync_queue>(200);  // make a sync_queue and return a stared_ptr to it
  thread t1 {task1,iqueue,p};  // task1 reads from *iqueue and writes to *p
  thread t2 {task2,p,oqueue};  // task2 reads from *p and writes to *oqueue
  t1.detach();
  t2.detach();
}


Now the destructors for task1 and task2 can destroy their shared_ptrs (and will do so implicitly in most good designs) and the last task to do so will destroy the sync_queue.
現在 task1 task2 的析構函數可以銷燬他們的 shared_ptr(在多數好的設計中,這會做得很隱蔽),最後一個這個做得會銷燬 sync_queue 對象。


This is simple and reasonably efficient. It does not imply a complicated run-time system with a garbage collector. Importantly, it does not just reclaim the memory associated with the sync_queue. It reclaims the synchronization object (mutex, lock, or whatever) embedded in the sync_queue to mange the synchronization of the two threads running the two tasks. What we have here is again not just memory management, it is general resource management. That “hidden” synchronization object is handled exactly as the file handles and stream buffers were handled in the earlier example.
這簡單合理高效。這不不是說一個複雜的運行系統一定要一個垃圾回收器。他不僅僅可以回收與 sync_queue 關聯的內存,還能回收sync_queue中用於管理不同任務的多線程同步性的同步對象(互斥,鎖等),不僅管理內存,還可以管理資源。隱藏的同步對象可以精確處理前面例子中的文件句柄和流句柄。


We could try to eliminate the use of shared_ptr by introducing a unique owner in some scope that encloses the tasks, but doing so is not always simple, so C++11 provides both unique_ptr (for unique ownership) and shared_ptr (for shared ownership).
我們可以嘗試通過引入唯一所有者在封裝的域中淘汰 shared_ptr 。但這並不簡單,所以 c++11 同時提供了 unique_ptr 和 shared_ptr。


4.3 Type safety
類型安全


Here, I have only addressed garbage collection in connection with resource management. It also has a role to play in type safety. As long as we have an explicit delete operation, it can be misused. For example:

這裏,我只談到了和資源管理相關的垃圾回收機制,它同樣在類型安全中起了重要作用。只要我們顯式使用 delete 操作,就可能出現失誤。例如:
X* p = new X;
X* q = p;
delete p;
// ...
 // the memory that held *p may have been re-used 
 // p 指向的內存已經被回收了
q->do_something(); 


Don’t do that. Naked deletes are dangerous -- and unnecessary in general/user code. Leave deletes inside resource management classes, such as string, ostream, thread, unique_ptr, and shared_ptr. There, deletes are carefully matched with news and harmless.
千萬不要那麼做。在一般的用戶代碼中,delete 的使用的危險多餘的。在 string ostream thread unique_ptr shared_ptr 的資源管理類中,不要使用 delete。因此小心配合 new 使用 delete 以確保無害。


4.4 Summary: Resource Management Ideals
總結:資源管理理念


For resource management, I consider garbage collection a last choice, rather than “the solution” or an ideal:
對於資源管理,我會把作爲最後的選擇,而不是解決方案或理念


Use appropriate abstractions that recursively and implicitly handle their own resources. Prefer such objects to be scoped variables.
作用域變量對象優先使用合適的抽象遞歸地隱式的處理它們的資源。


When you need pointer/reference semantics, use “smart pointers” such as unique_ptr and shared_ptr to represent ownership.
當你需要指針或引用時,使用像 unique_ptr shared_ptr 的智能指針表示其所有關係。


If everything else fails (e.g., because your code is part of a program using a mess of pointers without a language supported strategy for resource management and error handling), try to handle non-memory resources “by hand” and plug in a conservative garbage collector to handle the almost inevitable memory leaks.
如果所有方法都失敗了,(比如,你在沒有資源管理策略和錯誤處理支持的語言代碼中使用了大量指針),嘗試手動處理非內存資源並插入一套垃圾回收機制去處理不可避免的內存溢出。


Is this strategy perfect? No, but it is general and simple. Traditional garbage-collection based strategies are not perfect either, and they don’t directly address non-memory resources.
這種策略完美嗎?不,但它簡單實用。基於傳統垃圾回收的策略並不完美,它並不能直接解決非內存資源的問題。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章