《Windows Via C/C++》邊學習,邊翻譯(四)操作字符和字符串-3

Unicode and ANSI Functions in the C Run-Time Library

C運行期庫的Unicode和ANSI函數

 

Like the Windows functions, the C run-time library offers one set of functions to manipulate ANSI characters and strings and another set of functions to manipulate Unicode characters and strings. However, unlike Windows, the ANSI functions do the work; they do not translate the strings to Unicode and then call the Unicode version of the functions internally. And, of course, the Unicode versions do the work themselves too; they do not internally call the ANSI versions.

像Windows函數一樣,C運行時庫提供了一個操作ANSI字符和字符串的函數集,以及另一個操作Unicode字符和字符串函數集。然而與Windows的ANSI函數不同的是,(C運行時庫所提供的ANSI函數)完成它自己的工作;並不在內部將字符串轉換爲Unicode編碼和調用函數的對應Unicode版本。同時,Unicode版本的函數當然會完成它自己的工作;也不會在內部調用對應的ANSI版本。

An example of a C run-time function that returns the length of an ANSI string is strlen, and an example of an equivalent C run-time function that returns the length of a Unicode string is wcslen.

一個C運行時函數返回ANSI字符串長度的例子是strlen,而一個等價的C運行時函數返回Unicode字符串長度的例子是wcslen

Both of these functions are prototyped in String.h. To write source code that can be compiled for either ANSI or Unicode, you must also include TChar.h, which defines the following macro:

這些函數的原型都在String.h中作了定義。編寫既能編譯成ANSI又能編譯成Unicode的原代碼,必須也引入TChar.h,蓋頭文件定義了以下的宏:

#ifdef _UNICODE
#define _tcslen     wcslen
#else
#define _tcslen     strlen
#endif

 

Now, in your code, you should call _tcslen. If _UNICODE is defined, it expands to wcslen; otherwise, it expands to strlen. By default, when you create a new C++ project in Visual Studio, _UNICODE is defined (just like UNICODE is defined). The C run-time library always prefixes identifiers that are not part of the C++ standard with underscores, while the Windows team does not do this. So, in your applications you'll want to make sure that both UNICODE and _UNICODE are defined or that neither is defined. Appendix A, "The Build Environment," will describe the details of the CmnHdr.h header file used by all the code samples of this book to avoid this kind of problem.

現在,在你的代碼中應該調用_tcslen,如果定義宏_UNICODE,它被展開爲wcslen;否則展開爲strlen。當在Visual Studio中創建一個新的C++工程時,_UNICODE默認是定義的。C運行時庫總是在標識符前加下劃線作爲前綴,以表示不是標準C++的部分,而Windows開發組並不這樣做。因此在你的應用程序中你想確認是否UNICODE_UNICODE都已被定義是還是都未定義。附錄A,"The Build Environment",將講述本書中的所有代碼如何使用CmnHdr.h頭文件來避免這種情況。

 

Secure String Functions in the C Run-Time Library

C運行期庫中安全版本的字符串函數

 

Any function that modifies a string exposes a potential danger: if the destination string buffer is not large enough to contain the resulting string, memory corruption occurs. Here is an example:

任何修改字符串的函數都暴露了一個潛在的危險:如果目的字符串緩衝區的大小並不足以存放結果字符串,那麼內存惡化(memory corruption)就會發生。這裏是一個例子:

// The following puts 4 characters in a
// 3-character buffer, resulting in memory corruption
// 以下代碼試圖將4個字符放入3字符緩衝區中,
// 導致內存惡化

WCHAR szBuffer[
3= L"";
wcscpy(szBuffer, L
"abc"); // The terminating 0 is a character too!
                                          
// 零結束符也是一個字符

 

The problem with the strcpy and wcscpy functions (and most other string manipulation functions) is that they do not accept an argument specifying the maximum size of the buffer, and therefore, the function doesn't know that it is corrupting memory. Because the function doesn't know that it is corrupting memory, it can't report an error back to your code, and therefore, you have no way of knowing that memory was corrupted. And, of course, it would be best if the function just failed without corrupting any memory at all.

問題是strcpywcscpy函數(以及大多數其它的字符串操作函數)並不接受指定緩衝區長度的引數,因此函數並不知道它是否使內存惡化,也就無法報告一個錯誤,你也就無從知道內存已惡化。當然,函數僅僅是失敗而不腐化內存是最好的(情況)了。

This kind of misbehavior has been heavily exploited by malware in the past. Microsoft is now providing a set of new functions that replace the unsafe string manipulation functions (such as wcscat, which was shown earlier) provided by the C run-time library that many of us have grown to know and love over the years. To write safe code, you should no longer use any of the familiar C run-time functions that modify a string. (Functions such as strlen, wcslen, and _tcslen are OK, however, because they do not attempt to modify the string passed to them even though they assume that the string is 0 terminated, which might not be the case.) Instead, you should take advantage of the new secure string functions defined by Microsoft's StrSafe.h file.

這種不當行爲以前常被malware所利用。微軟目前提供新的函數集,來取代那些C運行時庫提供的許多我們多年來熟知的、喜愛的、但不安全的字符串操作函數(如wcscat,之前提過)。編寫安全的代碼,不應再使用任何C運行時庫家族的函數去修改字符串。(但是,使用strlen, wcslen_tcslen這樣的函數是OK的,因爲他們並不修改傳給它們的字符串,即使它們假定字符串以0結尾,也不會有那樣的問題。)應該利用微軟的StrSafe.h中定義的新安全版本的字符串函數的優點。

 Note  Internally, Microsoft has retrofitted its ATL and MFC class libraries to use the new safe string functions, and therefore, if you use these libraries, rebuilding your application to the new versions is all you have to do to make your application more secure.

注意  微軟更新了ATL和MFC的類庫使用新的安全的字符串函數,因此如果你的應用程序使用了這些庫,你所要做的就是rebuild你的程序到新版本,以使你的程序更安全。

Because this book is not dedicated to C/C++ programming, for a detailed usage of this library, you should take a look at the following sources of information:

由於此書並非專講C/C++編程的,所以如果想了解更多使用該庫的細節,請參考以下資源信息:

The MSDN Magazine article "Repel Attacks on Your Code with the Visual Studio 2005 Safe C and C++ Libraries" by Martyn Lovell, located at http://msdn.microsoft.com/msdnmag/issues/05/05/SafeCandC/default.aspx

The Martyn Lovell video presentation on Channel9, located at http://channel9.msdn.com/Showpost.aspx?postid=186406

The secure strings topic on MSDN Online, located at http://msdn2.microsoft.com/en-us/library/ms647466.aspx

The list of all C run-time secured replacement functions on MSDN Online, which you can find at http://msdn2.microsoft.com/en-us/library/wd3wzwts(VS.80).aspx

However, it is worth discussing a couple of details in this chapter. I'll start by describing the patterns employed by the new functions. Next, I'll mention the pitfalls you might encounter if you are following the migration path from legacy functions to their corresponding secure versions, like using _tcscpy_s instead of _tcscpy. Then I'll show you in which case it might be more interesting to call the new StringC* functions instead.

然而在本章中討論一些細節是必要的。我開始會先講述新函數採用的模式;之後會提到在將遺留函數遷移到相對應的安全版本的過程中,你可能會犯的錯誤,比如用_tcscpy_s代替_tcscpy;然後我會展示在何種情況下調用StringC*函數將會很有趣。

Introducing the New Secure String Functions

When you include StrSafe.h, String.h is also included and the existing string manipulation functions of the C run-time library, such as those behind the _tcscpy macro, are flagged with obsolete warnings during compilation. Note that the inclusion of StrSafe.h must appear after all other include files in your source code. I recommend that you use the compilation warnings to explicitly replace all the occurrences of the deprecated functions by their safer substitutes—thinking each time about possible buffer overflow and, if it is not possible to recover, how to gracefully terminate the application.

Each existing function, like _tcscpy or _tcscat, has a corresponding new function that starts with the same name that ends with the _s (for secure) suffix. All these new functions share common characteristics that require explanation. Let's start by examining their prototypes in the following code snippet, which shows the side-by-side definitions of two usual string functions:

PTSTR _tcscpy (PTSTR strDestination, PCTSTR strSource);
errno_t _tcscpy_s(PTSTR strDestination, size_t numberOfCharacters,
   PCTSTR strSource);

PTSTR _tcscat (PTSTR strDestination, PCTSTR strSource);
errno_t _tcscat_s(PTSTR strDestination, size_t numberOfcharacters,
   PCTSTR strSource);

 

When a writable buffer is passed as a parameter, its size must also be provided. This value is expected in the character count, which is easily computed by using the _countof macro (defined in stdlib.h) on your buffer.

All of the secure (_s) functions validate their arguments as the first thing they do. Checks are performed to make sure that pointers are not NULL, that integers are within a valid range, that enumeration values are valid, and that buffers are large enough to hold the resulting data. If any of these checks fail, the functions set the thread-local C run-time variable errno and the function returns an errno_t value to indicate success or failure. However, these functions don't actually return; instead, in a debug build, they display a user-unfriendly assertion dialog box similar to that shown in Figure 2-1. Then your application is terminated. The release builds directly auto-terminate.

The C run time actually allows you to provide a function of your own, which it will call when it detects an invalid parameter. Then, in this function, you can log the failure, attach a debugger, or do whatever you like. To enable this, you must first define a function that matches the following prototype:

當C運行期函數調用檢測到一個無效參數時,允許你提供自己的函數。你可以在在此函數中記錄失敗日誌、掛接調試器或其它任何你想做的,但是首先必須將函數按以下原型定義:

void InvalidParameterHandler(PCTSTR expression, PCTSTR function,
       PCTSTR file, unsigned 
int line, uintptr_t /*pReserved*/);

 

The expression parameter describes the failed expectation in the C run-time implementation code, such as (L"Buffer is too small" && 0). As you can see, this is not very user friendly and should not be shown to the end user. This comment also applies to the next three parameters because function, file, and line describe the function name, the source code file, and the source code line number where the error occurred, respectively.

參數expression是C運行期實現代碼中預期的失敗描述,例如 (L"Buffer is too small" && 0)。如你所見,這並非是對用戶友好的,不能顯示給最終用戶。對於接下來的三個參數也是一樣,functionfileline分別描述發生錯誤的函數名、代碼文件和代碼行號。

 Note  All these arguments will have a value of NULL if DEBUG is not defined. So this handler is valuable for logging errors only when testing debug builds. In a release build, you could replace the assertion dialog box with a more user-friendly message explaining that an unexpected error occurred that requires the application to shut down—maybe with specific logging behavior or an application restart. If its memory state is corrupted, your application execution should stop. However, it is recommended that you wait for the errno_t check to decide whether the error is recoverable or not.

注意  當DEBUG未被定義時所有這些引數均被賦爲NULL, 所以此句柄僅在debug版記日誌時有效。在release版中,應該用含有更多對用戶友好的信息,來解釋該程序發生了一個非預期的錯誤需要shut down——可能還有指定的記錄log的行爲或重啓此應用程序,用這些信息來取代彈出斷言對話框的做法。如果內存狀態在惡化,你的應用程序應該停止執行。但是建議你等待errno_t檢查,來確定錯誤是否已經恢復。

The next step is to register this handler by calling _set_invalid_parameter_handler. However, this step is not enough because the assertion dialog box will still appear. You need to call _CrtSetReportMode(_CRT_ASSERT, 0); at the beginning of your application, disabling all assertion dialog boxes that could be triggered by the C run time.

下一步是調用_set_invalid_parameter_handler來註冊此句柄。但這樣是不夠的,斷言對話框仍將彈出。你需要在你程序的開始調用_CrtSetReportMode(_CRT_ASSERT, 0);使所用在C執行期會被觸發的斷言對話框無效。

Now, when you call one of the legacy replacement functions defined in String.h, you are able to check the returned errno_t value to understand what happened. Only the value S_OK means that the call was successful. The other possible return values found in errno.h, such as EINVAL, are for invalid arguments such as NULL pointers.

現在,當你調用String.h中定義的取代以往遺留的函數時,可以通過檢查返回的errno_t值來理解發生的錯誤。只有返回值是S_OK才意味着調用成功了。其它可能的返回值在errno.h中有定義,如EINVAL,表示空指針之類的無效引數錯誤。

Let's take an example of a string that is copied into a buffer that is too small for one character:

舉個例子,將一個字符串複製到僅能放下一個字符的buffer中:

TCHAR szBefore[5] = {
   TEXT('B'), TEXT('B'), TEXT('B'), TEXT('B'), '/0'
};

TCHAR szBuffer[10] = {
   TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'),
   TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'), '/0'
};

TCHAR szAfter[5] = {
   TEXT('A'), TEXT('A'), TEXT('A'), TEXT('A'), '/0'
};

errno_t result = _tcscpy_s(szBuffer, _countof(szBuffer),
   TEXT("0123456789"));

 

Just before the call to _tcscpy_s, each variable has the content shown in Figure 2-2.

調用_tcscpy_s之前,每個變量的內容如圖2-2所示。

Figure 2-2: Variable state before the _tcscpy_s call

Because the "1234567890" string to be copied into szBuffer has exactly the same 10-character size as the buffer, there is not enough room to copy the terminating '/0' character. You might expect that the value of result is now STRUNCATE and the last character '9' has not been copied, but this is not the case. ERANGE is returned, and the state of each variable is shown in Figure 2-3.

將字符串"1234567890"複製到szBuffer,它正好爲10個字符長,因此沒有空間再複製'/0'結束符。也許你會期望結果是STRUNCATE並且最後一個字符'9'不被複制,然而事實並不是這樣。函數會返回ERANGE,每個變量的狀態如圖2-3所示:

Figure 2-3: Variable state after the _tcscpy_s call

There is one side effect that you don't see unless you take a look at the memory behind szBuffer, as shown in Figure 2-4.

當你查看szBuffer之後的內存內容時,會發現它的一個副作用,如圖2-4所示。

Figure 2-4: Content of szBuffer memory after a failed call

The first character of szBuffer has been set to '/0', and all other bytes now contain the value 0xfd. So the resulting string has been truncated to an empty string and the remaining bytes of the buffer have been set to a filler value (0xfd).

szBuffer的首字符被置爲'/0',其它所有字節被填充爲0xfd。所以最終字符串被截斷爲空字符串,且buffer中剩餘的字節被置爲填充符(0xfd)。

 Note  If you wonder why the memory after all the variables have been defined is filled up with the 0xcc value in Figure 2-4, the answer is in the result of the compiler implementation of the run-time checks (/RTCs, /RTCu, or /RTC1) that automatically detect buffer overrun at run time. If you compile your code without these /RTCx flags, the memory view will show all sz* variables side by side. But remember that your builds should always be compiled with these run-time checks to detect any remaining buffer overrun early in the development cycle.
 
注意  也許你想知道爲什麼圖2-4中所有定義變量之後的內存均被填充爲0xcc,答案是編譯器中執行期檢查(/RTCs/RTCu/RTC1)的實現,會在執行期自動檢測緩衝區溢出。當不使用/RTCx標記編譯代碼時,內存查看器會並排顯示所有sz*變量。請記住在build時應該一直選擇使用執行期檢查進行編譯,這樣可以在開發週期中儘早發現任何存在的緩衝區溢出錯誤。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章