openFOAM代碼閱讀——基礎代碼中的wchar

原創

2020-07-02 00:24

路徑src/OpenFOAM/primitives/chars中還有另外一個文件夾wchar，我們這裏來看看這裏面有什麼。

頭文件wchar.H的內容如下：

#include <cwchar>
#include <string>

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

namespace Foam
{

class Istream;
class Ostream;

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

//- Output wide character (Unicode) as UTF-8
Ostream& operator<<(Ostream&, const wchar_t);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const wchar_t*);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const std::wstring&);


// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

} // End namespace Foam

主體內容和前面一個博客中的char文件夾類似，都是給Istream和Ostream進行了運算符>>和<<的重定義，這裏的註釋提示了是針對output wide character (Unicode) as UTF-8，就是在輸出的同時還需要轉碼成UTF-8。

我們再來看看這三個運算符重定義的具體實現，即wcharIO.C文件：

#include "error.H"

#include "wchar.H"
#include "IOstreams.H"

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t wc)
{
    if (!(wc & ~0x0000007F))
    {
        // 0x00000000 - 0x0000007F: (1-byte output)
        // 0xxxxxxx
        os.write(char(wc));
    }
    else if (!(wc & ~0x000007FF))
    {
        // 0x00000080 - 0x000007FF: (2-byte output)
        // 110bbbaa 10aaaaaa
        os.write(char(0xC0 | ((wc >> 6) & 0x1F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x0000FFFF))
    {
        // 0x00000800 - 0x0000FFFF: (3-byte output)
        // 1110bbbb 10bbbbaa 10aaaaaa
        os.write(char(0xE0 | ((wc >> 12) & 0x0F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x001FFFFF))
    {
        // 0x00010000 - 0x001FFFFF: (4-byte output)
        // 11110ccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF0 | ((wc >> 18) & 0x07)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x03FFFFFF))
    {
        // 0x00200000 - 0x03FFFFFF: (5-byte output)
        // 111110dd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF8 | ((wc >> 24) & 0x03)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x7FFFFFFF))
    {
        // 0x04000000 - 0x7FFFFFFF: (6-byte output)
        // 1111110d 10dddddd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xFC | ((wc >> 30) & 0x01)));
        os.write(char(0x80 | ((wc >> 24) & 0x3F)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else
    {
        // according to man page utf8(7)
        // the Unicode standard specifies no characters above 0x0010FFFF,
        // so Unicode characters can only be up to four bytes long in UTF-8.

        // report anything unknown/invalid as replacement character U+FFFD
        os.write(char(0xEF));
        os.write(char(0xBF));
        os.write(char(0xBD));
    }

    os.check("Ostream& operator<<(Ostream&, const wchar_t)");
    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t* wstr)
{
    if (wstr)
    {
        for (const wchar_t* iter = wstr; *iter; ++iter)
        {
            os  << *iter;
        }
    }

    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const std::wstring& wstr)
{
    for
    (
        std::wstring::const_iterator iter = wstr.begin();
        iter != wstr.end();
        ++iter
    )
    {
        os  << *iter;
    }

    return os;
}

哇，略長。首先是error.H頭文件，在路徑src/OpenFOAM/db/error中，但是我們暫時不看這裏，IOstreams.H也先忽略。

其中的wc變量，可以百度一下https://baike.baidu.com/item/wchar_t/8562830?fr=aladdin。他是char類型定義一個擴展表達，可以表示更多的字符類型，代價是需要用更多的字節數，且在不同的庫中可能會用不同的字節數，最多爲4字節。

我們知道char是1字節，所以轉換之前，我們需要先判斷是否超過1字節。判斷方法用到了位運算。0x0000 007F是一個十六進制數，恰好四個字節，轉換成二進制數爲：

0b00000000,00000000,00000000,01111111

再來看判斷語句!(wc & ~0x0000007F)，去反後用&進行按位與運算，即和如下進行與運算

0b11111111,11111111,11111111,10000000

就是說只要wc的二進制編碼中超過7位非零，上述按位與運算就是非零的，再用!取反之後，得到判斷結果爲0。就是說這個判斷可以判斷wc是否會超出1字節。

後面的幾個判斷是類似的，超出的部分會分成多個char進行輸出（這樣也行？？？）

然後再看程序中的另外幾個運算符重定義，其實也是多態，只不過將輸入值分別給成wchar_t數組的首地址，或者是std::wstring，這個參考這篇帖子https://blog.csdn.net/qq_28388835/article/details/81172675，相當於用wchar_t類型定義的string。

比較有趣的是當前的wchar_t數組使用到了迭代器，這個用法在STL中經常出現，但是直接給數組使用的還是第一次見。可能openFOAM內部實現了數組的迭代器（這個就很強了），在《Essential C++》中介紹STL之前也實現過類似的功能，這對編程的技巧要求非常高。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

openFOAM代碼閱讀——基礎代碼中的wchar

SQL優化-20231016

Essential C++ 學習筆記第七章

openFOAM combustion模塊學習筆記——程序結構

CHEMKIN III 學習筆記

Essential C++ 學習筆記第三章

openFOAM中的forAll

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結