openFOAM代碼閱讀——基礎代碼中的wchar

路徑src/OpenFOAM/primitives/chars中還有另外一個文件夾wchar,我們這裏來看看這裏面有什麼。

頭文件wchar.H的內容如下:

#include <cwchar>
#include <string>

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

namespace Foam
{

class Istream;
class Ostream;

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

//- Output wide character (Unicode) as UTF-8
Ostream& operator<<(Ostream&, const wchar_t);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const wchar_t*);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const std::wstring&);


// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

} // End namespace Foam

主體內容和前面一個博客中的char文件夾類似,都是給IstreamOstream進行了運算符>><<的重定義,這裏的註釋提示了是針對output wide character (Unicode) as UTF-8,就是在輸出的同時還需要轉碼成UTF-8

我們再來看看這三個運算符重定義的具體實現,即wcharIO.C文件:

#include "error.H"

#include "wchar.H"
#include "IOstreams.H"

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t wc)
{
    if (!(wc & ~0x0000007F))
    {
        // 0x00000000 - 0x0000007F: (1-byte output)
        // 0xxxxxxx
        os.write(char(wc));
    }
    else if (!(wc & ~0x000007FF))
    {
        // 0x00000080 - 0x000007FF: (2-byte output)
        // 110bbbaa 10aaaaaa
        os.write(char(0xC0 | ((wc >> 6) & 0x1F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x0000FFFF))
    {
        // 0x00000800 - 0x0000FFFF: (3-byte output)
        // 1110bbbb 10bbbbaa 10aaaaaa
        os.write(char(0xE0 | ((wc >> 12) & 0x0F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x001FFFFF))
    {
        // 0x00010000 - 0x001FFFFF: (4-byte output)
        // 11110ccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF0 | ((wc >> 18) & 0x07)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x03FFFFFF))
    {
        // 0x00200000 - 0x03FFFFFF: (5-byte output)
        // 111110dd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF8 | ((wc >> 24) & 0x03)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x7FFFFFFF))
    {
        // 0x04000000 - 0x7FFFFFFF: (6-byte output)
        // 1111110d 10dddddd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xFC | ((wc >> 30) & 0x01)));
        os.write(char(0x80 | ((wc >> 24) & 0x3F)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else
    {
        // according to man page utf8(7)
        // the Unicode standard specifies no characters above 0x0010FFFF,
        // so Unicode characters can only be up to four bytes long in UTF-8.

        // report anything unknown/invalid as replacement character U+FFFD
        os.write(char(0xEF));
        os.write(char(0xBF));
        os.write(char(0xBD));
    }

    os.check("Ostream& operator<<(Ostream&, const wchar_t)");
    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t* wstr)
{
    if (wstr)
    {
        for (const wchar_t* iter = wstr; *iter; ++iter)
        {
            os  << *iter;
        }
    }

    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const std::wstring& wstr)
{
    for
    (
        std::wstring::const_iterator iter = wstr.begin();
        iter != wstr.end();
        ++iter
    )
    {
        os  << *iter;
    }

    return os;
}

哇,略長。首先是error.H頭文件,在路徑src/OpenFOAM/db/error中,但是我們暫時不看這裏,IOstreams.H也先忽略。

其中的wc變量,可以百度一下https://baike.baidu.com/item/wchar_t/8562830?fr=aladdin。他是char類型定義一個擴展表達,可以表示更多的字符類型,代價是需要用更多的字節數,且在不同的庫中可能會用不同的字節數,最多爲4字節。

我們知道char是1字節,所以轉換之前,我們需要先判斷是否超過1字節。判斷方法用到了位運算。0x0000 007F是一個十六進制數,恰好四個字節,轉換成二進制數爲:

0b00000000,00000000,00000000,01111111

再來看判斷語句!(wc & ~0x0000007F),去反後用&進行按位與運算,即和如下進行與運算

0b11111111,11111111,11111111,10000000

就是說只要wc的二進制編碼中超過7位非零,上述按位與運算就是非零的,再用!取反之後,得到判斷結果爲0。就是說這個判斷可以判斷wc是否會超出1字節。

後面的幾個判斷是類似的,超出的部分會分成多個char進行輸出(這樣也行???)

然後再看程序中的另外幾個運算符重定義,其實也是多態,只不過將輸入值分別給成wchar_t數組的首地址,或者是std::wstring,這個參考這篇帖子https://blog.csdn.net/qq_28388835/article/details/81172675,相當於用wchar_t類型定義的string

比較有趣的是當前的wchar_t數組使用到了迭代器,這個用法在STL中經常出現,但是直接給數組使用的還是第一次見。可能openFOAM內部實現了數組的迭代器(這個就很強了),在《Essential C++》中介紹STL之前也實現過類似的功能,這對編程的技巧要求非常高。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章