What are the differences between UTF-8, UTF-16, and UTF-32?

原創

2020-05-06 21:57

Answer1

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.

UTF-16 is better where ASCII is not predominant, since it uses 2 bytes per character, primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 bytes for most characters.

UTF-32 will cover all possible characters in 4 bytes. This makes it pretty bloated. I can’t think of any advantage to using it.

Answer2

UTF-8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.
UTF-16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.
UTF-32: Fixed-width encoding. All code points take four bytes. An enormous memory hog, but fast to operate on. Rarely used.

reference

article： https://stackoverflow.com/questions/496321/utf-8-utf-16-and-utf-32
How unicode works：https://www.youtube.com/watch?v=MijmeoH9LT4

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

C++：map.insert插入重複鍵（已存在鍵）將忽略，而非值覆蓋

C++：map.insert插入重複鍵（已存在鍵）將忽略，而非值覆蓋測試代碼： #include <iostream> #include <map> using namespace std; int main() {

2020-07-05 00:51:45

C：strerror（或 inet_ntoa）返回值默認整型截斷導致進程核心轉儲 core dumped

C：strerror（或 inet_ntoa）返回值默認整型截斷導致進程核心轉儲 core dumped 測試環境： [test1280@localhost ~]$ uname -a Linux localhost.locald

2020-07-05 00:51:44

openssl: 兼容openssl1.1.0及舊版本

openssl: 兼容openssl1.1.0及舊版本 openssl 1.1.0+ 版本和低版本有很多接口不兼容問題，例如： openssl 1.1.0+ 版本中，很多 struct 是不透明的，不能在棧中直接聲明變量，需要通過

2020-07-05 00:51:34

openssl: error: storage size of ‘ctx’ isn’t known

openssl: error: storage size of ‘ctx’ isn’t known 問題 Code: #include <stdio.h> #include <stdlib.h> #include <unistd.

2020-07-05 00:51:34

openssl: HMAC算法實現樣例

openssl: HMAC算法實現樣例算法實現樣例： HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512 HMAC-MD5 Code： #include <st

2020-07-05 00:51:34

openssl:獲取openssl版本號

openssl:獲取openssl版本號一、代碼 #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <openssl/opensslv.h>

2020-07-05 00:51:34

加解密：基於 openssl 實現 des ede3 cbc pkcs#5 算法

加解密：基於 openssl 實現 des ede3 cbc pkcs#5 算法 Code： #include <stdio.h> #include <stdlib.h> #include <openssl/evp.h> #inc

2020-07-05 00:51:34

LeetCode447. Number of Boomerangs

Description Given n points in the plane that are all pairwise distinct, a “boomerang” is a tuple of points (i, j, k

2020-07-04 00:02:32

【git相關1】一些基本命令

1.git clone ssh://用戶名@ip地址:29418/fw clone後面的ssh地址一般如上圖所示； 2.git clone之後，可以查看當前git目錄的branch版本 git branch -a : 可以查看當前git

2020-07-03 10:49:41

C語言複習0.1

C語言複習0.1 C程序編譯步驟預處理：宏定義展開、頭文件展開、條件編譯等，同時將代碼中的註釋刪除，這裏並不會檢查語法 gcc -E hello.c -o hello.i 編譯：檢查語法，將預處理後文件編譯生成彙編文件

2020-07-03 00:01:47

一招讓你徹底掌握C語言中運用宏以及#與##的妙用

學習C語言，特別是閱讀linux源碼的時候，大家經常遇到很多的宏定義，有簡單的，當然也有很複雜的。有事一個宏定義甚至有幾十行之多，遇到這種宏定義的大家基本上是一臉懵逼，不知所措，其實想複雜的宏定義沒有去深究的價值，簡短的纔有深究

2020-07-01 22:01:30

C語言庫自帶的二分查找函數bsearch函數的使用示例

bsearch 使用二分查找，查找一個被排序過的數組依賴頭文件 #include <stdlib.h> void *bsearch(const void *key, const void *base,

2020-07-01 22:01:30

C語言高級技巧-在Makefile中引用你的頭文件

在Makefile中添加頭文代碼倉庫：Makefile中添加頭文件引用我們常這樣寫C程序： #inlcude <stdio.h> int main(int argc, char *argv[]) { printf(

2020-07-01 22:01:30

enum-使用宏管理你的枚舉型數據

高級枚舉型定義實現 C源文件 #include <sys/stat.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h>

2020-07-01 22:01:30

面向對象的C語言編程-DynamicLinkageGenericFunctions--C語言中的偷樑換柱

文章目錄`DynamicLinkageGenericFunctions`Constructors and Destructors方法、信息、類和對象`new``delete``clone``differ``sizeOf``main

2020-07-01 22:01:30

24小時熱門文章

最新文章

最新評論文章