Go語言字符類型（byte和rune）,for-range結構輸出中文字符

字符串中的每一個元素叫做“字符”，在遍歷或者單個獲取字符串元素時可以獲得字符。

Go語言的字符有以下兩種：

一種是 uint8 類型，或者叫 byte 型，代表了 ASCII 碼的一個字符。
另一種是 rune 類型，代表一個 UTF-8 字符，當需要處理中文、日文或者其他複合字符時，則需要用到 rune 類型。rune 類型等價於 int32 類型。

byte 類型是 uint8 的別名，對於只佔用 1 個字節的傳統 ASCII 編碼的字符來說，完全沒有問題，例如 var ch byte = 'A'，字符使用單引號括起來。

在 ASCII 碼錶中，A 的值是 65，使用 16 進製表示則爲 41，所以下面的寫法是等效的：

var ch byte = 65 或 var ch byte = '\x41' //（\x 總是緊跟着長度爲 2 的 16 進制數）

另外一種可能的寫法是 \後面緊跟着長度爲 3 的八進制數，例如 \377。

Go語言同樣支持 Unicode（UTF-8），因此字符同樣稱爲 Unicode 代碼點或者 runes，並在內存中使用 int 來表示。在文檔中，一般使用格式 U+hhhh 來表示，其中 h 表示一個 16 進制數。

在書寫 Unicode 字符時，需要在 16 進制數之前加上前綴\u或者\U。因爲 Unicode 至少佔用 2 個字節，所以我們使用 int16 或者 int 類型來表示。如果需要使用到 4 字節，則使用\u前綴，如果需要使用到 8 個字節，則使用\U前綴。

var ch int = '\u0041'
var ch2 int = '\u03B2'
var ch3 int = '\U00101234'
fmt.Printf("%d - %d - %d\n", ch, ch2, ch3) // integer
fmt.Printf("%c - %c - %c\n", ch, ch2, ch3) // character
fmt.Printf("%X - %X - %X\n", ch, ch2, ch3) // UTF-8 bytes
fmt.Printf("%U - %U - %U", ch, ch2, ch3) // UTF-8 code point

輸出：

65 - 946 - 1053236
A - β - r
41 - 3B2 - 101234
U+0041 - U+03B2 - U+101234

格式化說明符%c用於表示字符，當和字符配合使用時，%v或%d會輸出用於表示該字符的整數，%U 輸出格式爲 U+hhhh 的字符串。

Unicode 包中內置了一些用於測試字符的函數，這些函數的返回值都是一個布爾值，如下所示（其中 ch 代表字符）：

判斷是否爲字母：unicode.IsLetter(ch)
判斷是否爲數字：unicode.IsDigit(ch)
判斷是否爲空白符號：unicode.IsSpace(ch)

UTF-8 和 Unicode 有何區別？

Unicode 與 ASCII 類似，都是一種字符集。

字符集爲每個字符分配一個唯一的 ID，我們使用到的所有字符在 Unicode 字符集中都有一個唯一的 ID，例如上面例子中的 a 在 Unicode 與 ASCII 中的編碼都是 97。漢字“你”在 Unicode 中的編碼爲 20320，在不同國家的字符集中，字符所對應的 ID 也會不同。而無論任何情況下，Unicode 中的字符的 ID 都是不會變化的。

UTF-8 是編碼規則，將 Unicode 中字符的 ID 以某種方式進行編碼，UTF-8 的是一種變長編碼規則，從 1 到 4 個字節不等。編碼規則如下：

0xxxxxx 表示文字符號 0～127，兼容 ASCII 字符集。
從 128 到 0x10ffff 表示其他字符。

根據這個規則，拉丁文語系的字符編碼一般情況下每個字符佔用一個字節，而中文每個字符佔用 3 個字節。

廣義的 Unicode 指的是一個標準，它定義了字符集及編碼規則，即 Unicode 字符集和 UTF-8、UTF-16 編碼等。

for-range 結構

這是 Go 特有的一種的迭代結構，您會發現它在許多情況下都非常有用。它可以迭代任何一個集合（包括數組和 map，詳見第 7 和 8 章）。語法上很類似其它語言中 foreach 語句，但您依舊可以獲得每次迭代所對應的索引。一般形式爲：for ix, val := range coll { }。

要注意的是，val 始終爲集合中對應索引的值拷貝，因此它一般只具有隻讀性質，對它所做的任何修改都不會影響到集合中原有的值（譯者注：如果 val 爲指針，則會產生指針的拷貝，依舊可以修改集合中的原值）。一個字符串是 Unicode 編碼的字符（或稱之爲 rune）集合，因此您也可以用它迭代字符串：

for pos, char := range str {
...
}

每個 rune 字符和索引在 for-range 循環中是一一對應的。它能夠自動根據 UTF-8 規則識別 Unicode 編碼的字符。

示例 5.9 range_string.go：

package main

import "fmt"

func main() {
	str := "Go is a beautiful language!"
	fmt.Printf("The length of str is: %d\n", len(str))
	for pos, char := range str {
		fmt.Printf("Character on position %d is: %c \n", pos, char)
	}
	fmt.Println()
	str2 := "Chinese: 中國話"
	fmt.Printf("The length of str2 is: %d\n", len(str2))
	for pos, char := range str2 {
    	fmt.Printf("character %c starts at byte position %d\n", char, pos)
	}
	fmt.Println()
	fmt.Println("index int(rune) rune    char bytes")
	for index, rune := range str2 {
    	fmt.Printf("%-2d      %d      %U '%c' % X\n", index, rune, rune, rune, []byte(string(rune)))
	}
}

輸出：

The length of str is: 27
Character on position 0 is: G 
Character on position 1 is: o 
Character on position 2 is:   
Character on position 3 is: i 
Character on position 4 is: s 
Character on position 5 is:   
Character on position 6 is: a 
Character on position 7 is:   
Character on position 8 is: b 
Character on position 9 is: e 
Character on position 10 is: a 
Character on position 11 is: u 
Character on position 12 is: t 
Character on position 13 is: i 
Character on position 14 is: f 
Character on position 15 is: u 
Character on position 16 is: l 
Character on position 17 is:   
Character on position 18 is: l 
Character on position 19 is: a 
Character on position 20 is: n 
Character on position 21 is: g 
Character on position 22 is: u 
Character on position 23 is: a 
Character on position 24 is: g 
Character on position 25 is: e 
Character on position 26 is: ! 

The length of str2 is: 18
character C starts at byte position 0
character h starts at byte position 1
character i starts at byte position 2
character n starts at byte position 3
character e starts at byte position 4
character s starts at byte position 5
character e starts at byte position 6
character : starts at byte position 7
character   starts at byte position 8
character 日 starts at byte position 9
character 本 starts at byte position 12
character 語 starts at byte position 15

index int(rune) rune    char bytes
0       67      U+0043 'C' 43
1       104      U+0068 'h' 68
2       105      U+0069 'i' 69
3       110      U+006E 'n' 6E
4       101      U+0065 'e' 65
5       115      U+0073 's' 73
6       101      U+0065 'e' 65
7       58      U+003A ':' 3A
8       32      U+0020 ' ' 20
9       26085      U+65E5 '中' E6 97 A5
12      26412      U+672C '國' E6 9C AC
15      35486      U+8A9E '話' E8 AA 9E

請將輸出結果和 Listing 5.7（for_string.go）進行對比。

我們可以看到，常用英文字符使用 1 個字節表示，而漢字使用 3 個字符表示。

Go語言字符類型（byte和rune）,for-range結構輸出中文字符

UTF-8 和 Unicode 有何區別？

for-range 結構

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

missing dependencies for the following libraries:libpthread.so.0.(OpenWRT)

DNS域名解析中A、AAAA、CNAME、MX、NS、TXT、SRV、SOA、PTR各項記錄

printf輸出格式%#x

const char * const name

CSP（communicating sequential processes）併發模型

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結