#1, 字符集相關的參數名和概念
MySQL的字符集設置比較自由。可以設置很多種組合,相關的變量和參數有:
(root@localhost)[sample3]> show global variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec)
其中global variables表示全局變量。也就是默認情況下,新建立的數據庫如果不顯式的指定字符集相關參數,將使用這些參數。也叫做全局字符集變量
其中variables沒帶global,表示當前session生效的的參數。所謂當前session,表示如果更改過相關的參數,離開這個session以後,就會恢復默認的參數。也叫做連接時字符集變量
各個變量的概念:
1,character_set_client:客戶端字符集,即數據在client端時字符集狀態。
2,character_set_connection:連接時轉換字符集,即客戶端和服務端連接時,字符集裝換成的字符集
3,character_set_server:服務端處理時候使用的字符集
4,character_set_database:數據庫層面存儲默認使用的字符集
5,character_set_results:數據返回時所用的字符集
指定字符集參數有多種方式,
1,在編譯時指定,主要是:
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8-general_ci \
其中CHARSET是指字符集,COLLATION是指相關的校對規則(也稱排序規則)
2,參數文件,也就是my.cnf中設定
character_set_server=utf8
collation_server=utf8_general_ci
參數人間的設定將會覆蓋編譯時設定的字符集和校對規則。
3,啓動MySQL服務的時候指定:
--character_set_server: 指定全局粒度的默認字符集
--collation_server:指定全局粒度的默認校對規則
啓動時指定的參數將覆蓋參數文件以及編譯時指定的字符集和校對規則。
這些參數,如果從大方向分的話,可以分爲兩類:
1,連接時使用的字符集,即爲show variable like '%character%' 顯示的那些字符集
2,存儲時使用的字符集,分爲4個級別
1)SERVER,全局級別
2)DATABASE,數據庫級別
3)TABLE,表級別
4)column,列級別
2# MySQL查詢的基本過程和亂碼的形成以及如何避免亂碼
1)查詢的基本過程
MySQL查詢基本過程如下:
1, 程序將字符轉換成二進制格式
2,MySQL 客戶端發出查詢(client端字符集)====>到達server端連接器(connection字符集)====>內部轉換並查詢(列字符集、表字符集、數據庫字符集、server端字符集,轉換優先級逐級遞減)=====>查詢結果返回給result(result字符集)
這裏先要回答一個latin字符集爲何能存放漢字的問題(漢字每個字符佔用2個字節長度,latin不支持雙字節長度)。實際上是因爲OS/APP層已經將漢字轉換爲單字符串的形式。一般來說,爲了正常顯示和處理漢字,OS層面和程序層面也一定設定了字符集,這個字符集就會將漢字先一步處理成二進制。比如OS和程序層面設置的是UTF8,那麼實際上輸入一個漢字,我們在mysql client端實際上是獲得3個單字節,而非雙字節,這樣latin字符集就可以處理了。
這些字符集的轉換,實際上是字符長度的轉換。比如如果CLIENT端時LATIN1,connection是UTF8,那麼就會發生3個字節長度轉換成6個字節長度。每個單字符後面都會被填0,這樣變長了自然是沒事的,頂多顯示的時候再轉回來,把後面填的零都切掉。但是反過來,如果是client段是utf8,connection是latin1,那麼3個字節長度的utf8就會被轉成1個字節長度的latin1,後兩位丟失,以後是怎麼都變不回來的。這種丟失非零位的過程是不可逆的。所以我們必須得保證設定字符集的時候(列字符集、表字符集、數據庫字符集、server段字符集)>= connection字符集 >= client字符集 來避免字符編碼丟失問題。 一種可能發生的狀況: client(latin1)==>connection(utf8)==>服務端內部的存儲時字符集(latin1),並不會發生字符編碼丟失,因爲connection轉換到存儲時字符集時,只是切掉了client轉connection時後面填的零,相當於轉回來了。 3,實驗: 1)client字符集爲utf8,connection 爲latin1,存儲字符集爲latin1 (root@localhost)[sample3]> show create table test2; +-------+---------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------+ | test2 | CREATE TABLE `test2` ( `id` int(11) DEFAULT NULL, `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> insert into test2 values (1,'中國'); Query OK, 1 row affected, 1 warning (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 這種情況,已經丟失了字符,轉換result是沒有用的。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 2)client字符集爲utf8,connection爲utf8,存儲字符集爲latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> (root@localhost)[sample3]> insert into test2 values (1,'中國'); ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1 (root@localhost)[sample3]> 這裏直接報錯,5.6以後加強了數據庫數據的安全性,因爲會丟失數據,所以不允許插入。 3)client字符集爲latin1,connection爲utf8,存儲字符集爲latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+---------------+ | id | name | +------+---------------+ | 1 | ä¸å›½ | +------+---------------+ 1 row in set (0.00 sec) 亂碼了,這裏爲何會亂碼呢?按理說轉換過程中只會切掉填充的零纔對。實際上是result的問題。這個result是utf8,而client存的時候就是3位,到connection轉到6位,到存儲時轉回3位,這時候到result又轉到6位,自然是亂碼的。只要result轉回latin1,就可以了。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+--------+ | id | name | +------+--------+ | 1 | 中國 | +------+--------+ 1 row in set (0.00 sec) 4)不丟失字符,但是字符集不同的亂碼. (root@localhost)[sample2]> show create table test; +-------+-------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample2]> insert into test values('中國'); Query OK, 1 row affected (0.01 sec) (root@localhost)[sample2]> (root@localhost)[sample2]> select * from test; +------+ | name | +------+ | | +------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> set character_set_results=utf8; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample2]> select * from test; +--------+ | name | +--------+ | 中國 | +--------+ 1 row in set (0.00 sec) (root@localhost)[sample2]>