MySQL字符集亂碼

MySQL數據庫查詢結果亂碼,這是大家比較常見的情形。到底是什麼原因導致出現查詢結果爲亂碼呢,本文主要通過演示來理解亂碼產生的原因,以及如何解決字符集亂碼,供大家參考。

一、字符編碼對比

	SELECT hex(convert('love' USING latin1)) latin_value,
	       hex(convert('love' USING gb2312)) gb2312_value,
	       hex(convert('love' USING gbk))    gbk_value,
	       hex(convert('love' USING utf8))   utf8_value;
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| 6C6F7665    | 6C6F7665     | 6C6F7665  | 6C6F7665   |
	+-------------+--------------+-----------+------------+
	
	SELECT hex(convert('愛' USING latin1)) latin_value,
	       hex(convert('愛' USING gb2312)) gb2312_value,
	       hex(convert('愛' USING gbk))    gbk_value,
	       hex(convert('愛' USING utf8))   utf8_value;
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| 3F          | B0AE         | B0AE      | E788B1     |
	+-------------+--------------+-----------+------------+  
	
	SELECT convert(0x3F USING latin1)   latin_value,
	       convert(0xB0AE USING gb2312) gb2312_value,
	       convert(0xB0AE USING gbk)    gbk_value,
	       convert(0xE788B1 USING utf8) utf8_value;
   
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| ?           | 愛           | 愛        | 愛         |
	+-------------+--------------+-----------+------------+

二、亂碼測試

1、環境準備

	# grep -Ev "^#|^$" /etc/my.cnf   -- 查看當前my.cnf配置
	
	mysql> show variables like 'version';
	+---------------+------------+
	| Variable_name | Value      |
	+---------------+------------+
	| version       | 5.7.23-log |
	+---------------+------------+

	mysql> show variables like '%character%';
	
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gbk                        |
	| character_set_connection | gbk                        |
	| character_set_database   | latin1                     |
	| character_set_filesystem | binary                     |
	| character_set_results    | gbk                        |
	| character_set_server     | latin1                     |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+

	DROP TABLE IF EXISTS sakila.colum_charset;
	
	CREATE TABLE sakila.colum_charset
	(
	   id int not null auto_increment primary key,
	   c1 varchar(20),
	   c2 char(20) CHAR SET gbk,
	   c3 varchar(20) CHARSET gb2312,
	   c4 char(20) CHARACTER SET utf8,
	   c5 varchar(20) CHARSET utf8mb4
	);
	
	mysql> show create table sakila.colum_charset\G
	*************************** 1. row ***************************
	       Table: colum_charset
	Create Table: CREATE TABLE `colum_charset` (
	  `id` int(11) NOT NULL AUTO_INCREMENT,
	  `c1` varchar(20) DEFAULT NULL,
	  `c2` char(20) CHARACTER SET gbk DEFAULT NULL,
	  `c3` varchar(20) CHARACTER SET gb2312 DEFAULT NULL,
	  `c4` char(20) CHARACTER SET utf8 DEFAULT NULL,
	  `c5` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL,
	  PRIMARY KEY (`id`)
	) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1

2、基於默認字符插入數據(gbk)
– character_set_client gbk
– character_set_connection gbk
– character_set_results gbk

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'愛','愛','愛','愛','愛');
	        
	ERROR 1366 (HY000): Incorrect string value: '\xB0\xAE' for column 'c1' at row 1

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'love','愛','愛','愛','愛');
	Query OK, 1 row affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+------+------+------+------+------+
	| id | c1   | c2   | c3   | c4   | c5   |
	+----+------+------+------+------+------+
	|  1 | love | 愛   | 愛   | 愛   | 愛   |
	+----+------+------+------+------+------+

3、三個變量全部設置爲utf8插入數據
mysql> set names ‘utf8’;
Query OK, 0 rows affected (0.00 sec)

	mysql> show variables like '%character%';
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | utf8                       |
	| character_set_connection | utf8                       |
	| character_set_database   | latin1                     |
	| character_set_filesystem | binary                     |
	| character_set_results    | utf8                       |
	| character_set_server     | latin1                     |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','心','心','心','心');        
	Query OK, 1 row affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 愛   | 愛   | 愛   | 愛   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	+----+-------+------+------+------+------+

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','屌','屌','屌','屌');  -- c3列爲gb2312編碼
	        
	ERROR 1366 (HY000): Incorrect string value: '\xE5\xB1\x8C' for column 'c3' at row 1

4、單個變量character_set_connection設置爲latin1插入數據
mysql> set character_set_connection=latin1;

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','情','情','情','情');
	Query OK, 1 row affected, 4 warnings (0.00 sec)
	
	mysql> show warnings \G
	*************************** 1. row ***************************
	  Level: Warning
	   Code: 1300
	Message: Invalid utf8 character string: '\xE6\x83\x85'
	
	-- 亂碼出現
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 愛   | 愛   | 愛   | 愛   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+        

5、單個變量character_set_connection設置爲gb2312插入數據
mysql> set character_set_connection=gb2312;

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','屌','屌','屌','屌');
	
	Query OK, 1 row affected, 4 warnings (0.00 sec)
    
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 愛   | 愛   | 愛   | 愛   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+

6、單個變量character_set_results設置爲latin1
– 測試返回數據

	mysql> set character_set_results=latin1;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | ?    | ?    | ?    | ?    |
	|  2 | heart | ?    | ?    | ?    | ?    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

6、單個變量character_set_results設置爲gb2312
– 測試返回數據

	mysql> set character_set_results=gb2312;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  |    |    |    |    |
	|  2 | heart |    |    |    |    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

6、單個變量character_set_results設置爲gbk
– 測試返回數據
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)

	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  |    |    |    |    |
	|  2 | heart |    |    |    |    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

7、單個變量character_set_results設置爲utf8
– 測試返回數據

	mysql> set character_set_results=utf8;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 愛   | 愛   | 愛   | 愛   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)             

8、本地環境變量影響客戶端字符集設定
– 在my.cnf中未配置客戶端字符集,如果配置後,則使用配置文件中設定的字符集

	[root@centos7 ~]# export LANG=en_US.UTF-8
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | utf8                       |
	| character_set_connection | utf8                       |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | utf8                       |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	[root@centos7 ~]# export LANG=zh_CN.GBK
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gbk                        |
	| character_set_connection | gbk                        |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | gbk                        |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	[root@centos7 ~]# export LANG=zh_CN.GB2312
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gb2312                     |
	| character_set_connection | gb2312                     |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | gb2312                     |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+

==========================================================
結論:
character_set_client: 客戶端發送的數據是什麼編碼?
character_set_connection: 告訴字符集轉換器,轉換成什麼編碼?
character_set_results: 查詢的結果用什麼編碼?
如果以上三者都爲字符集N,可簡寫爲set names ‘N’;

亂碼產生的原因如下:
a、插入或讀取時對應編碼環節發生轉換導致數據丟失。
b、如果兩個字符集之間無法進行無損編碼轉換,一定會出現亂碼。

解決方案:
1、一定要保證character_set_connection字符集大於等於client字符集,否則會丟失數據
比如: latin1 < gb2312 < gbk < utf8,
若設置set character_set_client = gb2312,
那麼至少connection的字符集要大於等於gb2312,否則就會丟失數據
2、一定要保證character_set_results大於等於數據存入的字符集,否則會丟失數據
比如:如存儲的字符爲utf8,而返回character_set_results爲gbk,數據被截斷

3、所有變量使用統一的字符編碼,如utf8或者utf8mb4

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章