MySQL數據庫查詢結果亂碼,這是大家比較常見的情形。到底是什麼原因導致出現查詢結果爲亂碼呢,本文主要通過演示來理解亂碼產生的原因,以及如何解決字符集亂碼,供大家參考。
一、字符編碼對比
SELECT hex(convert('love' USING latin1)) latin_value,
hex(convert('love' USING gb2312)) gb2312_value,
hex(convert('love' USING gbk)) gbk_value,
hex(convert('love' USING utf8)) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| 6C6F7665 | 6C6F7665 | 6C6F7665 | 6C6F7665 |
+-------------+--------------+-----------+------------+
SELECT hex(convert('愛' USING latin1)) latin_value,
hex(convert('愛' USING gb2312)) gb2312_value,
hex(convert('愛' USING gbk)) gbk_value,
hex(convert('愛' USING utf8)) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| 3F | B0AE | B0AE | E788B1 |
+-------------+--------------+-----------+------------+
SELECT convert(0x3F USING latin1) latin_value,
convert(0xB0AE USING gb2312) gb2312_value,
convert(0xB0AE USING gbk) gbk_value,
convert(0xE788B1 USING utf8) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| ? | 愛 | 愛 | 愛 |
+-------------+--------------+-----------+------------+
二、亂碼測試
1、環境準備
# grep -Ev "^#|^$" /etc/my.cnf -- 查看當前my.cnf配置
mysql> show variables like 'version';
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| version | 5.7.23-log |
+---------------+------------+
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
DROP TABLE IF EXISTS sakila.colum_charset;
CREATE TABLE sakila.colum_charset
(
id int not null auto_increment primary key,
c1 varchar(20),
c2 char(20) CHAR SET gbk,
c3 varchar(20) CHARSET gb2312,
c4 char(20) CHARACTER SET utf8,
c5 varchar(20) CHARSET utf8mb4
);
mysql> show create table sakila.colum_charset\G
*************************** 1. row ***************************
Table: colum_charset
Create Table: CREATE TABLE `colum_charset` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c1` varchar(20) DEFAULT NULL,
`c2` char(20) CHARACTER SET gbk DEFAULT NULL,
`c3` varchar(20) CHARACTER SET gb2312 DEFAULT NULL,
`c4` char(20) CHARACTER SET utf8 DEFAULT NULL,
`c5` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1
2、基於默認字符插入數據(gbk)
– character_set_client gbk
– character_set_connection gbk
– character_set_results gbk
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'愛','愛','愛','愛','愛');
ERROR 1366 (HY000): Incorrect string value: '\xB0\xAE' for column 'c1' at row 1
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'love','愛','愛','愛','愛');
Query OK, 1 row affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+------+------+------+------+------+
| 1 | love | 愛 | 愛 | 愛 | 愛 |
+----+------+------+------+------+------+
3、三個變量全部設置爲utf8插入數據
mysql> set names ‘utf8’;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','心','心','心','心');
Query OK, 1 row affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 愛 | 愛 | 愛 | 愛 |
| 2 | heart | 心 | 心 | 心 | 心 |
+----+-------+------+------+------+------+
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','屌','屌','屌','屌'); -- c3列爲gb2312編碼
ERROR 1366 (HY000): Incorrect string value: '\xE5\xB1\x8C' for column 'c3' at row 1
4、單個變量character_set_connection設置爲latin1插入數據
mysql> set character_set_connection=latin1;
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','情','情','情','情');
Query OK, 1 row affected, 4 warnings (0.00 sec)
mysql> show warnings \G
*************************** 1. row ***************************
Level: Warning
Code: 1300
Message: Invalid utf8 character string: '\xE6\x83\x85'
-- 亂碼出現
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 愛 | 愛 | 愛 | 愛 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
5、單個變量character_set_connection設置爲gb2312插入數據
mysql> set character_set_connection=gb2312;
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','屌','屌','屌','屌');
Query OK, 1 row affected, 4 warnings (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 愛 | 愛 | 愛 | 愛 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
6、單個變量character_set_results設置爲latin1
– 測試返回數據
mysql> set character_set_results=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | ? | ? | ? | ? |
| 2 | heart | ? | ? | ? | ? |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
6、單個變量character_set_results設置爲gb2312
– 測試返回數據
mysql> set character_set_results=gb2312;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | | | | |
| 2 | heart | | | | |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
6、單個變量character_set_results設置爲gbk
– 測試返回數據
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | | | | |
| 2 | heart | | | | |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
7、單個變量character_set_results設置爲utf8
– 測試返回數據
mysql> set character_set_results=utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 愛 | 愛 | 愛 | 愛 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
8、本地環境變量影響客戶端字符集設定
– 在my.cnf中未配置客戶端字符集,如果配置後,則使用配置文件中設定的字符集
[root@centos7 ~]# export LANG=en_US.UTF-8
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
[root@centos7 ~]# export LANG=zh_CN.GBK
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
[root@centos7 ~]# export LANG=zh_CN.GB2312
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gb2312 |
| character_set_connection | gb2312 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gb2312 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
==========================================================
結論:
character_set_client: 客戶端發送的數據是什麼編碼?
character_set_connection: 告訴字符集轉換器,轉換成什麼編碼?
character_set_results: 查詢的結果用什麼編碼?
如果以上三者都爲字符集N,可簡寫爲set names ‘N’;
亂碼產生的原因如下:
a、插入或讀取時對應編碼環節發生轉換導致數據丟失。
b、如果兩個字符集之間無法進行無損編碼轉換,一定會出現亂碼。
解決方案:
1、一定要保證character_set_connection字符集大於等於client字符集,否則會丟失數據
比如: latin1 < gb2312 < gbk < utf8,
若設置set character_set_client = gb2312,
那麼至少connection的字符集要大於等於gb2312,否則就會丟失數據
2、一定要保證character_set_results大於等於數據存入的字符集,否則會丟失數據
比如:如存儲的字符爲utf8,而返回character_set_results爲gbk,數據被截斷
3、所有變量使用統一的字符編碼,如utf8或者utf8mb4