在向Mysql中插入Emoj表情時,會出現錯誤,類似於:
ncorrect string value: '\xF0\x9F\x98\x81' for column 'XXXXXX' at row X;
這是由於編碼的問題。
比如使用python的MySQLdb連接MySQL時默認的charset是latin1,需要自己指定charset=’utf8′,即使是在服務器端的init-connect=’SET NAMES utf8′,MySQLdb也會使用latin1覆蓋該選項;
hibernate中可以這樣修改:
session.doReturningWork(new ReturningWork<Object>() {
@Override
public Object execute(Connection conn) throws SQLException
{
try(Statement stmt = conn.createStatement()) {
stmt.executeQuery("SET NAMES utf8mb4");
}
return null;
}
});
emoji表情與utf8mb4
關於emoji表情的話mysql的utf8是不支持,需要修改設置爲utf8mb4,才能支持,詳細emoji表情與utf8mb4的關係。MYSQL 5.5 之前, UTF8 編碼只支持1-3個字節,只支持BMP這部分的unicode編碼區, BMP是從哪到哪,到http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters這裏看,基本就是0000~FFFF這一區。 從MYSQL5.5開始,可支持4個字節UTF編碼utf8mb4,一個字符最多能有4字節,所以能支持更多的字符集。
utf8mb4 is a superset of utf8
utf8mb4兼容utf8,且比utf8能表示更多的字符。
客戶端
jdbc的連接字符串不支持utf8mb4,這個這種方式來解決的,如果服務器端設置了character_set_server=utf8mb4,則客戶端會自動將傳過去的utf-8視作utf8mb4。
-
Connector/J did not support
utf8mb4
for servers 5.5.2 and newer.Connector/J now auto-detects servers configured with
character_set_server=utf8mb4
or treats the Java encodingutf-8
passed usingcharacterEncoding=...
asutf8mb4
in theSET NAMES=
calls it makes when establishing the connection. (Bug #54175)
其他的client端,比如php、python需要看下client是否支持,如果不能在連接字符串中指定的話,可以在獲取連接之後,執行”set names utf8mb4″來解決這個問題;
因爲utf8mb4是utf8的超集,理論上即使client修改字符集爲utf8mb4,也會不會對已有的utf8編碼讀取產生任何問題。
服務端:
1 修改database,table,column字符集
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
2 修改my.ini(linux下爲my.cnf)
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect=’SET NAMES utf8mb4′
3 重新啓動Mysql,檢查字符集
mysql> SHOW VARIABLES WHERE Variable_name LIKE ‘character_set_%’ OR Variable_name LIKE ‘collation%';
+————————–+——————–+
| Variable_name | Value |
+————————–+——————–+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+————————–+——————–+
rows in set (0.00 sec)
4 如果是用java連接的mysql,需要升級mysql-connector-java.jar至少到5.1.22
init-connect 選項不起作用的原因
init-connect=’SET NAMES utf8′
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
這三個選項應該配置的是服務器端的,而我們設置的character_set_server=utf8,默認這三個選項就是utf8的,因此這個指定我覺得沒有作用。
Note that the content of init_connect is not executed for users that have the SUPER privilege. This is done so that an erroneous value for init_connect does not prevent all clients from connecting. For example, the value might contain a statement that has a syntax error, thus causing client connections to fail. Not executing init_connect for users that have the SUPER privilege enables them to open a connection and fix the init_connect value.
通過python進行測試
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’,charset=’gb2312′)
mysql> show grants for ‘admin2′@’127.0.0.1′;
+—————————————————————–+
| Grants for [email protected] |
+—————————————————————–+
| GRANT USAGE ON *.* TO ‘admin2′@’127.0.0.1′ |
| GRANT SELECT, INSERT ON `test`.`test` TO ‘admin2′@’127.0.0.1′ |
+—————————————————————–+
2 rows in set (0.00 sec)
但是測試腳本執行的結果仍然顯示亂碼,並且character_set_client、character_set_results爲MySQLdb.connect連接方法中設置的參數,init-connect沒有執行。猜測假如init-connect=’SET NAMES utf8′按照文檔中所說的在連接連上server之後,執行set操作,session級別的參數character_set_client、character_set_results應該爲utf8
但是將init-connect改爲”insert into test.test values(‘hello’)”,執行結果顯示插入了hello
將mysql的log開啓之後發現,對於使用python下面的MySQLdb來說,其中set autocommit=0是MySQLdb默認的方式。
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’,charset=’gb2312′)
MySQLdb先執行init-connect的SET NAMES utf8,然後將charset=’gb2312′解釋爲SET NAMES gb2312執行,所以使用不同語言的客戶端的時候最好都強制對字符集進行指定或者深入調查清楚默認的行爲。
101118 0:27:52 1 Connect [email protected] on test
1 Query SET NAMES utf8
1 Query SET NAMES gb2312
1 Query set autocommit=0
conn=MySQLdb.connect(host=’127.0.0.1′,user=’admin2′,passwd=”,db=’test’)
101118 0:27:52 1 Connect [email protected] on test
1 Query SET NAMES utf8
1 Query set autocommit=0
參考鏈接:
http://afei2.sinaapp.com/?p=202
http://afei2.sinaapp.com/?p=518