文章目錄
一、前言
爲了驗證不同SQL在大數據量下的執行性能,需要往數據庫批量插入幾十萬條數據。因爲這是一個很普遍的需求,所以網上應該會有現成的代碼。在一番搜索後,找到了下面的代碼,這段代碼實現了插入10萬條數據只需2秒鐘的功能。
package com.wave.checkin.wavecheckin.utils;
import java.io.BufferedReader;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Date;
public class MysqlBatchUtil {
private String sql = "INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)";
private String charset = "utf-8";
private String connectStr = "jdbc:mysql://localhost:3306/test";
private String username = "root";
private String password = "";
private void doStore() throws ClassNotFoundException, SQLException, IOException {
Class.forName("com.mysql.jdbc.Driver");
//此處是測試高效批次插入,去掉之後執行時普通批次插入
connectStr += "?useUnicode=true&characterEncoding=utf8&useServerPrepStmts=true&rewriteBatchedStatements=true&serverTimezone=GMT";
Connection conn = (Connection) DriverManager.getConnection(connectStr, username, password);
// 設置手動提交
conn.setAutoCommit(false);
int count = 0;
PreparedStatement psts = conn.prepareStatement(sql);
String line = null;
Date begin = new Date();
long time = System.currentTimeMillis();
for (int i = 0; i <= 100000; i++) {
psts.setString(1, i + "var");
psts.setInt(2, i);
psts.setDate(3, new java.sql.Date(time));
psts.setString(4, "1");
// 加入批量處理
psts.addBatch();
count++;
}
// 執行批量處理
psts.executeBatch();
// 提交
conn.commit();
Date end = new Date();
System.out.println("數量=" + count);
System.out.println("運行時間=" + (end.getTime() - begin.getTime()));
conn.close();
}
public static void main(String[] args) {
try {
new MysqlBatchUtil().doStore();
} catch (Exception e) {
e.printStackTrace();
}
}
}
雖說功能已經實現,但是原理還是得弄明白。
二、問題
爲了探究代碼背後的運行邏輯,我提出了幾個問題:
1、url後面useServerPrepStmts是什麼?
mysql官方文檔搜索useServerPrepStmts,找到了下面一段話:
Two variants of prepared statements are implemented by Connector/J, the client-side and the server-side prepared statements. Client-side prepared statements are used by default because early MySQL versions did not support the prepared statement feature or had problems with its implementation. Server-side prepared statements and binary-encoded result sets are used when the server supports them. To enable usage of server-side prepared statements, set useServerPrepStmts=true.
大意就是 Connector/J(也就是JBDC)預編譯分爲客戶端預編譯和服務端預編譯,默認是使用客戶端預編譯,因爲早期版本MYSQL不支持預編譯或者這功能有問題。如果要使用服務器預編譯,就設置useServerPrepStmts=true.。
那麼,什麼是服務器預編譯呢。繼續查。
MySQL 8.0 provides support for server-side prepared statements. This support takes advantage of the efficient client/server binary protocol. Using prepared statements with placeholders for parameter values has the following benefits:
1. Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as WHERE for queries and deletes, SET for updates, and VALUES for inserts.
2. Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.
大意就是MYSQL8.0支持服務端預編譯語句。這類語句將參數用佔位符替代,這樣做的好處有以下兩點:
- 減少每次語句執行時的語法解析。舉個例子,select * from user where state = ?,這是一個預編譯語句,只會解析一次,之後無論?傳入什麼參數都不需要再進行語法解析,達到“一次編譯、多次運行"的效果;對於普通語句,只要SQL不是百分百一樣,都需要進行語法解析。
- 防止SQL注入攻擊。 參數值可以包含未轉義的SQL引號和分隔符。
另外補充一點,MySQL Server 4.1之前的版本是不支持預編譯的,而Connector/J(也就是JBDC)在5.0.5以後的版本,默認是沒有開啓服務端預編譯功能的。
2、url後面rewriteBatchedStatements是什麼?
mysql官方文檔搜索rewriteBatchedStatements,找到了下面一段話:
Should the driver use multiqueries (irregardless of the setting of “allowMultiQueries”) as well as rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() is called? Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn’t sanitize input correctly. Notice that for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option, and that if you don’t specify stream lengths when using PreparedStatement.set*Stream(), the driver won’t be able to determine the optimum number of parameters per batch and you might receive an error from the driver that the resultant packet is too large. Statement.getGeneratedKeys() for these rewritten statements only works when the entire batch includes INSERT statements. Please be aware using rewriteBatchedStatements=true with INSERT … ON DUPLICATE KEY UPDATE that for rewritten statement server returns only one value as sum of all affected (or found) rows in batch and it isn’t possible to map it correctly to initial statements; in this case driver returns 0 as a result of each batch statement if total count was 0, and the Statement.SUCCESS_NO_INFO as a result of each batch statement if total count was > 0.
翻譯過來就是,在executeBatch()方法被執行時,是否使用多查詢(無論是否設置allowMultiQueries屬性)以及將用於插入的預編譯語句重寫爲多值插入?請注意,如果使用java.sql.Statements並且沒有對輸入進行校驗,那麼就這有可能遭到SQL注入攻擊。請注意,服務器預編譯語句當前無法利用此重寫選項,並且如果在使用PreparedStatement.set * Stream()時未指定流長度,則驅動程序將無法確定最佳的每批參數數量,您可能會從驅動程序收到錯誤消息,提示結果包太大。這些重寫語句的Statement.getGeneratedKeys()僅在整個批處理都包含INSERT語句時才起作用。請注意,在INSERT上使用rewriteBatchedStatements = true。
從這段話我們可以瞭解到,rewriteBatchedStatements對服務端無效,所以是作用於客戶端的。當這個參數設置爲true,在executeBatch()方法被執行時,預編譯語句會重寫成多值插入再傳給服務端,而不是一條條SQL傳給服務端。
3、這兩個參數對語句執行有什麼影響?
現在知道了rewriteBatchedStatements用於重寫客戶端預編譯語句,useServerPrepStmts用於開啓服務端預編譯功能。那麼,現在就開始驗證吧。筆者選用mysql5.7(已支持預編譯)+Connector/J 8.0.18(默認未開啓服務端預編譯)。
第一種情況,不加上rewriteBatchedStatements和useServerPrepStmts,循環插入兩條數據,通過mysql通用日誌查看語句執行情況。
?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT
2019-11-01T01:43:51.832520Z 79 Query SHOW WARNINGS
2019-11-01T01:43:51.836262Z 79 Query SET NAMES utf8mb4
2019-11-01T01:43:51.836420Z 79 Query SET character_set_results = NULL
2019-11-01T01:43:51.836653Z 79 Query SET autocommit=1
2019-11-01T01:43:51.841252Z 79 Query SET autocommit=0
2019-11-01T01:43:51.865821Z 79 Query SELECT @@session.transaction_read_only
2019-11-01T01:43:51.866303Z 79 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1')
2019-11-01T01:43:51.891904Z 79 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1')
2019-11-01T01:43:51.892136Z 79 Query commit
2019-11-01T01:43:51.977174Z 79 Query rollback
2019-11-01T01:43:51.981484Z 79 Quit
可以看出,服務端沒有進行預編譯,使用Query命令執行了兩條新增SQL語句。
第二種情況,設置useServerPrepStmts=true。
?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&useServerPrepStmts=true
2019-11-01T02:23:46.145078Z 89 Query SHOW WARNINGS
2019-11-01T02:23:46.148454Z 89 Query SET NAMES utf8mb4
2019-11-01T02:23:46.148588Z 89 Query SET character_set_results = NULL
2019-11-01T02:23:46.148809Z 89 Query SET autocommit=1
2019-11-01T02:23:46.153224Z 89 Query SET autocommit=0
2019-11-01T02:23:46.166311Z 89 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)
2019-11-01T02:23:46.171129Z 89 Query SELECT @@session.transaction_read_only
2019-11-01T02:23:46.171392Z 89 Query SELECT @@session.transaction_read_only
2019-11-01T02:23:46.175887Z 89 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1')
2019-11-01T02:23:46.204478Z 89 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1')
2019-11-01T02:23:46.204878Z 89 Query commit
2019-11-01T02:23:46.487619Z 89 Query rollback
2019-11-01T02:23:46.492423Z 89 Quit
可以看出,服務端進行預編譯Prepare,使用Execute命令執行了兩條新增SQL語句。
第三種情況,設置rewriteBatchedStatements=true。
?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true
2019-11-01T02:40:19.644778Z 90 Query SHOW WARNINGS
2019-11-01T02:40:19.650422Z 90 Query SET NAMES utf8mb4
2019-11-01T02:40:19.650650Z 90 Query SET character_set_results = NULL
2019-11-01T02:40:19.650932Z 90 Query SET autocommit=1
2019-11-01T02:40:19.655743Z 90 Query SET autocommit=0
2019-11-01T02:40:19.680848Z 90 Query SELECT @@session.transaction_read_only
2019-11-01T02:40:19.681998Z 90 Query INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1')
2019-11-01T02:40:19.700368Z 90 Query commit
2019-11-01T02:40:19.725935Z 90 Query rollback
2019-11-01T02:40:19.730085Z 90 Quit
可以看出,服務端進行沒有進行預編譯,使用Query命令執行了一條多值的新增SQL語句。
第四種情況,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true。
?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true&useServerPrepStmts=true
2019-11-01T02:42:25.799594Z 93 Query SHOW WARNINGS
2019-11-01T02:42:25.803008Z 93 Query SET NAMES utf8mb4
2019-11-01T02:42:25.803138Z 93 Query SET character_set_results = NULL
2019-11-01T02:42:25.803354Z 93 Query SET autocommit=1
2019-11-01T02:42:25.807759Z 93 Query SET autocommit=0
2019-11-01T02:42:25.822828Z 93 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)
2019-11-01T02:42:25.826277Z 93 Query SELECT @@session.transaction_read_only
2019-11-01T02:42:25.827500Z 93 Prepare INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?),(?,?,?,?)
2019-11-01T02:42:25.832597Z 93 Execute INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1')
2019-11-01T02:42:25.851448Z 93 Close stmt
2019-11-01T02:42:25.851605Z 93 Query commit
2019-11-01T02:42:25.880361Z 93 Query rollback
2019-11-01T02:42:25.884408Z 93 Quit
可以看出,服務端進行進行了兩次預編譯Prepare,使用Execute命令執行了一條多值SQL語句。
4、這兩個參數能帶來多大的性能提升?
分別對上面提到的四種情況做測試,每一種情況下插入10萬條數據,執行5次,記錄時間,結果如下:
第一種情況,不設置rewriteBatchedStatements和useServerPrepStmts。
10934 8849 8195 11846 9388
第二種情況,設置useServerPrepStmts=true。
9837 7045 7553 7195 7931
第三種情況,設置rewriteBatchedStatements=true。
2799 1481 1456 1180 1543
第四種情況,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true。
4211 1974 1867 2322 2381
不難看出,只設置rewriteBatchedStatements=true帶來的性能提升是很可觀的,而只設置useServerPrepStmts=true只帶來了一點點的提升。很有意思的一點是,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true比只設置rewriteBatchedStatements=true的性能要略差一點。
結合mysql通用日誌進行分析,可以得到以下結論:
1) 批量插入大量數據時設置rewriteBatchedStatements=true可以重寫SQL語句,將多條SQL合併成一條SQL,再交由服務端處理,從而大大減少執行時間。
2) 只設置useServerPrepStmts=true,可以略微提升性能,是因爲Prepare-Execute的執行模式要比單一的Query更快。
3) 爲什麼同時設置rewriteBatchedStatements=true和useServerPrepStmts=true比只設置rewriteBatchedStatements=true的性能要略差一點,是因爲Prepare本身是有開銷的,在只需要執行一條SQL的時候,這種開銷相對來說會比較大。
5、psts.addBatch();如果把這一段註釋掉,SQL就不會執行了,爲什麼?
for (int i = 0; i <= 100000; i++) {
psts.setString(1, i + "var");
psts.setInt(2, i);
psts.setDate(3, new java.sql.Date(time));
psts.setString(4, "1");
// 加入批量處理
psts.addBatch();
count++;
}
public void addBatch() throws SQLException {
try {
synchronized(this.checkClosed().getConnectionMutex()) {
QueryBindings<?> queryBindings = ((PreparedQuery)this.query).getQueryBindings();
queryBindings.checkAllParametersSet();
this.query.addBatch(queryBindings.clone()); //
}
} catch (CJException var6) {
throw SQLExceptionsMapping.translateException(var6, this.getExceptionInterceptor());
}
}
AbstractQuery類:
public void addBatch(Object batch) {
if (this.batchedArgs == null) {
this.batchedArgs = new ArrayList();
}
this.batchedArgs.add(batch); //this.batchedArgs爲protect List<Object> batchedArgs;
}
總的來說,psts.addBatch(); 所做的就是把每一條新增SQL的參數給存到批量參數列表中,在調用psts.executeBatch方法時,將批量參數列表發送給服務端。
三、總結
之所以插入10萬數據只需2秒,主要原因是URL設置了rewriteBatchedStatements=true。
四、疑問
在做性能測試的時候,每一種情況的第一次運行都會比後面幾次要慢,但是MYSQL通用日誌是一模一樣的,這就很奇怪了。有時間的話再另外寫一篇博客去探究下。