爲何插入10萬數據只需2秒

一、前言

    爲了驗證不同SQL在大數據量下的執行性能,需要往數據庫批量插入幾十萬條數據。因爲這是一個很普遍的需求,所以網上應該會有現成的代碼。在一番搜索後,找到了下面的代碼,這段代碼實現了插入10萬條數據只需2秒鐘的功能。

package com.wave.checkin.wavecheckin.utils;

import java.io.BufferedReader;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Date;

public class MysqlBatchUtil {
    private String sql = "INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)";
    private String charset = "utf-8";
    private String connectStr = "jdbc:mysql://localhost:3306/test";
    private String username = "root";
    private String password = "";
    
    private void doStore() throws ClassNotFoundException, SQLException, IOException {
        Class.forName("com.mysql.jdbc.Driver");
        //此處是測試高效批次插入,去掉之後執行時普通批次插入
        connectStr += "?useUnicode=true&characterEncoding=utf8&useServerPrepStmts=true&rewriteBatchedStatements=true&serverTimezone=GMT";
        Connection conn = (Connection) DriverManager.getConnection(connectStr, username, password);
        // 設置手動提交 
        conn.setAutoCommit(false); 
        int count = 0;
        PreparedStatement psts = conn.prepareStatement(sql);
        String line = null;
        Date begin = new Date();
        
        long time = System.currentTimeMillis();
        
        for (int i = 0; i <= 100000; i++) {
            psts.setString(1, i + "var");
            psts.setInt(2, i);
            psts.setDate(3, new java.sql.Date(time));
            psts.setString(4, "1");
             // 加入批量處理 
            psts.addBatch();  
            count++;
        }
         // 執行批量處理 
        psts.executeBatch();
        // 提交 
        conn.commit(); 
        Date end = new Date();
        System.out.println("數量=" + count);
        System.out.println("運行時間=" + (end.getTime() - begin.getTime()));
        conn.close();
    }
    
    public static void main(String[] args) {
        try {
            new MysqlBatchUtil().doStore();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

    雖說功能已經實現,但是原理還是得弄明白。

二、問題

    爲了探究代碼背後的運行邏輯,我提出了幾個問題:

1、url後面useServerPrepStmts是什麼?

    mysql官方文檔搜索useServerPrepStmts,找到了下面一段話:

Two variants of prepared statements are implemented by Connector/J, the client-side and the server-side prepared statements. Client-side prepared statements are used by default because early MySQL versions did not support the prepared statement feature or had problems with its implementation. Server-side prepared statements and binary-encoded result sets are used when the server supports them. To enable usage of server-side prepared statements, set useServerPrepStmts=true.

    大意就是 Connector/J(也就是JBDC)預編譯分爲客戶端預編譯和服務端預編譯,默認是使用客戶端預編譯,因爲早期版本MYSQL不支持預編譯或者這功能有問題。如果要使用服務器預編譯,就設置useServerPrepStmts=true.。
    那麼,什麼是服務器預編譯呢。繼續查。

MySQL 8.0 provides support for server-side prepared statements. This support takes advantage of the efficient client/server binary protocol. Using prepared statements with placeholders for parameter values has the following benefits:
    1. Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as WHERE for queries and deletes, SET for updates, and VALUES for inserts.
    2. Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.

    大意就是MYSQL8.0支持服務端預編譯語句。這類語句將參數用佔位符替代,這樣做的好處有以下兩點:

  1. 減少每次語句執行時的語法解析。舉個例子,select * from user where state = ?,這是一個預編譯語句,只會解析一次,之後無論?傳入什麼參數都不需要再進行語法解析,達到“一次編譯、多次運行"的效果;對於普通語句,只要SQL不是百分百一樣,都需要進行語法解析。
  2. 防止SQL注入攻擊。 參數值可以包含未轉義的SQL引號和分隔符。

    另外補充一點,MySQL Server 4.1之前的版本是不支持預編譯的,而Connector/J(也就是JBDC)在5.0.5以後的版本,默認是沒有開啓服務端預編譯功能的。

2、url後面rewriteBatchedStatements是什麼?

    mysql官方文檔搜索rewriteBatchedStatements,找到了下面一段話:

Should the driver use multiqueries (irregardless of the setting of “allowMultiQueries”) as well as rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() is called? Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn’t sanitize input correctly. Notice that for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option, and that if you don’t specify stream lengths when using PreparedStatement.set*Stream(), the driver won’t be able to determine the optimum number of parameters per batch and you might receive an error from the driver that the resultant packet is too large. Statement.getGeneratedKeys() for these rewritten statements only works when the entire batch includes INSERT statements. Please be aware using rewriteBatchedStatements=true with INSERT … ON DUPLICATE KEY UPDATE that for rewritten statement server returns only one value as sum of all affected (or found) rows in batch and it isn’t possible to map it correctly to initial statements; in this case driver returns 0 as a result of each batch statement if total count was 0, and the Statement.SUCCESS_NO_INFO as a result of each batch statement if total count was > 0.

    翻譯過來就是,在executeBatch()方法被執行時,是否使用多查詢(無論是否設置allowMultiQueries屬性)以及將用於插入的預編譯語句重寫爲多值插入?請注意,如果使用java.sql.Statements並且沒有對輸入進行校驗,那麼就這有可能遭到SQL注入攻擊。請注意,服務器預編譯語句當前無法利用此重寫選項,並且如果在使用PreparedStatement.set * Stream()時未指定流長度,則驅動程序將無法確定最佳的每批參數數量,您可能會從驅動程序收到錯誤消息,提示結果包太大。這些重寫語句的Statement.getGeneratedKeys()僅在整個批處理都包含INSERT語句時才起作用。請注意,在INSERT上使用rewriteBatchedStatements = true。
    從這段話我們可以瞭解到,rewriteBatchedStatements對服務端無效,所以是作用於客戶端的。當這個參數設置爲true,在executeBatch()方法被執行時,預編譯語句會重寫成多值插入再傳給服務端,而不是一條條SQL傳給服務端。

3、這兩個參數對語句執行有什麼影響?

    現在知道了rewriteBatchedStatements用於重寫客戶端預編譯語句,useServerPrepStmts用於開啓服務端預編譯功能。那麼,現在就開始驗證吧。筆者選用mysql5.7(已支持預編譯)+Connector/J 8.0.18(默認未開啓服務端預編譯)。
    第一種情況,不加上rewriteBatchedStatements和useServerPrepStmts,循環插入兩條數據,通過mysql通用日誌查看語句執行情況。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT
2019-11-01T01:43:51.832520Z	   79 Query	SHOW WARNINGS
2019-11-01T01:43:51.836262Z	   79 Query	SET NAMES utf8mb4
2019-11-01T01:43:51.836420Z	   79 Query	SET character_set_results = NULL
2019-11-01T01:43:51.836653Z	   79 Query	SET autocommit=1
2019-11-01T01:43:51.841252Z	   79 Query	SET autocommit=0
2019-11-01T01:43:51.865821Z	   79 Query	SELECT @@session.transaction_read_only
2019-11-01T01:43:51.866303Z	   79 Query	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1')
2019-11-01T01:43:51.891904Z	   79 Query	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1')
2019-11-01T01:43:51.892136Z	   79 Query	commit
2019-11-01T01:43:51.977174Z	   79 Query	rollback
2019-11-01T01:43:51.981484Z	   79 Quit	

    可以看出,服務端沒有進行預編譯,使用Query命令執行了兩條新增SQL語句。
    第二種情況,設置useServerPrepStmts=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&useServerPrepStmts=true
2019-11-01T02:23:46.145078Z	   89 Query	SHOW WARNINGS
2019-11-01T02:23:46.148454Z	   89 Query	SET NAMES utf8mb4
2019-11-01T02:23:46.148588Z	   89 Query	SET character_set_results = NULL
2019-11-01T02:23:46.148809Z	   89 Query	SET autocommit=1
2019-11-01T02:23:46.153224Z	   89 Query	SET autocommit=0
2019-11-01T02:23:46.166311Z	   89 Prepare	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)
2019-11-01T02:23:46.171129Z	   89 Query	SELECT @@session.transaction_read_only
2019-11-01T02:23:46.171392Z	   89 Query	SELECT @@session.transaction_read_only
2019-11-01T02:23:46.175887Z	   89 Execute	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1')
2019-11-01T02:23:46.204478Z	   89 Execute	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('1var',1,'2019-11-01','1')
2019-11-01T02:23:46.204878Z	   89 Query	commit
2019-11-01T02:23:46.487619Z	   89 Query	rollback
2019-11-01T02:23:46.492423Z	   89 Quit	

    可以看出,服務端進行預編譯Prepare,使用Execute命令執行了兩條新增SQL語句。
    第三種情況,設置rewriteBatchedStatements=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true
2019-11-01T02:40:19.644778Z	   90 Query	SHOW WARNINGS
2019-11-01T02:40:19.650422Z	   90 Query	SET NAMES utf8mb4
2019-11-01T02:40:19.650650Z	   90 Query	SET character_set_results = NULL
2019-11-01T02:40:19.650932Z	   90 Query	SET autocommit=1
2019-11-01T02:40:19.655743Z	   90 Query	SET autocommit=0
2019-11-01T02:40:19.680848Z	   90 Query	SELECT @@session.transaction_read_only
2019-11-01T02:40:19.681998Z	   90 Query	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1')
2019-11-01T02:40:19.700368Z	   90 Query	commit
2019-11-01T02:40:19.725935Z	   90 Query	rollback
2019-11-01T02:40:19.730085Z	   90 Quit	

    可以看出,服務端進行沒有進行預編譯,使用Query命令執行了一條多值的新增SQL語句。
    第四種情況,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true。

?useUnicode=true&characterEncoding=utf8&serverTimezone=GMT&rewriteBatchedStatements=true&useServerPrepStmts=true
2019-11-01T02:42:25.799594Z	   93 Query	SHOW WARNINGS
2019-11-01T02:42:25.803008Z	   93 Query	SET NAMES utf8mb4
2019-11-01T02:42:25.803138Z	   93 Query	SET character_set_results = NULL
2019-11-01T02:42:25.803354Z	   93 Query	SET autocommit=1
2019-11-01T02:42:25.807759Z	   93 Query	SET autocommit=0
2019-11-01T02:42:25.822828Z	   93 Prepare	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?)
2019-11-01T02:42:25.826277Z	   93 Query	SELECT @@session.transaction_read_only
2019-11-01T02:42:25.827500Z	   93 Prepare	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES (?,?,?,?),(?,?,?,?)
2019-11-01T02:42:25.832597Z	   93 Execute	INSERT INTO query_data (`var_data`, `int_data`, `create_date`, `char_data`) VALUES ('0var',0,'2019-11-01','1'),('1var',1,'2019-11-01','1')
2019-11-01T02:42:25.851448Z	   93 Close stmt	
2019-11-01T02:42:25.851605Z	   93 Query	commit
2019-11-01T02:42:25.880361Z	   93 Query	rollback
2019-11-01T02:42:25.884408Z	   93 Quit	

    可以看出,服務端進行進行了兩次預編譯Prepare,使用Execute命令執行了一條多值SQL語句。

4、這兩個參數能帶來多大的性能提升?

    分別對上面提到的四種情況做測試,每一種情況下插入10萬條數據,執行5次,記錄時間,結果如下:

第一種情況,不設置rewriteBatchedStatements和useServerPrepStmts。
10934	8849	8195	11846	9388
第二種情況,設置useServerPrepStmts=true9837	7045	7553	7195	7931
第三種情況,設置rewriteBatchedStatements=true2799	1481	1456	1180	1543
第四種情況,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true4211	1974	1867	2322	2381

    不難看出,只設置rewriteBatchedStatements=true帶來的性能提升是很可觀的,而只設置useServerPrepStmts=true只帶來了一點點的提升。很有意思的一點是,同時設置rewriteBatchedStatements=true和useServerPrepStmts=true比只設置rewriteBatchedStatements=true的性能要略差一點。
    結合mysql通用日誌進行分析,可以得到以下結論:
    1) 批量插入大量數據時設置rewriteBatchedStatements=true可以重寫SQL語句,將多條SQL合併成一條SQL,再交由服務端處理,從而大大減少執行時間。
    2) 只設置useServerPrepStmts=true,可以略微提升性能,是因爲Prepare-Execute的執行模式要比單一的Query更快。
    3) 爲什麼同時設置rewriteBatchedStatements=true和useServerPrepStmts=true比只設置rewriteBatchedStatements=true的性能要略差一點,是因爲Prepare本身是有開銷的,在只需要執行一條SQL的時候,這種開銷相對來說會比較大。

5、psts.addBatch();如果把這一段註釋掉,SQL就不會執行了,爲什麼?
	for (int i = 0; i <= 100000; i++) {
            psts.setString(1, i + "var");
            psts.setInt(2, i);
            psts.setDate(3, new java.sql.Date(time));
            psts.setString(4, "1");
             // 加入批量處理 
            psts.addBatch();  
            count++;
        }

	public void addBatch() throws SQLException {
        try {
            synchronized(this.checkClosed().getConnectionMutex()) {
                QueryBindings<?> queryBindings = ((PreparedQuery)this.query).getQueryBindings();
                queryBindings.checkAllParametersSet();
                this.query.addBatch(queryBindings.clone()); //
            }
        } catch (CJException var6) {
            throw SQLExceptionsMapping.translateException(var6, this.getExceptionInterceptor());
        }
    }

	AbstractQuery類:
	public void addBatch(Object batch) {
        if (this.batchedArgs == null) {
            this.batchedArgs = new ArrayList();
        }

        this.batchedArgs.add(batch); //this.batchedArgs爲protect List<Object> batchedArgs;
    }

    總的來說,psts.addBatch(); 所做的就是把每一條新增SQL的參數給存到批量參數列表中,在調用psts.executeBatch方法時,將批量參數列表發送給服務端。

三、總結

    之所以插入10萬數據只需2秒,主要原因是URL設置了rewriteBatchedStatements=true。

四、疑問

    在做性能測試的時候,每一種情況的第一次運行都會比後面幾次要慢,但是MYSQL通用日誌是一模一樣的,這就很奇怪了。有時間的話再另外寫一篇博客去探究下。

五、參考

MYSQL官方文檔

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章