vertica JDBC API 之 VerticaCopyStream

VerticaCopyStream類(有關該類的詳細信息可在JDBC文檔中找到)允許您將數據從客戶機系統傳輸到Vertica數據庫。它允許直接使用SQL COPY語句，而不必首先將數據複製到數據庫集羣中的主機。使用複製命令從主機加載數據需要超級用戶特權才能訪問主機的文件系統。用於從流中加載數據的COPY語句不需要超級用戶特權，因此您的客戶端可以使用在將要接收的表上具有INSERT特權的任何用戶帳戶進行連接。

適用場景：從數據文件或者inputStream中，批量copy數據到vertica。copy語句可以設置AUTO 、DIRECT、TRICKLE。具體參見COPY Parameters

將流複製到數據庫中：

禁用數據庫連接自動提交連接參數。
實例化一個VerticaCopyStream Object，至少向它傳遞數據庫連接對象和一個包含用於加載數據的COPY語句的字符串。此語句必須將數據從STDIN複製到表中。您可以使用任何適合於數據加載的參數。
調用VerticaCopyStreamObject.start()啓動COPY語句，並開始將數據流放到已經添加到VerticaCopyStreamObject中的任何流中。
調用VerticaCopyStreamObject.addStream()將附加流添加到要發送到數據庫的流列表中。然後可以調用VerticaCopyStreamObject.execute()將它們傳輸到服務器。
可選地，調用VerticaCopyStreamObject.getRejects()從最後一個.execute()調用獲得被拒絕的行列表。被拒絕列表通過每次調用.execute()或.finish()重新設置。
當您完成添加流時，調用VerticaCopyStreamObject.finish()將剩餘的流發送到數據庫並關閉COPY語句。
調用Connection.commit()提交加載的數據。

getrejects()方法返回一個列表，其中包含在前面的.execute()方法調用之後被拒絕的行數。每次調用.execute()都會清除被拒絕的行列表，因此需要在每次調用.execute()之後調用. getrejects()。因爲.start()和.finish()也調用.execute()來將任何掛起的流發送到服務器，所以您也應該在這些方法之後調用. getrejects()。

import java.io.File;
import java.io.FileInputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Iterator;
import java.util.List;
import java.util.Properties;
import com.vertica.jdbc.VerticaConnection;
import com.vertica.jdbc.VerticaCopyStream;
 
public class CopyMultipleStreamsExample {
    public static void main(String[] args) {
        // Note: If running on Java 5, you need to call Class.forName
        // to manually load the JDBC driver.
        // Set up the properties of the connection
        Properties myProp = new Properties();
        myProp.put("user", "ExampleUser"); // Must be superuser
        myProp.put("password", "password123");
        // When performing bulk loads, you should always disable the
        // connection's AutoCommit property to ensure the loads happen as
        // efficiently as possible by reusing the same COPY command and
        // transaction.
        myProp.put("AutoCommit", "false");
        Connection conn;
        try {
            conn = DriverManager.getConnection(
                          "jdbc:vertica://VerticaHost:5433/ExampleDB", myProp);
            Statement stmt = conn.createStatement();
            
            // Create a table to receive the data
            stmt.execute("DROP TABLE IF EXISTS customers");
            stmt.execute("CREATE TABLE customers (Last_Name char(50), "
                            + "First_Name char(50),Email char(50), "
                            + "Phone_Number char(15))");
            
            // Prepare the query to insert from a stream. This query must use
            // the COPY statement to load data from STDIN. Unlike copying from
            // a file on the host, you do not need superuser privileges to
            // copy a stream. All your user account needs is INSERT privileges
            // on the target table.
            String copyQuery = "COPY customers FROM STDIN "
                            + "DELIMITER '|' DIRECT ENFORCELENGTH";
            
            // Create an instance of the stream class. Pass in the
            // connection and the query string.
            VerticaCopyStream stream = new VerticaCopyStream(
                            (VerticaConnection) conn, copyQuery);
            
            // Keep running count of the number of rejects
            int totalRejects = 0;
            
            // start() starts the stream process, and opens the COPY command.
            stream.start();
            
            // If you added streams to VerticaCopyStream before calling start(),
            // You should check for rejects here (see below). The start() method
            // calls execute() to send any pre-queued streams to the server
            // once the COPY statement has been created.
            
            // Simple for loop to load 5 text files named customers-1.txt to
            // customers-5.txt
            for (int loadNum = 1; loadNum <= 5; loadNum++) {
                // Prepare the input file stream. Read from a local file.
                String filename = "C:\\Data\\customers-" + loadNum + ".txt";
                System.out.println("\n\nLoading file: " + filename);
                File inputFile = new File(filename);
                FileInputStream inputStream = new FileInputStream(inputFile);
                
                // Add stream to the VerticaCopyStream
                stream.addStream(inputStream);
                
                // call execute() to load the newly added stream. You could
                // add many streams and call execute once to load them all.
                // Which method you choose depends mainly on whether you want
                // the ability to check the number of rejections as the load
                // progresses so you can stop if the number of rejects gets too
                // high. Also, high numbers of InputStreams could create a
                // resource issue on your client system.
                stream.execute();
                
                // Show any rejects from this execution of the stream load
                // getRejects() returns a List containing the
                // row numbers of rejected rows.
                List<Long> rejects = stream.getRejects();
                
                // The size of the list gives you the number of rejected rows.
                int numRejects = rejects.size();
                totalRejects += numRejects;
                System.out.println("Number of rows rejected in load #"
                                + loadNum + ": " + numRejects);
                
                // List all of the rows that were rejected.
                Iterator<Long> rejit = rejects.iterator();
                long linecount = 0;
                while (rejit.hasNext()) {
                    System.out.print("Rejected row #" + ++linecount);
                    System.out.println(" is row " + rejit.next());
                }
            }
            // Finish closes the COPY command. It returns the number of
            // rows inserted.
            long results = stream.finish();
            System.out.println("Finish returned " + results);
            
            // If you added any streams that hadn't been executed(),
            // you should also check for rejects here, since finish()
            // calls execute() to
            
            // You can also get the number of rows inserted using
            // getRowCount().
            System.out.println("Number of rows accepted: "
                            + stream.getRowCount());
            System.out.println("Total number of rows rejected: " + totalRejects);
            
            // Commit the loaded data
            conn.commit();
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

上面的例子顯示了一個簡單的加載過程，目標是Vertica集羣中的一個節點。同時將多個流加載到多個數據庫節點更有效。這樣做可以極大地提高性能，因爲它將負載的處理擴展到整個集羣。

經測試，單線程操作數據庫，一次從oracle讀取5000條數據，耗時4s。然後導入vertica數據庫的一個節點，耗時4s。

本文僅用於學習，有問題請評論或者私信。
覺得有幫助的，點個贊再走唄。支持一下作者！

參考Using VerticaCopyStream

vertica JDBC API 之 VerticaCopyStream

DAPPER 事務 TRANSACTION

hive on spark參數配置

hive文件的存儲格式

spark checkpoint基礎

Spark的Checkpoint源碼和機制

spring boot jpa學習筆記（一）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結