lightdb copy from where\values多值加載時過濾數據

copy可用於快速導入和導出數據,主要用途如下:

  • The COPY command moves data between PostgreSQL tables and standard file system files.
  • COPY TO copies the contents of the table to the file.
  • COPY TO can also copy the results of the SELECT query. That is, if the column list is specified, COPY TO only copies the data in the specified columns to the file.
  • The COPY command instructs the PostgreSQL server to read from or write to the file directly. Therefore, the file must be accessible to the PostgreSQL user.
  • COPY FROM copies the data from the file to the table.
  • When using the COPY FROM command, each field in the file is inserted sequentially to the specified column. Table columns not specified in the COPY FROM column list get their default values.
  • It is necessary to grant SELECT privilege on the table read by COPY TO, and the INSERT privilege in the table where the values are inserted with COPY FROM.
  • COPY TO can only be used with tables, not views. However, if we want to copy the contents of the view, we must feed the COPY command with the sql query.
  • Files named in the COPY command are read or written directly by the server, not by the client application. Therefore, it must be located on or accessible to the database server machine, not the client either.
  • We shouldn’t confuse COPY with \copy in psql. \copy calls COPY FROM STDIN or COPY TO STDOUT and then retrieves and stores the data from a file accessible by the psql client. Therefore, file accessibility and access rights depend on the client rather than the server when using \copy.

  copy from除了直接導入數據外,還能包含where子句進行數據過濾,如下:

zjh@postgres=# create table big_table(id int, v varchar(100));
CREATE TABLE

zjh@postgres=# insert into big_table select i, i || '-' || i from generate_series(1,100000) i;
INSERT 0 100000
zjh@postgres=# exit
[zjh@hs-10-20-30-193 ~]$ ltsql -p14000 postgres
ltsql (13.3-22.2)
Type "help" for help.

zjh@postgres=# \copy ( select * from big_table) to '/home/zjh/big_table.csv' delimiter '|' csv header;
COPY 100000
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
zjh@postgres=# \timing on
Timing is on.
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 44.725 ms
zjh@postgres=# 
zjh@postgres=# \timing on
Timing is on.
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 45.757 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
COPY 100000
Time: 41.660 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
COPY 100000
Time: 41.146 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.736 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.753 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.475 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v like '99%';
COPY 1111
Time: 28.995 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
COPY 98889
Time: 43.442 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
COPY 98889
Time: 40.352 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';
COPY 98889
Time: 60.274 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';   -- 不同函數對性能影響很大
COPY 98889
Time: 61.503 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';  -- 不同函數對性能影響大
COPY 98889
Time: 53.131 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';
COPY 98889
Time: 49.655 ms
zjh@postgres=# 

pg_bulkload是沒有該特性的。如下:

其filter函數在記錄不滿足要求的時候返回null記錄,而不是丟棄該記錄,所以並不符合該語義。

無論是copy還是pg_bulkload,相比oracle sql loader,都存在很大的不足,常見的特性包括如下:

  • 基於絕對位置的分隔符
  • pg_bulkload不支持過濾記錄
  • pg_bulkload/copy都不支持對字段調用函數
  • 不支持replace
  • copy不支持parallel/direct等特性
  • 不支持指定列的默認值
  • 緩衝大小優化等

lightdb 22.4中將完整的支持上述特性,通過ltldr提供。

當前版本可以通過insert values多值實現該特性。如下:

insert into big_table select * from (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase) where depno = 1;

需要注意,pg不支持超過1000個綁定變量值,這極大地限制了insert values多值的價值,甚至不如jdbc batch的優化。 

此時,可以考慮使用file_fdw+nfs實現,如下:

 

http://www.light-pg.com/docs/lightdb/13.3-22.2/file-fdw.html

http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-values.html

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-copy.html

https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader-field-list-contents.html#GUID-83FF6EDC-C7F8-4F29-8994-59153BE31924 # 應用SQL函數到字段

https://www.depesz.com/2019/02/03/waiting-for-postgresql-12-allow-copy-from-to-filter-data-using-where-conditions/

https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/moving-data-with-postgresql-copy-and-copy-commands/ba-p/1561266

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章