關於 ClickHouse 更新數據的一次嘗試

1 需求

假如現在想將表B的數據在滿足一定條件時將其某個值更新到表A,如果是MySQL,實現該業務的語法可能如下:

UPDATE A,B set A.field1=B.field1 where filter_expr;

2 一個數據集

這裏主要使用的是 TPC-DS的一個數據集。更多TPC的使用可以查看我GitHue上寫的一份文檔TPC.md

wget http://www.tpc.org/tpc_documents_current_versions/temporary_download_files/42d6f585-7c65-469c-b8de-9bfe47b63d81-tpc-ds-tool.zip
mv 42d6f585-7c65-469c-b8de-9bfe47b63d81-tpc-ds-tool.zip TPC-2.11.0.zip
unzip TPC-2.11.0.zip
cd v2.11.0rc2/
cd tools/
# 編譯
make

# 生成一份10G的數據集
./dsdgen -DELIMITER ',' -scale 10 -parallel 2 -TERMINATE N -dir /opt/tmp/data

# 查看 inventory_1_2.dat 
[root@cdh3 tools]# head -n 3 /opt/tmp/data/inventory_1_2.dat
2450815,1,1,211
2450815,2,1,235
2450815,4,1,859

# 文件大小
[root@cdh3 tools]# du -hd1 /opt/tmp/data/inventory_1_2.dat
1.3G    /opt/tmp/data/inventory_1_2.dat

# 數據條數
[root@cdh3 tools]# wc -l /opt/tmp/data/inventory_1_2.dat
66555000 /opt/tmp/data/inventory_1_2.dat

3 表

3.1 登錄 client

clickhouse-client -h 127.0.0.1 --port 19000 -u default --password KavrqeN1   --multiline

3.2 建表

參考v2.11.0rc2/tools/tpcds.sql腳本的建表語句創建 ClickHouse 表

-- 創建A表
CREATE TABLE inventory(
inv_date_sk               UInt64 ,
inv_item_sk               UInt64 ,
inv_warehouse_sk          UInt64 ,
inv_quantity_on_hand      UInt64 
)ENGINE = MergeTree ORDER BY (inv_date_sk, inv_item_sk, inv_warehouse_sk);

-- 創建B表,
CREATE TABLE inventory2(
inv_date_sk               UInt64 ,
inv_item_sk               UInt64 ,
inv_warehouse_sk          UInt64 ,
inv_quantity_on_hand      UInt64 
)ENGINE = MergeTree ORDER BY (inv_date_sk, inv_item_sk, inv_warehouse_sk);

3.3 導入數據

# 導入數據到 cdh2 節點的 clickhouse
clickhouse-client -h cdh2 --port 19000 -u default --password KavrqeN1 --query "INSERT INTO inventory FORMAT CSV" < /opt/tmp/data/inventory_1_2.dat

3.4 SQL

-- 1 inventory2 中插入一部分數據
cdh2 :) INSERT INTO inventory2 SELECT inv_date_sk, inv_item_sk, inv_warehouse_sk, rand() FROM  inventory WHERE inv_warehouse_sk in (1,2,3,4,5);
INSERT INTO inventory2 SELECT
    inv_date_sk,
    inv_item_sk,
    inv_warehouse_sk,
    rand()
FROM inventory
WHERE inv_warehouse_sk IN (1, 2, 3, 4, 5)
→ Progress: 2.99 million rows, 45.92 MB (25.22 million rows/s., 387.37 MB/s.)  4%↘ Progress: 5.14 million rows, 80.06 MB (9.91 million rows/s., 154.19 MB/s.) ██████████████▋                                                                                                 %Ok.
0 rows in set. Elapsed: 9.417 sec. Processed 66.56 million rows, 1.07 GB (7.07 million rows/s., 113.30 MB/s.)

-- 2 數據總數
--  2.1 inventory
cdh2 :) SELECT COUNT(1) FROM inventory;
┌─COUNT(1)─┐
│ 66555000 │
└──────────┘
↓ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↙ Progress: 66.56 million rows, 532.44 MB (1.08 billion rows/s., 8.68 GB/s.)  98%
1 rows in set. Elapsed: 0.052 sec. Processed 66.56 million rows, 532.44 MB (1.08 billion rows/s., 8.67 GB/s.)
--  2.2 inventory2
cdh2 :) SELECT COUNT(1) FROM inventory2;
┌─COUNT(1)─┐
│ 33405000 │
└──────────┘
↓ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↙ Progress: 33.41 million rows, 267.24 MB (1.34 billion rows/s., 10.72 GB/s.)  98%
1 rows in set. Elapsed: 0.025 sec. Processed 33.41 million rows, 267.24 MB (1.32 billion rows/s., 10.56 GB/s.)


-- 3 統計字段信息。可以看到總共有 10個倉庫,68000 類商品
cdh2 :) SELECT COUNT(DISTINCT inv_date_sk),COUNT(DISTINCT inv_item_sk),COUNT(DISTINCT inv_warehouse_sk) FROM inventory;
┌─uniqExact(inv_date_sk)─┬─uniqExact(inv_item_sk)─┬─uniqExact(inv_warehouse_sk)─┐
│                    1316800010 │
└────────────────────────┴────────────────────────┴─────────────────────────────┘
↖ Progress: 64.92 million rows, 1.56 GB (190.93 million rows/s., 4.58 GB/s.)  96%↑ Progress: 66.56 million rows, 1.60 GB (195.70 million rows/s., 4.70 GB/s.)  98%
1 rows in set. Elapsed: 0.287 sec. Processed 66.56 million rows, 1.60 GB (195.57 million rows/s., 4.69 GB/s.)


-- 4 查看每個倉庫(inv_warehouse_sk) 的數據庫中條數。可以看到(4,3,2,5,1)共33405000,(6,7,9,8,10)共33150000,導入數據總數據條數 66555000
cdh2 :) SELECT inv_warehouse_sk,COUNT(inv_warehouse_sk) FROM inventory GROUP BY inv_warehouse_sk;
┌─inv_warehouse_sk─┬─COUNT(inv_warehouse_sk)─┐
│                46681000 │
│                36681000 │
│                26681000 │
│                56681000 │
│                16681000 │
│                66630000 │
│                76630000 │
│                96630000 │
│                86630000 │
│               106630000 │
└──────────────────┴─────────────────────────┘
↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ← Progress: 66.56 million rows, 532.44 MB (674.77 million rows/s., 5.40 GB/s.)  98%
10 rows in set. Elapsed: 0.076 sec. Processed 66.56 million rows, 532.44 MB (673.36 million rows/s., 5.39 GB/s.)

-- 5 查看各個庫存量
cdh2 :) SELECT inv_warehouse_sk,SUM(inv_quantity_on_hand) FROM inventory GROUP BY inv_warehouse_sk;
┌─inv_warehouse_sk─┬─SUM(inv_quantity_on_hand)─┐
│                43172760518 │
│                33173305680 │
│                23173041915 │
│                53173462792 │
│                13172739142 │
│                63148272312 │
│                73148312176 │
│                93150290280 │
│                83149344378 │
│               103148388511 │
└──────────────────┴───────────────────────────┘
↖ Progress: 39.35 million rows, 629.54 MB (227.31 million rows/s., 3.64 GB/s.)  58%↑ Progress: 66.56 million rows, 1.06 GB (384.34 million rows/s., 6.15 GB/s.)  98%
10 rows in set. Elapsed: 0.173 sec. Processed 66.56 million rows, 1.06 GB (384.14 million rows/s., 6.15 GB/s.)


-- 6 修改倉庫爲 (4,3,2,5,1)共33405000條的庫存量,庫存設置爲 0。
cdh2 :) ALTER TABLE inventory UPDATE inv_quantity_on_hand = 0 where inv_warehouse_sk in (4,3,2,5,1);
ALTER TABLE inventory
    UPDATE inv_quantity_on_hand = 0 WHERE inv_warehouse_sk IN (4, 3, 2, 5, 1)
Ok.
0 rows in set. Elapsed: 0.004 sec.

-- 7 查看當前各個庫存量。
--  7.1 inventory。發現倉庫(4,3,2,5,1)已經全部清庫。
cdh2 :) SELECT inv_warehouse_sk,SUM(inv_quantity_on_hand) FROM inventory GROUP BY inv_warehouse_sk;
┌─inv_warehouse_sk─┬─SUM(inv_quantity_on_hand)─┐
│                40 │
│                30 │
│                20 │
│                50 │
│                10 │
│                63148272312 │
│                73148312176 │
│                93150290280 │
│                83149344378 │
│               103148388511 │
└──────────────────┴───────────────────────────┘
↘ Progress: 55.33 million rows, 885.26 MB (427.34 million rows/s., 6.84 GB/s.)  82%↓ Progress: 66.56 million rows, 1.06 GB (513.79 million rows/s., 8.22 GB/s.)  98%
10 rows in set. Elapsed: 0.130 sec. Processed 66.56 million rows, 1.06 GB (513.45 million rows/s., 8.22 GB/s.)
--  7.2 inventory2
cdh2 :) SELECT inv_warehouse_sk,SUM(inv_quantity_on_hand) FROM inventory2 GROUP BY inv_warehouse_sk;
┌─inv_warehouse_sk─┬─SUM(inv_quantity_on_hand)─┐
│                414347686397994975 │
│                314343877924786742 │
│                214345396281859373 │
│                514345781573562921 │
│                114348098422679985 │
└──────────────────┴───────────────────────────┘
↑ Progress: 30.48 million rows, 487.68 MB (269.58 million rows/s., 4.31 GB/s.)  90%↗ Progress: 33.41 million rows, 534.48 MB (295.30 million rows/s., 4.72 GB/s.)  98%
5 rows in set. Elapsed: 0.113 sec. Processed 33.41 million rows, 534.48 MB (295.11 million rows/s., 4.72 GB/s.)


-- 8 將 inventory2 更新到 inventory 表,雖然這次搞的有點大
--  MySQL支持:update inventory A,inventory2 B set A.inv_quantity_on_hand=B.inv_quantity_on_hand where A.id=B.id;
--  但是ClickHouse不支持更細的字段來自於兩個表,但可以使用 INSERT 語句。MySQL使用Insert語句時不能向已存在的主鍵列插入值。
cdh2 :) INSERT INTO inventory SELECT inv_date_sk, inv_item_sk, inv_warehouse_sk,inv_quantity_on_hand FROM  inventory2
:-]  WHERE inventory2.inv_warehouse_sk in (1,2,3,4,5);
↑ Progress: 1.45 million rows, 46.40 MB (14.01 million rows/s., 448.37 MB/s.)  4%↗ Progress: 1.99 million rows, 63.70 MB (4.93 million rows/s., 157.64 MB/s.)  5%→ Progress: 2.51 million rows, 80.22 MB (4.97 million rows/s., 159.07 MB/s.) ██████████████▎                 %Ok.
0 rows in set. Elapsed: 8.993 sec. Processed 33.41 million rows, 1.07 GB (3.71 million rows/s., 118.87 MB/s.)
--  再次查詢。發現 inventory2 中的庫存信息已經更新到 inventory 表
cdh2 :) SELECT inv_warehouse_sk,SUM(inv_quantity_on_hand) FROM inventory GROUP BY inv_warehouse_sk;
┌─inv_warehouse_sk─┬─SUM(inv_quantity_on_hand)─┐
│                414347686397994975 │
│                314343877924786742 │
│                214345396281859373 │
│                514345781573562921 │
│                114348098422679985 │
│                63148272312 │
│                73148312176 │
│                93150290280 │
│                83149344378 │
│               103148388511 │
└──────────────────┴───────────────────────────┘
↗ Progress: 90.37 million rows, 1.45 GB (372.91 million rows/s., 5.97 GB/s.)  89%→ Progress: 99.96 million rows, 1.60 GB (412.39 million rows/s., 6.60 GB/s.)  98%
10 rows in set. Elapsed: 0.242 sec. Processed 99.96 million rows, 1.60 GB (412.26 million rows/s., 6.60 GB/s.)


-- 9 視圖版
--  9.1 創建視圖。注意子句的 JOIN 不能使用別名(AS)
cdh2 :) CREATE VIEW inventory_view AS SELECT
:-] inventory.inv_date_sk,inventory.inv_item_sk,inventory.inv_warehouse_sk,inventory.inv_quantity_on_hand a,inventory2.inv_quantity_on_hand b
:-] FROM inventory LEFT join inventory2
:-] ON inventory.inv_date_sk = inventory2.inv_date_sk AND
:-]  inventory.inv_item_sk = inventory2.inv_item_sk AND
:-]  inventory.inv_warehouse_sk = inventory2.inv_warehouse_sk
:-] WHERE inventory.inv_warehouse_sk in (1,2,3,4,5)
:-] ;
CREATE VIEW inventory_view AS
SELECT
    inventory.inv_date_sk,
    inventory.inv_item_sk,
    inventory.inv_warehouse_sk,
    inventory.inv_quantity_on_hand AS a,
    inventory2.inv_quantity_on_hand AS b
FROM inventory
LEFT JOIN inventory2 ON (inventory.inv_date_sk = inventory2.inv_date_sk) AND (inventory.inv_item_sk = inventory2.inv_item_sk) AND (inventory.inv_warehouse_sk = inventory2.inv_warehouse_sk)
WHERE inventory.inv_warehouse_sk IN (1, 2, 3, 4, 5)
Ok.
0 rows in set. Elapsed: 0.006 sec.

--  9.2 查看錶和視圖
cdh2 :) SHOW TABLES;
SHOW TABLES
↖ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ┌─name───────────┐
│ inventory      │
│ inventory2     │
│ inventory_view │
│ ontime_local   │
└────────────────┘
↑ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↗ Progress: 4.00 rows, 145.00 B (1.86 thousand rows/s., 67.36 KB/s.)
4 rows in set. Elapsed: 0.002 sec.

--  9.3 查看視圖數據
cdh2 :) SELECT * FROM inventory_view LIMIT 10;
SELECT *
FROM inventory_view
LIMIT 10
→ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↘ Progress: 540.67 thousand rows, 17.30 MB (4.38 million rows/s., 140.31 MB/s.)  1%↓ Progress: 933.89 thousand rows, 29.88 MB (4.18 million rows/s., 133.69 MB/s.)  2%↙ Progress: 1.45 million rows, 46.40 MB (3.42 mi%
┌─inv_date_sk─┬─inv_item_sk─┬─inv_warehouse_sk─┬─a─┬──────────b─┐
│     24512211103736098505 │
│     24512211202885779993 │
│     2451221130479103458 │
│     24512211401752919932 │
│     24512211503798676092 │
│     24512212101596118095 │
│     24512212203044174515 │
│     24512212301792720993 │
│     2451221240888870309 │
│     24512212501904802464 │
└─────────────┴─────────────┴──────────────────┴───┴────────────┘
→ Progress: 33.41 million rows, 1.07 GB (2.38 million rows/s., 76.18 MB/s.) ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ 98%
10 rows in set. Elapsed: 14.042 sec. Processed 34.02 million rows, 1.08 GB (2.42 million rows/s., 76.96 MB/s.)

--  9.3 UPDATE 數據。VIEW 不支持 Mutations
cdh2 :) ALTER TABLE inventory_view UPDATE a=b
:-] WHERE inv_warehouse_sk in (1,2,3,4,5);
ALTER TABLE inventory_view
    UPDATE a = b WHERE inv_warehouse_sk IN (1, 2, 3, 4, 5)
Received exception from server (version 19.16.3):
Code: 48. DB::Exception: Received from 127.0.0.1:19000. DB::Exception: Mutations are not supported by storage View.
0 rows in set. Elapsed: 0.004 sec.

4 不同點

MySQL更新數據支持如下語法:

-- 可以將 B 表的某字段值 更新到 A表某字段
mysql> UPDATE inventory A,inventory2 B SET A.inv_quantity_on_hand=B.inv_quantity_on_hand
    -> where A.inv_warehouse_sk in (1,2,3,4,5) AND
    -> A.inv_date_sk = B.inv_date_sk AND
    -> A.inv_item_sk = B.inv_item_sk AND
    -> A.inv_warehouse_sk = B.inv_warehouse_sk ;
Query OK, 6 rows affected (0.00 sec)
Rows matched: 6  Changed: 6  Warnings: 0

MySQL不支持使用 INSERT 語句插入一條主鍵已存在的數據,但是 ClickHouse支持使用 INSERT 插入數據,如果主鍵已存在就是覆蓋那條數據

ClickHouse的 UPDATE語法如下,從語法上可以看到 TABLE後面只能是一個表名,可以更新一個字段值(根據過濾條件可能更新的是一行,也可能是多行),也可以更新多個字段值,但不能是主鍵

ALTER TABLE [db.]table UPDATE column1 = expr1 [, ...] WHERE filter_expr

ClickHouse 的創建 視圖(VIEW)的語法如下:

CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...

MySQL中通過視圖更新數據的SQL如下

-- 創建視圖
mysql> CREATE VIEW inventory_view AS SELECT A.inv_date_sk,A.inv_item_sk,A.inv_warehouse_sk,A.inv_quantity_on_hand a,B.inv_quantity_on_hand b
    -> FROM inventory A LEFT join inventory2 B
    -> ON A.inv_date_sk = B.inv_date_sk AND
    -> A.inv_item_sk = B.inv_item_sk AND
    -> A.inv_warehouse_sk = B.inv_warehouse_sk
    -> --WHERE A.inv_warehouse_sk in (1,2,3,4,5)
    -> ;
Query OK, 0 rows affected (0.01 sec)

-- 更新數據
mysql> UPDATE inventory_view SET a=b WHERE inv_warehouse_sk in (1,2,3,4,5);

5 小節

ClickHouse 支持 UPDATEINSERT 也支持 VIEW,但是和傳統關係型數據庫的語法有很大的不同,在該需求下我們既不能使用 UPDATE,又不能使用 VIEW,儘管我們可以根據主鍵使用INSERT 將表 B 的數據更新到表 A,但是和 NoSQL 型數據庫的 UPSERT 的性能還是有些差距,因此在使用 ClickHouse 時單表查詢時的性能非常強悍,單表更新的效率也很快,而多表關聯查詢或者更新時,如果對速度有要求的情況下是不太適合的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章