隨着大數據和大量數據系統的出現,數據的加載變得越來越重要,很多崗位甚至只需要ETL技能,不過今時今日數據加載已經不再是單純的ETL,還有ELT,甚至不需要進行數據移動的計算等等。
本文先把精力放在傳統的ETL上。介紹幾種數據加載的過程和對比。
常規數據加載方式
數據加載的本質就是把數據插入特定的地方,通常我們關注的是性能,說白了就是插入完成的時間,及引起的空間消耗。
本文及後面兩篇文章將介紹和演示幾種常規的插入方式,包括:
- 帶有WITH (TABLOCK)的INSERT SELECT
- 非聚集列存儲索引
- In-Memory搭配聚集列存儲索引
INSERT SELECT with(TABLOCK)
使用這種方式,可以實現並行插入,目標表可以是聚集列存儲索引,但是也可以是堆表。但是我們把目光放到列存儲索引上,因爲從上一篇文章中可以看到,列存儲索引確實有好處,而且因爲我目前使用的是Azure SQL DB,底層來說已經是SQL 2019的功能,所以我暫時不討論堆表。
這種並行功能從SQL 2016開始引入,也就是2014並沒有這個並行得功能。不管有多少個cores,insert into到聚集列存儲索引都是使用單一的core進行,一開始先順序填滿Delta-Store,直到達到了第一個可用Delta-Store的最小可用行數(1048576行)。然後開始下一個Delta-Store。
在INSERT SELECT過程中,不管SELECT部分多高效,INSERT性能都將成爲最慢的部分。但是這種設計是爲了保證最後的哪個Delta-Store不會被完全填滿,然後在後續載入更多的數據。
從SQL 2016(兼容級別是130或以上)開始,當帶上WITH (TABLOCK)是,可以實現並行插入。根據當前CPU和內存資源,每個CPU core都對應一個獨立的Delta-Store,使得數據的載入更加快。理論上說,如果磁盤的能力和大小足夠大,那麼core越多,相比起2014,性能也就成倍增加。但是正如前面說的,2014中,最後一個Delta-Store需要整理,在2016中,就是有N個Delta-Store需要整理,這是它的缺點,不過可以通過一些手段來緩解,比如重組索引。
上面的版本區別還需要考慮兼容級別,如果你安裝SQL 2016但是兼容級別小於130,那麼本質上還是等於使用低於2016的版本。
下面我們用ContosoRetailDW庫來演示一下。先清理一下上面FactOnlineSales的主鍵和外鍵約束,然後創建一個聚集的列存儲索引:
use ContosoRetailDW;
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];
create clustered index PK_FactOnlineSales on dbo.FactOnlineSales( OnlineSalesKey ) with ( maxdop = 1);
create clustered columnstore Index PK_FactOnlineSales on dbo.FactOnlineSales with( drop_existing = on, maxdop = 1 );
然後創建一個測試表,
CREATE TABLE [dbo].[FactOnlineSales_CCI](
[OnlineSalesKey] [int] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
INDEX PK_FactOnlineSales_CCI CLUSTERED COLUMNSTORE
);
然後從FactOnlineSales中把1000萬數據導入到上面的測試表,但是這個時候不指定WITH (TABLOCK)使其不強制使用並行插入,同樣我開啓了實際執行計劃和SET STATISTICS IO和TIME的選項,用於獲取一些執行信息:
set statistics time, io on;
insert into [dbo].[FactOnlineSales_CCI] (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey)
select distinct top 10000000 OnlineSalesKey, store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey
FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
option (recompile);
SQL Server parse and compile time:
CPU time = 6020 ms, elapsed time = 6312 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 12023, lob physical reads 40, lob page server reads 0, lob read-ahead reads 30079, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 1, page server reads 0, read-ahead reads 126, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
在我的環境下用了大概71秒完成插入,同時執行計劃顯示是串行的。這個可以在運算符的屬性中看到只有1個線程來做這件事。我們還可以算一下使用了大概(1000萬除以1048576)個行組來存放,最後一個行組是多出來的沒填滿的數據行。用下面的語句可以驗證:
select *
from sys.column_store_row_groups
where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
order by row_group_id asc;
接下來清掉測試表,然後使用WITH(TABLOCK)來再跑一次,注意WITH(TABLOCK)是在目標表,而不是在源表,另外先執行一下升級兼容級別確保能使用SQL 2016的新特性,在本人環境中這個庫是2008的兼容級別:
USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO
set statistics time, io on;
insert into [dbo].[FactOnlineSales_CCI] with(TABLOCK) (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey)
select distinct top 10000000 OnlineSalesKey, store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey
FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
option (recompile);
這次時間從71秒降到40秒。
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 2 ms, elapsed time = 2 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 132 ms, elapsed time = 132 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 11237, lob physical reads 17, lob page server reads 0, lob read-ahead reads 26733, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
(10000000 行受影響)
(1 行受影響)
SQL Server Execution Times:
CPU time = 101093 ms, elapsed time = 39771 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
然後查看一下實際執行計劃,提醒一下,並行操作只能從實際執行計劃中看到,預估執行計劃是不顯示的。下圖可以看到用了4個線程,因爲我的環境只有4個core,然後並行爲“True”。
檢查一下行組信息,這次是12行,而不是前面的10行。最後四行被平均分佈在獨立的行組中用於後續的“修剪”,前面提到過,並行會導致更多的最終行組需要處理,這裏的四行就是例子,如果你的core很多,那麼也會有同樣多的行。這些會影響性能,但是如果整體性能反而有明顯提升,那麼這種開銷是可以接受的。
select *
from sys.column_store_row_groups
where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
order by row_group_id asc;
最大的可並行數爲可用的core數量-1。
總結
本文演示了對比較大量的數據使用insert select命令導入時,如果是SQL 2016即兼容級別是130或以上版本,可以通過添加WITH(TABLOCK)來並行插入。
並行插入可以明顯提高性能,不過代價就是你要修剪更多的元組,這個可以通過索引重建來完成,後面會演示。
下一文將繼續,演示關於非聚集列存儲索引在數據導入方面的提升。