隨着大數據和大量數據系統的出現，數據的加載變得越來越重要，很多崗位甚至只需要ETL技能，不過今時今日數據加載已經不再是單純的ETL，還有ELT，甚至不需要進行數據移動的計算等等。
本文先把精力放在傳統的ETL上。介紹幾種數據加載的過程和對比。

常規數據加載方式

數據加載的本質就是把數據插入特定的地方，通常我們關注的是性能，說白了就是插入完成的時間，及引起的空間消耗。
本文及後面兩篇文章將介紹和演示幾種常規的插入方式，包括：

帶有WITH (TABLOCK)的INSERT SELECT
非聚集列存儲索引
In-Memory搭配聚集列存儲索引

INSERT SELECT with(TABLOCK)

使用這種方式，可以實現並行插入，目標表可以是聚集列存儲索引，但是也可以是堆表。但是我們把目光放到列存儲索引上，因爲從上一篇文章中可以看到，列存儲索引確實有好處，而且因爲我目前使用的是Azure SQL DB，底層來說已經是SQL 2019的功能，所以我暫時不討論堆表。
這種並行功能從SQL 2016開始引入，也就是2014並沒有這個並行得功能。不管有多少個cores，insert into到聚集列存儲索引都是使用單一的core進行，一開始先順序填滿Delta-Store，直到達到了第一個可用Delta-Store的最小可用行數（1048576行）。然後開始下一個Delta-Store。
在INSERT SELECT過程中，不管SELECT部分多高效，INSERT性能都將成爲最慢的部分。但是這種設計是爲了保證最後的哪個Delta-Store不會被完全填滿，然後在後續載入更多的數據。
從SQL 2016（兼容級別是130或以上）開始，當帶上WITH (TABLOCK)是，可以實現並行插入。根據當前CPU和內存資源，每個CPU core都對應一個獨立的Delta-Store，使得數據的載入更加快。理論上說，如果磁盤的能力和大小足夠大，那麼core越多，相比起2014，性能也就成倍增加。但是正如前面說的，2014中，最後一個Delta-Store需要整理，在2016中，就是有N個Delta-Store需要整理，這是它的缺點，不過可以通過一些手段來緩解，比如重組索引。

上面的版本區別還需要考慮兼容級別，如果你安裝SQL 2016但是兼容級別小於130，那麼本質上還是等於使用低於2016的版本。

下面我們用ContosoRetailDW庫來演示一下。先清理一下上面FactOnlineSales的主鍵和外鍵約束，然後創建一個聚集的列存儲索引：

use ContosoRetailDW;
 
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore] 
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];
 
create clustered index PK_FactOnlineSales on dbo.FactOnlineSales( OnlineSalesKey ) with ( maxdop = 1);
 
create clustered columnstore Index PK_FactOnlineSales on dbo.FactOnlineSales with( drop_existing = on, maxdop = 1 );

然後創建一個測試表，

CREATE TABLE [dbo].[FactOnlineSales_CCI](
     [OnlineSalesKey] [int] NOT NULL,
     [StoreKey] [int] NOT NULL,
     [ProductKey] [int] NOT NULL,
     [PromotionKey] [int] NOT NULL,
     [CurrencyKey] [int] NOT NULL,
     [CustomerKey] [int] NOT NULL,
     INDEX PK_FactOnlineSales_CCI CLUSTERED COLUMNSTORE 
);

然後從FactOnlineSales中把1000萬數據導入到上面的測試表，但是這個時候不指定WITH (TABLOCK)使其不強制使用並行插入，同樣我開啓了實際執行計劃和SET STATISTICS IO和TIME的選項，用於獲取一些執行信息：

set statistics time, io on;
 
insert into [dbo].[FactOnlineSales_CCI]  (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey) 
 
select distinct top 10000000 OnlineSalesKey,  store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey 
  FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
        inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
  where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
  option (recompile);

SQL Server parse and compile time: 
   CPU time = 6020 ms, elapsed time = 6312 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 12023, lob physical reads 40, lob page server reads 0, lob read-ahead reads 30079, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 1, page server reads 0, read-ahead reads 126, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

在我的環境下用了大概71秒完成插入，同時執行計劃顯示是串行的。這個可以在運算符的屬性中看到只有1個線程來做這件事。我們還可以算一下使用了大概（1000萬除以1048576）個行組來存放，最後一個行組是多出來的沒填滿的數據行。用下面的語句可以驗證:

select *
	from sys.column_store_row_groups
	where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
	order by row_group_id asc;

接下來清掉測試表，然後使用WITH(TABLOCK)來再跑一次，注意WITH(TABLOCK)是在目標表，而不是在源表，另外先執行一下升級兼容級別確保能使用SQL 2016的新特性，在本人環境中這個庫是2008的兼容級別：

USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO

set statistics time, io on;
 
insert into [dbo].[FactOnlineSales_CCI] with(TABLOCK) (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey) 
 
select distinct top 10000000 OnlineSalesKey,  store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey 
  FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
        inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
  where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
  option (recompile);

這次時間從71秒降到40秒。

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 2 ms, elapsed time = 2 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 132 ms, elapsed time = 132 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 11237, lob physical reads 17, lob page server reads 0, lob read-ahead reads 26733, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(10000000 行受影響)

(1 行受影響)

 SQL Server Execution Times:
   CPU time = 101093 ms,  elapsed time = 39771 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

然後查看一下實際執行計劃，提醒一下，並行操作只能從實際執行計劃中看到，預估執行計劃是不顯示的。下圖可以看到用了4個線程，因爲我的環境只有4個core，然後並行爲“True”。

檢查一下行組信息，這次是12行，而不是前面的10行。最後四行被平均分佈在獨立的行組中用於後續的“修剪”，前面提到過，並行會導致更多的最終行組需要處理，這裏的四行就是例子，如果你的core很多，那麼也會有同樣多的行。這些會影響性能，但是如果整體性能反而有明顯提升，那麼這種開銷是可以接受的。

select *
	from sys.column_store_row_groups
	where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
	order by row_group_id asc;

最大的可並行數爲可用的core數量-1。

總結

本文演示了對比較大量的數據使用insert select命令導入時，如果是SQL 2016即兼容級別是130或以上版本，可以通過添加WITH(TABLOCK)來並行插入。
並行插入可以明顯提高性能，不過代價就是你要修剪更多的元組，這個可以通過索引重建來完成，後面會演示。
下一文將繼續，演示關於非聚集列存儲索引在數據導入方面的提升。

SQL Server導入性能對比（1）——並行導入

常規數據加載方式

INSERT SELECT with(TABLOCK)

總結

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

程序員常見的文本查看工具

ITSM落地經驗之建設藍圖規劃

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

SQL Server導入性能對比（1）——WITH TABLOCK並行導入

SQL Azure 工作積累（1）——添加用戶到Azure SQL DB

SQL Server 列存儲索引性能總結（3）——列存儲的鎖

SQL Server 列存儲索引性能總結（9）——重建和重組聚集列存儲索引所需的內存

SQL Server 列存儲索引性能總結（7）——導入數據到列存儲索引的Delta Store

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結