SQL Server導入性能對比(1)——並行導入

隨着大數據和大量數據系統的出現,數據的加載變得越來越重要,很多崗位甚至只需要ETL技能,不過今時今日數據加載已經不再是單純的ETL,還有ELT,甚至不需要進行數據移動的計算等等。
本文先把精力放在傳統的ETL上。介紹幾種數據加載的過程和對比。

常規數據加載方式

  數據加載的本質就是把數據插入特定的地方,通常我們關注的是性能,說白了就是插入完成的時間,及引起的空間消耗。
  本文及後面兩篇文章將介紹和演示幾種常規的插入方式,包括:

  1. 帶有WITH (TABLOCK)的INSERT SELECT
  2. 非聚集列存儲索引
  3. In-Memory搭配聚集列存儲索引

INSERT SELECT with(TABLOCK)

  使用這種方式,可以實現並行插入,目標表可以是聚集列存儲索引,但是也可以是堆表。但是我們把目光放到列存儲索引上,因爲從上一篇文章中可以看到,列存儲索引確實有好處,而且因爲我目前使用的是Azure SQL DB,底層來說已經是SQL 2019的功能,所以我暫時不討論堆表。
  這種並行功能從SQL 2016開始引入,也就是2014並沒有這個並行得功能。不管有多少個cores,insert into到聚集列存儲索引都是使用單一的core進行,一開始先順序填滿Delta-Store,直到達到了第一個可用Delta-Store的最小可用行數(1048576行)。然後開始下一個Delta-Store。
  在INSERT SELECT過程中,不管SELECT部分多高效,INSERT性能都將成爲最慢的部分。但是這種設計是爲了保證最後的哪個Delta-Store不會被完全填滿,然後在後續載入更多的數據。
  從SQL 2016(兼容級別是130或以上)開始,當帶上WITH (TABLOCK)是,可以實現並行插入。根據當前CPU和內存資源,每個CPU core都對應一個獨立的Delta-Store,使得數據的載入更加快。理論上說,如果磁盤的能力和大小足夠大,那麼core越多,相比起2014,性能也就成倍增加。但是正如前面說的,2014中,最後一個Delta-Store需要整理,在2016中,就是有N個Delta-Store需要整理,這是它的缺點,不過可以通過一些手段來緩解,比如重組索引。

上面的版本區別還需要考慮兼容級別,如果你安裝SQL 2016但是兼容級別小於130,那麼本質上還是等於使用低於2016的版本。

  下面我們用ContosoRetailDW庫來演示一下。先清理一下上面FactOnlineSales的主鍵和外鍵約束,然後創建一個聚集的列存儲索引:

use ContosoRetailDW;
 
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore] 
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];
 
create clustered index PK_FactOnlineSales on dbo.FactOnlineSales( OnlineSalesKey ) with ( maxdop = 1);
 
create clustered columnstore Index PK_FactOnlineSales on dbo.FactOnlineSales with( drop_existing = on, maxdop = 1 );

  然後創建一個測試表,

CREATE TABLE [dbo].[FactOnlineSales_CCI](
     [OnlineSalesKey] [int] NOT NULL,
     [StoreKey] [int] NOT NULL,
     [ProductKey] [int] NOT NULL,
     [PromotionKey] [int] NOT NULL,
     [CurrencyKey] [int] NOT NULL,
     [CustomerKey] [int] NOT NULL,
     INDEX PK_FactOnlineSales_CCI CLUSTERED COLUMNSTORE 
);

  然後從FactOnlineSales中把1000萬數據導入到上面的測試表,但是這個時候不指定WITH (TABLOCK)使其不強制使用並行插入,同樣我開啓了實際執行計劃和SET STATISTICS IO和TIME的選項,用於獲取一些執行信息:

set statistics time, io on;
 
insert into [dbo].[FactOnlineSales_CCI]  (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey) 
 
select distinct top 10000000 OnlineSalesKey,  store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey 
  FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
        inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
  where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
  option (recompile);

在這裏插入圖片描述

SQL Server parse and compile time: 
   CPU time = 6020 ms, elapsed time = 6312 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 12023, lob physical reads 40, lob page server reads 0, lob read-ahead reads 30079, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 1, page server reads 0, read-ahead reads 126, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

在這裏插入圖片描述

在這裏插入圖片描述
  在我的環境下用了大概71秒完成插入,同時執行計劃顯示是串行的。這個可以在運算符的屬性中看到只有1個線程來做這件事。我們還可以算一下使用了大概(1000萬除以1048576)個行組來存放,最後一個行組是多出來的沒填滿的數據行。用下面的語句可以驗證:

select *
	from sys.column_store_row_groups
	where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
	order by row_group_id asc;

在這裏插入圖片描述

  接下來清掉測試表,然後使用WITH(TABLOCK)來再跑一次,注意WITH(TABLOCK)是在目標表,而不是在源表,另外先執行一下升級兼容級別確保能使用SQL 2016的新特性,在本人環境中這個庫是2008的兼容級別:

USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO
set statistics time, io on;
 
insert into [dbo].[FactOnlineSales_CCI] with(TABLOCK) (OnlineSalesKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey) 
 
select distinct top 10000000 OnlineSalesKey,  store.StoreKey, sales.ProductKey, PromotionKey, CurrencyKey, CustomerKey 
  FROM [dbo].[FactOnlineSales] sales inner join dbo.DimProduct prod on sales.ProductKey = prod.ProductKey
        inner join dbo.DimStore store on sales.StoreKey = store.StoreKey
  where prod.ProductSubcategoryKey >= 10 and store.StoreManager >= 30
  option (recompile);

  這次時間從71秒降到40秒。

在這裏插入圖片描述

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 2 ms, elapsed time = 2 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 132 ms, elapsed time = 132 ms.
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 11237, lob physical reads 17, lob page server reads 0, lob read-ahead reads 26733, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 13, segment skipped 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimStore'. Scan count 5, logical reads 67, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'FactOnlineSales_CCI'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(10000000 行受影響)

(1 行受影響)

 SQL Server Execution Times:
   CPU time = 101093 ms,  elapsed time = 39771 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

  然後查看一下實際執行計劃,提醒一下,並行操作只能從實際執行計劃中看到,預估執行計劃是不顯示的。下圖可以看到用了4個線程,因爲我的環境只有4個core,然後並行爲“True”。
在這裏插入圖片描述

  檢查一下行組信息,這次是12行,而不是前面的10行。最後四行被平均分佈在獨立的行組中用於後續的“修剪”,前面提到過,並行會導致更多的最終行組需要處理,這裏的四行就是例子,如果你的core很多,那麼也會有同樣多的行。這些會影響性能,但是如果整體性能反而有明顯提升,那麼這種開銷是可以接受的。

select *
	from sys.column_store_row_groups
	where object_schema_name(object_id) + '.' + object_name(object_id) = 'dbo.FactOnlineSales_CCI'
	order by row_group_id asc;

在這裏插入圖片描述
  最大的可並行數爲可用的core數量-1。

總結

  本文演示了對比較大量的數據使用insert select命令導入時,如果是SQL 2016即兼容級別是130或以上版本,可以通過添加WITH(TABLOCK)來並行插入。
  並行插入可以明顯提高性能,不過代價就是你要修剪更多的元組,這個可以通過索引重建來完成,後面會演示。
  下一文將繼續,演示關於非聚集列存儲索引在數據導入方面的提升。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章