接上文SQL Server 列存儲索引性能總結（9）——重建和重組聚集列存儲索引所需的內存我們知道，爲了更好的性能，行組（row group)的大小最好是1045678行，或者小於10萬行。如果沒有辦法達到最佳大小，在讀取大量數據的時候，就很難用到列存儲的優點。

前言

在列存儲索引中，最重要的概念就是行組和片段，它們分別代表了數據存儲在行存儲和列存儲中。在片段中，不管你存了1行還是100萬行數，讀取的時候都是每個頁或者區來讀取，所以如果行數太少，是挺浪費的。
如果對未排序的列使用篩選條件，那可能會調用很多額外的片段，因爲片段最好還好是已排序。未排序的數據可能會分不到多個片段中。我們知道頁越多，最終的性能就越差。對於小型表（百萬/千萬級別），當然很難只讀一個片段，也不可能只讀一個頁/區，不過這種規模並不是非常影響。但是如果是數十億行的表，不必要的片段將會成爲性能殺手。

環境搭建

接下來繼續用ContosoRetailDW來做演示，並把兼容級別設置到150也就是使用SQL Server 2019的特性。：


USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 150
GO

-- 創建聚集列存儲索引:
create clustered columnstore Index CCI  on dbo.FactOnlineSales;


select * into dbo.FactOnlineSales_SmallGroups_Test from dbo.FactOnlineSales;

接下來的技巧要注意了，我把SQL Server的Max Server Memory降低，比如300MB（只能在你自己的實驗環境下測試，畢竟300MB內存在任何企業環境下都會導致系統緩慢甚至無法響應），用來強制只用少量的行創建行組：

EXEC sys.sp_configure N'show advanced options', N'1'  RECONFIGURE WITH OVERRIDE
GO
EXEC sys.sp_configure N'max server memory (MB)', N'300'
GO
RECONFIGURE WITH OVERRIDE
GO
EXEC sys.sp_configure N'show advanced options', N'0'  RECONFIGURE WITH OVERRIDE
GO

接下來創建聚集列存儲索引到測試表上，由於內存原因，需要跑一段時間，大概3分鐘左右：

create clustered columnstore index CCI on dbo.FactOnlineSales_SmallGroups_test;

然後對比一下空間大小：

exec sp_spaceused '[dbo].[FactOnlineSales]';
exec sp_spaceused '[dbo].[FactOnlineSales_SmallGroups_test]';

兩者有所差距，但是大小不是非常明顯，源表佔了163MB的空間，測試表有189MB。但是一旦行組的數量非常多的時候，這個差異將會非常明顯。我們來細化一下兩個表的行組信息：

SELECT object_name(i.object_id) as TableName, count(*) as RowGroupsCount
	FROM sys.indexes AS i
	INNEr JOIN sys.column_store_row_groups AS rg with(nolock)
		ON i.object_id = rg.object_id
	AND i.index_id = rg.index_id 
	WHERE object_name(i.object_id) in ( 'FactOnlineSales','FactOnlineSales_SmallGroups_test')
	group by object_name(i.object_id)
	ORDER BY object_name(i.object_id);

可以看出行組的數量差異很大。測試表有79個行組但是源表只有15個，差了快6倍。接下來看看查詢（打開實際執行計劃）的效果：

dbcc freeproccache;
dbcc dropcleanbuffers;
set statistics io on
set statistics time on

select prod.ProductName, sum(sales.SalesAmount)
	from dbo.FactOnlineSales sales
		inner join dbo.DimProduct prod
			on sales.ProductKey = prod.ProductKey
		inner join dbo.DimCurrency cur
			on sales.CurrencyKey = cur.CurrencyKey
		inner join dbo.DimPromotion prom
			on sales.PromotionKey = prom.PromotionKey
	where cur.CurrencyName = 'USD' and prom.EndDate >= '2004-01-01' 
	group by prod.ProductName;
--清空緩存以免受影響
dbcc freeproccache;
dbcc dropcleanbuffers;

select prod.ProductName, sum(sales.SalesAmount)
	from dbo.FactOnlineSales_SmallGroups_test sales
		inner join dbo.DimProduct prod
			on sales.ProductKey = prod.ProductKey
		inner join dbo.DimCurrency cur
			on sales.CurrencyKey = cur.CurrencyKey
		inner join dbo.DimPromotion prom
			on sales.PromotionKey = prom.PromotionKey
	where cur.CurrencyName = 'USD' and prom.EndDate >= '2004-01-01' 
	group by prod.ProductName;

執行計劃看上去沒有明顯差異，均佔據開銷50%。
從這些信息來看，第一個執行比第二個要慢，從CPU Time（CPU時間源表小於測試表)和Escaped Time(源表大於測試表)可以看出。如果查看Statistics IO的結果，可以看到總邏輯讀還是有點差異的（源表：19,640，測試表：22,808）。另外從執行時間來看：源表15個行組1371 ms，測試表29個行組 1253 ms，沒有非常大的差異。



(2516 行受影響)
Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 6420, lob physical reads 33, lob page server reads 0, lob read-ahead reads 13220, lob page server read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 15, segment skipped 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 1, page server reads 0, read-ahead reads 123, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimPromotion'. Scan count 5, logical reads 4, physical reads 1, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimCurrency'. Scan count 5, logical reads 4, physical reads 1, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 行受影響)

 SQL Server Execution Times:
   CPU time = 756 ms,  elapsed time = 618 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


(2516 行受影響)
Table 'FactOnlineSales_SmallGroups_Test'. Scan count 4, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 8609, lob physical reads 158, lob page server reads 0, lob read-ahead reads 14199, lob page server read-ahead reads 0.
Table 'FactOnlineSales_SmallGroups_Test'. Segment reads 75, segment skipped 0.
Table 'DimProduct'. Scan count 5, logical reads 370, physical reads 1, page server reads 0, read-ahead reads 126, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimCurrency'. Scan count 5, logical reads 4, physical reads 1, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'DimPromotion'. Scan count 5, logical reads 4, physical reads 1, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 行受影響)

 SQL Server Execution Times:
   CPU time = 931 ms,  elapsed time = 322 ms.

接下來用下面的命令再做進一步分析：

SELECT i.name, object_name(p.object_id) tablename, p.index_id, i.type_desc 
   	,sum(p.rows)/count(seg.segment_id) as 'rows'
	,sum(seg.on_disk_size) as 'size in Bytes'
	,cast( sum(seg.on_disk_size) / 1024. / 1024. / 1024 as decimal(8,3)) as 'size in GB'
	,count(distinct seg.segment_id) as 'Segments'
	,count(distinct p.partition_id) as 'Partitions'
	FROM sys.column_store_segments AS seg 
		INNER JOIN sys.partitions AS p 
			ON seg.hobt_id = p.hobt_id 
		INNER JOIN sys.indexes AS i 
			ON p.object_id = i.object_id
	WHERE i.type in (5, 6)
	GROUP BY i.name, p.object_id, p.index_id, i.type_desc;

結果如下，片段的多少並不是非常影響整體體積，畢竟是使用了高效的列式壓縮。

還有字典的情況:

select 
	OBJECT_NAME(t.object_id) as 'Table Name',
	sum(dict.on_disk_size)/1024./1024 as DictionarySizeMB
	from sys.column_store_dictionaries dict
	inner join sys.partitions as p 
		ON dict.partition_id = p.partition_id
	inner join sys.tables t
		ON t.object_id = p.object_id
	inner join sys.indexes i
		ON i.object_id = t.object_id
	where i.type in (5,6) -- Clustered 和 Nonclustered Columnstore
	group by t.object_id

在字典層面，測試表佔了更大的字典大小。另外如果檢查每個列的字典數量和類型，可以看到下面結果：

select t.name as 'Table Name'
	,dict.column_id
	,col.name
	,tp.name
	,case dict.dictionary_id
		when 0 then 'Global Dictionary'
		else 'Local Dictionary'
	end as 'Dictionary Type'
	,count(dict.type) as 'Count'
	,sum(dict.on_disk_size) as 'Size in Bytes'
	,cast(sum(dict.on_disk_size) / 1024.0 / 1024 as Decimal(16,3)) as 'Size in MBytes'
	from sys.column_store_dictionaries dict
	inner join sys.partitions as p 
		ON dict.partition_id = p.partition_id
	inner join sys.tables t
		ON t.object_id = p.object_id
	inner join sys.all_columns col
		on col.column_id = dict.column_id and col.object_id = t.object_id
	inner join sys.types tp 
		ON col.system_type_id = tp.system_type_id AND col.user_type_id = tp.user_type_id   
	where t.[is_ms_shipped] = 0 
		and col.name in ('SalesAmount','ProductKey','CurrencyKey','PromotionKey')
	group by t.name,
			 case dict.dictionary_id
				when 0 then 'Global Dictionary'
				else 'Local Dictionary'
			 end, 
			 col.name,
			 tp.name,
			 dict.column_id
	order by dict.column_id, t.name;

對比Size的話，實際上兩者差距還是挺大的。特別是Local Dictionary，接近10倍的差距。

總結

從上面的結果看出，小型行組跟大型行組在某些指標上各有優勢，所以我們不能一概而論，還是那句話：具體問題具體分析。
對於這種行組數量差異，只要對聚集列存儲索引rebuild一下即可。可以看到其實微軟還是希望你使用大型行組的。畢竟rebuild是經常需要用到的維護操作，一旦rebuild成功，行組就會恢復差不多的水平。

最後記得把Max Server Memory調回去。
下一文：SQL Server 列存儲索引性能總結（11）——列存儲的維護

SQL Server 列存儲索引性能總結（10）——行組的大小影響

前言

環境搭建

總結

《日本蠟燭圖》讀書筆記 & 技術分析回測

一分鐘部署 Llama3 中文大模型，沒別的，就是快

Python多線程編程深度探索：從入門到實戰

《期貨-市場技術分析》讀書筆記

mongodb處理json數據很好

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

google瀏覽器插件開發

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

ffmpeg 百度雲盤

SQL Server導入性能對比（1）——WITH TABLOCK並行導入

SQL Azure 工作積累（1）——添加用戶到Azure SQL DB

SQL Server 列存儲索引性能總結（3）——列存儲的鎖

SQL Server 列存儲索引性能總結（9）——重建和重組聚集列存儲索引所需的內存

SQL Server 列存儲索引性能總結（7）——導入數據到列存儲索引的Delta Store

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結