GP索引調優測試--排序篇

簡介

在PostgreSQL及GP集羣上分別進行索引調優的測試，重點研究索引對排序查詢的影響。

測試環境

數據庫信息：

PostgreSQL版本: 9.4
GP版本：4.3(基於PostgreSQL 8.2)

測試表信息：

表名：test
總行數：68w
總大小：170MB

測試語句：

查看執行計劃: explain analyze select * from test order by test_id
執行查詢: select * from test order by test_id

PostgreSQL環境測試

1. 無索引

查看執行計劃：

“Sort (cost=230780.25..232494.46 rows=685684 width=205) (actual time=3200.642..4079.336 rows=685684 loops=1)”
” Sort Key: test_id”
” Sort Method: external merge Disk: 156136kB”
” -> Seq Scan on test (cost=0.00..28379.84 rows=685684 width=205) (actual time=0.005..128.166 rows=685684 loops=1)”
“Planning time: 0.116 ms”
“Execution time: 4152.203 ms”

從該上述執行計劃可以得出：真實執行時間爲4079ms，即4s。

在本機上執行查詢，查詢時間爲2:20min，即140s。

2. 有索引

創建索引：

CREATE INDEX test_index
  ON test
  USING btree
  (test_id);

這裏的索引默認爲升序排列，並且我們的查詢語句中使用到了order by，故執行查詢時會走該索引，如下

執行計劃：

“Index Scan using test8 on test (cost=0.42..99475.90 rows=685684 width=205) (actual time=0.045..558.244 rows=685684 loops=1)”
“Planning time: 0.449 ms”
“Execution time: 606.375 ms”

從該上述執行計劃可以得出：真實執行時間爲558.244ms，即0.56s。

在本機(192.168.80.188)上執行查詢，查詢時間爲2:17min，即137s。

3. 有無索引的比較

使用索引後，查詢時間從4s減少到了0.56s，即縮短爲原來的1/8。

值得注意的是，因爲執行計劃中的查詢時間沒有考慮數據傳輸時間（在這裏是從數據庫主機傳輸到我的主機的時間），故大家看到的真實查詢時間要長很多。比如創建索引後，執行計劃中的查詢時間爲0.56s，但在我主機上進行查詢卻要花137s，也就意味着有136s的時間用於傳輸數據了。

同樣的道理，若直接在PostgreSQL主機上進行查詢，應該會快很多。比如，將上述sql在數據庫主機上運行，查詢時間僅有10s。

GP環境測試

1. 無索引

查看執行計劃：

“Gather Motion 4:1 (slice1; segments: 4) (cost=337376.64..339090.53 rows=685556 width=211)”
” Merge Key: test_id”
” Rows out: 685684 rows at destination with 1303 ms to first row, 2969 ms to end, start offset by 3.100 ms.”
” -> Sort (cost=337376.64..339090.53 rows=171389 width=211)”
” Sort Key: test_id”
” Rows out: Avg 171421.0 rows x 4 workers. Max 171423 rows (seg2) with 1004 ms to first row, 1102 ms to end, start offset by 4268012 ms.”
” Executor memory: 79865K bytes avg, 79865K bytes max (seg0).”
” Work_mem used: 79865K bytes avg, 79865K bytes max (seg0). Workfile: (0 spilling, 0 reused)”
” -> Seq Scan on test (cost=0.00..12289.56 rows=171389 width=211)”
” Rows out: Avg 171421.0 rows x 4 workers. Max 171423 rows (seg2) with 0.437 ms to first row, 69 ms to end, start offset by 4268013 ms.”
“Slice statistics:”
” (slice0) Executor memory: 335K bytes.”
” (slice1) Executor memory: 80055K bytes avg x 4 workers, 80055K bytes max (seg0). Work_mem: 79865K bytes max.”
“Statement statistics:”
” Memory used: 128000K bytes”
“Optimizer status: legacy query optimizer”
“Total runtime: 3088.360 ms”

該計劃顯示查詢所需時間約爲3s，與PostgreSQL的查詢時間差別不大。

在本機執行查詢，時間爲2:19min。

2. 有索引

創建索引：

CREATE INDEX test_index2
  ON test
  USING btree
  (test_id);

注意，因爲當前的GP不支持“排序的索引”，故上述創建的索引並不能應用到涉及排序的查詢中，查看查詢計劃進行驗證，確實沒有使用該索引，而是採用了順序掃描，如下

“Gather Motion 4:1 (slice1; segments: 4) (cost=337376.64..339090.53 rows=685556 width=211)”
” Merge Key: test_id”
” Rows out: 685684 rows at destination with 1161 ms to first row, 2784 ms to end, start offset by 1.896 ms.”
” -> Sort (cost=337376.64..339090.53 rows=171389 width=211)”
” Sort Key: test_id”
” Rows out: Avg 171421.0 rows x 4 workers. Max 171423 rows (seg2) with 1149 ms to first row, 1254 ms to end, start offset by 4268008 ms.”
” Executor memory: 79865K bytes avg, 79865K bytes max (seg0).”
” Work_mem used: 79865K bytes avg, 79865K bytes max (seg0). Workfile: (0 spilling, 0 reused)”
” -> Seq Scan on test (cost=0.00..12289.56 rows=171389 width=211)”
” Rows out: Avg 171421.0 rows x 4 workers. Max 171423 rows (seg2) with 0.224 ms to first row, 86 ms to end, start offset by 4268009 ms.”
“Slice statistics:”
” (slice0) Executor memory: 335K bytes.”
” (slice1) Executor memory: 80055K bytes avg x 4 workers, 80055K bytes max (seg0). Work_mem: 79865K bytes max.”
“Statement statistics:”
” Memory used: 128000K bytes”
“Optimizer status: legacy query optimizer”
“Total runtime: 2896.060 ms”

不再進行查詢測試，因爲與上述時間一致。

結論

採用索引後，查詢時間會大幅減少
確保有足夠的網絡帶寬，否則查詢時間的絕大部分都花在了數據傳輸上

asin929

發佈了148 篇原創文章 · 獲贊 39 · 訪問量 69萬+

私信關注

GP索引調優測試--排序篇

簡介

測試環境

PostgreSQL環境測試

1. 無索引

2. 有索引

3. 有無索引的比較

GP環境測試

1. 無索引

2. 有索引

結論

985 碩士程序員，空窗 4 個月沒有 Offer！

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

關於遠程主機的數據傳輸

Java中錯誤記錄

PostgreSQL生成測試數據

Jupyter配置Spark開發環境

PostgreSQL中的注意點

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結