PostgreSQL IoT,車聯網 - 實時軌跡、行程實踐 2 - (含index only scan類聚簇表效果)

標籤

PostgreSQL , 實時軌跡 , IoT , 車聯網 , GIS , 離散IO , 順序IO , IO放大 , 聚合 , PGLZ壓縮解壓縮 , IN OUT函數消耗


背景

車聯網場景,一個行程由若干個點組成,業務需要按行程查詢,按。

但是問題來了:

一個行程包含多個點,那麼一次查詢意味着需要查詢並返回多個點(多條記錄)。

由於有多個設備(例如汽車),大概率可能同時活躍,同時上傳點,因此以堆表存儲的話,一個行程中的多條記錄,實際上在數據庫的表裏面,存儲在多個BLOCK裏面。

如上分析,性能問題:IO放大。

如何優化?

1、行程記錄,按行聚集存儲。

類似cluster操作。

2、行程記錄,合併到單條,聚集存儲。

類似將一個行程多條記錄聚合。

例子

下面分別測試幾種優化方法帶來的性能優化效果。

1、cluster

2、array 聚合帶壓縮

3、array 聚合不帶壓縮

4、jsonb 聚合帶壓縮

5、jsonb 聚合不帶壓縮

6、text 聚合帶壓縮

7、text 聚合不帶壓縮

1 原始狀態

create unlogged table t_sensor(id serial8 primary key, sid int8, att text, crt_time timestamp);  
create index idx_t_sensor_1 on t_sensor (sid, crt_time);  
vi test.sql  
  
\set sid random(1,10000)  
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());  
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000  

查看一個行程,需要訪問11227個數據塊。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;  
                                                                 QUERY PLAN                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx_t_sensor_1 on public.t_sensor  (cost=0.49..253.36 rows=11121 width=57) (actual time=0.020..47.591 rows=11189 loops=1)  
   Output: id, sid, att, crt_time  
   Index Cond: (t_sensor.sid = 1)  
   Buffers: shared hit=3406 read=7821  
 Planning Time: 0.092 ms  
 Execution Time: 48.303 ms  
(6 rows)  
  
postgres=# select 119*11227*8/1024.0;  
      ?column?        
--------------------  
 10437.601562500000  
(1 row)  

壓測性能

vi test.sql  
  
\set sid random(1,10000)  
select * from t_sensor where sid=:sid order by crt_time ;  
  
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 14307  
latency average = 470.175 ms  
latency stddev = 43.500 ms  
tps = 119.037842 (including connections establishing)  
tps = 119.098221 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set sid random(1,10000)  
       470.178  select * from t_sensor where sid=:sid order by crt_time ;  

2 cluster

按行程,時間索引聚集。一次查詢返回一條聚合後的記錄。

postgres=# cluster t_sensor USING idx_t_sensor_1 ;  

查詢一個行程,掃描174個數據塊。IO 驟降。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx_t_sensor_1 on public.t_sensor  (cost=0.49..253.39 rows=11149 width=57) (actual time=0.018..2.212 rows=11189 loops=1)  
   Output: id, sid, att, crt_time  
   Index Cond: (t_sensor.sid = 1)  
   Buffers: shared hit=174  
 Planning Time: 0.094 ms  
 Execution Time: 2.816 ms  
(6 rows)  

壓測性能,見末尾。

3 array 帶壓縮

create unlogged table t_sensor_agg(sid int8 primary key, agg t_sensor[]);  
insert into t_sensor_agg select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;  

瓶頸:array類型的INPUT OUTPUT接口,以及壓縮解壓縮接口

Samples: 3M of event 'cpu-clock', Event count (approx.): 559749813667                   
Overhead  Shared Object                           Symbol                                
  24.59%  postgres                                [.] array_out                         
  20.70%  postgres                                [.] record_out                        
   6.74%  postgres                                [.] pglz_decompress                   
   3.78%  libc-2.17.so                            [.] __memcpy_ssse3_back               
   2.72%  postgres                                [.] pg_ltostr_zeropad                 
   2.34%  [kernel]                                [k] run_timer_softirq                 
   2.23%  postgres                                [.] pg_lltoa                          
   2.13%  postgres                                [.] 0x000000000035c350                
   1.76%  postgres                                [.] heap_deform_tuple                 
   1.49%  libc-2.17.so                            [.] __strlen_sse2_pminub              
   1.36%  postgres                                [.] appendStringInfoChar              
   1.36%  [kernel]                                [k] copy_user_enhanced_fast_string    
   1.29%  postgres                                [.] 0x000000000035c36c                
   1.28%  postgres                                [.] 0x000000000035c362                
   1.17%  postgres                                [.] FunctionCall1Coll                 
   0.92%  postgres                                [.] hash_search_with_hash_value       
   0.86%  [kernel]                                [k] _raw_spin_unlock_irqrestore       
   0.84%  postgres                                [.] j2date                            
   0.82%  postgres                                [.] 0x000000000035c357                
   0.76%  postgres                                [.] palloc                            
   0.76%  postgres                                [.] lookup_type_cache                 
   0.67%  postgres                                [.] 0x000000000035c360                
   0.66%  postgres                                [.] timestamp2tm                      
   0.64%  [kernel]                                [k] rcu_process_callbacks             
   0.64%  [kernel]                                [k] __do_softirq                     
vi test.sql  
  
\set sid random(1,10000)  
select * from t_sensor_agg4 where sid=:sid ;  

4 array 不帶壓縮

create unlogged table t_sensor_agg1(sid int8 primary key, agg t_sensor[]);  
alter table t_sensor_agg1 alter column agg set storage external;  
insert into t_sensor_agg1 select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;  

瓶頸:array類型的INPUT OUTPUT接口

Samples: 1M of event 'cpu-clock', Event count (approx.): 310127790569                  
Overhead  Shared Object                           Symbol                               
  26.06%  postgres                                [.] array_out                        
  21.44%  postgres                                [.] record_out                       
   4.20%  libc-2.17.so                            [.] __memcpy_ssse3_back              
   2.86%  [kernel]                                [k] run_timer_softirq                
   2.75%  postgres                                [.] pg_ltostr_zeropad                
   2.65%  postgres                                [.] heap_deform_tuple                
   2.28%  postgres                                [.] pg_lltoa                         
   2.14%  postgres                                [.] 0x000000000035c350               
   1.87%  [kernel]                                [k] copy_user_enhanced_fast_string   
   1.52%  libc-2.17.so                            [.] __strlen_sse2_pminub             
   1.47%  postgres                                [.] appendStringInfoChar             
   1.32%  postgres                                [.] 0x000000000035c36c               
   1.30%  postgres                                [.] 0x000000000035c362               
   1.20%  postgres                                [.] FunctionCall1Coll                
   1.11%  postgres                                [.] hash_search_with_hash_value      
   0.87%  postgres                                [.] j2date                           
   0.85%  postgres                                [.] 0x000000000035c357               
   0.81%  [kernel]                                [k] _raw_spin_unlock_irqrestore      
   0.76%  postgres                                [.] lookup_type_cache                
   0.75%  postgres                                [.] 0x000000000046d33b               
   0.74%  postgres                                [.] palloc                           
   0.72%  [kernel]                                [k] rcu_process_callbacks            
   0.68%  postgres                                [.] timestamp2tm                     
   0.68%  postgres                                [.] pfree                         

5 jsonb 帶壓縮

create unlogged table t_sensor_agg2(sid int8 primary key, agg jsonb);  
insert into t_sensor_agg2 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;  

6 jsonb 不帶壓縮

create unlogged table t_sensor_agg3(sid int8 primary key, agg jsonb);  
alter table t_sensor_agg3 alter column agg set storage external;  
insert into t_sensor_agg3 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;  

7 text 帶壓縮

create unlogged table t_sensor_agg4(sid int8 primary key, agg text);  
insert into t_sensor_agg4 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;  

8 text 不帶壓縮

create unlogged table t_sensor_agg5(sid int8 primary key, agg text);  
alter table t_sensor_agg5 alter column agg set storage external;  
insert into t_sensor_agg5 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;  

9 index only scan 類似聚集表效果

所有內容作爲INDEX的KEY,類似聚集表的效果(相鄰內容在同一個INDEX PAGE裏面)。查詢時走INDEX ONLY SCAN掃描方法,掃描的BLOCK最少。

注意:btree 索引內容不能超過1/3 PAGE (因爲BTREE是雙向鏈表,一個PAGE至少要有一條有效記錄,所以有這個限制。)。

寫入數據

create table t_sensor (id serial8 primary key, sid int8, att text, crt_time timestamp);  
  
create index idx_t_sensor_1 on t_sensor (sid, crt_time, att, id);  
  
  
vi test.sql  
\set sid random(1,10000)    
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());    
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 50  
number of threads: 50  
number of transactions per client: 2000000  
number of transactions actually processed: 100000000/100000000  
latency average = 0.193 ms  
latency stddev = 0.461 ms  
tps = 257995.418591 (including connections establishing)  
tps = 258024.212148 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.001  \set sid random(1,10000)    
         0.192  insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());  

生成VM文件(autovacuum觸發時會自動生成,但是爲了立馬看效果,手工執行一下。)

vacuum analyze t_sensor;  

INDEX ONLY SCAN, IO減少效果如下:

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=2 order by crt_time;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Index Only Scan using idx_t_sensor_1 on public.t_sensor  (cost=0.60..70.41 rows=9960 width=57) (actual time=0.019..2.109 rows=9978 loops=1)  
   Output: id, sid, att, crt_time  
   Index Cond: (t_sensor.sid = 2)  
   Heap Fetches: 0  
   Buffers: shared hit=235  
 Planning Time: 0.090 ms  
 Execution Time: 2.652 ms  
(7 rows)  

查詢性能:

vi test.sql  
  
\set sid random(1,10000)  
select * from t_sensor where sid=:sid order by crt_time;  
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 28 -j 28 -T 120  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 28  
number of threads: 28  
duration: 120 s  
number of transactions actually processed: 283638  
latency average = 11.844 ms  
latency stddev = 1.931 ms  
tps = 2363.410561 (including connections establishing)  
tps = 2363.913145 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set sid random(1,10000)    
        11.842  select * from t_sensor where sid=:sid order by crt_time;  

小結

目標數據分散在多個BLOCK中,引起IO放大的問題,通過聚集存儲,或者通過聚合存儲,可以很好的解決這個問題。

聚合後,另一個瓶頸則是聚合後的類型(array,jsonb,text)的IN OUT接口。

/ 表存儲 行程查詢 qps
原始(IO 放大) 8880 MB 119
順序(無IO 放大) 8880 MB 2057
index only scan(類似聚集表)(無IO 放大) 8880 MB 2363
聚合array(壓縮) 4523 MB 2362
聚合array(不壓縮) 8714 MB 2515
聚合json(壓縮) 5052 MB 3102
聚合json(不壓縮) 13 GB 3184
聚合text(壓縮) 4969 MB 6057
聚合text(不壓縮) 7692 MB 5997

從上面的測試,可以看到IN OUT函數接口的開銷,text<jsonb<array(composite array)。

實際的優化例子,可參考末尾的幾篇文章。例如:

1、按時間分區,舊的分區使用cluster,按行程整理數據,使用AB表切換,解決IO放大的問題。

2、異步聚合,將點數據準實時按行程ID,聚合到聚合後的行程表。

3、使用INDEX ONLY SCAN, 達到聚集表效果。對業務無任何侵入性。(例如按天分區,加全量(業務需要查詢的字段)索引。),相當於兩倍存儲空間(一份在堆表,一份在索引中)。

參考

《PostgreSQL IoT,車聯網 - 實時軌跡、行程實踐》

《PostgreSQL AB表切換最佳實踐 - 提高切換成功率,杜絕雪崩 - 珍藏級》

《PostgreSQL 時序最佳實踐 - 證券交易系統數據庫設計 - 阿里雲RDS PostgreSQL最佳實踐》

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章