PostgreSQL 11 1000億 tpcb、1000W tpcc 性能測試 - on 阿里雲ECS + ESSD (含quorum based 0丟失多副本配置與性能測試)

標籤

PostgreSQL , pgbench , tpcb , tpcc , tpch


背景

https://help.aliyun.com/knowledge_detail/64950.html

阿里雲ESSD提供了單盤32TB容量,100萬IOPS,4GB/s讀寫吞吐的能力,單臺ECS可以掛載16塊ESSD盤,組成512 TB的大容量存儲。非常適合數據庫這類IO密集應用。

PostgreSQL 作爲一款優秀的企業級開源數據庫產品,阿里雲ESSD的加入,可以帶給用戶什麼樣的體驗呢?

《PostgreSQL 11 100億 tpcb 性能 on ECS》

《PostgreSQL 11 tpcc 測試(103萬tpmC on ECS) - use sysbench-tpcc by Percona-Lab》

《(TPC-H測試 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》

ESSD雲盤部署

parted -s /dev/vdb mklabel gpt          
parted -s /dev/vdb mkpart primary 1MiB 100%         
mkfs.ext4 /dev/vdb1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L data01          
mkdir /data01            
          
vi /etc/fstab            
            
LABEL=data01 /data01     ext4        defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback    0 0          
          
          
mount -a      

fsync RT測試

1 ECS本地SSD

使用PostgreSQL提供的pg_test_fsync進行測試

5 seconds per test    
O_DIRECT supported on this platform for open_datasync and open_sync.    
    
Compare file sync methods using one 8kB write:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                     46355.824 ops/sec      22 usecs/op    
        fdatasync                         39213.835 ops/sec      26 usecs/op    
        fsync                             35912.478 ops/sec      28 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                         42426.737 ops/sec      24 usecs/op    
    
Compare file sync methods using two 8kB writes:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                     17109.945 ops/sec      58 usecs/op    
        fdatasync                         26316.089 ops/sec      38 usecs/op    
        fsync                             24202.679 ops/sec      41 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                         15760.721 ops/sec      63 usecs/op    
    
Compare open_sync with different write sizes:    
(This is designed to compare the cost of writing 16kB in different write    
open_sync sizes.)    
         1 * 16kB open_sync write         29108.820 ops/sec      34 usecs/op    
         2 *  8kB open_sync writes        15674.805 ops/sec      64 usecs/op    
         4 *  4kB open_sync writes         9942.061 ops/sec     101 usecs/op    
         8 *  2kB open_sync writes         5637.484 ops/sec     177 usecs/op    
        16 *  1kB open_sync writes         3076.057 ops/sec     325 usecs/op    
    
Test if fsync on non-write file descriptor is honored:    
(If the times are similar, fsync() can sync data written on a different    
descriptor.)    
        write, fsync, close               32581.863 ops/sec      31 usecs/op    
        write, close, fsync               32512.798 ops/sec      31 usecs/op    
    
Non-sync'ed 8kB writes:    
        write                            350232.219 ops/sec       3 usecs/op    

2 ESSD

5 seconds per test    
O_DIRECT supported on this platform for open_datasync and open_sync.    
    
Compare file sync methods using one 8kB write:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                      8395.592 ops/sec     119 usecs/op    
        fdatasync                          7722.692 ops/sec     129 usecs/op    
        fsync                              5619.389 ops/sec     178 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                          5685.669 ops/sec     176 usecs/op    
    
Compare file sync methods using two 8kB writes:    
(in wal_sync_method preference order, except fdatasync is Linux's default)    
        open_datasync                      3858.783 ops/sec     259 usecs/op    
        fdatasync                          5396.356 ops/sec     185 usecs/op    
        fsync                              4214.546 ops/sec     237 usecs/op    
        fsync_writethrough                              n/a    
        open_sync                          3025.366 ops/sec     331 usecs/op    
    
Compare open_sync with different write sizes:    
(This is designed to compare the cost of writing 16kB in different write    
open_sync sizes.)    
         1 * 16kB open_sync write          4506.749 ops/sec     222 usecs/op    
         2 *  8kB open_sync writes         3099.963 ops/sec     323 usecs/op    
         4 *  4kB open_sync writes         1763.684 ops/sec     567 usecs/op    
         8 *  2kB open_sync writes          429.923 ops/sec    2326 usecs/op    
        16 *  1kB open_sync writes          198.005 ops/sec    5050 usecs/op    
    
Test if fsync on non-write file descriptor is honored:    
(If the times are similar, fsync() can sync data written on a different    
descriptor.)    
        write, fsync, close                5393.927 ops/sec     185 usecs/op    
        write, close, fsync                5470.240 ops/sec     183 usecs/op    
    
Non-sync'ed 8kB writes:    
        write                            385505.858 ops/sec       3 usecs/op    

fio 專業IO測試

測試項

vi test    
    
[global]        
thread        
numjobs=64        
ramp_time=6        
size=10g        
exitall        
time_based        
runtime=180        
group_reporting        
randrepeat=0        
norandommap        
bs=8k        
rwmixwrite=35        
        
[rw-rand-libaio-mysql-ext4]        
stonewall        
direct=1        
iodepth=16        
iodepth_batch=8        
iodepth_low=8        
iodepth_batch_complete=8        
rw=randrw        
ioengine=libaio        
filename=/data01/ext4        
        
[rw-seq-libaio-mysql-ext4]        
stonewall        
direct=1        
iodepth=16        
iodepth_batch=8        
iodepth_low=8        
iodepth_batch_complete=8        
rw=rw        
ioengine=libaio        
filename=/data01/ext4        
    
[rw-rand-sync-pgsql-ext4]        
stonewall        
direct=0        
rw=randrw        
ioengine=sync        
filename=/data01/ext4        
        
[rw-seq-sync-pgsql-ext4]        
stonewall        
direct=0        
rw=rw        
ioengine=sync        
filename=/data01/ext4    
fio test --output ./cfq-raw.log    

1 ECS本地SSD

rw-rand-libaio-mysql-ext4: (g=0): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-seq-libaio-mysql-ext4: (g=1): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-rand-sync-pgsql-ext4: (g=2): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
rw-seq-sync-pgsql-ext4: (g=3): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
fio-3.1    
Starting 256 threads    
    
rw-rand-libaio-mysql-ext4: (groupid=0, jobs=64): err= 0: pid=27005: Tue Sep 18 15:18:42 2018    
   read: IOPS=146k, BW=1140MiB/s (1196MB/s)(200GiB/180004msec)    
    slat (usec): min=22, max=114828, avg=2266.17, stdev=2520.66    
    clat (nsec): min=1759, max=115061k, avg=2316672.58, stdev=2575502.75    
     lat (usec): min=70, max=164337, avg=4582.93, stdev=3639.95    
    clat percentiles (usec):    
     |  1.00th=[  603],  5.00th=[ 1352], 10.00th=[ 1549], 20.00th=[ 1713],    
     | 30.00th=[ 1811], 40.00th=[ 1876], 50.00th=[ 1942], 60.00th=[ 2008],    
     | 70.00th=[ 2057], 80.00th=[ 2114], 90.00th=[ 2245], 95.00th=[ 2376],    
     | 99.00th=[16581], 99.50th=[19792], 99.90th=[27919], 99.95th=[32900],    
     | 99.99th=[49021]    
   bw (  KiB/s): min= 6000, max=42120, per=1.54%, avg=17968.83, stdev=2125.54, samples=22976    
   iops        : min=  750, max= 5265, avg=2245.75, stdev=265.70, samples=22976    
  write: IOPS=78.6k, BW=614MiB/s (644MB/s)(108GiB/180004msec)    
    slat (usec): min=24, max=114826, avg=2266.44, stdev=2515.63    
    clat (nsec): min=1056, max=114853k, avg=2225812.28, stdev=2478372.79    
     lat (usec): min=43, max=164125, avg=4492.34, stdev=3574.60    
    clat percentiles (usec):    
     |  1.00th=[   19],  5.00th=[ 1123], 10.00th=[ 1500], 20.00th=[ 1696],    
     | 30.00th=[ 1795], 40.00th=[ 1876], 50.00th=[ 1942], 60.00th=[ 1991],    
     | 70.00th=[ 2057], 80.00th=[ 2114], 90.00th=[ 2212], 95.00th=[ 2311],    
     | 99.00th=[16057], 99.50th=[19268], 99.90th=[27132], 99.95th=[31327],    
     | 99.99th=[47449]    
   bw (  KiB/s): min= 3235, max=22265, per=1.54%, avg=9680.55, stdev=1180.58, samples=22976    
   iops        : min=  404, max= 2783, avg=1209.72, stdev=147.58, samples=22976    
  lat (usec)   : 2=0.01%, 4=0.06%, 10=0.04%, 20=0.33%, 50=0.37%    
  lat (usec)   : 100=0.16%, 250=0.21%, 500=0.37%, 750=0.59%, 1000=0.84%    
  lat (msec)   : 2=56.76%, 4=36.91%, 10=0.82%, 20=2.08%, 50=0.44%    
  lat (msec)   : 100=0.01%, 250=0.01%    
  cpu          : usr=0.50%, sys=82.17%, ctx=8285760, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.4%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=26268555,14152416,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-seq-libaio-mysql-ext4: (groupid=1, jobs=64): err= 0: pid=27075: Tue Sep 18 15:18:42 2018    
   read: IOPS=147k, BW=1150MiB/s (1206MB/s)(202GiB/180004msec)    
    slat (usec): min=22, max=97402, avg=2245.77, stdev=2561.30    
    clat (nsec): min=1597, max=97414k, avg=2299719.84, stdev=2622137.64    
     lat (usec): min=37, max=121819, avg=4545.57, stdev=3688.31    
    clat percentiles (usec):    
     |  1.00th=[  529],  5.00th=[ 1287], 10.00th=[ 1516], 20.00th=[ 1680],    
     | 30.00th=[ 1778], 40.00th=[ 1844], 50.00th=[ 1909], 60.00th=[ 1975],    
     | 70.00th=[ 2040], 80.00th=[ 2114], 90.00th=[ 2212], 95.00th=[ 2376],    
     | 99.00th=[16712], 99.50th=[19792], 99.90th=[28181], 99.95th=[32375],    
     | 99.99th=[47973]    
   bw (  KiB/s): min= 6822, max=34080, per=1.53%, avg=18044.54, stdev=2237.76, samples=22979    
   iops        : min=  852, max= 4260, avg=2255.24, stdev=279.74, samples=22979    
  write: IOPS=79.3k, BW=620MiB/s (650MB/s)(109GiB/180004msec)    
    slat (usec): min=24, max=97396, avg=2247.37, stdev=2563.13    
    clat (nsec): min=965, max=97414k, avg=2198816.49, stdev=2518690.40    
     lat (usec): min=40, max=121817, avg=4446.27, stdev=3625.49    
    clat percentiles (usec):    
     |  1.00th=[   17],  5.00th=[  996], 10.00th=[ 1434], 20.00th=[ 1647],    
     | 30.00th=[ 1762], 40.00th=[ 1844], 50.00th=[ 1909], 60.00th=[ 1958],    
     | 70.00th=[ 2024], 80.00th=[ 2089], 90.00th=[ 2180], 95.00th=[ 2278],    
     | 99.00th=[16188], 99.50th=[19268], 99.90th=[27132], 99.95th=[31327],    
     | 99.99th=[46400]    
   bw (  KiB/s): min= 3073, max=18400, per=1.53%, avg=9720.67, stdev=1243.91, samples=22979    
   iops        : min=  384, max= 2300, avg=1214.75, stdev=155.52, samples=22979    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.01%, 4=0.09%, 10=0.06%, 20=0.50%, 50=0.28%    
  lat (usec)   : 100=0.13%, 250=0.20%, 500=0.54%, 750=0.71%, 1000=0.98%    
  lat (msec)   : 2=61.58%, 4=31.54%, 10=0.78%, 20=2.16%, 50=0.46%    
  lat (msec)   : 100=0.01%    
  cpu          : usr=0.50%, sys=81.14%, ctx=8462673, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.3%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=26505435,14277757,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-rand-sync-pgsql-ext4: (groupid=2, jobs=64): err= 0: pid=27150: Tue Sep 18 15:18:42 2018    
   read: IOPS=228k, BW=1785MiB/s (1872MB/s)(314GiB/180002msec)    
    clat (nsec): min=945, max=14282k, avg=5002.95, stdev=48977.41    
     lat (nsec): min=1247, max=14282k, avg=5374.56, stdev=49037.52    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4],    
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    4],    
     | 70.00th=[    4], 80.00th=[    4], 90.00th=[    5], 95.00th=[    5],    
     | 99.00th=[   12], 99.50th=[   90], 99.90th=[  212], 99.95th=[  586],    
     | 99.99th=[ 2180]    
   bw (  KiB/s): min= 4824, max=77689, per=1.55%, avg=28302.49, stdev=3542.33, samples=22979    
   iops        : min=  603, max= 9711, avg=3537.39, stdev=442.79, samples=22979    
  write: IOPS=123k, BW=961MiB/s (1008MB/s)(169GiB/180002msec)    
    clat (usec): min=2, max=216155, avg=504.28, stdev=1314.99    
     lat (usec): min=2, max=216155, avg=504.72, stdev=1315.00    
    clat percentiles (usec):    
     |  1.00th=[    8],  5.00th=[   77], 10.00th=[  285], 20.00th=[  392],    
     | 30.00th=[  408], 40.00th=[  420], 50.00th=[  469], 60.00th=[  519],    
     | 70.00th=[  545], 80.00th=[  562], 90.00th=[  586], 95.00th=[  603],    
     | 99.00th=[  685], 99.50th=[ 2089], 99.90th=[15270], 99.95th=[23462],    
     | 99.99th=[46924]    
   bw (  KiB/s): min= 2661, max=42215, per=1.55%, avg=15240.76, stdev=1812.18, samples=22979    
   iops        : min=  332, max= 5276, avg=1904.70, stdev=226.51, samples=22979    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.01%, 4=56.34%, 10=8.49%, 20=1.04%, 50=0.35%    
  lat (usec)   : 100=0.43%, 250=1.42%, 500=16.05%, 750=15.58%, 1000=0.04%    
  lat (msec)   : 2=0.07%, 4=0.05%, 10=0.07%, 20=0.04%, 50=0.02%    
  lat (msec)   : 100=0.01%, 250=0.01%    
  cpu          : usr=0.93%, sys=89.47%, ctx=5631403, majf=0, minf=0    
  IO depths    : 1=102.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=41127965,22146325,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
rw-seq-sync-pgsql-ext4: (groupid=3, jobs=64): err= 0: pid=27221: Tue Sep 18 15:18:42 2018    
   read: IOPS=357k, BW=2786MiB/s (2922MB/s)(490GiB/180001msec)    
    clat (nsec): min=1006, max=20336k, avg=2745.46, stdev=22291.35    
     lat (nsec): min=1307, max=20336k, avg=3110.59, stdev=22301.63    
    clat percentiles (nsec):    
     |  1.00th=[ 1592],  5.00th=[ 1800], 10.00th=[ 1944], 20.00th=[ 2160],    
     | 30.00th=[ 2352], 40.00th=[ 2480], 50.00th=[ 2608], 60.00th=[ 2736],    
     | 70.00th=[ 2864], 80.00th=[ 3024], 90.00th=[ 3216], 95.00th=[ 3440],    
     | 99.00th=[ 4016], 99.50th=[ 8896], 99.90th=[15552], 99.95th=[17280],    
     | 99.99th=[21376]    
   bw (  KiB/s): min=21099, max=151871, per=1.56%, avg=44588.93, stdev=5219.41, samples=22983    
   iops        : min= 2637, max=18983, avg=5573.29, stdev=652.43, samples=22983    
  write: IOPS=192k, BW=1500MiB/s (1573MB/s)(264GiB/180001msec)    
    clat (usec): min=2, max=97210, avg=322.71, stdev=828.04    
     lat (usec): min=2, max=97210, avg=323.13, stdev=828.04    
    clat percentiles (usec):    
     |  1.00th=[    6],  5.00th=[   97], 10.00th=[  182], 20.00th=[  269],    
     | 30.00th=[  297], 40.00th=[  306], 50.00th=[  310], 60.00th=[  314],    
     | 70.00th=[  322], 80.00th=[  326], 90.00th=[  334], 95.00th=[  343],    
     | 99.00th=[  400], 99.50th=[  930], 99.90th=[12911], 99.95th=[19792],    
     | 99.99th=[32113]    
   bw (  KiB/s): min=11433, max=81619, per=1.56%, avg=24008.70, stdev=2707.40, samples=22983    
   iops        : min= 1429, max=10202, avg=3000.77, stdev=338.43, samples=22983    
  lat (usec)   : 2=7.89%, 4=56.48%, 10=1.25%, 20=0.37%, 50=0.25%    
  lat (usec)   : 100=0.57%, 250=4.14%, 500=28.80%, 750=0.06%, 1000=0.03%    
  lat (msec)   : 2=0.05%, 4=0.03%, 10=0.04%, 20=0.03%, 50=0.02%    
  lat (msec)   : 100=0.01%    
  cpu          : usr=1.34%, sys=90.37%, ctx=8645391, majf=0, minf=0    
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=64196338,34566565,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
    
Run status group 0 (all jobs):    
   READ: bw=1140MiB/s (1196MB/s), 1140MiB/s-1140MiB/s (1196MB/s-1196MB/s), io=200GiB (215GB), run=180004-180004msec    
  WRITE: bw=614MiB/s (644MB/s), 614MiB/s-614MiB/s (644MB/s-644MB/s), io=108GiB (116GB), run=180004-180004msec    
    
Run status group 1 (all jobs):    
   READ: bw=1150MiB/s (1206MB/s), 1150MiB/s-1150MiB/s (1206MB/s-1206MB/s), io=202GiB (217GB), run=180004-180004msec    
  WRITE: bw=620MiB/s (650MB/s), 620MiB/s-620MiB/s (650MB/s-650MB/s), io=109GiB (117GB), run=180004-180004msec    
    
Run status group 2 (all jobs):    
   READ: bw=1785MiB/s (1872MB/s), 1785MiB/s-1785MiB/s (1872MB/s-1872MB/s), io=314GiB (337GB), run=180002-180002msec    
  WRITE: bw=961MiB/s (1008MB/s), 961MiB/s-961MiB/s (1008MB/s-1008MB/s), io=169GiB (181GB), run=180002-180002msec    
    
Run status group 3 (all jobs):    
   READ: bw=2786MiB/s (2922MB/s), 2786MiB/s-2786MiB/s (2922MB/s-2922MB/s), io=490GiB (526GB), run=180001-180001msec    
  WRITE: bw=1500MiB/s (1573MB/s), 1500MiB/s-1500MiB/s (1573MB/s-1573MB/s), io=264GiB (283GB), run=180001-180001msec    
    
Disk stats (read/write):    
    dm-0: ios=56484741/52000097, merge=0/0, ticks=55721482/45621907, in_queue=102865767, util=67.65%, aggrios=6930165/6249265, aggrmerge=130427/250746, aggrticks=6648860/4061887, aggrin_queue=10712820, aggrutil=66.53%    
  vdb: ios=6928564/6249448, merge=130186/250641, ticks=6629128/3928950, in_queue=10559055, util=65.93%    
  vdc: ios=6930486/6248991, merge=130413/250950, ticks=6643790/3962244, in_queue=10605847, util=66.02%    
  vdd: ios=6928089/6250855, merge=130732/250764, ticks=6472207/4009640, in_queue=10493342, util=66.18%    
  vde: ios=6929909/6250351, merge=130382/250303, ticks=6661137/4040922, in_queue=10701353, util=66.20%    
  vdf: ios=6932429/6245563, merge=130328/251129, ticks=6681549/4088606, in_queue=10769793, util=66.29%    
  vdg: ios=6930521/6249106, merge=130467/250956, ticks=6697543/4113859, in_queue=10811547, util=66.35%    
  vdh: ios=6930993/6249837, merge=130681/250844, ticks=6698661/4151500, in_queue=10851206, util=66.41%    
  vdi: ios=6930333/6249975, merge=130228/250384, ticks=6706869/4199379, in_queue=10910423, util=66.53%    

2 ESSD

rw-rand-libaio-mysql-ext4: (g=0): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-seq-libaio-mysql-ext4: (g=1): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=16    
...    
rw-rand-sync-pgsql-ext4: (g=2): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
rw-seq-sync-pgsql-ext4: (g=3): rw=rw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=sync, iodepth=1    
...    
fio-3.1    
Starting 256 threads    
    
rw-rand-libaio-mysql-ext4: (groupid=0, jobs=64): err= 0: pid=21221: Tue Sep 18 15:19:03 2018    
   read: IOPS=112k, BW=879MiB/s (922MB/s)(154GiB/180008msec)    
    slat (usec): min=16, max=119403, avg=1880.80, stdev=2119.90    
    clat (nsec): min=940, max=512641k, avg=4098774.34, stdev=3181162.93    
     lat (usec): min=158, max=514427, avg=5979.64, stdev=3696.51    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[  486], 10.00th=[  922], 20.00th=[ 1729],    
     | 30.00th=[ 2474], 40.00th=[ 2933], 50.00th=[ 3425], 60.00th=[ 4080],    
     | 70.00th=[ 5014], 80.00th=[ 6128], 90.00th=[ 7832], 95.00th=[ 9503],    
     | 99.00th=[14091], 99.50th=[16450], 99.90th=[23200], 99.95th=[29230],    
     | 99.99th=[61604]    
   bw (  KiB/s): min= 8615, max=31056, per=1.57%, avg=14172.78, stdev=990.53, samples=23040    
   iops        : min= 1076, max= 3882, avg=1771.17, stdev=123.81, samples=23040    
  write: IOPS=60.6k, BW=473MiB/s (496MB/s)(83.2GiB/180008msec)    
    slat (usec): min=17, max=119400, avg=1881.51, stdev=2122.86    
    clat (nsec): min=1019, max=510353k, avg=3892737.90, stdev=3154201.30    
     lat (usec): min=118, max=512897, avg=5774.32, stdev=3688.63    
    clat percentiles (usec):    
     |  1.00th=[    3],  5.00th=[  388], 10.00th=[  799], 20.00th=[ 1500],    
     | 30.00th=[ 2278], 40.00th=[ 2802], 50.00th=[ 3261], 60.00th=[ 3851],    
     | 70.00th=[ 4752], 80.00th=[ 5866], 90.00th=[ 7504], 95.00th=[ 9241],    
     | 99.00th=[13829], 99.50th=[16188], 99.90th=[22676], 99.95th=[28181],    
     | 99.99th=[58459]    
   bw (  KiB/s): min= 4137, max=17712, per=1.57%, avg=7634.48, stdev=590.18, samples=23040    
   iops        : min=  517, max= 2214, avg=953.86, stdev=73.79, samples=23040    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.71%, 4=1.40%, 10=0.17%, 20=0.04%, 50=0.11%    
  lat (usec)   : 100=0.20%, 250=0.80%, 500=2.06%, 750=2.89%, 1000=3.31%    
  lat (msec)   : 2=12.56%, 4=35.84%, 10=36.00%, 20=3.73%, 50=0.17%    
  lat (msec)   : 100=0.01%, 250=0.01%, 750=0.01%    
  cpu          : usr=0.31%, sys=49.41%, ctx=17838909, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=103.3%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=20248219,10907423,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-seq-libaio-mysql-ext4: (groupid=1, jobs=64): err= 0: pid=21285: Tue Sep 18 15:19:03 2018    
   read: IOPS=133k, BW=1037MiB/s (1087MB/s)(182GiB/180005msec)    
    slat (usec): min=10, max=40443, avg=861.52, stdev=1493.93    
    clat (nsec): min=953, max=508067k, avg=4239522.11, stdev=2801559.16    
     lat (usec): min=188, max=510200, avg=5101.11, stdev=3015.85    
    clat percentiles (usec):    
     |  1.00th=[  420],  5.00th=[  914], 10.00th=[ 1319], 20.00th=[ 2024],    
     | 30.00th=[ 2671], 40.00th=[ 3261], 50.00th=[ 3851], 60.00th=[ 4490],    
     | 70.00th=[ 5211], 80.00th=[ 6194], 90.00th=[ 7635], 95.00th=[ 8979],    
     | 99.00th=[11731], 99.50th=[12911], 99.90th=[15795], 99.95th=[17171],    
     | 99.99th=[21890]    
   bw (  KiB/s): min=  602, max=22672, per=1.53%, avg=16220.02, stdev=2446.89, samples=23020    
   iops        : min=   75, max= 2834, avg=2027.07, stdev=305.86, samples=23020    
  write: IOPS=71.4k, BW=558MiB/s (585MB/s)(98.1GiB/180005msec)    
    slat (usec): min=12, max=40442, avg=871.28, stdev=1501.73    
    clat (nsec): min=913, max=509492k, avg=3964521.95, stdev=2769947.35    
     lat (usec): min=129, max=511537, avg=4835.88, stdev=2992.88    
    clat percentiles (usec):    
     |  1.00th=[  223],  5.00th=[  791], 10.00th=[ 1172], 20.00th=[ 1827],    
     | 30.00th=[ 2409], 40.00th=[ 2999], 50.00th=[ 3589], 60.00th=[ 4228],    
     | 70.00th=[ 4883], 80.00th=[ 5866], 90.00th=[ 7242], 95.00th=[ 8455],    
     | 99.00th=[11207], 99.50th=[12387], 99.90th=[15008], 99.95th=[16319],    
     | 99.99th=[20317]    
   bw (  KiB/s): min=  309, max=12471, per=1.53%, avg=8733.75, stdev=1342.51, samples=23020    
   iops        : min=   38, max= 1558, avg=1091.29, stdev=167.81, samples=23020    
  lat (nsec)   : 1000=0.01%    
  lat (usec)   : 2=0.31%, 4=0.36%, 10=0.02%, 20=0.01%, 50=0.01%    
  lat (usec)   : 100=0.01%, 250=0.08%, 500=0.82%, 750=2.09%, 1000=2.82%    
  lat (msec)   : 2=14.32%, 4=33.12%, 10=43.55%, 20=2.47%, 50=0.01%    
  lat (msec)   : 100=0.01%, 250=0.01%, 750=0.01%    
  cpu          : usr=0.41%, sys=9.90%, ctx=21948046, majf=0, minf=0    
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.1%, 16=102.1%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=0.0%, 8=100.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=23881029,12859729,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=16    
rw-rand-sync-pgsql-ext4: (groupid=2, jobs=64): err= 0: pid=21355: Tue Sep 18 15:19:03 2018    
   read: IOPS=304k, BW=2374MiB/s (2489MB/s)(417GiB/180001msec)    
    clat (nsec): min=659, max=20344k, avg=4937.11, stdev=53635.32    
     lat (nsec): min=830, max=20344k, avg=5147.67, stdev=53639.04    
    clat percentiles (nsec):    
     |  1.00th=[   1928],  5.00th=[   2096], 10.00th=[   2192],    
     | 20.00th=[   2352], 30.00th=[   2448], 40.00th=[   2544],    
     | 50.00th=[   2640], 60.00th=[   2736], 70.00th=[   2864],    
     | 80.00th=[   2992], 90.00th=[   3248], 95.00th=[   3504],    
     | 99.00th=[   8640], 99.50th=[  16512], 99.90th=[ 585728],    
     | 99.95th=[1122304], 99.99th=[2375680]    
   bw (  KiB/s): min= 8661, max=312564, per=1.57%, avg=38145.72, stdev=5837.53, samples=23040    
   iops        : min= 1082, max=39070, avg=4767.84, stdev=729.68, samples=23040    
  write: IOPS=164k, BW=1278MiB/s (1340MB/s)(225GiB/180001msec)    
    clat (usec): min=2, max=517110, avg=377.98, stdev=1501.97    
     lat (usec): min=2, max=517110, avg=378.25, stdev=1501.97    
    clat percentiles (usec):    
     |  1.00th=[    6],  5.00th=[   99], 10.00th=[  265], 20.00th=[  330],    
     | 30.00th=[  343], 40.00th=[  351], 50.00th=[  355], 60.00th=[  363],    
     | 70.00th=[  371], 80.00th=[  379], 90.00th=[  396], 95.00th=[  416],    
     | 99.00th=[  545], 99.50th=[ 1045], 99.90th=[11469], 99.95th=[16581],    
     | 99.99th=[28967]    
   bw (  KiB/s): min= 4709, max=168193, per=1.57%, avg=20539.14, stdev=3068.53, samples=23040    
   iops        : min=  588, max=21024, avg=2567.01, stdev=383.55, samples=23040    
  lat (nsec)   : 750=0.01%, 1000=0.01%    
  lat (usec)   : 2=1.33%, 4=62.24%, 10=1.91%, 20=0.50%, 50=0.18%    
  lat (usec)   : 100=0.32%, 250=1.63%, 500=31.34%, 750=0.27%, 1000=0.06%    
  lat (msec)   : 2=0.07%, 4=0.04%, 10=0.06%, 20=0.03%, 50=0.01%    
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%    
  cpu          : usr=0.53%, sys=89.73%, ctx=7630423, majf=0, minf=0    
  IO depths    : 1=102.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=54689028,29447045,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
rw-seq-sync-pgsql-ext4: (groupid=3, jobs=64): err= 0: pid=21420: Tue Sep 18 15:19:03 2018    
   read: IOPS=400k, BW=3128MiB/s (3280MB/s)(550GiB/180002msec)    
    clat (nsec): min=655, max=77256k, avg=1908.86, stdev=16100.37    
     lat (nsec): min=818, max=77256k, avg=2119.98, stdev=16102.12    
    clat percentiles (nsec):    
     |  1.00th=[  988],  5.00th=[ 1144], 10.00th=[ 1256], 20.00th=[ 1416],    
     | 30.00th=[ 1576], 40.00th=[ 1704], 50.00th=[ 1816], 60.00th=[ 1928],    
     | 70.00th=[ 2040], 80.00th=[ 2192], 90.00th=[ 2384], 95.00th=[ 2576],    
     | 99.00th=[ 3184], 99.50th=[ 5472], 99.90th=[13504], 99.95th=[15424],    
     | 99.99th=[18304]    
   bw (  KiB/s): min=27335, max=205560, per=1.57%, avg=50224.99, stdev=4880.87, samples=23040    
   iops        : min= 3416, max=25695, avg=6277.74, stdev=610.11, samples=23040    
  write: IOPS=216k, BW=1684MiB/s (1766MB/s)(296GiB/180002msec)    
    clat (nsec): min=1980, max=344807k, avg=289811.56, stdev=750064.44    
     lat (usec): min=2, max=344807, avg=290.08, stdev=750.06    
    clat percentiles (usec):    
     |  1.00th=[    5],  5.00th=[  106], 10.00th=[  194], 20.00th=[  258],    
     | 30.00th=[  273], 40.00th=[  277], 50.00th=[  285], 60.00th=[  289],    
     | 70.00th=[  293], 80.00th=[  297], 90.00th=[  306], 95.00th=[  310],    
     | 99.00th=[  351], 99.50th=[  799], 99.90th=[ 8356], 99.95th=[12387],    
     | 99.99th=[24249]    
   bw (  KiB/s): min=13581, max=112633, per=1.57%, avg=27046.57, stdev=2507.74, samples=23040    
   iops        : min= 1697, max=14079, avg=3380.43, stdev=313.47, samples=23040    
  lat (nsec)   : 750=0.01%, 1000=0.74%    
  lat (usec)   : 2=42.27%, 4=21.80%, 10=1.17%, 20=0.26%, 50=0.13%    
  lat (usec)   : 100=0.33%, 250=4.51%, 500=28.59%, 750=0.03%, 1000=0.02%    
  lat (msec)   : 2=0.05%, 4=0.04%, 10=0.05%, 20=0.02%, 50=0.01%    
  lat (msec)   : 100=0.01%, 500=0.01%    
  cpu          : usr=0.63%, sys=92.37%, ctx=9426487, majf=0, minf=0    
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%    
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%    
     issued rwt: total=72069547,38810440,0, short=0,0,0, dropped=0,0,0    
     latency   : target=0, window=0, percentile=100.00%, depth=1    
    
Run status group 0 (all jobs):    
   READ: bw=879MiB/s (922MB/s), 879MiB/s-879MiB/s (922MB/s-922MB/s), io=154GiB (166GB), run=180008-180008msec    
  WRITE: bw=473MiB/s (496MB/s), 473MiB/s-473MiB/s (496MB/s-496MB/s), io=83.2GiB (89.4GB), run=180008-180008msec    
    
Run status group 1 (all jobs):    
   READ: bw=1037MiB/s (1087MB/s), 1037MiB/s-1037MiB/s (1087MB/s-1087MB/s), io=182GiB (196GB), run=180005-180005msec    
  WRITE: bw=558MiB/s (585MB/s), 558MiB/s-558MiB/s (585MB/s-585MB/s), io=98.1GiB (105GB), run=180005-180005msec    
    
Run status group 2 (all jobs):    
   READ: bw=2374MiB/s (2489MB/s), 2374MiB/s-2374MiB/s (2489MB/s-2489MB/s), io=417GiB (448GB), run=180001-180001msec    
  WRITE: bw=1278MiB/s (1340MB/s), 1278MiB/s-1278MiB/s (1340MB/s-1340MB/s), io=225GiB (241GB), run=180001-180001msec    
    
Run status group 3 (all jobs):    
   READ: bw=3128MiB/s (3280MB/s), 3128MiB/s-3128MiB/s (3280MB/s-3280MB/s), io=550GiB (590GB), run=180002-180002msec    
  WRITE: bw=1684MiB/s (1766MB/s), 1684MiB/s-1684MiB/s (1766MB/s-1766MB/s), io=296GiB (318GB), run=180002-180002msec    
    
Disk stats (read/write):    
  vdb: ios=32558186/26432830, merge=13683969/3967228, ticks=103112470/71513817, in_queue=174719639, util=60.48%    

PostgreSQL 11 測試

1、參數

listen_addresses = '0.0.0.0'      
port = 1921      
max_connections = 2000      
superuser_reserved_connections = 3      
unix_socket_directories = '., /var/run/postgresql, /tmp'      
tcp_keepalives_idle = 60      
tcp_keepalives_interval = 10      
tcp_keepalives_count = 10      
shared_buffers = 64GB      
max_prepared_transactions = 2000      
work_mem = 8MB      
maintenance_work_mem = 2GB      
dynamic_shared_memory_type = posix      
vacuum_cost_delay = 0      
bgwriter_delay = 10ms      
bgwriter_lru_maxpages = 1000      
bgwriter_lru_multiplier = 10.0      
effective_io_concurrency = 0      
max_worker_processes = 128      
max_parallel_maintenance_workers = 64      
max_parallel_workers_per_gather = 0      
parallel_leader_participation = on      
max_parallel_workers = 64      
wal_level = minimal        
synchronous_commit = off      
wal_writer_delay = 10ms      
checkpoint_timeout = 35min      
max_wal_size = 128GB      
min_wal_size = 32GB      
checkpoint_completion_target = 0.1      
max_wal_senders = 0      
effective_cache_size = 400GB      
log_destination = 'csvlog'      
logging_collector = on      
log_directory = 'log'      
log_filename = 'postgresql-%a.log'      
log_truncate_on_rotation = on      
log_rotation_age = 1d      
log_rotation_size = 0      
log_checkpoints = on       
log_connections = on      
log_disconnections = on      
log_error_verbosity = verbose       
log_line_prefix = '%m [%p] '      
log_timezone = 'PRC'      
log_autovacuum_min_duration = 0      
autovacuum_max_workers = 16      
autovacuum_freeze_max_age = 1200000000      
autovacuum_multixact_freeze_max_age = 1400000000      
autovacuum_vacuum_cost_delay = 0ms      
vacuum_freeze_table_age = 1150000000      
vacuum_multixact_freeze_table_age = 1150000000      
datestyle = 'iso, mdy'      
timezone = 'PRC'      
lc_messages = 'C'      
lc_monetary = 'C'      
lc_numeric = 'C'      
lc_time = 'C'      
default_text_search_config = 'pg_catalog.english'      
jit = off      
cpu_tuple_cost=0.00018884145574257426        
cpu_index_tuple_cost = 0.00433497085216479990        
cpu_operator_cost = 0.00216748542608239995        
seq_page_cost=0.014329        
random_page_cost = 0.016     
parallel_tuple_cost = 0      
parallel_setup_cost = 0     
min_parallel_table_scan_size = 0    
min_parallel_index_scan_size = 0    

1000W tpcc 測試

16072 * 60 = 96.4萬 tpmC    

詳細結果

......    
[ 2993s ] thds: 64 tps: 15107.81 qps: 431171.53 (r/w/o: 196624.50/204331.41/30215.62) lat (ms,95%): 10.65 err/s 51.00 reconn/s: 0.00    
[ 2994s ] thds: 64 tps: 15454.20 qps: 434439.71 (r/w/o: 198195.61/205335.70/30908.41) lat (ms,95%): 10.46 err/s 77.00 reconn/s: 0.00    
[ 2995s ] thds: 64 tps: 15480.57 qps: 438798.81 (r/w/o: 200298.44/207538.23/30962.14) lat (ms,95%): 10.46 err/s 72.00 reconn/s: 0.00    
[ 2996s ] thds: 64 tps: 15341.97 qps: 434496.22 (r/w/o: 198027.64/205784.63/30683.94) lat (ms,95%): 10.65 err/s 73.00 reconn/s: 0.00    
[ 2997s ] thds: 64 tps: 15208.54 qps: 433973.96 (r/w/o: 197975.05/205581.82/30417.08) lat (ms,95%): 10.65 err/s 75.01 reconn/s: 0.00    
[ 2998s ] thds: 64 tps: 15300.14 qps: 431763.76 (r/w/o: 196862.95/204300.53/30600.28) lat (ms,95%): 10.65 err/s 84.00 reconn/s: 0.00    
[ 2999s ] thds: 64 tps: 15108.49 qps: 426253.35 (r/w/o: 194171.57/201864.80/30216.98) lat (ms,95%): 10.65 err/s 56.99 reconn/s: 0.00    
[ 3000s ] thds: 64 tps: 15046.89 qps: 428187.32 (r/w/o: 195463.56/202631.98/30091.78) lat (ms,95%): 10.84 err/s 70.00 reconn/s: 0.00    
SQL statistics:    
    queries performed:    
        read:                            625427855    
        write:                           649118720    
        other:                           96478628    
        total:                           1371025203    
    transactions:                        48223282 (16072.47 per sec.)    
    queries:                             1371025203 (456952.75 per sec.)    
    ignored errors:                      210005 (69.99 per sec.)    
    reconnects:                          0      (0.00 per sec.)    
    
General statistics:    
    total time:                          3000.3636s    
    total number of events:              48223282    
    
Latency (ms):    
         min:                                    0.28    
         avg:                                    3.98    
         max:                                  912.95    
         95th percentile:                        9.91    
         sum:                            191859179.61    
    
Threads fairness:    
    events (avg/stddev):           753488.7812/3072.19    
    execution time (avg/stddev):   2997.7997/0.05    

1000億 tpcb 測試

1、生成1000億數據

nohup pgbench -i -s 1000000 -I dtg >./pgbench_ins.log 2>&1 &    
    
    
99998900000 of 100000000000 tuples (99%) done (elapsed 93180.83 s, remaining 1.03 s)    
99999000000 of 100000000000 tuples (99%) done (elapsed 93181.05 s, remaining 0.93 s)    
99999100000 of 100000000000 tuples (99%) done (elapsed 93181.13 s, remaining 0.84 s)    
99999200000 of 100000000000 tuples (99%) done (elapsed 93181.21 s, remaining 0.75 s)    
99999300000 of 100000000000 tuples (99%) done (elapsed 93181.30 s, remaining 0.65 s)    
99999400000 of 100000000000 tuples (99%) done (elapsed 93182.01 s, remaining 0.56 s)    
99999500000 of 100000000000 tuples (99%) done (elapsed 93182.09 s, remaining 0.47 s)    
99999600000 of 100000000000 tuples (99%) done (elapsed 93182.17 s, remaining 0.37 s)    
99999700000 of 100000000000 tuples (99%) done (elapsed 93182.25 s, remaining 0.28 s)    
99999800000 of 100000000000 tuples (99%) done (elapsed 93182.33 s, remaining 0.19 s)    
99999900000 of 100000000000 tuples (99%) done (elapsed 93182.42 s, remaining 0.09 s)    
100000000000 of 100000000000 tuples (100%) done (elapsed 93182.50 s, remaining 0.00 s)    
done.    

生成1000億數據耗時: 93182 秒。 (約25小時 52分鐘。)

2、給1000億的單表創建索引(64 parallel)

postgres=# analyze pgbench_accounts ;    
ANALYZE    
postgres=# alter table pgbench_accounts set (parallel_workers =64);    
ALTER TABLE    
nohup pgbench -i -s 1000000 -I p > ./pk.log 2>&1 &    

1000億單表創建索引耗時: 10小時 50分鐘。

1000億單表、索引容量大小

1000億單表:12.5 TB。

1000億單表索引: 2 TB。

postgres=# \di+ pgbench*  
                                      List of relations  
 Schema |         Name          | Type  |  Owner   |      Table       |  Size   | Description   
--------+-----------------------+-------+----------+------------------+---------+-------------  
 public | pgbench_accounts_pkey | index | postgres | pgbench_accounts | 2092 GB |   
 public | pgbench_branches_pkey | index | postgres | pgbench_branches | 21 MB   |   
 public | pgbench_tellers_pkey  | index | postgres | pgbench_tellers  | 214 MB  |   
(3 rows)  
  
postgres=# \dt+ pgbench*  
                          List of relations  
 Schema |       Name       | Type  |  Owner   |  Size   | Description   
--------+------------------+-------+----------+---------+-------------  
 public | pgbench_accounts | table | postgres | 12 TB   |   
 public | pgbench_branches | table | postgres | 35 MB   |   
 public | pgbench_history  | table | postgres | 0 bytes |   
 public | pgbench_tellers  | table | postgres | 422 MB  |   
(4 rows)  

索引深度

1、1000億行,INT8類型索引,深度爲4(不包括ROOT PAGE)。

postgres=# select * from bt_metap('pgbench_accounts_pkey');  
 magic  | version |   root   | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples   
--------+---------+----------+-------+----------+-----------+-------------+-------------------------  
 340322 |       3 | 23149704 |     4 | 23149704 |         4 |           0 |                      -1  
(1 row)  

2、索引查詢,索引需要搜索5個BLOCK。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from pgbench_accounts where aid=10000000;  
                                                                   QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using pgbench_accounts_pkey on public.pgbench_accounts  (cost=0.62..0.66 rows=1 width=101) (actual time=0.020..0.021 rows=1 loops=1)  
   Output: aid, bid, abalance, filler  
   Index Cond: (pgbench_accounts.aid = 10000000)  
   Buffers: shared hit=6  -- 5個index block, 1個heap block  
 Planning Time: 0.049 ms  
 Execution Time: 0.033 ms  
(6 rows)  

《深入淺出PostgreSQL B-Tree索引結構》

tpcb 1000億 性能測試

使用高斯分佈,生成測試數據。

《生成泊松、高斯、指數、隨機分佈數據 - PostgreSQL 9.5 new feature - pgbench improve, gaussian (standard normal) & exponential distribution》

只讀

vi test.sql  
  
\set aid random_gaussian(1, :range, 10.0)  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

讀寫

vi rw.sql  
  
\set aid random_gaussian(1, :range, 10.0)  
\set bid random(1, 1 * :scale)  
\set tid random(1, 10 * :scale)  
\set delta random(-5000, 5000)  
BEGIN;  
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
END;  

tpcb 1000億 只讀測試

1、活躍數據10億

QPS: 998818

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=1000000000  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 359606534  
latency average = 0.064 ms  
latency stddev = 0.046 ms  
tps = 998777.462686 (including connections establishing)  
tps = 998818.121681 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.062  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

2、活躍數據100億

QPS: 597877

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=10000000000  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 215257932  
latency average = 0.107 ms  
latency stddev = 0.526 ms  
tps = 597861.125133 (including connections establishing)  
tps = 597877.469245 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.105  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

前期IO

Total DISK READ :       2.32 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.32 G/s | Actual DISK WRITE:       0.00 B/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
46798 be/4 postgres   31.72 M/s    0.00 B/s  0.00 % 72.60 % postgres: postgres postgres [local] BIND    
46774 be/4 postgres   37.42 M/s    0.00 B/s  0.00 % 71.91 % postgres: postgres postgres [local] SELECT  
46792 be/4 postgres   35.54 M/s    0.00 B/s  0.00 % 71.89 % postgres: postgres postgres [local] idle    
46708 be/4 postgres   35.08 M/s    0.00 B/s  0.00 % 71.59 % postgres: postgres postgres [local] SELECT  
46730 be/4 postgres   46.84 M/s    0.00 B/s  0.00 % 70.99 % postgres: postgres postgres [local] SELECT  
46704 be/4 postgres   34.51 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46716 be/4 postgres   46.05 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46788 be/4 postgres   33.83 M/s    0.00 B/s  0.00 % 70.84 % postgres: postgres postgres [local] SELECT  
46807 be/4 postgres   33.78 M/s    0.00 B/s  0.00 % 70.41 % postgres: postgres postgres [local] SELECT  
46815 be/4 postgres   35.21 M/s    0.00 B/s  0.00 % 70.33 % postgres: postgres postgres [local] SELECT  
46812 be/4 postgres   45.95 M/s    0.00 B/s  0.00 % 70.18 % postgres: postgres postgres [local] SELECT  
46752 be/4 postgres   34.21 M/s    0.00 B/s  0.00 % 70.09 % postgres: postgres postgres [local] SELECT  

加熱後IO

... ...    
  
Total DISK READ :     527.32 M/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:     527.24 M/s | Actual DISK WRITE:      30.77 K/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47108 be/4 postgres   11.77 M/s    0.00 B/s  0.00 %  4.71 % postgres: postgres postgres [local] SELECT  
47025 be/4 postgres   10.55 M/s    0.00 B/s  0.00 %  4.39 % postgres: postgres postgres [local] SELECT  
47115 be/4 postgres    9.28 M/s    0.00 B/s  0.00 %  4.30 % postgres: postgres postgres [local] SELECT  
47061 be/4 postgres   13.27 M/s    0.00 B/s  0.00 %  4.23 % postgres: postgres postgres [local] SELECT  
47082 be/4 postgres   10.49 M/s    0.00 B/s  0.00 %  4.21 % postgres: postgres postgres [local] SELECT  
47111 be/4 postgres    6.54 M/s    0.00 B/s  0.00 %  4.18 % postgres: postgres postgres [local] idle    
47071 be/4 postgres    6.46 M/s    0.00 B/s  0.00 %  4.15 % postgres: postgres postgres [local] idle    
47018 be/4 postgres    9.13 M/s    0.00 B/s  0.00 %  4.11 % postgres: postgres postgres [local] idle    
47087 be/4 postgres    5.77 M/s    0.00 B/s  0.00 %  4.09 % postgres: postgres postgres [local] idle    
47105 be/4 postgres    8.89 M/s    0.00 B/s  0.00 %  4.08 % postgres: postgres postgres [local] BINDCT  
47069 be/4 postgres    8.46 M/s    0.00 B/s  0.00 %  4.05 % postgres: postgres postgres [local] SELECT  
47106 be/4 postgres    8.89 M/s    0.00 B/s  0.00 %  3.91 % postgres: postgres postgres [local] idle    
47053 be/4 postgres    6.48 M/s    0.00 B/s  0.00 %  3.91 % postgres: postgres postgres [local] SELECT  
47028 be/4 postgres    9.71 M/s    0.00 B/s  0.00 %  3.83 % postgres: postgres postgres [local] idle    
47112 be/4 postgres    6.72 M/s    0.00 B/s  0.00 %  3.82 % postgres: postgres postgres [local] SELECT  
47039 be/4 postgres    7.63 M/s    0.00 B/s  0.00 %  3.81 % postgres: postgres postgres [local] BIND    

3、活躍數據500億

QPS: 66678

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=50000000000  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 3600 s  
number of transactions actually processed: 240046184  
latency average = 0.960 ms  
latency stddev = 1.660 ms  
tps = 66678.433880 (including connections establishing)  
tps = 66678.672147 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.001  \set aid random_gaussian(1, :range, 10.0)  
         0.958  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

IO

Total DISK READ :       2.45 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.45 G/s | Actual DISK WRITE:       0.00 B/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47230 be/4 postgres   39.87 M/s    0.00 B/s  0.00 % 87.93 % postgres: postgres postgres [local] SELECT  
47218 be/4 postgres   32.12 M/s    0.00 B/s  0.00 % 87.85 % postgres: postgres postgres [local] SELECT  
47196 be/4 postgres   38.54 M/s    0.00 B/s  0.00 % 87.77 % postgres: postgres postgres [local] SELECT  
47250 be/4 postgres   32.52 M/s    0.00 B/s  0.00 % 87.73 % postgres: postgres postgres [local] SELECT  
47210 be/4 postgres   35.25 M/s    0.00 B/s  0.00 % 87.64 % postgres: postgres postgres [local] SELECT  
47173 be/4 postgres   35.29 M/s    0.00 B/s  0.00 % 87.63 % postgres: postgres postgres [local] SELECT  
47220 be/4 postgres   36.14 M/s    0.00 B/s  0.00 % 87.63 % postgres: postgres postgres [local] SELECT  
47243 be/4 postgres   44.79 M/s    0.00 B/s  0.00 % 87.61 % postgres: postgres postgres [local] SELECT  
47149 be/4 postgres   48.33 M/s    0.00 B/s  0.00 % 87.55 % postgres: postgres postgres [local] SELECT  
47245 be/4 postgres   44.83 M/s    0.00 B/s  0.00 % 87.54 % postgres: postgres postgres [local] SELECT  
47254 be/4 postgres   29.74 M/s    0.00 B/s  0.00 % 87.53 % postgres: postgres postgres [local] SELECT  
47253 be/4 postgres   41.24 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47162 be/4 postgres   30.31 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47229 be/4 postgres   29.40 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47234 be/4 postgres   37.08 M/s    0.00 B/s  0.00 % 87.50 % postgres: postgres postgres [local] SELECT  
47242 be/4 postgres   40.28 M/s    0.00 B/s  0.00 % 87.46 % postgres: postgres postgres [local] SELECT  
47186 be/4 postgres   36.05 M/s    0.00 B/s  0.00 % 87.44 % postgres: postgres postgres [local] SELECT  
47165 be/4 postgres   33.66 M/s    0.00 B/s  0.00 % 87.43 % postgres: postgres postgres [local] SELECT  

4、活躍數據1000億

QPS: 67295

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 360 -D range=100000000000  
  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 3600 s  
number of transactions actually processed: 242265704  
latency average = 0.951 ms  
latency stddev = 2.313 ms  
tps = 67295.523254 (including connections establishing)  
tps = 67295.778158 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.949  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  

IO

Total DISK READ :       2.24 G/s | Total DISK WRITE :       0.00 B/s  
Actual DISK READ:       2.24 G/s | Actual DISK WRITE:      54.79 K/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
47932 be/4 postgres   33.54 M/s    0.00 B/s  0.00 % 88.36 % postgres: postgres postgres [local] SELECT  
48010 be/4 postgres   33.41 M/s    0.00 B/s  0.00 % 88.27 % postgres: postgres postgres [local] SELECT  
48021 be/4 postgres   34.10 M/s    0.00 B/s  0.00 % 88.21 % postgres: postgres postgres [local] SELECT  
48049 be/4 postgres   32.14 M/s    0.00 B/s  0.00 % 88.20 % postgres: postgres postgres [local] SELECT  
48048 be/4 postgres   33.34 M/s    0.00 B/s  0.00 % 88.18 % postgres: postgres postgres [local] SELECT  
47988 be/4 postgres   31.79 M/s    0.00 B/s  0.00 % 88.11 % postgres: postgres postgres [local] SELECT  
48007 be/4 postgres   26.25 M/s    0.00 B/s  0.00 % 88.07 % postgres: postgres postgres [local] SELECT  
48013 be/4 postgres   35.37 M/s    0.00 B/s  0.00 % 88.07 % postgres: postgres postgres [local] SELECT  
47949 be/4 postgres   36.25 M/s    0.00 B/s  0.00 % 88.04 % postgres: postgres postgres [local] SELECT  
47979 be/4 postgres   44.90 M/s    0.00 B/s  0.00 % 88.02 % postgres: postgres postgres [local] SELECT  
48047 be/4 postgres   39.64 M/s    0.00 B/s  0.00 % 87.97 % postgres: postgres postgres [local] SELECT  
48038 be/4 postgres   39.24 M/s    0.00 B/s  0.00 % 87.93 % postgres: postgres postgres [local] SELECT  
48034 be/4 postgres   38.02 M/s    0.00 B/s  0.00 % 87.89 % postgres: postgres postgres [local] SELECT  
48019 be/4 postgres   35.99 M/s    0.00 B/s  0.00 % 87.88 % postgres: postgres postgres [local] SELECT  
48046 be/4 postgres   32.00 M/s    0.00 B/s  0.00 % 87.88 % postgres: postgres postgres [local] SELECT  

tpcb 1000億 讀寫測試

1、活躍數據10億

TPS: 95119

QPS: 475595

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=1000000000   
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 34244287  
latency average = 0.673 ms  
latency stddev = 0.394 ms  
tps = 95116.186279 (including connections establishing)  
tps = 95119.886927 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.046  BEGIN;  
         0.133  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.077  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.104  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.088  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.074  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.146  END;  

2、活躍數據100億

TPS: 85278

QPS: 426390

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=10000000000  
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 30702466  
latency average = 0.750 ms  
latency stddev = 1.518 ms  
tps = 85275.759706 (including connections establishing)  
tps = 85278.402619 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.047  BEGIN;  
         0.193  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.082  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.108  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.093  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.078  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.144  END;  

IO

Total DISK READ :     124.77 M/s | Total DISK WRITE :     846.78 M/s  
Actual DISK READ:     124.01 M/s | Actual DISK WRITE:     820.10 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
31430 be/4 postgres   27.03 K/s   44.03 M/s  0.00 % 22.13 % postgres: walwriter  
49767 be/4 postgres 1629.40 K/s    8.68 M/s  0.00 %  3.03 % postgres: postgres postgres [local] UPDATE  
49771 be/4 postgres    2.73 M/s    8.63 M/s  0.00 %  2.34 % postgres: postgres postgres [local] UPDATE               
49742 be/4 postgres    2.77 M/s    8.26 M/s  0.00 %  2.31 % postgres: postgres postgres [local] UPDATE               
49787 be/4 postgres 1343.68 K/s    9.27 M/s  0.00 %  2.29 % postgres: postgres postgres [local] UPDATE               
49785 be/4 postgres    3.05 M/s    8.64 M/s  0.00 %  2.27 % postgres: postgres postgres [local] UPDATE               
49776 be/4 postgres 1783.85 K/s   11.64 M/s  0.00 %  2.05 % postgres: postgres postgres [local] UPDATE               
49774 be/4 postgres    4.22 M/s    8.13 M/s  0.00 %  2.05 % postgres: postgres postgres [local] UPDATE  
49775 be/4 postgres  671.84 K/s    8.79 M/s  0.00 %  2.04 % postgres: postgres postgres [local] UPDATE               
49786 be/4 postgres 1220.12 K/s    8.15 M/s  0.00 %  2.04 % postgres: postgres postgres [local] UPDATE               
49772 be/4 postgres 1003.90 K/s    8.77 M/s  0.00 %  2.02 % postgres: postgres postgres [local] UPDATE               
49697 be/4 postgres    2.56 M/s    8.69 M/s  0.00 %  2.01 % postgres: postgres postgres [local] UPDATE  
49803 be/4 postgres  733.62 K/s    8.22 M/s  0.00 %  2.00 % postgres: postgres postgres [local] UPDATE  
49806 be/4 postgres    2.84 M/s   10.23 M/s  0.00 %  1.99 % postgres: postgres postgres [local] UPDATE               
49804 be/4 postgres 1783.85 K/s    8.60 M/s  0.00 %  1.98 % postgres: postgres postgres [local] UPDATE               
49766 be/4 postgres  478.78 K/s   14.62 M/s  0.00 %  1.97 % postgres: postgres postgres [local] UPDATE               
49770 be/4 postgres    2.29 M/s    8.82 M/s  0.00 %  1.96 % postgres: postgres postgres [local] UPDATE  
49715 be/4 postgres    3.20 M/s    8.37 M/s  0.00 %  1.96 % postgres: postgres postgres [local] UPDATE               
49810 be/4 postgres    3.35 M/s    9.68 M/s  0.00 %  1.93 % postgres: postgres postgres [local] UPDATE  
49780 be/4 postgres    2.56 M/s    8.22 M/s  0.00 %  1.92 % postgres: postgres postgres [local] UPDATE               
49784 be/4 postgres    3.39 M/s    8.35 M/s  0.00 %  1.92 % postgres: postgres postgres [local] UPDATE               
49734 be/4 postgres    2.31 M/s    8.66 M/s  0.00 %  1.91 % postgres: postgres postgres [local] UPDATE               
49800 be/4 postgres 2023.24 K/s    8.55 M/s  0.00 %  1.90 % postgres: postgres postgres [local] UPDATE  
49794 be/4 postgres 1629.40 K/s    8.88 M/s  0.00 %  1.85 % postgres: postgres postgres [local] UPDATE               
49765 be/4 postgres    3.24 M/s    8.67 M/s  0.00 %  1.80 % postgres: postgres postgres [local] UPDATE               
49724 be/4 postgres 1173.79 K/s   10.08 M/s  0.00 %  1.79 % postgres: postgres postgres [local] UPDATE               
49728 be/4 postgres 1413.18 K/s    8.63 M/s  0.00 %  1.78 % postgres: postgres postgres [local] UPDATE  
49781 be/4 postgres 1436.34 K/s    7.93 M/s  0.00 %  1.76 % postgres: postgres postgres [local] UPDATE               
49790 be/4 postgres 1096.56 K/s    8.60 M/s  0.00 %  1.76 % postgres: postgres postgres [local] UPDATE               

3、活躍數據500億

TPS: 38301

QPS: 191505

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=50000000000   
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 13790704  
latency average = 1.671 ms  
latency stddev = 2.620 ms  
tps = 38299.935890 (including connections establishing)  
tps = 38301.102322 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.031  BEGIN;  
         1.274  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.068  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.103  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.076  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.058  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.056  END;  

IO

Total DISK READ :    1508.55 M/s | Total DISK WRITE :     618.92 M/s  
Actual DISK READ:    1507.72 M/s | Actual DISK WRITE:     450.33 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
49510 be/4 postgres   21.45 M/s    5.40 M/s  0.00 % 61.29 % postgres: postgres postgres [local] UPDATE               
49507 be/4 postgres   26.85 M/s    5.61 M/s  0.00 % 60.79 % postgres: postgres postgres [local] UPDATE               
49456 be/4 postgres   28.59 M/s    5.67 M/s  0.00 % 60.42 % postgres: postgres postgres [local] UPDATE               
49436 be/4 postgres   23.76 M/s    5.23 M/s  0.00 % 60.31 % postgres: postgres postgres [local] UPDATE               
49516 be/4 postgres   21.82 M/s    5.40 M/s  0.00 % 59.84 % postgres: postgres postgres [local] UPDATE               
49414 be/4 postgres   20.77 M/s    5.44 M/s  0.00 % 59.84 % postgres: postgres postgres [local] UPDATE               
49503 be/4 postgres   20.13 M/s    6.15 M/s  0.00 % 59.81 % postgres: postgres postgres [local] UPDATE               
49410 be/4 postgres   29.23 M/s    5.52 M/s  0.00 % 59.73 % postgres: postgres postgres [local] UPDATE               
49427 be/4 postgres   18.61 M/s    5.18 M/s  0.00 % 59.71 % postgres: postgres postgres [local] idle in transaction  
49501 be/4 postgres   17.22 M/s    5.60 M/s  0.00 % 59.70 % postgres: postgres postgres [local] UPDATE               
49493 be/4 postgres   24.60 M/s    7.22 M/s  0.00 % 59.66 % postgres: postgres postgres [local] UPDATE               
49512 be/4 postgres   23.08 M/s    5.53 M/s  0.00 % 59.65 % postgres: postgres postgres [local] UPDATE               
49509 be/4 postgres   24.04 M/s    5.64 M/s  0.00 % 59.55 % postgres: postgres postgres [local] UPDATE               
49490 be/4 postgres   17.89 M/s    5.62 M/s  0.00 % 59.55 % postgres: postgres postgres [local] UPDATE             

4、活躍數據1000億

TPS: 35189

QPS: 175945

pgbench -M prepared -n -r -P 1 -f ./rw.sql -c 64 -j 64 -T 360 -D scale=1000000 -D range=100000000000   
  
  
transaction type: ./rw.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 360 s  
number of transactions actually processed: 12670591  
latency average = 1.818 ms  
latency stddev = 3.928 ms  
tps = 35188.224787 (including connections establishing)  
tps = 35189.625697 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.003  \set aid random_gaussian(1, :range, 10.0)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.032  BEGIN;  
         1.392  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.072  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         0.112  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         0.086  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         0.061  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         0.059  END;  

IO

Total DISK READ :    1824.52 M/s | Total DISK WRITE :     241.53 M/s  
Actual DISK READ:    1821.44 M/s | Actual DISK WRITE:     237.31 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
48918 be/4 postgres   27.19 M/s    0.00 B/s  0.00 % 67.37 % postgres: postgres postgres [local] UPDATE  
48893 be/4 postgres   32.32 M/s    0.00 B/s  0.00 % 67.29 % postgres: postgres postgres [local] UPDATE               
48914 be/4 postgres   23.47 M/s    0.00 B/s  0.00 % 67.29 % postgres: postgres postgres [local] UPDATE  
48889 be/4 postgres   33.45 M/s    0.00 B/s  0.00 % 67.23 % postgres: postgres postgres [local] UPDATE               
48904 be/4 postgres   34.80 M/s    0.00 B/s  0.00 % 67.21 % postgres: postgres postgres [local] UPDATE  
48861 be/4 postgres   30.88 M/s    7.81 K/s  0.00 % 67.06 % postgres: postgres postgres [local] UPDATE  
48910 be/4 postgres   27.16 M/s  328.11 K/s  0.00 % 67.02 % postgres: postgres postgres [local] UPDATE               
48821 be/4 postgres   27.54 M/s    7.81 K/s  0.00 % 67.01 % postgres: postgres postgres [local] UPDATE  
48825 be/4 postgres   35.88 M/s    0.00 B/s  0.00 % 66.89 % postgres: postgres postgres [local] UPDATE               
48930 be/4 postgres   31.68 M/s    7.81 K/s  0.00 % 66.82 % postgres: postgres postgres [local] UPDATE  
48867 be/4 postgres   26.99 M/s    7.81 K/s  0.00 % 66.81 % postgres: postgres postgres [local] idle in transaction  
48929 be/4 postgres   25.61 M/s    7.81 K/s  0.00 % 66.77 % postgres: postgres postgres [local] UPDATE  
48894 be/4 postgres   24.08 M/s    0.00 B/s  0.00 % 66.67 % postgres: postgres postgres [local] UPDATE               
48921 be/4 postgres   32.90 M/s  640.60 K/s  0.00 % 66.66 % postgres: postgres postgres [local] UPDATE  
48925 be/4 postgres   27.30 M/s    0.00 B/s  0.00 % 66.63 % postgres: postgres postgres [local] UPDATE  
48829 be/4 postgres   24.85 M/s    0.00 B/s  0.00 % 66.63 % postgres: postgres postgres [local] idle                 
48901 be/4 postgres   29.57 M/s    0.00 B/s  0.00 % 66.62 % postgres: postgres postgres [local] UPDATE  
48927 be/4 postgres   24.33 M/s    7.81 K/s  0.00 % 66.59 % postgres: postgres postgres [local] UPDATE  
48933 be/4 postgres   27.85 M/s    7.81 K/s  0.00 % 66.57 % postgres: postgres postgres [local] BINDTE  
48890 be/4 postgres   27.79 M/s    0.00 B/s  0.00 % 66.56 % postgres: postgres postgres [local] UPDATE  
48931 be/4 postgres   30.29 M/s    0.00 B/s  0.00 % 66.55 % postgres: postgres postgres [local] UPDATE  

其他測試

同步多副本環境

《PostgreSQL 一主多從(多副本,強同步)簡明手冊 - 配置、壓測、監控、切換、防腦裂、修復、0丟失 - 珍藏級》

其中一個備庫使用zfs存儲,開啓lz4壓縮,提供閃回,備份能力。

創建備庫,單個備庫的創建速度約500MB/s,15TB的庫,需要9個半小時創建完成。

[root@pg11 ~]# dstat
You did not select any stats, using -cdngy by default.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  4   7  85   4   0   0| 289M  164M|   0     0 |   0     0 |  41k   80k
  0   1  99   0   0   0| 508M    0 |1218k 1007M|   0     0 |  40k 1559 
  0   1  99   0   0   0| 500M    0 |1233k 1004M|   0     0 |  41k 1673 
  0   1  99   0   0   0| 492M    0 |1206k  994M|   0     0 |  40k 1576 
  0   1  99   0   0   0| 508M    0 |1245k 1015M|   0     0 |  41k 1601 
  0   1  99   0   0   0| 516M    0 |1257k 1021M|   0     0 |  42k 1576 
  0   2  98   0   0   0| 520M    0 |1300k 1044M|   0     0 |  44k 1891 

[root@pg11 ~]# top -c -u postgres

top - 15:09:33 up 2 days,  4:48,  2 users,  load average: 0.41, 3.98, 22.70
Tasks: 516 total,   2 running, 514 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.8 sy,  0.0 ni, 99.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 52819500+total, 29681988 free,  3079916 used, 49543308+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 52057548+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                  
50168 postgres  20   0 66.436g  14264  12248 R  35.1  0.0   0:37.57 postgres: walsender postgres 172.17.20.28(48412) sending backup "pg_basebackup base backup"                                                                              
50176 postgres  20   0 66.436g  20416  18400 S  24.2  0.0   0:18.35 postgres: walsender postgres 172.17.20.29(65032) sending backup "pg_basebackup base backup"                                                                              
50154 postgres  20   0 66.433g 1.979g 1.978g S   0.0  0.4   0:01.41 /usr/pgsql-11/bin/postgres                                                                                                                                               
50155 postgres  20   0  245148   2036    596 S   0.0  0.0   0:00.00 postgres: logger                                                                                                                                                         
50157 postgres  20   0 66.434g 531212 529720 S   0.0  0.1   0:00.43 postgres: checkpointer                                                                                                                                                   
50158 postgres  20   0 66.434g 530592 529120 S   0.0  0.1   0:00.54 postgres: background writer                                                                                                                                              
50159 postgres  20   0 66.433g 526884 525420 S   0.0  0.1   0:00.28 postgres: walwriter                                                                                                                                                      
50160 postgres  20   0 66.436g   3224   1392 S   0.0  0.0   0:00.00 postgres: autovacuum launcher                                                                                                                                            
50161 postgres  20   0  247404   2252    692 S   0.0  0.0   0:00.00 postgres: stats collector                                                                                                                                                
50162 postgres  20   0 66.436g   2844   1096 S   0.0  0.0   0:00.00 postgres: logical replication launcher                                                                                                                                   
50169 postgres  20   0 66.436g   4124   1944 S   0.0  0.0   0:00.01 postgres: walsender postgres 172.17.20.28(48414) streaming 101/44000140                                                                                                  
50177 postgres  20   0 66.436g   3796   1872 S   0.0  0.0   0:00.00 postgres: walsender postgres 172.17.20.29(65034) streaming 101/44000140                                                                                                  

配置步驟

1、配置pg_hba.conf

host replication all xxx.xxx.xxx.xxx/32 md5

2、創建replication角色用戶

create role repxxx login replication encrypted password 'xxx';

3、pg_basebackup拉取數據,注意開啓SLOT,否則對於很大的數據庫,可能數據備份結束後,WAL已經在主庫被清除了。(有WAL歸檔的情況下,可以不開啓SLOT。開啓SLOT後,未被備庫拉取的WAL不會被清除。)

export PGPASSWORD=xxx

nohup pg_basebackup -F p -D $PGDATA -h 172.17.20.30 -p 1921 -U repxxx -X stream -C -S standby1 >/dev/null 2>&1 &
  
  
nohup pg_basebackup -F p -D $PGDATA -h 172.17.20.30 -p 1921 -U repxxx -X stream -C -S standby2 >/dev/null 2>&1 &

4、配置recovery.conf

recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.17.20.30 port=1921 user=repxxx password=xxx'          # e.g. 'host=localhost port=5432'
primary_slot_name = 'standby1'


recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=172.17.20.30 port=1921 user=repxxx password=xxx'          # e.g. 'host=localhost port=5432'
primary_slot_name = 'standby2'

5、啓動備庫

6、主庫配置多副本參數

synchronous_standby_names = 'ANY 1 (*)'
synchronous_commit = remote_write 


postgres=# show synchronous_commit ;
 synchronous_commit 
--------------------
 remote_write
(1 row)

postgres=# show synchronous_standby_names ;
 synchronous_standby_names 
---------------------------
 ANY 1 (*)
(1 row)

7、tpcb 1000億(活躍10億) rw 測試, QPS 5.34萬。

transaction type: ./rw.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 360 s
number of transactions actually processed: 19239053
latency average = 1.198 ms
latency stddev = 1.208 ms
tps = 53432.922774 (including connections establishing)
tps = 53435.051257 (excluding connections establishing)
statement latencies in milliseconds:
         0.003  \set aid random_gaussian(1, :range, 10.0)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.043  BEGIN;
         0.122  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.068  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.089  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.080  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.064  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.726  END;

8、tpcc 1000W , tps : 11000 , tpmC : 66W.

SQL statistics:
    queries performed:
        read:                            513698640
        write:                           533163795
        other:                           79242254
        total:                           1126104689
    transactions:                        39605095 (11000.27 per sec.)
    queries:                             1126104689 (312774.25 per sec.)
    ignored errors:                      172646 (47.95 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          3600.3729s
    total number of events:              39605095

Latency (ms):
         min:                                    0.29
         avg:                                    5.81
         max:                                 1388.75
         95th percentile:                       11.87
         sum:                            230276426.25

Threads fairness:
    events (avg/stddev):           618829.6094/1712.80
    execution time (avg/stddev):   3598.0692/0.06

flashback 閃回

《PostgreSQL 最佳實踐 - 塊級增量備份(ZFS篇)驗證 - recovery test script for zfs snapshot clone + postgresql stream replication + archive》

主備切換

《PostgreSQL 一主多從(多副本,強同步)簡明手冊 - 配置、壓測、監控、切換、防腦裂、修復、0丟失 - 珍藏級》

修復主庫, pg_rewind

《PostgreSQL primary-standby failback tools : pg_rewind》

《PostgreSQL 9.5 new feature - pg_rewind fast sync Split Brain Primary & Standby》

《PostgreSQL 9.5 add pg_rewind for Fast align for PostgreSQL unaligned primary & standby》

小結

1、8K fsync IO RT

1 ecs本地ssd

22 us    

2 essd

119 us    

2、8K directIO

1 ecs本地ssd

離散讀

IOPS=146k, BW=1140MiB/s (1196MB/s)(200GiB/180004msec)    

離散寫

IOPS=78.6k, BW=614MiB/s (644MB/s)(108GiB/180004msec)    

順序讀

IOPS=147k, BW=1150MiB/s (1206MB/s)(202GiB/180004msec)    

順序寫

IOPS=79.3k, BW=620MiB/s (650MB/s)(109GiB/180004m    

2 essd

離散讀

IOPS=112k, BW=879MiB/s (922MB/s)(154GiB/180008msec)    

離散寫

IOPS=60.6k, BW=473MiB/s (496MB/s)(83.2GiB/180008msec)    

順序讀

IOPS=133k, BW=1037MiB/s (1087MB/s)(182GiB/180005msec)    

順序寫

IOPS=71.4k, BW=558MiB/s (585MB/s)(98.1GiB/180005msec)    

3、tpcc 1000W (ESSD)

96.4萬 tpmC

4、tpcb 1000億 只讀 (ESSD)

1、活躍數據10億

QPS: 998818

2、活躍數據100億

QPS: 597877

3、活躍數據500億

QPS: 66678

4、活躍數據1000億

QPS: 67295

5、tpcb 1000億 讀寫 (ESSD)

1、活躍數據10億

TPS: 95119

QPS: 475595

2、活躍數據100億

TPS: 85278

QPS: 426390

3、活躍數據500億

TPS: 38301

QPS: 191505

4、活躍數據1000億

TPS: 35189

QPS: 175945

性能小結

環境:阿里雲 ECS + 32T ESSD

表SIZE: 12.5 TB 寫入耗時 25h52min

索引SIZE: 2 TB 創建耗時 10h50min

索引深度: 5級

單表數據量 TEST CASE QPS TPS
10 * 100W tpcc 1000W - 96.4萬 tpmC
10 * 100W tpcc 1000W(同步多副本) - 66萬 tpmC
1000億 tpcb 活躍數據10億 只讀 998818 998818
1000億 tpcb 活躍數據100億 只讀 597877 597877
1000億 tpcb 活躍數據500億 只讀 66678 66678
1000億 tpcb 活躍數據1000億 只讀 67295 67295
1000億 tpcb 活躍數據10億 讀寫 475595 95119
1000億 tpcb 活躍數據10億 讀寫(同步多副本) 267160 53432
1000億 tpcb 活躍數據100億 讀寫 426390 85278
1000億 tpcb 活躍數據500億 讀寫 191505 38301
1000億 tpcb 活躍數據1000億 讀寫 175945 35189

阿里雲ESSD的引入,結合PostgreSQL企業級開源數據庫(良好的性能、可管理海量數據、功能對齊Oracle,不僅ESSD層面提供多副本,同時數據庫層面也支持通過quorum based replication多副本提供金融級的可靠性,使用ZFS秒級快照,閃回等企業特性),給企業大容量關係數據庫上雲提供了便利。

對於PG企業用戶,可以選擇阿里雲PG企業版PPAS,擁有以上所有特性的同時,提供ORACLE兼容性。

參考

《fio測試IO性能》

ECS 本地SSD雲盤(8*1.8TB條帶)測試:

《PostgreSQL 100億 tpcb 性能 on ECS》

《PostgreSQL 11 tpcc 測試(103萬tpmC on ECS) - use sysbench-tpcc by Percona-Lab》

《(TPC-H測試 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》

《PostgreSQL 10 on ECS 實施 流複製備庫鏡像+自動快照備份+自動備份驗證+自動清理備份與歸檔 - 珍藏級》

https://help.aliyun.com/knowledge_detail/64950.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章