PostgreSQL 秒殺場景優化

Postgres2015全國用戶大會將於11月20至21日在北京麗亭華苑酒店召開。本次大會嘉賓陣容強大，國內頂級PostgreSQL數據庫專家將悉數到場，並特邀歐洲、俄羅斯、日本、美國等國家和地區的數據庫方面專家助陣:

Postgres-XC項目的發起人鈴木市一(SUZUKI Koichi)
Postgres-XL的項目發起人Mason Sharp
pgpool的作者石井達夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研發總監姚延棟
周正中(德哥), PostgreSQL中國用戶會創始人之一
汪洋，平安科技數據庫技術部經理
……

 

2015年度PG大象會報名地址：http://postgres2015.eventdove.com/PostgreSQL中國社區： http://postgres.cn/PostgreSQL專業1羣： 3336901（已滿）PostgreSQL專業2羣： 100910388PostgreSQL專業3羣： 150657323

秒殺場景的典型瓶頸在於對同一條記錄的多次更新請求，然後只有一個或者少量請求是成功的，其他請求是以失敗或更新不到告終。

例如，Iphone的1元秒殺，如果我只放出1臺Iphone，我們把它看成一條記錄，秒殺開始後，誰先搶到（更新這條記錄的鎖），誰就算秒殺成功。

例如：

使用一個標記位來表示這條記錄是否已經被更新，或者記錄更新的次數（幾臺Iphone）。

update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; -- 假設可以秒殺5臺

這種方法的弊端：

獲得鎖的用戶在處理這條記錄時，可能成功，也可能失敗，或者可能需要很長時間，（例如數據庫響應慢）在它結束事務前，其他會話只能等着。

等待是非常不科學的，因爲對於沒有獲得鎖的用戶，等待是在浪費時間。

所以一般的優化處理方法是先使用for update nowait的方式來避免等待，即如果無法即可獲得鎖，那麼就不等待。

例如：

begin;

select 1 from tbl where id=pk for update nowait; -- 如果用戶無法即刻獲得鎖，則返回錯誤。從而這個事務回滾。

update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;

end;

這種方法可以減少用戶的等待時間，因爲無法即刻獲得鎖後就直接返回了。

但是這種方法也存在一定的弊端，對於一個商品，如果可以秒殺多臺的話，我們用1條記錄來存儲多臺，降低了秒殺的併發性。

因爲我們用的是行鎖。

解決這個問題辦法很多，最終就是要提高併發性，例如：

1. 分段秒殺，把商品數量打散，拆成多個段，從而提高併發處理能力。

總體來說，優化的思路是減少鎖等待時間，避免串行，儘量並行。

優化到這裏就結束了嗎？顯然沒有，以上方法任意數據庫都可以做到，如果就這樣結束怎麼體現PostgreSQL的特性呢？

PostgreSQL還提供了一個鎖類型，advisory鎖，這種鎖比行鎖更加輕量，支持會話級別和事務級別。（但是需要注意ID是全局的，否則會相互干擾，也就是說，所有參與秒殺或者需要用到advisory lock的ID需要在單個庫內保持全局唯一）

例子：

update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5 and pg_try_advisory_xact_lock(:id);

最後必須要對比一下for update nowait和advisory lock的性能。

下面是在一臺本地虛擬機上的測試。

新建一張秒殺表

postgres=# \d t1

Table "public.t1"

Column | Type | Modifiers

--------+---------+-----------

id | integer | not null

info | text |

Indexes:

"t1_pkey" PRIMARY KEY, btree (id)

只有一條記錄，不斷的被更新

postgres=# select * from t1;

id | info

----+-------------------------------

1 | 2015-09-14 09:47:04.703904+08

(1 row)

壓測for update nowait的方式：

CREATE OR REPLACE FUNCTION public.f1(i_id integer)

RETURNS void

LANGUAGE plpgsql

AS $function$

declare

begin

perform 1 from t1 where id=i_id for update nowait;

update t1 set info=now()::text where id=i_id;

exception when others then

return;

end;

$function$;

postgres@digoal-> cat test1.sql

\setrandom id 1 1

select f1(:id);

壓測advisory lock的方式：

postgres@digoal-> cat test.sql

\setrandom id 1 1

update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

清除壓測統計數據：

postgres=# select pg_stat_reset();

pg_stat_reset

---------------

(1 row)

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+-------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 0

idx_tup_fetch | 0

n_tup_ins | 0

n_tup_upd | 0

n_tup_del | 0

n_tup_hot_upd | 0

n_live_tup | 0

n_dead_tup | 0

n_mod_since_analyze | 0

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

壓測結果：

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 20 -j 20 -T 60

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 792029

latency average: 1.505 ms

latency stddev: 4.275 ms

tps = 13196.542846 (including connections establishing)

tps = 13257.270709 (excluding connections establishing)

statement latencies in milliseconds:

0.002625 \setrandom id 1 1

1.502420 select f1(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+-------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 896963 // 大多數是無用功

idx_tup_fetch | 896963 // 大多數是無用功

n_tup_ins | 0

n_tup_upd | 41775

n_tup_del | 0

n_tup_hot_upd | 41400

n_live_tup | 0

n_dead_tup | 928

n_mod_since_analyze | 41774

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 20 -j 20 -T 60

......

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 1392372

latency average: 0.851 ms

latency stddev: 2.475 ms

tps = 23194.831054 (including connections establishing)

tps = 23400.411501 (excluding connections establishing)

statement latencies in milliseconds:

0.002594 \setrandom id 1 1

0.848536 update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+--------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 1368933 // 大多數是無用功

idx_tup_fetch | 1368933 // 大多數是無用功

n_tup_ins | 0

n_tup_upd | 54957

n_tup_del | 0

n_tup_hot_upd | 54489

n_live_tup | 0

n_dead_tup | 1048

n_mod_since_analyze | 54957

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

我們注意到，不管用哪種方法，都會浪費掉很多次的無用功掃描。

爲了解決無用掃描的問題，可以使用以下函數。（當然，還有更好的方法是對用戶透明。）

CREATE OR REPLACE FUNCTION public.f(i_id integer)

RETURNS void

LANGUAGE plpgsql

AS $function$

declare

a_lock boolean := false;

begin

select pg_try_advisory_xact_lock(i_id) into a_lock;

if a_lock then

update t1 set info=now()::text where id=i_id;

end if;

exception when others then

return;

end;

$function$;

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 20

number of threads: 20

duration: 60 s

number of transactions actually processed: 1217195

latency average: 0.973 ms

latency stddev: 3.563 ms

tps = 20283.314001 (including connections establishing)

tps = 20490.143363 (excluding connections establishing)

statement latencies in milliseconds:

0.002703 \setrandom id 1 1

0.970209 select f(:id);

postgres=# select * from pg_stat_all_tables where relname='t1';

-[ RECORD 1 ]-------+-------

relid | 184731

schemaname | public

relname | t1

seq_scan | 0

seq_tup_read | 0

idx_scan | 75927

idx_tup_fetch | 75927

n_tup_ins | 0

n_tup_upd | 75927

n_tup_del | 0

n_tup_hot_upd | 75902

n_live_tup | 0

n_dead_tup | 962

n_mod_since_analyze | 75927

last_vacuum |

last_autovacuum |

last_analyze |

last_autoanalyze |

vacuum_count | 0

autovacuum_count | 0

analyze_count | 0

autoanalyze_count | 0

除了吞吐率的提升，我們其實還看到真實的處理數（更新次數）也有提升，所以不僅僅是降低了等待延遲，實際上也提升了處理能力。

最後提供一個物理機上的數據參考，使用128個併發連接，同時對一條記錄進行更新：

不做任何優化的併發處理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 285673

latency average: 44.806 ms

latency stddev: 45.751 ms

tps = 2855.547375 (including connections establishing)

tps = 2855.856976 (excluding connections establishing)

statement latencies in milliseconds:

0.002509 \setrandom id 1 1

44.803299 update t1 set info=now()::text where id=:id;

使用for update nowait的併發處理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 6663253

latency average: 1.919 ms

latency stddev: 2.804 ms

tps = 66623.169445 (including connections establishing)

tps = 66630.307999 (excluding connections establishing)

statement latencies in milliseconds:

0.001934 \setrandom id 1 1

1.917297 select f1(:id);

使用advisory lock後的併發處理能力：

transaction type: Custom query

scaling factor: 1

query mode: prepared

number of clients: 128

number of threads: 128

duration: 100 s

number of transactions actually processed: 19154754

latency average: 0.667 ms

latency stddev: 1.054 ms

tps = 191520.550924 (including connections establishing)

tps = 191546.208051 (excluding connections establishing)

statement latencies in milliseconds:

0.002085 \setrandom id 1 1

0.664420 select f(:id);

使用advisory lock，性能相比不做任何優化性能提升了約66倍，相比for update nowait性能提升了約1.8倍。

這種優化可以快速告訴用戶是否能秒殺到此類商品，而不需要等待其他用戶更新結束後才知道。所以大大降低了RT，提高了吞吐率。

[參考]

1. http://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

PostgreSQL 秒殺場景優化

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

2020年上半年數據庫系統工程師考試

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

performance tuning case: array search & date order by , data updated daily (use cursor solve it)

PostgreSQL 秒殺場景優化

A Smart PostgreSQL extension plproxy 2.2 practices

useful function & operator & custom operator for Row and Array Comparisons

Oracle index by table(Associative array) used in PostgreSQL

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結