PostgreSQL操作符與優化器詳解

PostgreSQL 支持自定義操作符，本質上是調用函數來實現的。

語法如下：

例如創建一個求兩個值的平均值的操作符：

首選要創建函數

postgres = # create function f_avg(numeric,numeric) returns numeric as $$

postgres$# select ($1+$2)/2;

postgres$# $$ language sql strict;

CREATE FUNCTION

驗證函數

postgres = # select f_avg(1,null);

f_avg

-------

(1 row)

postgres=# select f_avg(1,2);

f_avg

--------------------

1.5000000000000000

(1 row)

創建操作符，指定左右參數類型，調用的函數名，commutator是一個和優化器相關的選項，我後面會重點介紹：

postgres = # create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');

CREATE OPERATOR

postgres=# select 1 ## 2;

?column?

--------------------

1.5000000000000000

(1 row)

注意到在創建操作符的語法中有6個和優化器有關的關鍵字：

[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]

[, RESTRICT = res_proc ] [, JOIN = join_proc ]

[, HASHES ] [, MERGES ]

介紹如下：

假設x表示操作符左側的參數，y表示操作符右側的參數

1. commutator，指明x op1 y等效於y op2 x，即操作數調換，返回的值一樣。例如2>1 和1<2結果是一致的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的，那麼+就是+的commutator。commutator只需要在創建其中一個操作符時指定，創建另一個對應的操作符時可以不需要指定，PostgreSQL會自動建立這個關係。例如創建>操作符時指定了它的commutator是<，那麼在創建<操作符時可以不需要指定>是它的commutator。

另外需要注意，有commutator操作符的操作符的左右兩側的參數類型必須一致，這樣才能滿足x op1 y等價於y op2 x。

優化器如何利用commutator呢？例如索引掃描，必須列在操作符的左側才能使用索引。1 > tbl.c這個條件，如果>沒有commutator的話，是不能使用索引的。

例子，以int4的>和<操作符爲例，實驗一下：

>和<在PostgreSQL中是一對commutator

postgres = # select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

oprcom

--------------

pg_catalog.<

(1 row)

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

oprcom

--------------

pg_catalog.>

(1 row)

記錄他們的oprcom對應的OID

postgres = # select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

ode | oprrest | oprjoin

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4

gt | scalargtsel | scalargtjoinsel

(1 row)

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

ode | oprrest | oprjoin

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4

lt | scalarltsel | scalarltjoinsel

(1 row)

接下來我要通過更新pg_operator解除他們的commutator關係，設置爲0即可。

postgres = # update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

創建測試表，插入測試數據，創建索引：

postgres = # create table tbl(id int);

CREATE TABLE

postgres=# insert into tbl select generate_series(1,100000);

INSERT 0 100000

postgres=# create index idx_tbl_id on tbl(id);

CREATE INDEX

將列放在條件的左邊可以走索引，但是放在右邊不走索引。因爲優化器不能決定>,<是否爲commutator

postgres = # explain select * from tbl where id<10;

QUERY PLAN

---------------------------------------------------------------------------

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)

Index Cond: (id < 10)

(2 rows)

postgres=# explain select * from tbl where 10>id;

QUERY PLAN

----------------------------------------------------------

Seq Scan on tbl (cost=0.00..1361.00 rows=33333 width=4)

Filter: (10 > id)

(2 rows)

重新建立這兩個 operator的commutator關係後，優化器會自動將10>id轉換爲id<10，並且走索引了：

postgres = # update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# explain select * from tbl where 10>id;

QUERY PLAN

---------------------------------------------------------------------------

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)

Index Cond: (id < 10)

(2 rows)

2. negator，指x op1 y 等價於 not(y op2 x)，或者x op1等價於not( y op2)，或者op1 x 等價於not(op2 y)，因此negator支持一元和二元操作符。

例子:

如果=和<>是一對negator操作符，NOT (x = y) 可以簡化爲 x <> y。

postgres = # explain select * from tbl where 10=id;

QUERY PLAN

---------------------------------------------------------------------------

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)

Index Cond: (id = 10)

(2 rows)

postgres=# explain select * from tbl where not(10<>id);

QUERY PLAN

---------------------------------------------------------------------------

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)

Index Cond: (id = 10)

(2 rows)

同樣，操作符兩側參數x,y的類型必須一致。並且僅適用於返回布爾邏輯類型的操作符。

3. restrict，是用於評估選擇性的函數，僅適用於二元操作符，例如where col>100，這個查詢條件，如何評估選擇性呢？是通過操作符的restrict來指定的，選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。

選擇性函數的代碼在 src/backend/utils/adt/

包括

- rw - r -- r --. 1 1107 1107 33191 Jun 10 03 : 29 array_selfuncs . c

- rw - r -- r --. 1 1107 1107 2316 Jun 10 03 : 29 geo_selfuncs . c

- rw - r -- r --. 1 1107 1107 720 Jun 10 03 : 29 network_selfuncs . c

- rw - r -- r --. 1 1107 1107 33895 Jun 10 03 : 29 rangetypes_selfuncs . c

- rw - r -- r --. 1 1107 1107 218809 Jun 10 03 : 29 selfuncs . c

選擇性函數，還需要依賴數據庫的統計信息，從而計算選擇性，常見的選擇性計算函數有：

postgres = # select distinct oprrest from pg_operator order by 1;

oprrest

--------------

eqsel 相等

neqsel 不相等

scalarltsel 小於等於

scalargtsel 大於等於

areasel

positionsel

contsel

iclikesel

icnlikesel

regexeqsel

likesel

icregexeqsel

regexnesel

nlikesel

icregexnesel

rangesel

networksel

tsmatchsel

arraycontsel

(20 rows)

當然，用戶如果自定義數據類型的話，也可以自定義選擇性函數，或者使用以上標準的選擇性函數，只是可能需要實現一下類型轉換。

源碼中的介紹：

src/backend/utils/adt/selfuncs.c

/*----------

* Operator selectivity estimation functions are called to estimate the

* selectivity of WHERE clauses whose top-level operator is their operator.

* We divide the problem into two cases:

* Restriction clause estimation: the clause involves vars of just

* one relation. 一種是符合WHERE條件的選擇性（百分比）。

* Join clause estimation: the clause involves vars of multiple rels.

* Join selectivity estimation is far more difficult and usually less accurate

* than restriction estimation. -- JOIN的選擇性評估通常沒有WHERE條件的選擇性準確。

* When dealing with the inner scan of a nestloop join, we consider the

* join's joinclauses as restriction clauses for the inner relation, and

* treat vars of the outer relation as parameters (a/k/a constants of unknown

* values). So, restriction estimators need to be able to accept an argument

* telling which relation is to be treated as the variable.

在使用nestloop JOIN時，一個表的字段將作爲變量，另一個表的字段（及其統計信息）與操作符作爲JOIN評估子句。

* The call convention for a restriction estimator (oprrest function) is

* Selectivity oprrest (PlannerInfo *root,

* Oid operator,

* List *args,

* int varRelid);

* 評估選擇性需要4個參數:

* root: general information about the query (rtable and RelOptInfo lists

* are particularly important for the estimator). plannerinfo信息。

* operator: OID of the specific operator in question. 操作符的OID

* args: argument list from the operator clause. 操作符子句中的參數列表

* varRelid: if not zero, the relid (rtable index) of the relation to

* be treated as the variable relation. May be zero if the args list

* is known to contain vars of only one relation. 表示where條件所包含的參數來自哪些relation。

* This is represented at the SQL level (in pg_proc) as

* float8 oprrest (internal, oid, internal, int4); 在pg_proc數據字典中表示爲oprrest指定的函數。

* The result is a selectivity, that is, a fraction (0 to 1) of the rows

* of the relation that are expected to produce a TRUE result for the

* given operator. 選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。

* The call convention for a join estimator (oprjoin function) is similar

* except that varRelid is not needed, and instead join information is

* supplied:

* JOIN選擇性的計算函數與WHERE選擇性的計算函數參數有輕微差別，麼有varRelid, 增加了join信息的參數。

* Selectivity oprjoin (PlannerInfo *root,

* Oid operator,

* List *args,

* JoinType jointype,

* SpecialJoinInfo *sjinfo);

* float8 oprjoin (internal, oid, internal, int2, internal);

* (Before Postgres 8.4, join estimators had only the first four of these

* parameters. That signature is still allowed, but deprecated.) The

* relationship between jointype and sjinfo is explained in the comments for

* clause_selectivity() --- the short version is that jointype is usually

* best ignored in favor of examining sjinfo.

* Join selectivity for regular inner and outer joins is defined as the

* fraction (0 to 1) of the cross product of the relations that is expected

* to produce a TRUE result for the given operator. For both semi and anti (半連接與預連接)

* joins, however, the selectivity is defined as the fraction of the left-hand

* side relation's rows that are expected to have a match (ie, at least one

* row with a TRUE result) in the right-hand side.

* For both oprrest and oprjoin functions, the operator's input collation OID

* (if any) is passed using the standard fmgr mechanism, so that the estimator

* function can fetch it with PG_GET_COLLATION(). Note, however, that all

* statistics in pg_statistic are currently built using the database's default

* collation. Thus, in most cases where we are looking at statistics, we

* should ignore the actual operator collation and use DEFAULT_COLLATION_OID.

* We expect that the error induced by doing this is usually not large enough

* to justify complicating matters.

*----------

4. join，是joinsel即join的選擇性計算函數。

對應pg_operator.oprjoin

postgres = # select distinct oprjoin from pg_operator order by 1;

oprjoin

------------------

eqjoinsel

neqjoinsel

scalarltjoinsel

scalargtjoinsel

areajoinsel

positionjoinsel

contjoinsel

iclikejoinsel

icnlikejoinsel

regexeqjoinsel

likejoinsel

icregexeqjoinsel

regexnejoinsel

nlikejoinsel

icregexnejoinsel

networkjoinsel

tsmatchjoinsel

arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示該操作符是否允許hash join和merge join, 只有返回布爾邏輯值的二元操作符滿足這個要求。

我們在pg_operator這個catalog中也可以查看到對應的介紹：

Name	Type	References	Description
oid	oid		Row identifier (hidden attribute; must be explicitly selected)
oprname	name		Name of the operator
oprnamespace	oid	pg_namespace .oid	The OID of the namespace that contains this operator
oprowner	oid	pg_authid.oid	Owner of the operator
oprkind	char		b = infix ( "between" ), l = prefix ( "left" ), r = postfix ("right" ) 指定操作符在什麼位置，例如中間，左側，右側
oprcanmerge	bool		This operator supports merge joins此操作符是否支持merge join
oprcanhash	bool		This operator supports hash joins此操作符是否支持hash join
oprleft	oid	pg_type .oid	Type of the left operand操作符左側的數據類型
oprright	oid	pg_type .oid	Type of the right operand操作符右側的數據類型
oprresult	oid	pg_type .oid	Type of the result返回結果的數據類型
oprcom	oid	pg_operator.oid	Commutator of this operator, if any
oprnegate	oid	pg_operator.oid	Negator of this operator, if any
oprcode	regproc	pg_proc .oid	Function that implements this operator
oprrest	regproc	pg_proc .oid	Restriction selectivity estimation function for this operator
oprjoin	regproc	pg_proc .oid	Join selectivity estimation function for this operator

PostgreSQL操作符與優化器詳解

ComfyUI 完全入門：ControlNet 使用教程

爲什麼shell的變量定義不能有空格

Unix系統的三種緩衝區行緩衝、全緩衝、無緩衝（以及如何調整緩衝區的類型）

虛擬內存與虛擬地址空間

用Linux守護進程檢測某個程序是否運行（然後重新運行）

Linux C 創建配置文件小模板

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結