PostgreSQL 支持自定義操作符,本質上是調用函數來實現的。
語法如下:
例如創建一個求兩個值的平均值的操作符:
首選要創建函數
postgres = # create function f_avg(numeric,numeric) returns numeric as $$
postgres$# select ($1+$2)/2;
postgres$# $$ language sql strict;
CREATE FUNCTION
postgres = # select f_avg(1,null);
f_avg
-------
(1 row)
postgres=# select f_avg(1,2);
f_avg
--------------------
1.5000000000000000
(1 row)
創建操作符,指定左右參數類型,調用的函數名,commutator是一個和優化器相關的選項,我後面會重點介紹:
postgres = # create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');
CREATE OPERATOR
postgres=# select 1 ## 2;
?column?
--------------------
1.5000000000000000
(1 row)
注意到在創建操作符的語法中有6個和優化器有關的關鍵字:
[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
[, RESTRICT = res_proc ] [, JOIN = join_proc ]
[, HASHES ] [, MERGES ]
介紹如下:
假設x表示操作符左側的參數,y表示操作符右側的參數
1. commutator,指明x op1 y等效於y op2 x,即操作數調換,返回的值一樣。例如2>1 和1<2結果是一致的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的,那麼+就是+的commutator。commutator只需要在創建其中一個操作符時指定,創建另一個對應的操作符時可以不需要指定,PostgreSQL會自動建立這個關係。例如創建>操作符時指定了它的commutator是<,那麼在創建<操作符時可以不需要指定>是它的commutator。
另外需要注意,有commutator操作符的操作符的左右兩側的參數類型必須一致,這樣才能滿足x op1 y等價於y op2 x。
優化器如何利用commutator呢?例如索引掃描,必須列在操作符的左側才能使用索引。1 > tbl.c這個條件,如果>沒有commutator的話,是不能使用索引的。
例子,以int4的>和<操作符爲例,實驗一下:
>和<在PostgreSQL中是一對commutator
postgres = # select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprcom
--------------
pg_catalog.<
(1 row)
postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
oprcom
--------------
pg_catalog.>
(1 row)
記錄他們的oprcom對應的OID
postgres = # select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode | oprrest | oprjoin
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4
gt | scalargtsel | scalargtjoinsel
(1 row)
postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode | oprrest | oprjoin
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4
lt | scalarltsel | scalarltjoinsel
(1 row)
接下來我要通過更新pg_operator解除他們的commutator關係,設置爲0即可。
postgres = # update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;
UPDATE 1
postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;
UPDATE 1
創建測試表,插入測試數據,創建索引:
postgres = # create table tbl(id int);
CREATE TABLE
postgres=# insert into tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# create index idx_tbl_id on tbl(id);
CREATE INDEX
將列放在條件的左邊可以走索引,但是放在右邊不走索引。因爲優化器不能決定>,<是否爲commutator
postgres = # explain select * from tbl where id<10;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)
Index Cond: (id < 10)
(2 rows)
postgres=# explain select * from tbl where 10>id;
QUERY PLAN
----------------------------------------------------------
Seq Scan on tbl (cost=0.00..1361.00 rows=33333 width=4)
Filter: (10 > id)
(2 rows)
重新建立這兩個 operator的commutator關係後,優化器會自動將10>id轉換爲id<10,並且走索引了:
postgres = # update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;
UPDATE 1
postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;
UPDATE 1
postgres=# explain select * from tbl where 10>id;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)
Index Cond: (id < 10)
(2 rows)
2. negator,指x op1 y 等價於 not(y op2 x),或者x op1等價於not( y op2),或者op1 x 等價於not(op2 y),因此negator支持一元和二元操作符。
例子:
如果=和<>是一對negator操作符,NOT (x = y) 可以簡化爲 x <> y。
postgres = # explain select * from tbl where 10=id;
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
postgres=# explain select * from tbl where not(10<>id);
QUERY PLAN
---------------------------------------------------------------------------
Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
同樣,操作符兩側參數x,y的類型必須一致。並且僅適用於返回布爾邏輯類型的操作符。
3. restrict,是用於評估選擇性的函數,僅適用於二元操作符,例如where col>100,這個查詢條件,如何評估選擇性呢?是通過操作符的restrict來指定的,選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。
選擇性函數的代碼在 src/backend/utils/adt/
包括
- rw - r -- r --. 1 1107 1107 33191 Jun 10 03 : 29 array_selfuncs . c
- rw - r -- r --. 1 1107 1107 2316 Jun 10 03 : 29 geo_selfuncs . c
- rw - r -- r --. 1 1107 1107 720 Jun 10 03 : 29 network_selfuncs . c
- rw - r -- r --. 1 1107 1107 33895 Jun 10 03 : 29 rangetypes_selfuncs . c
- rw - r -- r --. 1 1107 1107 218809 Jun 10 03 : 29 selfuncs . c
選擇性函數,還需要依賴數據庫的統計信息,從而計算選擇性,常見的選擇性計算函數有:
postgres = # select distinct oprrest from pg_operator order by 1;
oprrest
--------------
-
eqsel 相等
neqsel 不相等
scalarltsel 小於等於
scalargtsel 大於等於
areasel
positionsel
contsel
iclikesel
icnlikesel
regexeqsel
likesel
icregexeqsel
regexnesel
nlikesel
icregexnesel
rangesel
networksel
tsmatchsel
arraycontsel
(20 rows)
當然,用戶如果自定義數據類型的話,也可以自定義選擇性函數,或者使用以上標準的選擇性函數,只是可能需要實現一下類型轉換。
源碼中的介紹:
src/backend/utils/adt/selfuncs.c
/*----------
* Operator selectivity estimation functions are called to estimate the
* selectivity of WHERE clauses whose top-level operator is their operator.
* We divide the problem into two cases:
* Restriction clause estimation: the clause involves vars of just
* one relation. 一種是符合WHERE條件的選擇性(百分比)。
* Join clause estimation: the clause involves vars of multiple rels.
* Join selectivity estimation is far more difficult and usually less accurate
* than restriction estimation. -- JOIN的選擇性評估通常沒有WHERE條件的選擇性準確。
*
* When dealing with the inner scan of a nestloop join, we consider the
* join's joinclauses as restriction clauses for the inner relation, and
* treat vars of the outer relation as parameters (a/k/a constants of unknown
* values). So, restriction estimators need to be able to accept an argument
* telling which relation is to be treated as the variable.
在使用nestloop JOIN時,一個表的字段將作爲變量,另一個表的字段(及其統計信息)與操作符作爲JOIN評估子句。
*
* The call convention for a restriction estimator (oprrest function) is
*
* Selectivity oprrest (PlannerInfo *root,
* Oid operator,
* List *args,
* int varRelid);
* 評估選擇性需要4個參數:
* root: general information about the query (rtable and RelOptInfo lists
* are particularly important for the estimator). plannerinfo信息。
* operator: OID of the specific operator in question. 操作符的OID
* args: argument list from the operator clause. 操作符子句中的參數列表
* varRelid: if not zero, the relid (rtable index) of the relation to
* be treated as the variable relation. May be zero if the args list
* is known to contain vars of only one relation. 表示where條件所包含的參數來自哪些relation。
*
* This is represented at the SQL level (in pg_proc) as
*
* float8 oprrest (internal, oid, internal, int4); 在pg_proc數據字典中表示爲oprrest指定的函數。
*
* The result is a selectivity, that is, a fraction (0 to 1) of the rows
* of the relation that are expected to produce a TRUE result for the
* given operator. 選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。
*
* The call convention for a join estimator (oprjoin function) is similar
* except that varRelid is not needed, and instead join information is
* supplied:
* JOIN選擇性的計算函數與WHERE選擇性的計算函數參數有輕微差別,麼有varRelid, 增加了join信息的參數。
* Selectivity oprjoin (PlannerInfo *root,
* Oid operator,
* List *args,
* JoinType jointype,
* SpecialJoinInfo *sjinfo);
*
* float8 oprjoin (internal, oid, internal, int2, internal);
*
* (Before Postgres 8.4, join estimators had only the first four of these
* parameters. That signature is still allowed, but deprecated.) The
* relationship between jointype and sjinfo is explained in the comments for
* clause_selectivity() --- the short version is that jointype is usually
* best ignored in favor of examining sjinfo.
*
* Join selectivity for regular inner and outer joins is defined as the
* fraction (0 to 1) of the cross product of the relations that is expected
* to produce a TRUE result for the given operator. For both semi and anti (半連接與預連接)
* joins, however, the selectivity is defined as the fraction of the left-hand
* side relation's rows that are expected to have a match (ie, at least one
* row with a TRUE result) in the right-hand side.
*
* For both oprrest and oprjoin functions, the operator's input collation OID
* (if any) is passed using the standard fmgr mechanism, so that the estimator
* function can fetch it with PG_GET_COLLATION(). Note, however, that all
* statistics in pg_statistic are currently built using the database's default
* collation. Thus, in most cases where we are looking at statistics, we
* should ignore the actual operator collation and use DEFAULT_COLLATION_OID.
* We expect that the error induced by doing this is usually not large enough
* to justify complicating matters.
*----------
4. join,是joinsel即join的選擇性計算函數。
對應pg_operator.oprjoin
postgres = # select distinct oprjoin from pg_operator order by 1;
oprjoin
------------------
-
eqjoinsel
neqjoinsel
scalarltjoinsel
scalargtjoinsel
areajoinsel
positionjoinsel
contjoinsel
iclikejoinsel
icnlikejoinsel
regexeqjoinsel
likejoinsel
icregexeqjoinsel
regexnejoinsel
nlikejoinsel
icregexnejoinsel
networkjoinsel
tsmatchjoinsel
arraycontjoinsel
(19 rows)
5. hashes
6. merges
hashes和merges表示該操作符是否允許hash join和merge join, 只有返回布爾邏輯值的二元操作符滿足這個要求。
我們在pg_operator這個catalog中也可以查看到對應的介紹:
Name | Type | References | Description |
---|---|---|---|
oid | oid | Row identifier (hidden attribute; must be explicitly selected) | |
oprname | name | Name of the operator | |
oprnamespace | oid | pg_namespace .oid | The OID of the namespace that contains this operator |
oprowner | oid | pg_authid.oid | Owner of the operator |
oprkind | char | b = infix ( "between" ), l = prefix ( "left" ), r = postfix ("right" ) 指定操作符在什麼位置,例如中間,左側,右側 |
|
oprcanmerge | bool | This operator supports merge joins此操作符是否支持merge join | |
oprcanhash | bool | This operator supports hash joins此操作符是否支持hash join | |
oprleft | oid | pg_type .oid | Type of the left operand操作符左側的數據類型 |
oprright | oid | pg_type .oid | Type of the right operand操作符右側的數據類型 |
oprresult | oid | pg_type .oid | Type of the result返回結果的數據類型 |
oprcom | oid | pg_operator.oid | Commutator of this operator, if any |
oprnegate | oid | pg_operator.oid | Negator of this operator, if any |
oprcode | regproc | pg_proc .oid | Function that implements this operator |
oprrest | regproc | pg_proc .oid | Restriction selectivity estimation function for this operator |
oprjoin | regproc | pg_proc .oid | Join selectivity estimation function for this operator |