PostgreSQL操作符與優化器詳解

PostgreSQL 支持自定義操作符,本質上是調用函數來實現的。

語法如下:

例如創建一個求兩個值的平均值的操作符:

首選要創建函數

postgres = # create function f_avg(numeric,numeric) returns numeric as $$

postgres$#   select ($1+$2)/2;

postgres$# $$ language sql strict;

CREATE FUNCTION

驗證函數

postgres = # select f_avg(1,null);

 f_avg 

-------

      

(1 row)

postgres=# select f_avg(1,2);

       f_avg        

--------------------

 1.5000000000000000

(1 row)

創建操作符,指定左右參數類型,調用的函數名,commutator是一個和優化器相關的選項,我後面會重點介紹:

postgres = # create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');

CREATE OPERATOR

postgres=# select 1 ## 2;

      ?column?      

--------------------

 1.5000000000000000

(1 row)

注意到在創建操作符的語法中有6個和優化器有關的關鍵字:

    [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]

    [, RESTRICT = res_proc ] [, JOIN = join_proc ]

    [, HASHES ] [, MERGES ]

介紹如下:

假設x表示操作符左側的參數,y表示操作符右側的參數

1. commutator,指明x op1 y等效於y op2 x,即操作數調換,返回的值一樣。例如2>1 和1<2結果是一致的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的,那麼+就是+的commutator。commutator只需要在創建其中一個操作符時指定,創建另一個對應的操作符時可以不需要指定,PostgreSQL會自動建立這個關係。例如創建>操作符時指定了它的commutator是<,那麼在創建<操作符時可以不需要指定>是它的commutator。

另外需要注意,有commutator操作符的操作符的左右兩側的參數類型必須一致,這樣才能滿足x op1 y等價於y op2 x。

優化器如何利用commutator呢?例如索引掃描,必須列在操作符的左側才能使用索引。1 > tbl.c這個條件,如果>沒有commutator的話,是不能使用索引的。

例子,以int4的>和<操作符爲例,實驗一下:

>和<在PostgreSQL中是一對commutator

postgres = # select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

    oprcom    

--------------

 pg_catalog.<

(1 row)

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

    oprcom    

--------------

 pg_catalog.>

(1 row)

記錄他們的oprcom對應的OID

postgres = # select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

 oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc

ode |   oprrest   |     oprjoin     

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

 >       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |     97 |       523 | int4

gt  | scalargtsel | scalargtjoinsel

(1 row)

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

 oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc

ode |   oprrest   |     oprjoin     

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

 <       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |    521 |       525 | int4

lt  | scalarltsel | scalarltjoinsel

(1 row)

接下來我要通過更新pg_operator解除他們的commutator關係,設置爲0即可。

postgres = # update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

創建測試表,插入測試數據,創建索引:

postgres = # create table tbl(id int);

CREATE TABLE

postgres=# insert into tbl select generate_series(1,100000);

INSERT 0 100000

postgres=# create index idx_tbl_id on tbl(id);

CREATE INDEX

將列放在條件的左邊可以走索引,但是放在右邊不走索引。因爲優化器不能決定>,<是否爲commutator

postgres = # explain select * from tbl where id<10;

                                QUERY PLAN                                 

---------------------------------------------------------------------------

 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.45 rows=9 width=4)

   Index Cond: (id < 10)

(2 rows)

postgres=# explain select * from tbl where 10>id;

                        QUERY PLAN                        

----------------------------------------------------------

 Seq Scan on tbl  (cost=0.00..1361.00 rows=33333 width=4)

   Filter: (10 > id)

(2 rows)

重新建立這兩個 operator的commutator關係後,優化器會自動將10>id轉換爲id<10,並且走索引了:

postgres = # update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# explain select * from tbl where 10>id;

                                QUERY PLAN                                 

---------------------------------------------------------------------------

 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.45 rows=9 width=4)

   Index Cond: (id < 10)

(2 rows)

2. negator,指x op1 y 等價於 not(y op2 x),或者x op1等價於not( y op2),或者op1 x 等價於not(op2 y),因此negator支持一元和二元操作符。

例子:

如果=和<>是一對negator操作符,NOT (x = y) 可以簡化爲 x <> y。

postgres = # explain select * from tbl where 10=id;

                                QUERY PLAN                                 

---------------------------------------------------------------------------

 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.31 rows=1 width=4)

   Index Cond: (id = 10)

(2 rows)

postgres=# explain select * from tbl where not(10<>id);

                                QUERY PLAN                                 

---------------------------------------------------------------------------

 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.31 rows=1 width=4)

   Index Cond: (id = 10)

(2 rows)

同樣,操作符兩側參數x,y的類型必須一致。並且僅適用於返回布爾邏輯類型的操作符。

3. restrict,是用於評估選擇性的函數,僅適用於二元操作符,例如where col>100,這個查詢條件,如何評估選擇性呢?是通過操作符的restrict來指定的,選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。

選擇性函數的代碼在 src/backend/utils/adt/ 

包括

- rw - r -- r --. 1 1107 1107   33191 Jun 10 03 : 29 array_selfuncs . c

- rw - r -- r --. 1 1107 1107   2316 Jun 10 03 : 29 geo_selfuncs . c

- rw - r -- r --. 1 1107 1107     720 Jun 10 03 : 29 network_selfuncs . c

- rw - r -- r --. 1 1107 1107   33895 Jun 10 03 : 29 rangetypes_selfuncs . c

- rw - r -- r --. 1 1107 1107 218809 Jun 10 03 : 29 selfuncs . c

選擇性函數,還需要依賴數據庫的統計信息,從而計算選擇性,常見的選擇性計算函數有:

postgres = # select distinct oprrest from pg_operator order by 1;

   oprrest    

--------------

 -

 eqsel  相等

 neqsel  不相等

 scalarltsel  小於等於

 scalargtsel  大於等於

 areasel

 positionsel

 contsel

 iclikesel

 icnlikesel

 regexeqsel

 likesel

 icregexeqsel

 regexnesel

 nlikesel

 icregexnesel

 rangesel

 networksel

 tsmatchsel

 arraycontsel

(20 rows)

當然,用戶如果自定義數據類型的話,也可以自定義選擇性函數,或者使用以上標準的選擇性函數,只是可能需要實現一下類型轉換。

源碼中的介紹:

src/backend/utils/adt/selfuncs.c

/*----------

 * Operator selectivity estimation functions are called to estimate the

 * selectivity of WHERE clauses whose top-level operator is their operator.

 * We divide the problem into two cases:

 *              Restriction clause estimation: the clause involves vars of just

 *                      one relation.  一種是符合WHERE條件的選擇性(百分比)。

 *              Join clause estimation: the clause involves vars of multiple rels.

 * Join selectivity estimation is far more difficult and usually less accurate

 * than restriction estimation.  -- JOIN的選擇性評估通常沒有WHERE條件的選擇性準確。

 *

 * When dealing with the inner scan of a nestloop join, we consider the

 * join's joinclauses as restriction clauses for the inner relation, and

 * treat vars of the outer relation as parameters (a/k/a constants of unknown

 * values).  So, restriction estimators need to be able to accept an argument

 * telling which relation is to be treated as the variable.

在使用nestloop JOIN時,一個表的字段將作爲變量,另一個表的字段(及其統計信息)與操作符作爲JOIN評估子句。

 *

 * The call convention for a restriction estimator (oprrest function) is

 *

 *              Selectivity oprrest (PlannerInfo *root,

 *                                                       Oid operator,

 *                                                       List *args,

 *                                                       int varRelid);

 * 評估選擇性需要4個參數:

 * root: general information about the query (rtable and RelOptInfo lists

 * are particularly important for the estimator).   plannerinfo信息。

 * operator: OID of the specific operator in question. 操作符的OID

 * args: argument list from the operator clause.  操作符子句中的參數列表

 * varRelid: if not zero, the relid (rtable index) of the relation to

 * be treated as the variable relation.  May be zero if the args list

 * is known to contain vars of only one relation.   表示where條件所包含的參數來自哪些relation。

 *

 * This is represented at the SQL level (in pg_proc) as

 *

 *              float8 oprrest (internal, oid, internal, int4);   在pg_proc數據字典中表示爲oprrest指定的函數。

 *

 * The result is a selectivity, that is, a fraction (0 to 1) of the rows

 * of the relation that are expected to produce a TRUE result for the

 * given operator.  選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。

 *

 * The call convention for a join estimator (oprjoin function) is similar

 * except that varRelid is not needed, and instead join information is

 * supplied:

 * JOIN選擇性的計算函數與WHERE選擇性的計算函數參數有輕微差別,麼有varRelid, 增加了join信息的參數。

 *              Selectivity oprjoin (PlannerInfo *root,

 *                                                       Oid operator,

 *                                                       List *args,

 *                                                       JoinType jointype,

 *                                                       SpecialJoinInfo *sjinfo);

 *

 *              float8 oprjoin (internal, oid, internal, int2, internal);

 *

 * (Before Postgres 8.4, join estimators had only the first four of these

 * parameters.  That signature is still allowed, but deprecated.)  The

 * relationship between jointype and sjinfo is explained in the comments for

 * clause_selectivity() --- the short version is that jointype is usually

 * best ignored in favor of examining sjinfo.

 *

 * Join selectivity for regular inner and outer joins is defined as the

 * fraction (0 to 1) of the cross product of the relations that is expected

 * to produce a TRUE result for the given operator.  For both semi and anti  (半連接與預連接)

 * joins, however, the selectivity is defined as the fraction of the left-hand

 * side relation's rows that are expected to have a match (ie, at least one

 * row with a TRUE result) in the right-hand side.

 *

 * For both oprrest and oprjoin functions, the operator's input collation OID

 * (if any) is passed using the standard fmgr mechanism, so that the estimator

 * function can fetch it with PG_GET_COLLATION().  Note, however, that all

 * statistics in pg_statistic are currently built using the database's default

 * collation.  Thus, in most cases where we are looking at statistics, we

 * should ignore the actual operator collation and use DEFAULT_COLLATION_OID.

 * We expect that the error induced by doing this is usually not large enough

 * to justify complicating matters.

 *----------

4. join,是joinsel即join的選擇性計算函數。

對應pg_operator.oprjoin

postgres = # select distinct oprjoin from pg_operator order by 1;

     oprjoin      

------------------

 -

 eqjoinsel

 neqjoinsel

 scalarltjoinsel

 scalargtjoinsel

 areajoinsel

 positionjoinsel

 contjoinsel

 iclikejoinsel

 icnlikejoinsel

 regexeqjoinsel

 likejoinsel

 icregexeqjoinsel

 regexnejoinsel

 nlikejoinsel

 icregexnejoinsel

 networkjoinsel

 tsmatchjoinsel

 arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示該操作符是否允許hash join和merge join, 只有返回布爾邏輯值的二元操作符滿足這個要求。

我們在pg_operator這個catalog中也可以查看到對應的介紹:

Name Type References Description
oid oid   Row identifier (hidden attribute; must be explicitly selected)
oprname name   Name of the operator
oprnamespace oid pg_namespace .oid The OID of the namespace that contains this operator
oprowner oid pg_authid.oid Owner of the operator
oprkind char   b = infix ( "between" ), l = prefix ( "left" ), r = postfix ("right" ) 
指定操作符在什麼位置,例如中間,左側,右側
oprcanmerge bool   This operator supports merge joins此操作符是否支持merge join
oprcanhash bool   This operator supports hash joins此操作符是否支持hash join
oprleft oid pg_type .oid Type of the left operand操作符左側的數據類型
oprright oid pg_type .oid Type of the right operand操作符右側的數據類型
oprresult oid pg_type .oid Type of the result返回結果的數據類型
oprcom oid pg_operator.oid Commutator of this operator, if any
oprnegate oid pg_operator.oid Negator of this operator, if any
oprcode regproc pg_proc .oid Function that implements this operator
oprrest regproc pg_proc .oid Restriction selectivity estimation function for this operator
oprjoin regproc pg_proc .oid Join selectivity estimation function for this operator
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章