PG_PROC
PG_OPERATOR
pg_opclass用於定義索引上的相關操作符,一般來說是同一類數據類型。pg_opfamiliy定義了相互兼容的數據類型的操作符,關係見https://www.postgresql.org/docs/9.1/catalog-pg-opclass.html。pg 8.3引入pg_opfamilies,原因:Create "operator families" to improve planning of queries involving cross-data-type comparisons (Tom)
https://www.postgresql.org/docs/current/btree-behavior.html
https://www.postgresql.org/docs/current/indexes-opclass.html
PG_LANGUAGE
對於操作符表達式, 在PostgreSQL 數據庫中操作符實際都轉成了對應的函數。
到執行期,也就是ExecMakeTableFunctionResult/ExecMakeFunctionResultSet階段,函數信息fcinfo/flinfo及函數指針都已經確定。
表達式實現
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL
plpgsql實現
初始化
編譯
默認情況下,除非存儲過程(函數無此特例)是SQL語言編寫的,否則編譯發生在第一次調用(函數總是在第一次執行時編譯)時,pl_comp()函數。
就理論而言,在語法解析爲數據結構這個過程,語句表達式可以使用深度優先二叉樹遍歷實現(每個節點保存節點類型、操作數和值、也可能還包括操作數,也就是left/right),二叉樹一般使用遞歸實現,遞歸性能較低,可以將其轉換爲數組來平面化(關鍵在於如何表示,PG做了實現)。對於SQL語句來說,不管硬編碼、綁定變量還是字段、函數、表達式、聚合函數、分析函數這一步都是一樣的。因爲在targetlist已經全部數組、表達式化。包括case when xxx=ssss then; case xxx when sss then; between and; interval '';語句和表達式都可用二叉樹來實現計算。
因爲GLR(bison默認,flex and bison第九章,可以向前查看無限個記號)或LALR(1)(Look-Ahead Left Reversed,bison也支持)或兩路並行 正常會向前找一個符號,所以可以爲操作符指定優先級,這樣就可以轉換爲深度優先樹(逆波蘭可以解決括號問題,不用括號就解決優先級問題,但是不適合人工閱讀,適合機器表示)。
調用
在analyze語義分析階段,會確定函數信息並設置fcinfo/flinfo的固定部分,如函數名、函數指針。如下:
> FuncnameGetCandidates C++ (gdb) lt_func_get_detail C++ (gdb) ParseFuncOrColumn C++ (gdb) transformFuncCall C++ (gdb) transformExprRecurse C++ (gdb) transformExpr C++ (gdb) transformRangeFunction C++ (gdb) transformFromClauseItem C++ (gdb) transformFromClause C++ (gdb) transformSelectStmt C++ (gdb) transformStmt C++ (gdb) transformTopLevelStmt C++ (gdb) parse_analyze C++ (gdb) pg_analyze_and_rewrite C++ (gdb) exec_simple_query C++ (gdb) PostgresMain C++ (gdb) BackendRun C++ (gdb) BackendStartup C++ (gdb) ServerLoop C++ (gdb) PostmasterMain C++ (gdb) main C++ (gdb)
其中函數地址在fn_addr屬性中。在lookup_C_func函數中設置,如下:
那函數地址第一次是如何加載到哈希中的呢?
完成
異常清理
/*------------------------------------------------------------------------- * Support struct to ease writing Set Returning Functions (SRFs) *------------------------------------------------------------------------- * * This struct holds function context for Set Returning Functions. * Use fn_extra to hold a pointer to it across calls */ typedef struct FuncCallContext { /* * Number of times we've been called before * * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and * incremented for you every time SRF_RETURN_NEXT() is called. */ uint64 call_cntr; /* * OPTIONAL maximum number of calls * * max_calls is here for convenience only and setting it is optional. If * not set, you must provide alternative means to know when the function * is done. */ uint64 max_calls; /* * OPTIONAL pointer to miscellaneous user-provided context information * * user_fctx is for use as a pointer to your own struct to retain * arbitrary context information between calls of your function. */ void *user_fctx; /* * OPTIONAL pointer to struct containing attribute type input metadata * * attinmeta is for use when returning tuples (i.e. composite data types) * and is not used when returning base data types. It is only needed if * you intend to use BuildTupleFromCStrings() to create the return tuple. */ AttInMetadata *attinmeta; /* * memory context used for structures that must live for multiple calls * * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory * context for any memory that is to be reused across multiple calls of * the SRF. */ MemoryContext multi_call_memory_ctx; /* * OPTIONAL pointer to struct containing tuple description * * tuple_desc is for use when returning tuples (i.e. composite data types) * and is only needed if you are going to build the tuples with * heap_form_tuple() rather than with BuildTupleFromCStrings(). Note that * the TupleDesc pointer stored here should usually have been run through * BlessTupleDesc() first. */ TupleDesc tuple_desc; } FuncCallContext;
對於非跨調用(上下文無關,通常是標量函數)函數,其實例如下:
FmgrInfo:函數信息
TupleDesc:記錄定義
HeapTuple:記錄
從C字符串構建記錄元祖extern HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values);,具體實現在heap_form_tuple,如下:
/* * heap_form_tuple * construct a tuple from the given values[] and isnull[] arrays, * which are of the length indicated by tupleDescriptor->natts * * The result is allocated in the current memory context. */ HeapTuple heap_form_tuple(TupleDesc tupleDescriptor, Datum *values, bool *isnull)
/* * heap_fill_tuple * Load data portion of a tuple from values/isnull arrays * * We also fill the null bitmap (if any) and set the infomask bits * that reflect the tuple's data contents. * * NOTE: it is now REQUIRED that the caller have pre-zeroed the data area. */ void heap_fill_tuple(TupleDesc tupleDesc, Datum *values, bool *isnull, char *data, Size data_size, uint16 *infomask, bits8 *bit)
generate_series的實現及返回集合類型
CREATE OR REPLACE FUNCTION fibonacci_seq(num integer) RETURNS SETOF integer AS $$ DECLARE a int := 0; b int := 1; BEGIN IF (num <= 0) THEN RETURN; END IF; RETURN NEXT a; LOOP EXIT WHEN num <= 1; RETURN NEXT b; num = num - 1; SELECT b, a + b INTO a, b; END LOOP; END; $$ language plpgsql;
zjh@postgres-# (SELECT fibonacci_seq(3)); fibonacci_seq --------------- 0 1 1 (3 rows)
-- 雖然這種模式有點怪,但是因爲設計問題,PL/PGSQL無法支持類似RETURN a,b,c語法。
CREATE FUNCTION permutations(INOUT a int, INOUT b int, INOUT c int) RETURNS SETOF RECORD AS $$ BEGIN RETURN NEXT; SELECT b,c INTO c,b; RETURN NEXT; SELECT a,b INTO b,a; RETURN NEXT; SELECT b,c INTO c,b; RETURN NEXT; SELECT a,b INTO b,a; RETURN NEXT; SELECT b,c INTO c,b; RETURN NEXT; END; $$ LANGUAGE plpgsql; zjh@postgres=# SELECT * FROM permutations(1, 2, 3); a | b | c ---+---+--- 1 | 2 | 3 1 | 3 | 2 3 | 1 | 2 3 | 2 | 1 2 | 3 | 1 2 | 1 | 3 (6 rows)
zjh@postgres=# CREATE OR REPLACE FUNCTION permutations2(a int, b int, c int) RETURNS SETOF abc AS $$ BEGIN RETURN NEXT a,b,c; END; $$ LANGUAGE plpgsql; CREATE FUNCTION zjh@postgres=# zjh@postgres=# zjh@postgres=# select * from permutations2(1,2,3); ERROR: query "SELECT a,b,c" returned 3 columns CONTEXT: PL/pgSQL function permutations2(integer,integer,integer) line 3 at RETURN NEXT
因爲generate_series是使用c語言實現的,其結構和plpgsql實現類似。
調試pl/pgsql代碼
目前來說,主要幾個plpgsql debugger插件實現,https://github.com/OmniDB/plpgsql_debugger,plugin_debugger(EDB寫)。主流的pg ide包括dbeaver,pgadmin 4,navicat都支持,lightdb 22.4正式版內置了plugin_debugger,二次發行版dbeaver也開箱即用的支持plpgsql和pgorasql的調試。
另外,和oracle裏面一樣,pg也支持打印調用堆棧,可參見https://www.cybertec-postgresql.com/en/debugging-pl-pgsql-get-stacked-diagnostics/。
PL/pgSQL的實現
PL/pgSQL存儲過程示例https://blog.csdn.net/kmblack1/article/details/92786900。
由於plpgsql支持事務(存儲過程支持,函數不支持)、表達式和語句採用表達式引擎實現、執行SQL語句基於SPI實現,因此要了解或pl/pgsql的實現,需要先熟悉事務快照,表達式以及SPI的實現機制,不然會有大量的盲區。
編譯、校驗、執行、語言。
SQL執行引擎、PL/pgSQL引擎(無單獨的表達式解析器)。
在postgresql中,PL/pgSQL過程、函數的執行有點類似Javascript和python,會話第一次加載(這事挺麻煩,爲了確保沒問題,還得先調用一遍,如果是很複雜的存儲過程,會導致需要大量的準備過程。注:SQL過程會在創建的時候會進行編譯)的時候會進行語法解析、語義解析等,然後生成內存中的結構,下次執行的時候就直接執行緩存的內存指令結構(由於最終會通過plXXXsql過程性解析器編譯爲類表達式引擎中執行plpgsql中的指令,其實現通常性能變動較大,因此性能通常不如c編寫的函數)。具體可參見https://www.postgresql.org/docs/13/plpgsql-implementation.html、https://www.percona.com/live/19/sites/default/files/slides/Introduction%20to%20PL_pgSQL%20Development%20-%20FileId%20-%20187790.pdf。
zjh@postgres=# CREATE OR REPLACE FUNCTION ambiguous(parameter varchar) RETURNS zjh@postgres-# integer AS $$ zjh@postgres$# DECLARE retval integer; zjh@postgres$# BEGIN zjh@postgres$# INSERT INTO parameter (parameter) VALUES (parameter) RETURNING id zjh@postgres$# INTO retval; zjh@postgres$# RETURN retval; zjh@postgres$# END zjh@postgres$# $$ zjh@postgres-# language plpgsql; CREATE FUNCTION zjh@postgres=# zjh@postgres=# SELECT ambiguous ('parameter'); ERROR: relation "parameter" does not exist LINE 1: INSERT INTO parameter (parameter) VALUES (parameter) RETURNI... ^ QUERY: INSERT INTO parameter (parameter) VALUES (parameter) RETURNING id CONTEXT: PL/pgSQL function ambiguous(character varying) line 4 at SQL statement
表達式的核心設計架構
表達式引擎相比函數,實現起來並不是那麼直接,核心設計模式在於:爲了提升運行時的性能,因爲表達式通常對每行記錄執行一次,而遞歸層次深的函數無論資源消耗還是性能都比普通迭代的要弱,所以,在PG中,表達式被設計爲:解析的時候,向下遞歸、嵌套函數列表;表達式初始化的時候,一樣由外向內遞歸、深度優先二叉樹轉換爲array,具體在
遊標、跨事務遊標 https://www.cybertec-postgresql.com/en/declare-cursor-in-postgresql-or-how-to-reduce-memory-consumption/、https://www.postgresql.org/docs/10/sql-declare.html,遊標選項對應的宏定義
#define CURSOR_OPT_BINARY 0x0001 /* BINARY */
#define CURSOR_OPT_SCROLL 0x0002 /* SCROLL explicitly given */
#define CURSOR_OPT_NO_SCROLL 0x0004 /* NO SCROLL explicitly given */
#define CURSOR_OPT_INSENSITIVE 0x0008 /* INSENSITIVE */
#define CURSOR_OPT_HOLD 0x0010 /* WITH HOLD */
/* these planner-control flags do not correspond to any SQL grammar: */
#define CURSOR_OPT_FAST_PLAN 0x0020 /* prefer fast-start plan */
#define CURSOR_OPT_GENERIC_PLAN 0x0040 /* force use of generic plan */
#define CURSOR_OPT_CUSTOM_PLAN 0x0080 /* force use of custom plan */
#define CURSOR_OPT_PARALLEL_OK 0x0100 /* parallel mode OK */
https://wiki.postgresql.org/wiki/Debugging_the_PostgreSQL_grammar_(Bison)