概述
自定義協議在一些特殊場景下比較有用,例如需要在自己的系統和Greenplum之間直接導入、導出數據。下面簡單介紹一下開發一個自定義協議的步驟:
主要步驟
1、按照Greenplum預定義的API格式,實現幾個C接口(import、export、validate_urls)。這些接口將被編譯成so的導出函數,並最終註冊到GP(需要用戶自己定義GP函數進行註冊,後面會講到。)
CREATE FUNCTION myread() RETURNS integer
as '$libdir/gpextprotocol.so', 'demoprot_import'
LANGUAGE C STABLE;
CREATE FUNCTION mywrite() RETURNS integer
as '$libdir/gpextprotocol.so', 'demoprot_export'
LANGUAGE C STABLE;
#可選函數
CREATE OR REPLACE FUNCTION myvalidate() RETURNS void
AS '$libdir/gpextprotocol.so', 'demoprot_validate_urls'
LANGUAGE C STABLE;
3、基於上面定義這些數據庫函數,定義自定義協議myprot。 其中myvalidatorfunc 是可選的。
CREATE TRUSTED PROTOCOL myprot(
writefunc='mywrite',
readfunc='myread',
validatorfunc='myvalidate');
GRANT ALL ON PROTOCOL myprot TO otheruser
5、基於上述協議,創建可讀或可寫的外部表。
PS:自定義協議的外部表對異常數據處理和標準協議外部表相同。
CREATE WRITABLE EXTERNAL TABLE ext_sales(LIKE sales)
LOCATION ('myprot://<meta>/<meta>/…')
FORMAT 'TEXT';
CREATE READABLE EXTERNAL TABLE ext_sales(LIKE sales)
LOCATION('myprot://<meta>/<meta>/…')
FORMAT 'TEXT';
自定義協議核心代碼
導出函數
實現將數據從Greenplum導出到外部表的過程。
Greenplum 外部表調用框架代碼會根據sql語句,將查詢到的數據組織成指定格式(如TEXT或CSV)放到一個buffer中,用戶調用API獲取該buffer的地址,從中讀取數據,並按照自己的業務邏輯將其發送到自定義外部表。
其處理流程大概如下:
初始化外部連接->向外部系統發數據->關閉外部系統連接。
export函數會被GP外部表框架多次調用(取決於數據量的多少),需要我們結合GP提供的API自己判斷連接是否初始化(初始化成功後將連接信息保存到自定義數據區)、數據是否發送完成。
初始化連接
當export函數被調用時,首先調用 EXTPROTOCOL_GET_USER_CTX 獲取自定義信息(可用於保存外部表連接的描述信息),如果爲NULL,則需要執行初始化連接的動作,並調用 EXTPROTOCOL_SET_USER_CTX 保存上述信息。
發送數據
Datum
myprot_export(PG_FUNCTION_ARGS)
{
……
/* ========================================
* DO THE EXPORT
* ======================================== */
/* 當我們執行 insert into ext_test_table(xx,xx) values(xxxxxxxx)或insert into ext_test_table select * from internal_test_table 等sql語句時,
* gp會自動把數據組織成CSV或TEXT等格式放入buf中,我們只需要將其發送給外部系統就可以了。當然根據實際業務邏輯的需要,可能要對數據格式進行轉換處理。 */
data = EXTPROTOCOL_GET_DATABUF(fcinfo);
datlen = EXTPROTOCOL_GET_DATALEN(fcinfo);
if(datlen > 0)
{
/* 將數據寫入(發送到)外部表。本Demo直接寫入指定的本地文件中 */
wrote = fwrite(data, 1, datlen, myData->file);
if (ferror(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("myprot_import: could not read from file
\"%s\": %m",
myData->filename)));
}
PG_RETURN_INT32((int)wrote);
}
關閉連接
通過 EXTPROTOCOL_IS_LAST_CALL 判斷是否還有剩餘數據。如沒有其他數據,說明導出已經完成(靜態數據的情況下)。需要關閉自定義外部表連接。
導入函數
接收數據
Datum myprot_import(PG_FUNCTION_ARGS)
{
……
/* ==========================================
* DO THE IMPORT
* ========================================== */
data = EXTPROTOCOL_GET_DATABUF(fcinfo);
datlen = EXTPROTOCOL_GET_DATALEN(fcinfo);
/* read some bytes (with fread in this example, but normally
in some other method over the network) */
if(datlen > 0)
{
// GP 的API會幫我們申請好buf內存空間,我們按照自己的業務邏輯從外部表系統獲取數據,並組織成外部表定義時約定的格式如CSV或TEXT,然後放入buf中,返回實際讀到的長度就可以了
nread = fread(data, 1, datlen, myData->file);
if (ferror(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("myprot_import: could not write to file
\"%s\": %m",
myData->filename)));
}
PG_RETURN_INT32((int)nread); // 通常如果導入函數返回0,Greenplum框架認爲導入完成。接下來調用EXTPROTOCOL_IS_LAST_CALL會返回true
}
關閉連接
數據導入Greenplum會面臨一個問題,如何判斷導入完成。因爲外部數據源中的數據是靜態的(比如已經存在於文本文件),還是動態的(比如來自某個消息隊列或腳本程序動態生成),這個是不確定的。所以需要開發者根據自己的業務邏輯進行處理。
通常如果導入函數返回0,Greenplum框架認爲導入完成。接下來調用EXTPROTOCOL_IS_LAST_CALL會返回TRUE。也可以調用 EXTPROTOCOL_SET_LAST_CALL 強制設置爲TRUE。
EXTPROTOCOL_SET_LAST_CALL(fcinfo);
(可選)validator函數
比如檢查LOCATION中URL的數量等
自定義外部協議API介紹
GP提供了一系列API用於在自定義協議中控制數據交互:/* ---- 讀/寫函數API ------*/
CALLED_AS_EXTPROTOCOL(fcinfo) /* 判斷當前函數是否被外部表管理器調用 */
EXTPROTOCOL_GET_URL(fcinfo)(fcinfo) /* 獲取外部表數據URL,初始化時需要調用該API */
EXTPROTOCOL_GET_DATABUF(fcinfo) /* 獲取和GP交互的緩衝區地址:
* 從GP導出數據時,GP會把數據填充到這個緩衝區,然後我們將其發送到自己的系統;
* 向GP導入數據時,我們要自己控制從外部系統獲取的數據,填入該緩衝區。
* 緩衝區中的數據格式應該和定義外部表時指定的格式相同,如CSV或TEXT
EXTPROTOCOL_GET_DATALEN(fcinfo) /* 上述數據的長度 */
EXTPROTOCOL_GET_SCANQUALS(fcinfo)
EXTPROTOCOL_GET_USER_CTX(fcinfo) /* 獲取用戶自定義的上下文信息 */
EXTPROTOCOL_IS_LAST_CALL(fcinfo) /* 判斷是否爲最後一次調用(數據導入/導出完成):
* 從GP往外導時,源數據往往是靜態的,GP自己知道什麼時候結束,會設置相關信息。我們在調用該API得知導出完成後,執行一些清理關閉的動作;
* 從外部系統向GP導入時,必須正確設置自定義read接口的返回值。當read接口返回0時,GP會認爲導入已經完成。此後調用EXTPROTOCOL_IS_LAST_CALL會返回true。 */
EXTPROTOCOL_SET_LAST_CALL(fcinfo) /* 可以通過這個API自己設置什麼時候結束 */
EXTPROTOCOL_SET_USER_CTX(fcinfo, p) /* 設置用戶自定義上下文信息,和 GET_USER_CTX對應。一般可用該字段存放全局的上下文信息,比如,和外部系統連接相關的信息。
* 在Demo中,如果該信息爲NULL,認爲是初次調用,執行與外部系統連接初始化動作,然後將連接上下文set進去,後續直接通過get使用。 */
/* ------ 驗證器函數API ------*/
CALLED_AS_EXTPROTOCOL_VALIDATOR(fcinfo) /* 一般在validate_urls中調用,判斷前述函數是否被外部表管理器調用 */
EXTPROTOCOL_VALIDATOR_GET_URL_LIST(fcinfo) /* 獲取外部表LOCATION URL列表 */
EXTPROTOCOL_VALIDATOR_GET_NUM_URLS(fcinfo) /* 獲取url數量 */
EXTPROTOCOL_VALIDATOR_GET_NTH_URL(fcinfo, n) /* 獲取第n個url */
編譯命令
GP源碼 contrib\extprotocol 目錄給出了Makefile 文件,但真正開發自定義協議時難免會用到自己的頭文件、lib庫,需要修改Makefile。Demo對應的編譯命令(gpadmin系統賬戶登陸執行)
cc -fpic -c gpextprotocol.c -I$GPHOME/include/postgresql/server
cc -shared -o gpextprotocol.so gpextprotocol.o
例如,筆者需要將MPP和kafka繼承,編譯中需要添加librdkafka的頭文件和lib,命令如下(靜態鏈接kafka庫)
cc -fpic -c gpextprotocol.c -I$GPHOME/include/postgresql/server -I/usr/local/include/librdkafka/
cc -shared -o gpextprotocol.so gpextprotocol.o librdkafka.a -lsasl2
PS:
GP官方文檔中有很詳細的介紹和Demo,本文根據官方文檔和實際開發經歷整理。
以下是GP 5.2版本源碼中提供的Demo
#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"
#include "access/extprotocol.h"
#include "catalog/pg_proc.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/memutils.h"
#include "catalog/pg_exttable.h"
typedef struct DemoUri
{
char *protocol;
char *path;
} DemoUri;
static DemoUri *ParseDemoUri(const char *uri_str);
static void FreeDemoUri(DemoUri* uri);
/* Do the module magic dance */
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(demoprot_export);
PG_FUNCTION_INFO_V1(demoprot_import);
PG_FUNCTION_INFO_V1(demoprot_validate_urls);
Datum demoprot_export(PG_FUNCTION_ARGS);
Datum demoprot_import(PG_FUNCTION_ARGS);
Datum demoprot_validate_urls(PG_FUNCTION_ARGS);
typedef struct {
char *url;
char *filename;
FILE *file;
} extprotocol_t;
static void check_ext_options(const FunctionCallInfo fcinfo)
{
ListCell *cell;
Relation rel = EXTPROTOCOL_GET_RELATION(fcinfo);
ExtTableEntry *exttbl = GetExtTableEntry(rel->rd_id);
List *options = exttbl->options;
foreach(cell, options) {
DefElem *def = (DefElem *) lfirst(cell);
char *key = def->defname;
if (key && strcasestr(key, "database") && !strcasestr(key, "greenplum")) {
ereport(ERROR, (0, errmsg("This is greenplum.")));
}
}
}
/*
* Import data into GPDB.
*/
Datum
demoprot_import(PG_FUNCTION_ARGS)
{
extprotocol_t *myData;
char *data;
int datlen;
size_t nread = 0;
/* Must be called via the external table format manager */
if (!CALLED_AS_EXTPROTOCOL(fcinfo))
elog(ERROR, "extprotocol_import: not called by external protocol manager");
/* Get our internal description of the protocol */
myData = (extprotocol_t *) EXTPROTOCOL_GET_USER_CTX(fcinfo);
if(EXTPROTOCOL_IS_LAST_CALL(fcinfo))
{
/* we're done receiving data. close our connection */
if(myData && myData->file)
if(fclose(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not close file \"%s\": %m",
myData->filename)));
PG_RETURN_INT32(0);
}
if (myData == NULL)
{
/* first call. do any desired init */
const char *p_name = "demoprot";
DemoUri *parsed_url;
char *url = EXTPROTOCOL_GET_URL(fcinfo);
myData = palloc(sizeof(extprotocol_t));
myData->url = pstrdup(url);
parsed_url = ParseDemoUri(myData->url);
myData->filename = pstrdup(parsed_url->path);
if(strcasecmp(parsed_url->protocol, p_name) != 0)
elog(ERROR, "internal error: demoprot called with a different protocol (%s)",
parsed_url->protocol);
/* An example of checking options */
check_ext_options(fcinfo);
FreeDemoUri(parsed_url);
/* open the destination file (or connect to remote server in other cases) */
myData->file = fopen(myData->filename, "r");
if (myData->file == NULL)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("demoprot_import: could not open file \"%s\" for reading: %m",
myData->filename)));
EXTPROTOCOL_SET_USER_CTX(fcinfo, myData);
}
/* =======================================================================
* DO THE IMPORT
* ======================================================================= */
data = EXTPROTOCOL_GET_DATABUF(fcinfo);
datlen = EXTPROTOCOL_GET_DATALEN(fcinfo);
if(datlen > 0)
{
nread = fread(data, 1, datlen, myData->file);
if (ferror(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("demoprot_import: could not write to file \"%s\": %m",
myData->filename)));
}
PG_RETURN_INT32((int)nread);
}
/*
* Export data out of GPDB.
*/
Datum
demoprot_export(PG_FUNCTION_ARGS)
{
extprotocol_t *myData;
char *data;
int datlen;
size_t wrote = 0;
/* Must be called via the external table format manager */
if (!CALLED_AS_EXTPROTOCOL(fcinfo))
elog(ERROR, "extprotocol_export: not called by external protocol manager");
/* Get our internal description of the protocol */
myData = (extprotocol_t *) EXTPROTOCOL_GET_USER_CTX(fcinfo);
if(EXTPROTOCOL_IS_LAST_CALL(fcinfo))
{
/* we're done sending data. close our connection */
if(myData && myData->file)
if(fclose(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not close file \"%s\": %m",
myData->filename)));
PG_RETURN_INT32(0);
}
if (myData == NULL)
{
/* first call. do any desired init */
const char *p_name = "demoprot";
DemoUri *parsed_url;
char *url = EXTPROTOCOL_GET_URL(fcinfo);
myData = palloc(sizeof(extprotocol_t));
myData->url = pstrdup(url);
parsed_url = ParseDemoUri(myData->url);
myData->filename = pstrdup(parsed_url->path);
if(strcasecmp(parsed_url->protocol, p_name) != 0)
elog(ERROR, "internal error: demoprot called with a different protocol (%s)",
parsed_url->protocol);
FreeDemoUri(parsed_url);
/* open the destination file (or connect to remote server in other cases) */
myData->file = fopen(myData->filename, "a");
if (myData->file == NULL)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("demoprot_export: could not open file \"%s\" for writing: %m",
myData->filename)));
EXTPROTOCOL_SET_USER_CTX(fcinfo, myData);
}
/* =======================================================================
* DO THE EXPORT
* ======================================================================= */
data = EXTPROTOCOL_GET_DATABUF(fcinfo);
datlen = EXTPROTOCOL_GET_DATALEN(fcinfo);
if(datlen > 0)
{
wrote = fwrite(data, 1, datlen, myData->file);
if (ferror(myData->file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("demoprot_import: could not read from file \"%s\": %m",
myData->filename)));
}
PG_RETURN_INT32((int)wrote);
}
Datum
demoprot_validate_urls(PG_FUNCTION_ARGS)
{
int nurls;
int i;
ValidatorDirection direction;
/* Must be called via the external table format manager */
if (!CALLED_AS_EXTPROTOCOL_VALIDATOR(fcinfo))
elog(ERROR, "demoprot_validate_urls: not called by external protocol manager");
nurls = EXTPROTOCOL_VALIDATOR_GET_NUM_URLS(fcinfo);
direction = EXTPROTOCOL_VALIDATOR_GET_DIRECTION(fcinfo);
/*
* Dumb example 1: search each url for a substring
* we don't want to be used in a url. in this example
* it's 'secured_directory'.
*/
for (i = 1 ; i <= nurls ; i++)
{
char *url = EXTPROTOCOL_VALIDATOR_GET_NTH_URL(fcinfo, i);
if (strstr(url, "secured_directory") != 0)
{
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("using 'secured_directory' in a url isn't allowed ")));
}
}
/*
* Dumb example 2: set a limit on the number of urls
* used. In this example we limit readable external
* tables that use our protocol to 2 urls max.
*/
if(direction == EXT_VALIDATE_READ && nurls > 2)
{
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("more than 2 urls aren't allowed in this protocol ")));
}
PG_RETURN_VOID();
}
/* --- utility functions --- */
static
DemoUri *ParseDemoUri(const char *uri_str)
{
DemoUri *uri = (DemoUri *) palloc0(sizeof(DemoUri));
int protocol_len;
uri->path = NULL;
uri->protocol = NULL;
/*
* parse protocol
*/
char *post_protocol = strstr(uri_str, "://");
if(!post_protocol)
{
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("invalid demo prot URI \'%s\'", uri_str)));
}
protocol_len = post_protocol - uri_str;
uri->protocol = (char *) palloc0 (protocol_len + 1);
strncpy(uri->protocol, uri_str, protocol_len);
/* make sure there is more to the uri string */
if (strlen(uri_str) <= protocol_len)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("invalid demo prot URI \'%s\' : missing path", uri_str)));
/*
* parse path
*/
uri->path = pstrdup(uri_str + protocol_len + strlen("://"));
return uri;
}
static
void FreeDemoUri(DemoUri *uri)
{
if (uri->path)
pfree(uri->path);
if (uri->protocol)
pfree(uri->protocol);
pfree(uri);
}