005-sql語句執行流程解析2-查詢邏輯優化

sql執行語句流程解析

整個處理流程在exec_simple_query函數中完成,代碼架構如下:

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//原始語法樹獲取
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//循環處理sql語句
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//對原始語法樹進行分析和重寫,生成查詢語法樹
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//對查詢語法樹進行優化,生成執行計劃
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//執行語句
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

查詢邏輯優化

分析重寫後的查詢樹不是最優化的查詢樹,當碰到select子查詢層次很深時,最低層的基表就和樹根距離較遠,這樣就會增加查詢時間。另外查詢樹中的各個節點信息是獨立的,就有可能造成冗餘查詢,所以也需要做邏輯優化。所以,查詢邏輯優化就是以數據庫理論中的關係代數爲理論基礎,以查詢樹爲載體,通過遍歷查詢樹並在保證查詢樹中的語法單元的語義和最終結果不變的情況下對其進行優化;最終得到一個沒有冗餘的查詢樹。

------代碼中在pg_plan_queries函數中實現

與邏輯優化對應的是物理優化,兩者策略完全不同:

邏輯優化:類似於冗餘和上提,下放優化策略

物理優化:基於代價的優化策略,在下文中介紹

語法優化處理基本步驟及相關代碼:

  • 工具類語法(DDL,DML)不做處理
  • 非工具類語法,使用pg_plan_query函數進行處理,pg_plan_query調用planner進行處理
List *
pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
{
	List	   *stmt_list = NIL;
	ListCell   *query_list;

	foreach(query_list, querytrees)
	{
		Query	   *query = lfirst_node(Query, query_list);
		PlannedStmt *stmt;

		if (query->commandType == CMD_UTILITY)
		{
			/* Utility commands require no planning. */
			stmt = makeNode(PlannedStmt);
			stmt->commandType = CMD_UTILITY;
			stmt->canSetTag = query->canSetTag;
			stmt->utilityStmt = query->utilityStmt;
			stmt->stmt_location = query->stmt_location;
			stmt->stmt_len = query->stmt_len;
		}
		else
		{
			stmt = pg_plan_query(query, cursorOptions, boundParams);
		}

		stmt_list = lappend(stmt_list, stmt);
	}

	return stmt_list;
}


PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *plan;

	...
	/* call the optimizer */
	plan = planner(querytree, cursorOptions, boundParams);

	...

	return plan;
}

非工具類語法處理

在planner函數中對非工具類語法進行處理,如果設置planner_hook則調用鉤子函數,默認調用standard_planner函數處理。

standard_planner函數中遞歸處理查詢樹,查詢時將結果存儲在PlannerGlobal全局結果中。再調用create_plan函數根據PlannerInfo類型創建執行計劃樹。最後將PlannerGlobal和PlannerInfo中存儲的基本信息轉存到PlannedStmt中並返回。

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *result;

	if (planner_hook)
		result = (*planner_hook) (parse, cursorOptions, boundParams);
	else
		result = standard_planner(parse, cursorOptions, boundParams);
	return result;
}

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	...
	
	/* primary planning entry point (may recurse for subqueries) */
	root = subquery_planner(glob, parse, NULL,
							false, tuple_fraction);

	/* Select best Path and turn it into a Plan */
	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

	top_plan = create_plan(root, best_path);

	...
	
	/* build the PlannedStmt result */
	result = makeNode(PlannedStmt);

	result->commandType = parse->commandType;
	...
	result->stmt_len = parse->stmt_len;

	result->jitFlags = PGJIT_NONE;
	...

	return result;
}

計劃樹生成並優化

在subquery_planner函數中生成,根據類型對查詢語句進行分類處理。計劃優化部分涉及子鏈接上提、函數處理、子查詢上提等操作。因爲計劃樹的生成和優化是按照類型分類處理並同時執行,所以這裏放在一起介紹。

處理CTE(通用表表達式)表達式

SS_process_ctes:處理查詢語句中的CTE子句(with語句),CTE是一個臨時的結果集;可以將子查詢的結果作爲一個獨立的結果集使用。所以在函數處理時遍歷ctelish列表,將其中的各個子結果集通過再調用subquery_planner函數進行處理,處理的結果存儲在root->glob->subplans鏈表中。

void
SS_process_ctes(PlannerInfo *root)
{
	ListCell   *lc;

	Assert(root->cte_plan_ids == NIL);

	foreach(lc, root->parse->cteList)
	{
		...
		
		/*
		 * Generate Paths for the CTE query.  Always plan for full retrieval
		 * --- we don't have enough info to predict otherwise.
		 */
		subroot = subquery_planner(root->glob, subquery,
								   root,
								   cte->cterecursive, 0.0);

		...

		plan = create_plan(subroot, best_path);

		...
		
		/*
		 * Add the subplan and its PlannerInfo to the global lists.
		 */
		root->glob->subplans = lappend(root->glob->subplans, plan);
		root->glob->subroots = lappend(root->glob->subroots, subroot);
		
		...
	}
}

子鏈接上提

pull_up_sublinks:將命令中的 ANY(sub-SELECT) 和 EXISTS 轉換爲 JOIN 。這樣能夠將子鏈接和父查詢進行合併,統一進行優化處理。

ANY語句轉換爲Semi-join語句,轉換隻適用於WHERE語句或者JOIN/ON語句。

EXISTS或者NOT EXISTS語句轉換爲Semi-join或者Anti-Semi-join。

基本流程介紹

子鏈接上提時,因爲WHERE相關的節點信息存儲在jointree中,所以會輸入root->parse->jointree到pull_up_sublinks_jointree_recurse函數進行上提操作。pull_up_sublinks_jointree_recurse函數中檢查jointree中存儲的類型,並按照類型分類進行處理:

  • RangeTblRef:直接返回,不做優化
  • FromExpr:fromlist中包含兩個域:基表信息(fromlist)和where條件表達式(quals);處理流程如下:
    • fromlist 列表:遞歸調用pull_up_sublinks_jointree_recurse函數
    • quals 表達式:調用pull_up_sublinks_qual_recurse函數,where條件表達式(quals)上提
  • JoinExpr:joinexpr中包含兩個域:左右基表信息(larg和rarg)和on約束條件(quals);處理流程如下:
    • 調用pull_up_sublinks_jointree_recurse函數處理左右節點
    • 根據join類型處理對應的where條件表達式(quals)上提

相關代碼如下:

static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
								  Relids *relids)
{
	if (jtnode == NULL)
	{
		*relids = NULL;
	}
	else if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;

		*relids = bms_make_singleton(varno);
		/* jtnode is returned unmodified */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		List	   *newfromlist = NIL;
		Relids		frelids = NULL;
		FromExpr   *newf;
		Node	   *jtlink;
		ListCell   *l;

		/* First, recurse to process children and collect their relids */
		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		jtnode = jtlink;
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j;
		Relids		leftrelids;
		Relids		rightrelids;
		Node	   *jtlink;

		/*
		 * Make a modifiable copy of join node, but don't bother copying its
		 * subnodes (yet).
		 */
		j = (JoinExpr *) palloc(sizeof(JoinExpr));
		memcpy(j, jtnode, sizeof(JoinExpr));
		jtlink = (Node *) j;

		/* Recurse to process children and collect their relids */
		j->larg = pull_up_sublinks_jointree_recurse(root, j->larg,
													&leftrelids);
		j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg,
													&rightrelids);

		/*
		 * Now process qual, showing appropriate child relids as available,
		 * and attach any pulled-up jointree items at the right place. In the
		 * inner-join case we put new JoinExprs above the existing one (much
		 * as for a FromExpr-style join).  In outer-join cases the new
		 * JoinExprs must go into the nullable side of the outer join. The
		 * point of the available_rels machinations is to ensure that we only
		 * pull up quals for which that's okay.
		 *
		 * We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI
		 * nodes here.
		 */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &jtlink,
														 bms_union(leftrelids,
																   rightrelids),
														 NULL, NULL);
				break;
			case JOIN_LEFT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->rarg,
														 rightrelids,
														 NULL, NULL);
				break;
			case JOIN_FULL:
				/* can't do anything with full-join quals */
				break;
			case JOIN_RIGHT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->larg,
														 leftrelids,
														 NULL, NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}

		/*
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.  But we *do* need to include the join's own rtindex
		 * because we haven't yet collapsed join alias variables, so upper
		 * levels would mistakenly think they couldn't use references to this
		 * join.
		 */
		*relids = bms_join(leftrelids, rightrelids);
		if (j->rtindex)
			*relids = bms_add_member(*relids, j->rtindex);
		jtnode = jtlink;
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

由上述介紹可以知道基本的處理流程,子鏈接的基表都會被上提到頂層的基錶鏈表中。

		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			//在這裏獲取基錶鏈表信息
			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		//這裏創建一個新節點,並把新節點中基錶鏈表賦值
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		
		//這裏給qulas賦值
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		
		//這裏返回新節點
		jtnode = jtlink;

但是最終是怎麼完成where條件表達式或者on約束條件上提的呢?下面再分析一下pull_up_sublinks_qual_recurse函數。在quals表達式中,有三種基本類型需要處理

  • sublink子鏈接語句:

    • 子鏈接類型爲ANY_SUBLINK:調用convert_ANY_sublink_to_join函數,嘗試將ANY類型轉換爲join類型。轉換成功後創建一個新的JOIN_SEMI類型的JoinExpr 節點,節點的左子節點傳入的上層JoinExpr節點被填充,右子節點在convert_ANY_sublink_to_join函數中被創建爲RangeTblRef類型。最後傳入左右子節點信息完成quals上提。
    • 子鏈接類型爲EXISTS_SUBLINK:調用convert_EXISTS_sublink_to_join函數,嘗試將EXISTS類型轉換爲join類型。轉換成功後創建一個新的未知類型的JoinExpr 節點,節點的左子節點傳入的上層JoinExpr節點被填充,右子節點在convert_EXISTS_sublink_to_join函數中根據傳入節點的子查詢節點被創建。最後傳入左右子節點信息完成quals上提。
  • NOT語句:爲子鏈接時參考EXISTS_SUBLINK處理

  • AND、OR語句:遍歷BoolExpr 節點,遞歸調用pull_up_sublinks_qual_recurse處理

具體轉換流程介紹

實際轉換流程執行,有以下限制:

  • 子鏈接的子查詢不能使用父節點的var類型變量:形成環路
  • 比較表達式中必須包含父查詢的var類型變量
  • 比較表達式中不能包含任何的虛函數(Volatile function)

var類型變量:指查詢分析和查詢優化中的基表目標列;或者表示子查詢計劃的輸出結果

convert_ANY_sublink_to_join函數介紹:

  • contain_vars_of_level:父節點環路檢查,檢查子查詢中的基表是否是父節點的基表。
  • pull_varnos:比較表達式檢查
  • contain_volatile_functions:虛函數查詢
  • addRangeTableEntryForSubquery:創建名字爲ANY_subquery的RangeTblEntry對象,添加到父查詢的基表(rtable鏈表)中
  • generate_subquery_vars:根據rtable鏈表創建var變量用來存儲子鏈接查詢結果
  • convert_testexpr:調用XXX_mutator函數處理
  • 構建JoinExpr節點,節點的larg由調用者填充
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Relids		upper_varnos;
	int			rtindex;
	RangeTblEntry *rte;
	RangeTblRef *rtr;
	List	   *subquery_vars;
	Node	   *quals;
	ParseState *pstate;

	Assert(sublink->subLinkType == ANY_SUBLINK);

	/*
	 * The sub-select must not refer to any Vars of the parent query. (Vars of
	 * higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * The test expression must contain some Vars of the parent query, else
	 * it's not gonna be a join.  (Note that it won't have Vars referring to
	 * the subquery, rather Params.)
	 */
	upper_varnos = pull_varnos(sublink->testexpr);
	if (bms_is_empty(upper_varnos))
		return NULL;

	/*
	 * However, it can't refer to anything outside available_rels.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/*
	 * The combining operators and left-hand expressions mustn't be volatile.
	 */
	if (contain_volatile_functions(sublink->testexpr))
		return NULL;

	/* Create a dummy ParseState for addRangeTableEntryForSubquery */
	pstate = make_parsestate(NULL);

	/*
	 * Okay, pull up the sub-select into upper range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true, other than the Vars that we build
	 * below). Therefore this is a lot easier than what pull_up_subqueries has
	 * to go through.
	 */
	rte = addRangeTableEntryForSubquery(pstate,
										subselect,
										makeAlias("ANY_subquery", NIL),
										false,
										false);
	parse->rtable = lappend(parse->rtable, rte);
	rtindex = list_length(parse->rtable);

	/*
	 * Form a RangeTblRef for the pulled-up sub-select.
	 */
	rtr = makeNode(RangeTblRef);
	rtr->rtindex = rtindex;

	/*
	 * Build a list of Vars representing the subselect outputs.
	 */
	subquery_vars = generate_subquery_vars(root,
										   subselect->targetList,
										   rtindex);

	/*
	 * Build the new join's qual expression, replacing Params with these Vars.
	 */
	quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	result->rarg = (Node *) rtr;
	result->usingClause = NIL;
	result->quals = quals;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */

	return result;
}
上提原理分析

什麼是半連接(SEMI-JOIN):一張表AA在另外一張表BB中找到匹配的記錄,返回第一張表AA中滿足條件的記錄,且BB表記錄不被返回。

什麼是IN語句:AA IN BB 返回AA中滿足BB條件的記錄。

由此而知:

  • 基本原理一致,所以可以將IN語句轉換爲SEMI-JOIN 半連接語句。
  • 因爲右邊的記錄不會顯示,所以上述處理中將實際查詢語句放在JoinExpr的左子節點,便於顯示。
  • 所以上述提到的‘上提’操作,只是將子鏈接中的查詢語句進行解析並轉換爲JoinExpr中節點信息的過程。由此來減少查詢動作節約時間。

子查詢優化

原代碼爲pull_up_subqueries;原代碼註釋爲Check to see if any subqueries in the jointree can be merged into this query

名稱爲子查詢上提,實際是對pull_up_sublinks子鏈接上提操作後jointree樹進行分析,嘗試是否能否再進行優化。因爲子鏈接上提操作未將子查詢中的基表添加到父查詢的基表(rtable鏈表)中。所以這裏需要檢查子查詢是否能合併到父查詢中

具體操作爲:檢查jointree樹中是否還存在別名的結果集,如果存在則替換爲對應的查詢語句的類型(RangeTblRef、FromExpr、JoinExpr)。

最終在pull_up_subqueries_recurse函數中實現上述流程;pull_up_subqueries_recurse函數介紹:函數對jointree進行解析,jointree中包含三種類型:

  • RangeTblRef:上提
    • RTE_SUBQUERY且不爲簡單表達式:由於上提後層級會發生變化,所以對索引編號、層級編號、變量參數等需要進行調整。---調整相關的見代碼,這裏不做介紹
    • RTE_SUBQUERY且爲簡單表達式:簡單查詢樹,直接上提子查詢樹
    • RTE_VALUES:上提爲RTE值
  • FromExpr:遍歷fromlist,遞歸調用pull_up_subqueries_recurse
  • JoinExpr:調用pull_up_subqueries_recurse函數處理左右節點,根據join類型修改相應的參數
static Node *
pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
						   JoinExpr *lowest_outer_join,
						   JoinExpr *lowest_nulling_outer_join,
						   AppendRelInfo *containing_appendrel)
{
	Assert(jtnode != NULL);
	if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;
		RangeTblEntry *rte = rt_fetch(varno, root->parse->rtable);

		/*
		 * Is this a subquery RTE, and if so, is the subquery simple enough to
		 * pull up?
		 *
		 * If we are looking at an append-relation member, we can't pull it up
		 * unless is_safe_append_member says so.
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_subquery(rte->subquery, rte, lowest_outer_join) &&
			(containing_appendrel == NULL ||
			 is_safe_append_member(rte->subquery)))
			return pull_up_simple_subquery(root, jtnode, rte,
										   lowest_outer_join,
										   lowest_nulling_outer_join,
										   containing_appendrel);

		/*
		 * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
		 * into an "append relation".
		 *
		 * It's safe to do this regardless of whether this query is itself an
		 * appendrel member.  (If you're thinking we should try to flatten the
		 * two levels of appendrel together, you're right; but we handle that
		 * in set_append_rel_pathlist, not here.)
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_union_all(rte->subquery))
			return pull_up_simple_union_all(root, jtnode, rte);

		/*
		 * Or perhaps it's a simple VALUES RTE?
		 *
		 * We don't allow VALUES pullup below an outer join nor into an
		 * appendrel (such cases are impossible anyway at the moment).
		 */
		if (rte->rtekind == RTE_VALUES &&
			lowest_outer_join == NULL &&
			containing_appendrel == NULL &&
			is_simple_values(root, rte))
			return pull_up_simple_values(root, jtnode, rte);

		/* Otherwise, do nothing at this node. */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		ListCell   *l;

		Assert(containing_appendrel == NULL);
		/* Recursively transform all the child nodes */
		foreach(l, f->fromlist)
		{
			lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
												   lowest_outer_join,
												   lowest_nulling_outer_join,
												   NULL);
		}
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j = (JoinExpr *) jtnode;

		Assert(containing_appendrel == NULL);
		/* Recurse, being careful to tell myself when inside outer join */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			case JOIN_LEFT:
			case JOIN_SEMI:
			case JOIN_ANTI:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_FULL:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_RIGHT:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

UNION ALL語句處理

/*
	 * If this is a simple UNION ALL query, flatten it into an appendrel. We
	 * do this now because it requires applying pull_up_subqueries to the leaf
	 * queries of the UNION ALL, which weren't touched above because they
	 * weren't referenced by the jointree (they will be after we do this).
	 */
	 if (parse->setOperations)
		flatten_simple_union_all(root);

RowMark處理

/*
	 * Preprocess RowMark information.  We need to do this after subquery
	 * pullup, so that all base relations are present.
	 */
	preprocess_rowmarks(root);

表達式優化處理

目標列處理、withCheckOptions處理、RETURN表達式處理、HAVING語句處理、WINDOWS語句處理、LIMIT OFF語句處理

都調用preprocess_expression函數進行處理。

處理流程

  • flatten_join_alias_vars:拉平鏈接中的變量別名
  • eval_const_expressions:常量表達式預處理
  • canonicalize_qual:對quals中的條件表達式進行正則化處理
  • SS_process_sublinks:子鏈接轉換爲子查詢計劃
  • SS_replace_correlation_vars:處理Param節點中的變量
  • make_ands_implicit:將quals或havingqual轉換爲隱式AND格式
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{
	/*
	 * Fall out quickly if expression is empty.  This occurs often enough to
	 * be worth checking.  Note that null->null is the correct conversion for
	 * implicit-AND result format, too.
	 */
	if (expr == NULL)
		return NULL;

	/*
	 * If the query has any join RTEs, replace join alias variables with
	 * base-relation variables.  We must do this first, since any expressions
	 * we may extract from the joinaliasvars lists have not been preprocessed.
	 * For example, if we did this after sublink processing, sublinks expanded
	 * out from join aliases would not get processed.  But we can skip this in
	 * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since
	 * they can't contain any Vars of the current query level.
	 */
	if (root->hasJoinRTEs &&
		!(kind == EXPRKIND_RTFUNC ||
		  kind == EXPRKIND_VALUES ||
		  kind == EXPRKIND_TABLESAMPLE ||
		  kind == EXPRKIND_TABLEFUNC))
		expr = flatten_join_alias_vars(root->parse, expr);

	/*
	 * Simplify constant expressions.
	 *
	 * Note: an essential effect of this is to convert named-argument function
	 * calls to positional notation and insert the current actual values of
	 * any default arguments for functions.  To ensure that happens, we *must*
	 * process all expressions here.  Previous PG versions sometimes skipped
	 * const-simplification if it didn't seem worth the trouble, but we can't
	 * do that anymore.
	 *
	 * Note: this also flattens nested AND and OR expressions into N-argument
	 * form.  All processing of a qual expression after this point must be
	 * careful to maintain AND/OR flatness --- that is, do not generate a tree
	 * with AND directly under AND, nor OR directly under OR.
	 */
	expr = eval_const_expressions(root, expr);

	/*
	 * If it's a qual or havingQual, canonicalize it.
	 */
	if (kind == EXPRKIND_QUAL)
	{
		expr = (Node *) canonicalize_qual((Expr *) expr, false);

#ifdef OPTIMIZER_DEBUG
		printf("After canonicalize_qual()\n");
		pprint(expr);
#endif
	}

	/* Expand SubLinks to SubPlans */
	if (root->parse->hasSubLinks)
		expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));

	/*
	 * XXX do not insert anything here unless you have grokked the comments in
	 * SS_replace_correlation_vars ...
	 */

	/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
	if (root->query_level > 1)
		expr = SS_replace_correlation_vars(root, expr);

	/*
	 * If it's a qual or havingQual, convert it to implicit-AND format. (We
	 * don't want to do this before eval_const_expressions, since the latter
	 * would be unable to simplify a top-level AND correctly. Also,
	 * SS_process_sublinks expects explicit-AND format.)
	 */
	if (kind == EXPRKIND_QUAL)
		expr = (Node *) make_ands_implicit((Expr *) expr);

	return expr;
}
各個類型分類處理

上述流程的多個類型轉換環節最終都會調用 XXX_XXX_mutator函數。XXX_XXX_mutator函數根據各個類型來實現分類轉換。

這裏主要介紹一下SS_process_sublinks流程中調用的process_sublinks_mutator函數,因爲子鏈接中的節點爲不確定類型,所以再函數調用時也會根據類型實行分類處理,當類型都不滿足時,調用expression_tree_mutator函數進行處理。

  • SubLink子鏈接類型:對sublink->testexpr再調用process_sublinks_mutator函數進行解析獲取testexpr節點,再調用make_subplan函數解析testexpr節點並創建一個子計劃節點。
  • AND和OR類型:遍歷bool中的節點再調用process_sublinks_mutator函數進行解析,將解析結果存儲再新的list中,最後創建expr節點存儲list和對應的AND、OR類型信息並返回。

相關代碼如下:

static Node *
process_sublinks_mutator(Node *node, process_sublinks_context *context)
{
	process_sublinks_context locContext;

	locContext.root = context->root;

	if (node == NULL)
		return NULL;
	if (IsA(node, SubLink))
	{
		。。。
		/*
		 * Now build the SubPlan node and make the expr to return.
		 */
		return make_subplan(context->root,
							(Query *) sublink->subselect,
							sublink->subLinkType,
							sublink->subLinkId,
							testexpr,
							context->isTopQual);
	}

	/*
	 * Don't recurse into the arguments of an outer PHV or aggregate here. Any
	 * SubLinks in the arguments have to be dealt with at the outer query
	 * level; they'll be handled when build_subplan collects the PHV or Aggref
	 * into the arguments to be passed down to the current subplan.
	 */
	if (IsA(node, PlaceHolderVar))
	{
		if (((PlaceHolderVar *) node)->phlevelsup > 0)
			return node;
	}
	else if (IsA(node, Aggref))
	{
		if (((Aggref *) node)->agglevelsup > 0)
			return node;
	}

	/*
	 * We should never see a SubPlan expression in the input (since this is
	 * the very routine that creates 'em to begin with).  We shouldn't find
	 * ourselves invoked directly on a Query, either.
	 */
	Assert(!IsA(node, SubPlan));
	Assert(!IsA(node, AlternativeSubPlan));
	Assert(!IsA(node, Query));

	/*
	 * Because make_subplan() could return an AND or OR clause, we have to
	 * take steps to preserve AND/OR flatness of a qual.  We assume the input
	 * has been AND/OR flattened and so we need no recursion here.
	 *
	 * (Due to the coding here, we will not get called on the List subnodes of
	 * an AND; and the input is *not* yet in implicit-AND format.  So no check
	 * is needed for a bare List.)
	 *
	 * Anywhere within the top-level AND/OR clause structure, we can tell
	 * make_subplan() that NULL and FALSE are interchangeable.  So isTopQual
	 * propagates down in both cases.  (Note that this is unlike the meaning
	 * of "top level qual" used in most other places in Postgres.)
	 */
	if (is_andclause(node))
	{
		...
		return (Node *) make_andclause(newargs);
	}

	if (is_orclause(node))
	{
		...
		return (Node *) make_orclause(newargs);
	}

	/*
	 * If we recurse down through anything other than an AND or OR node, we
	 * are definitely not at top qual level anymore.
	 */
	locContext.isTopQual = false;

	return expression_tree_mutator(node,
								   process_sublinks_mutator,
								   (void *) &locContext);
}
創建子查詢計劃

make_subplan函數流程如下:

  • tuple_fraction值設置:0-1表示記錄查詢的比例個數,比例根據EXISTS_SUBLINK、ALL_SUBLINK來制定。(因爲ANY、EXISTS、ANY功能不一致)。
  • 調用subquery_planner函數執行子鏈接的查詢樹優化,和完整的查詢樹優化處理一致。
  • create_plan、build_subplan創建計劃。

完成子計劃創建後,返回。

條件語句中的表達式優化處理

調用preprocess_qual_conditions函數遍歷jointree節點,依據節點基礎類型查找qual節點,並調用preprocess_expression函數對qual節點進行處理,:

  • RangeTblRef:什麼都不做
  • FromExpr:遍歷,遞歸調用preprocess_qual_conditions函數,再調用preprocess_expression處理各個節點中的qual節點
  • JoinExpr:對左右子節點調用preprocess_qual_conditions函數,再調用preprocess_expression處理節點中的qual節點

消除外連接

reduce_outer_joins

/*
 * reduce_outer_joins
 *		Attempt to reduce outer joins to plain inner joins.
 *
 * The idea here is that given a query like
 *		SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
 * we can reduce the LEFT JOIN to a plain JOIN if the "=" operator in WHERE
 * is strict.  The strict operator will always return NULL, causing the outer
 * WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b's
 * columns.  Therefore, there's no need for the join to produce null-extended
 * rows in the first place --- which makes it a plain join not an outer join.
 * (This scenario may not be very likely in a query written out by hand, but
 * it's reasonably likely when pushing quals down into complex views.)
 *
 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)
 *
 * Another transformation we apply here is to recognize cases like
 *		SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
 * If the join clause is strict for b.y, then only null-extended rows could
 * pass the upper WHERE, and we can conclude that what the query is really
 * specifying is an anti-semijoin.  We change the join type from JOIN_LEFT
 * to JOIN_ANTI.  The IS NULL clause then becomes redundant, and must be
 * removed to prevent bogus selectivity calculations, but we leave it to
 * distribute_qual_to_rels to get rid of such clauses.
 *
 * Also, we get rid of JOIN_RIGHT cases by flipping them around to become
 * JOIN_LEFT.  This saves some code here and in some later planner routines,
 * but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
 * join type.
 *
 * To ease recognition of strict qual clauses, we require this routine to be
 * run after expression preprocessing (i.e., qual canonicalization and JOIN
 * alias-var expansion).
 */
void
reduce_outer_joins(PlannerInfo *root)
{
	reduce_outer_joins_state *state;

	/*
	 * To avoid doing strictness checks on more quals than necessary, we want
	 * to stop descending the jointree as soon as there are no outer joins
	 * below our current point.  This consideration forces a two-pass process.
	 * The first pass gathers information about which base rels appear below
	 * each side of each join clause, and about whether there are outer
	 * join(s) below each side of each join clause. The second pass examines
	 * qual clauses and changes join types as it descends the tree.
	 */
	state = reduce_outer_joins_pass1((Node *) root->parse->jointree);

	/* planner.c shouldn't have called me if no outer joins */
	if (state == NULL || !state->contains_outer)
		elog(ERROR, "so where are the outer joins?");

	reduce_outer_joins_pass2((Node *) root->parse->jointree,
							 state, root, NULL, NIL, NIL);
}

生成查詢計劃

grouping_planner函數 ,首先處理LIMIT、ORDER BY、GROUP BY語句,然後根據setOperations(集合操作語句)值判斷是否爲UNION/INTERSECT/EXCEPT語句還是普通語句:

  • 處理LIMIT語句
  • 處理UNION/INTERSECT/EXCEPT語句,調用plan_set_operations函數處理,內部進行分類處理
    • union遞歸處理 generate_recursion_path:對左右子句進行處理後再合併
    • 非遞歸處理recurse_set_operations:按照基本流程就行處理
  • 處理普通語句,按照基本流程就行處理

static void grouping_planner(PlannerInfo *root, bool inheritance_update, double tuple_fraction) { Query *parse = root->parse; int64 offset_est = 0; int64 count_est = 0; double limit_tuples = -1.0; bool have_postponed_srfs = false; PathTarget *final_target; List *final_targets; List *final_targets_contain_srfs; bool final_target_parallel_safe; RelOptInfo *current_rel; RelOptInfo *final_rel; FinalPathExtraData extra; ListCell *lc;

static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
				 double tuple_fraction)
{
	...
	
	//處理LIMIT語句
	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
	if (parse->limitCount || parse->limitOffset)
	{
		tuple_fraction = preprocess_limit(root, tuple_fraction,
										  &offset_est, &count_est);

		/*
		 * If we have a known LIMIT, and don't have an unknown OFFSET, we can
		 * estimate the effects of using a bounded sort.
		 */
		if (count_est > 0 && offset_est >= 0)
			limit_tuples = (double) count_est + (double) offset_est;
	}

	/* Make tuple_fraction accessible to lower-level routines */
	root->tuple_fraction = tuple_fraction;

	if (parse->setOperations)
	{
		...
		
		//處理UNION/INTERSECT/EXCEPT語句
		/*
		 * Construct Paths for set operations.  The results will not need any
		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
		 * special work for recursive unions is the responsibility of
		 * plan_set_operations.
		 */
		current_rel = plan_set_operations(root);

		...
	}
	else
	{
		//普通語句處理
		...
	}
	...
}
普通語句處理流程

除開特殊語句,其他的語句都執行普通執行流程:

  • preprocess_groupclause處理分組語句:將GROUP BY 後的元素重新排列順序,調整的順序按照ORDER BY調整。便於後續利用索引快速完成ORDER BY和GROUP BY操作。
  • preprocess_targetlist處理目標列語句:沒看懂怎麼處理的
  • get_agg_clause_costs收集聚集函數使用的成本:
  • select_active_windows執行windows函數:
  • query_planner創建查詢訪問路徑:因爲該部分比較重要,所以單獨講一下
創建查詢訪問路徑

注意:閱讀代碼前需要對查詢引擎原理進行了解,不然不知道爲什麼這麼做。

query_planner函數處理時分爲普通語句和fromlist鏈表長度爲1的語句("SELECT expression" and "INSERT ... VALUES()")--該類型調用函數直接處理並返回結果。重點講一下普通語句的處理流程(普通查詢會有三個要素:數據源、輸出結果、查詢條件,下面依次進行填充):

  • setup_simple_rel_arrays收集基表信息:從root->parse->rtable表設置root->append_rel_array表。

  • add_base_rels_to_query構建RelOptInfo數組(基表信息)(設置數據源):根據jointree類型創建RelOptInfo數組,將RelOptInfo數據存放再root->simple_rel_array中。簡單來說就是創建基表的數組,再填充基表中的數據源。

    • build_simple_rel設置RelOptInfo參數填充輸出鏈表targetlist和查詢條件quals
  • build_base_rel_tlists設置目標列(設置輸出結果):設置查詢語句的輸出結果,遍歷target list(這裏入參爲root->processed_tlist),查找出所有的Var類型節點並添加到Var所屬基表的RelOptInfo的reltargetlist中(已存在則不做重複添加)。簡單來說就是填充基表的輸出結果;將列名和數據源綁定:(select a1 , b1 from aa, bb where aa.a1 = bb.b1;------將a1和aa的關係綁定起來)

    • pull_var_clause:調用pull_var_clause_walker函數查詢所用的Var變量
    • add_vars_to_targetlist:將Var變量添加到root->simple_rel_array中
  • deconstruct_jointree設置約束條件(設置查詢條件):調用deconstruct_recurse函數按照類型進行處理:

    • RangeTblRef:直接返回,不處理
    • FromExpr:查找FromExpr中所有的基表Relids信息,然後和基表相關的約束條件綁定到基表RelOptInfo中。調用distribute_qual_to_rels函數執行綁定操作,綁定函數中涉及較複雜流程,後續介紹。
    • JoinExpr:按照join的類型分類進行查找,查找JoinExpr中左右子節點中所有的基表Relids信息,然後和基表相關的約束條件綁定到基表RelOptInfo中。最後調用make_outerjoininfo函數做連接順序處理
  • reconsider_outer_join_clauses處理外連接:

  • generate_base_implied_equalities創建約束條件 :將clause中所有的約束添加進行優化,並將clause中約束條件與基表信息RelOptInfo進行綁定。

  • (*qp_callback) (root, qp_extra)-----standard_qp_callback回調:處理排序的pathkey

  • create_lateral_join_info構建lateraljoin信息:子查詢中的處理,略

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章