005-sql语句执行流程解析2-查询逻辑优化

sql执行语句流程解析

整个处理流程在exec_simple_query函数中完成,代码架构如下:

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//原始语法树获取
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//循环处理sql语句
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//对原始语法树进行分析和重写,生成查询语法树
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//对查询语法树进行优化,生成执行计划
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//执行语句
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

查询逻辑优化

分析重写后的查询树不是最优化的查询树,当碰到select子查询层次很深时,最低层的基表就和树根距离较远,这样就会增加查询时间。另外查询树中的各个节点信息是独立的,就有可能造成冗余查询,所以也需要做逻辑优化。所以,查询逻辑优化就是以数据库理论中的关系代数为理论基础,以查询树为载体,通过遍历查询树并在保证查询树中的语法单元的语义和最终结果不变的情况下对其进行优化;最终得到一个没有冗余的查询树。

------代码中在pg_plan_queries函数中实现

与逻辑优化对应的是物理优化,两者策略完全不同:

逻辑优化:类似于冗余和上提,下放优化策略

物理优化:基于代价的优化策略,在下文中介绍

语法优化处理基本步骤及相关代码:

  • 工具类语法(DDL,DML)不做处理
  • 非工具类语法,使用pg_plan_query函数进行处理,pg_plan_query调用planner进行处理
List *
pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
{
	List	   *stmt_list = NIL;
	ListCell   *query_list;

	foreach(query_list, querytrees)
	{
		Query	   *query = lfirst_node(Query, query_list);
		PlannedStmt *stmt;

		if (query->commandType == CMD_UTILITY)
		{
			/* Utility commands require no planning. */
			stmt = makeNode(PlannedStmt);
			stmt->commandType = CMD_UTILITY;
			stmt->canSetTag = query->canSetTag;
			stmt->utilityStmt = query->utilityStmt;
			stmt->stmt_location = query->stmt_location;
			stmt->stmt_len = query->stmt_len;
		}
		else
		{
			stmt = pg_plan_query(query, cursorOptions, boundParams);
		}

		stmt_list = lappend(stmt_list, stmt);
	}

	return stmt_list;
}


PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *plan;

	...
	/* call the optimizer */
	plan = planner(querytree, cursorOptions, boundParams);

	...

	return plan;
}

非工具类语法处理

在planner函数中对非工具类语法进行处理,如果设置planner_hook则调用钩子函数,默认调用standard_planner函数处理。

standard_planner函数中递归处理查询树,查询时将结果存储在PlannerGlobal全局结果中。再调用create_plan函数根据PlannerInfo类型创建执行计划树。最后将PlannerGlobal和PlannerInfo中存储的基本信息转存到PlannedStmt中并返回。

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *result;

	if (planner_hook)
		result = (*planner_hook) (parse, cursorOptions, boundParams);
	else
		result = standard_planner(parse, cursorOptions, boundParams);
	return result;
}

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	...
	
	/* primary planning entry point (may recurse for subqueries) */
	root = subquery_planner(glob, parse, NULL,
							false, tuple_fraction);

	/* Select best Path and turn it into a Plan */
	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

	top_plan = create_plan(root, best_path);

	...
	
	/* build the PlannedStmt result */
	result = makeNode(PlannedStmt);

	result->commandType = parse->commandType;
	...
	result->stmt_len = parse->stmt_len;

	result->jitFlags = PGJIT_NONE;
	...

	return result;
}

计划树生成并优化

在subquery_planner函数中生成,根据类型对查询语句进行分类处理。计划优化部分涉及子链接上提、函数处理、子查询上提等操作。因为计划树的生成和优化是按照类型分类处理并同时执行,所以这里放在一起介绍。

处理CTE(通用表表达式)表达式

SS_process_ctes:处理查询语句中的CTE子句(with语句),CTE是一个临时的结果集;可以将子查询的结果作为一个独立的结果集使用。所以在函数处理时遍历ctelish列表,将其中的各个子结果集通过再调用subquery_planner函数进行处理,处理的结果存储在root->glob->subplans链表中。

void
SS_process_ctes(PlannerInfo *root)
{
	ListCell   *lc;

	Assert(root->cte_plan_ids == NIL);

	foreach(lc, root->parse->cteList)
	{
		...
		
		/*
		 * Generate Paths for the CTE query.  Always plan for full retrieval
		 * --- we don't have enough info to predict otherwise.
		 */
		subroot = subquery_planner(root->glob, subquery,
								   root,
								   cte->cterecursive, 0.0);

		...

		plan = create_plan(subroot, best_path);

		...
		
		/*
		 * Add the subplan and its PlannerInfo to the global lists.
		 */
		root->glob->subplans = lappend(root->glob->subplans, plan);
		root->glob->subroots = lappend(root->glob->subroots, subroot);
		
		...
	}
}

子链接上提

pull_up_sublinks:将命令中的 ANY(sub-SELECT) 和 EXISTS 转换为 JOIN 。这样能够将子链接和父查询进行合并,统一进行优化处理。

ANY语句转换为Semi-join语句,转换只适用于WHERE语句或者JOIN/ON语句。

EXISTS或者NOT EXISTS语句转换为Semi-join或者Anti-Semi-join。

基本流程介绍

子链接上提时,因为WHERE相关的节点信息存储在jointree中,所以会输入root->parse->jointree到pull_up_sublinks_jointree_recurse函数进行上提操作。pull_up_sublinks_jointree_recurse函数中检查jointree中存储的类型,并按照类型分类进行处理:

  • RangeTblRef:直接返回,不做优化
  • FromExpr:fromlist中包含两个域:基表信息(fromlist)和where条件表达式(quals);处理流程如下:
    • fromlist 列表:递归调用pull_up_sublinks_jointree_recurse函数
    • quals 表达式:调用pull_up_sublinks_qual_recurse函数,where条件表达式(quals)上提
  • JoinExpr:joinexpr中包含两个域:左右基表信息(larg和rarg)和on约束条件(quals);处理流程如下:
    • 调用pull_up_sublinks_jointree_recurse函数处理左右节点
    • 根据join类型处理对应的where条件表达式(quals)上提

相关代码如下:

static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
								  Relids *relids)
{
	if (jtnode == NULL)
	{
		*relids = NULL;
	}
	else if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;

		*relids = bms_make_singleton(varno);
		/* jtnode is returned unmodified */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		List	   *newfromlist = NIL;
		Relids		frelids = NULL;
		FromExpr   *newf;
		Node	   *jtlink;
		ListCell   *l;

		/* First, recurse to process children and collect their relids */
		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		jtnode = jtlink;
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j;
		Relids		leftrelids;
		Relids		rightrelids;
		Node	   *jtlink;

		/*
		 * Make a modifiable copy of join node, but don't bother copying its
		 * subnodes (yet).
		 */
		j = (JoinExpr *) palloc(sizeof(JoinExpr));
		memcpy(j, jtnode, sizeof(JoinExpr));
		jtlink = (Node *) j;

		/* Recurse to process children and collect their relids */
		j->larg = pull_up_sublinks_jointree_recurse(root, j->larg,
													&leftrelids);
		j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg,
													&rightrelids);

		/*
		 * Now process qual, showing appropriate child relids as available,
		 * and attach any pulled-up jointree items at the right place. In the
		 * inner-join case we put new JoinExprs above the existing one (much
		 * as for a FromExpr-style join).  In outer-join cases the new
		 * JoinExprs must go into the nullable side of the outer join. The
		 * point of the available_rels machinations is to ensure that we only
		 * pull up quals for which that's okay.
		 *
		 * We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI
		 * nodes here.
		 */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &jtlink,
														 bms_union(leftrelids,
																   rightrelids),
														 NULL, NULL);
				break;
			case JOIN_LEFT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->rarg,
														 rightrelids,
														 NULL, NULL);
				break;
			case JOIN_FULL:
				/* can't do anything with full-join quals */
				break;
			case JOIN_RIGHT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->larg,
														 leftrelids,
														 NULL, NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}

		/*
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.  But we *do* need to include the join's own rtindex
		 * because we haven't yet collapsed join alias variables, so upper
		 * levels would mistakenly think they couldn't use references to this
		 * join.
		 */
		*relids = bms_join(leftrelids, rightrelids);
		if (j->rtindex)
			*relids = bms_add_member(*relids, j->rtindex);
		jtnode = jtlink;
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

由上述介绍可以知道基本的处理流程,子链接的基表都会被上提到顶层的基表链表中。

		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			//在这里获取基表链表信息
			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		//这里创建一个新节点,并把新节点中基表链表赋值
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		
		//这里给qulas赋值
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		
		//这里返回新节点
		jtnode = jtlink;

但是最终是怎么完成where条件表达式或者on约束条件上提的呢?下面再分析一下pull_up_sublinks_qual_recurse函数。在quals表达式中,有三种基本类型需要处理

  • sublink子链接语句:

    • 子链接类型为ANY_SUBLINK:调用convert_ANY_sublink_to_join函数,尝试将ANY类型转换为join类型。转换成功后创建一个新的JOIN_SEMI类型的JoinExpr 节点,节点的左子节点传入的上层JoinExpr节点被填充,右子节点在convert_ANY_sublink_to_join函数中被创建为RangeTblRef类型。最后传入左右子节点信息完成quals上提。
    • 子链接类型为EXISTS_SUBLINK:调用convert_EXISTS_sublink_to_join函数,尝试将EXISTS类型转换为join类型。转换成功后创建一个新的未知类型的JoinExpr 节点,节点的左子节点传入的上层JoinExpr节点被填充,右子节点在convert_EXISTS_sublink_to_join函数中根据传入节点的子查询节点被创建。最后传入左右子节点信息完成quals上提。
  • NOT语句:为子链接时参考EXISTS_SUBLINK处理

  • AND、OR语句:遍历BoolExpr 节点,递归调用pull_up_sublinks_qual_recurse处理

具体转换流程介绍

实际转换流程执行,有以下限制:

  • 子链接的子查询不能使用父节点的var类型变量:形成环路
  • 比较表达式中必须包含父查询的var类型变量
  • 比较表达式中不能包含任何的虚函数(Volatile function)

var类型变量:指查询分析和查询优化中的基表目标列;或者表示子查询计划的输出结果

convert_ANY_sublink_to_join函数介绍:

  • contain_vars_of_level:父节点环路检查,检查子查询中的基表是否是父节点的基表。
  • pull_varnos:比较表达式检查
  • contain_volatile_functions:虚函数查询
  • addRangeTableEntryForSubquery:创建名字为ANY_subquery的RangeTblEntry对象,添加到父查询的基表(rtable链表)中
  • generate_subquery_vars:根据rtable链表创建var变量用来存储子链接查询结果
  • convert_testexpr:调用XXX_mutator函数处理
  • 构建JoinExpr节点,节点的larg由调用者填充
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Relids		upper_varnos;
	int			rtindex;
	RangeTblEntry *rte;
	RangeTblRef *rtr;
	List	   *subquery_vars;
	Node	   *quals;
	ParseState *pstate;

	Assert(sublink->subLinkType == ANY_SUBLINK);

	/*
	 * The sub-select must not refer to any Vars of the parent query. (Vars of
	 * higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * The test expression must contain some Vars of the parent query, else
	 * it's not gonna be a join.  (Note that it won't have Vars referring to
	 * the subquery, rather Params.)
	 */
	upper_varnos = pull_varnos(sublink->testexpr);
	if (bms_is_empty(upper_varnos))
		return NULL;

	/*
	 * However, it can't refer to anything outside available_rels.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/*
	 * The combining operators and left-hand expressions mustn't be volatile.
	 */
	if (contain_volatile_functions(sublink->testexpr))
		return NULL;

	/* Create a dummy ParseState for addRangeTableEntryForSubquery */
	pstate = make_parsestate(NULL);

	/*
	 * Okay, pull up the sub-select into upper range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true, other than the Vars that we build
	 * below). Therefore this is a lot easier than what pull_up_subqueries has
	 * to go through.
	 */
	rte = addRangeTableEntryForSubquery(pstate,
										subselect,
										makeAlias("ANY_subquery", NIL),
										false,
										false);
	parse->rtable = lappend(parse->rtable, rte);
	rtindex = list_length(parse->rtable);

	/*
	 * Form a RangeTblRef for the pulled-up sub-select.
	 */
	rtr = makeNode(RangeTblRef);
	rtr->rtindex = rtindex;

	/*
	 * Build a list of Vars representing the subselect outputs.
	 */
	subquery_vars = generate_subquery_vars(root,
										   subselect->targetList,
										   rtindex);

	/*
	 * Build the new join's qual expression, replacing Params with these Vars.
	 */
	quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	result->rarg = (Node *) rtr;
	result->usingClause = NIL;
	result->quals = quals;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */

	return result;
}
上提原理分析

什么是半连接(SEMI-JOIN):一张表AA在另外一张表BB中找到匹配的记录,返回第一张表AA中满足条件的记录,且BB表记录不被返回。

什么是IN语句:AA IN BB 返回AA中满足BB条件的记录。

由此而知:

  • 基本原理一致,所以可以将IN语句转换为SEMI-JOIN 半连接语句。
  • 因为右边的记录不会显示,所以上述处理中将实际查询语句放在JoinExpr的左子节点,便于显示。
  • 所以上述提到的‘上提’操作,只是将子链接中的查询语句进行解析并转换为JoinExpr中节点信息的过程。由此来减少查询动作节约时间。

子查询优化

原代码为pull_up_subqueries;原代码注释为Check to see if any subqueries in the jointree can be merged into this query

名称为子查询上提,实际是对pull_up_sublinks子链接上提操作后jointree树进行分析,尝试是否能否再进行优化。因为子链接上提操作未将子查询中的基表添加到父查询的基表(rtable链表)中。所以这里需要检查子查询是否能合并到父查询中

具体操作为:检查jointree树中是否还存在别名的结果集,如果存在则替换为对应的查询语句的类型(RangeTblRef、FromExpr、JoinExpr)。

最终在pull_up_subqueries_recurse函数中实现上述流程;pull_up_subqueries_recurse函数介绍:函数对jointree进行解析,jointree中包含三种类型:

  • RangeTblRef:上提
    • RTE_SUBQUERY且不为简单表达式:由于上提后层级会发生变化,所以对索引编号、层级编号、变量参数等需要进行调整。---调整相关的见代码,这里不做介绍
    • RTE_SUBQUERY且为简单表达式:简单查询树,直接上提子查询树
    • RTE_VALUES:上提为RTE值
  • FromExpr:遍历fromlist,递归调用pull_up_subqueries_recurse
  • JoinExpr:调用pull_up_subqueries_recurse函数处理左右节点,根据join类型修改相应的参数
static Node *
pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
						   JoinExpr *lowest_outer_join,
						   JoinExpr *lowest_nulling_outer_join,
						   AppendRelInfo *containing_appendrel)
{
	Assert(jtnode != NULL);
	if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;
		RangeTblEntry *rte = rt_fetch(varno, root->parse->rtable);

		/*
		 * Is this a subquery RTE, and if so, is the subquery simple enough to
		 * pull up?
		 *
		 * If we are looking at an append-relation member, we can't pull it up
		 * unless is_safe_append_member says so.
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_subquery(rte->subquery, rte, lowest_outer_join) &&
			(containing_appendrel == NULL ||
			 is_safe_append_member(rte->subquery)))
			return pull_up_simple_subquery(root, jtnode, rte,
										   lowest_outer_join,
										   lowest_nulling_outer_join,
										   containing_appendrel);

		/*
		 * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
		 * into an "append relation".
		 *
		 * It's safe to do this regardless of whether this query is itself an
		 * appendrel member.  (If you're thinking we should try to flatten the
		 * two levels of appendrel together, you're right; but we handle that
		 * in set_append_rel_pathlist, not here.)
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_union_all(rte->subquery))
			return pull_up_simple_union_all(root, jtnode, rte);

		/*
		 * Or perhaps it's a simple VALUES RTE?
		 *
		 * We don't allow VALUES pullup below an outer join nor into an
		 * appendrel (such cases are impossible anyway at the moment).
		 */
		if (rte->rtekind == RTE_VALUES &&
			lowest_outer_join == NULL &&
			containing_appendrel == NULL &&
			is_simple_values(root, rte))
			return pull_up_simple_values(root, jtnode, rte);

		/* Otherwise, do nothing at this node. */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		ListCell   *l;

		Assert(containing_appendrel == NULL);
		/* Recursively transform all the child nodes */
		foreach(l, f->fromlist)
		{
			lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
												   lowest_outer_join,
												   lowest_nulling_outer_join,
												   NULL);
		}
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j = (JoinExpr *) jtnode;

		Assert(containing_appendrel == NULL);
		/* Recurse, being careful to tell myself when inside outer join */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			case JOIN_LEFT:
			case JOIN_SEMI:
			case JOIN_ANTI:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_FULL:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_RIGHT:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

UNION ALL语句处理

/*
	 * If this is a simple UNION ALL query, flatten it into an appendrel. We
	 * do this now because it requires applying pull_up_subqueries to the leaf
	 * queries of the UNION ALL, which weren't touched above because they
	 * weren't referenced by the jointree (they will be after we do this).
	 */
	 if (parse->setOperations)
		flatten_simple_union_all(root);

RowMark处理

/*
	 * Preprocess RowMark information.  We need to do this after subquery
	 * pullup, so that all base relations are present.
	 */
	preprocess_rowmarks(root);

表达式优化处理

目标列处理、withCheckOptions处理、RETURN表达式处理、HAVING语句处理、WINDOWS语句处理、LIMIT OFF语句处理

都调用preprocess_expression函数进行处理。

处理流程

  • flatten_join_alias_vars:拉平链接中的变量别名
  • eval_const_expressions:常量表达式预处理
  • canonicalize_qual:对quals中的条件表达式进行正则化处理
  • SS_process_sublinks:子链接转换为子查询计划
  • SS_replace_correlation_vars:处理Param节点中的变量
  • make_ands_implicit:将quals或havingqual转换为隐式AND格式
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{
	/*
	 * Fall out quickly if expression is empty.  This occurs often enough to
	 * be worth checking.  Note that null->null is the correct conversion for
	 * implicit-AND result format, too.
	 */
	if (expr == NULL)
		return NULL;

	/*
	 * If the query has any join RTEs, replace join alias variables with
	 * base-relation variables.  We must do this first, since any expressions
	 * we may extract from the joinaliasvars lists have not been preprocessed.
	 * For example, if we did this after sublink processing, sublinks expanded
	 * out from join aliases would not get processed.  But we can skip this in
	 * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since
	 * they can't contain any Vars of the current query level.
	 */
	if (root->hasJoinRTEs &&
		!(kind == EXPRKIND_RTFUNC ||
		  kind == EXPRKIND_VALUES ||
		  kind == EXPRKIND_TABLESAMPLE ||
		  kind == EXPRKIND_TABLEFUNC))
		expr = flatten_join_alias_vars(root->parse, expr);

	/*
	 * Simplify constant expressions.
	 *
	 * Note: an essential effect of this is to convert named-argument function
	 * calls to positional notation and insert the current actual values of
	 * any default arguments for functions.  To ensure that happens, we *must*
	 * process all expressions here.  Previous PG versions sometimes skipped
	 * const-simplification if it didn't seem worth the trouble, but we can't
	 * do that anymore.
	 *
	 * Note: this also flattens nested AND and OR expressions into N-argument
	 * form.  All processing of a qual expression after this point must be
	 * careful to maintain AND/OR flatness --- that is, do not generate a tree
	 * with AND directly under AND, nor OR directly under OR.
	 */
	expr = eval_const_expressions(root, expr);

	/*
	 * If it's a qual or havingQual, canonicalize it.
	 */
	if (kind == EXPRKIND_QUAL)
	{
		expr = (Node *) canonicalize_qual((Expr *) expr, false);

#ifdef OPTIMIZER_DEBUG
		printf("After canonicalize_qual()\n");
		pprint(expr);
#endif
	}

	/* Expand SubLinks to SubPlans */
	if (root->parse->hasSubLinks)
		expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));

	/*
	 * XXX do not insert anything here unless you have grokked the comments in
	 * SS_replace_correlation_vars ...
	 */

	/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
	if (root->query_level > 1)
		expr = SS_replace_correlation_vars(root, expr);

	/*
	 * If it's a qual or havingQual, convert it to implicit-AND format. (We
	 * don't want to do this before eval_const_expressions, since the latter
	 * would be unable to simplify a top-level AND correctly. Also,
	 * SS_process_sublinks expects explicit-AND format.)
	 */
	if (kind == EXPRKIND_QUAL)
		expr = (Node *) make_ands_implicit((Expr *) expr);

	return expr;
}
各个类型分类处理

上述流程的多个类型转换环节最终都会调用 XXX_XXX_mutator函数。XXX_XXX_mutator函数根据各个类型来实现分类转换。

这里主要介绍一下SS_process_sublinks流程中调用的process_sublinks_mutator函数,因为子链接中的节点为不确定类型,所以再函数调用时也会根据类型实行分类处理,当类型都不满足时,调用expression_tree_mutator函数进行处理。

  • SubLink子链接类型:对sublink->testexpr再调用process_sublinks_mutator函数进行解析获取testexpr节点,再调用make_subplan函数解析testexpr节点并创建一个子计划节点。
  • AND和OR类型:遍历bool中的节点再调用process_sublinks_mutator函数进行解析,将解析结果存储再新的list中,最后创建expr节点存储list和对应的AND、OR类型信息并返回。

相关代码如下:

static Node *
process_sublinks_mutator(Node *node, process_sublinks_context *context)
{
	process_sublinks_context locContext;

	locContext.root = context->root;

	if (node == NULL)
		return NULL;
	if (IsA(node, SubLink))
	{
		。。。
		/*
		 * Now build the SubPlan node and make the expr to return.
		 */
		return make_subplan(context->root,
							(Query *) sublink->subselect,
							sublink->subLinkType,
							sublink->subLinkId,
							testexpr,
							context->isTopQual);
	}

	/*
	 * Don't recurse into the arguments of an outer PHV or aggregate here. Any
	 * SubLinks in the arguments have to be dealt with at the outer query
	 * level; they'll be handled when build_subplan collects the PHV or Aggref
	 * into the arguments to be passed down to the current subplan.
	 */
	if (IsA(node, PlaceHolderVar))
	{
		if (((PlaceHolderVar *) node)->phlevelsup > 0)
			return node;
	}
	else if (IsA(node, Aggref))
	{
		if (((Aggref *) node)->agglevelsup > 0)
			return node;
	}

	/*
	 * We should never see a SubPlan expression in the input (since this is
	 * the very routine that creates 'em to begin with).  We shouldn't find
	 * ourselves invoked directly on a Query, either.
	 */
	Assert(!IsA(node, SubPlan));
	Assert(!IsA(node, AlternativeSubPlan));
	Assert(!IsA(node, Query));

	/*
	 * Because make_subplan() could return an AND or OR clause, we have to
	 * take steps to preserve AND/OR flatness of a qual.  We assume the input
	 * has been AND/OR flattened and so we need no recursion here.
	 *
	 * (Due to the coding here, we will not get called on the List subnodes of
	 * an AND; and the input is *not* yet in implicit-AND format.  So no check
	 * is needed for a bare List.)
	 *
	 * Anywhere within the top-level AND/OR clause structure, we can tell
	 * make_subplan() that NULL and FALSE are interchangeable.  So isTopQual
	 * propagates down in both cases.  (Note that this is unlike the meaning
	 * of "top level qual" used in most other places in Postgres.)
	 */
	if (is_andclause(node))
	{
		...
		return (Node *) make_andclause(newargs);
	}

	if (is_orclause(node))
	{
		...
		return (Node *) make_orclause(newargs);
	}

	/*
	 * If we recurse down through anything other than an AND or OR node, we
	 * are definitely not at top qual level anymore.
	 */
	locContext.isTopQual = false;

	return expression_tree_mutator(node,
								   process_sublinks_mutator,
								   (void *) &locContext);
}
创建子查询计划

make_subplan函数流程如下:

  • tuple_fraction值设置:0-1表示记录查询的比例个数,比例根据EXISTS_SUBLINK、ALL_SUBLINK来制定。(因为ANY、EXISTS、ANY功能不一致)。
  • 调用subquery_planner函数执行子链接的查询树优化,和完整的查询树优化处理一致。
  • create_plan、build_subplan创建计划。

完成子计划创建后,返回。

条件语句中的表达式优化处理

调用preprocess_qual_conditions函数遍历jointree节点,依据节点基础类型查找qual节点,并调用preprocess_expression函数对qual节点进行处理,:

  • RangeTblRef:什么都不做
  • FromExpr:遍历,递归调用preprocess_qual_conditions函数,再调用preprocess_expression处理各个节点中的qual节点
  • JoinExpr:对左右子节点调用preprocess_qual_conditions函数,再调用preprocess_expression处理节点中的qual节点

消除外连接

reduce_outer_joins

/*
 * reduce_outer_joins
 *		Attempt to reduce outer joins to plain inner joins.
 *
 * The idea here is that given a query like
 *		SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
 * we can reduce the LEFT JOIN to a plain JOIN if the "=" operator in WHERE
 * is strict.  The strict operator will always return NULL, causing the outer
 * WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b's
 * columns.  Therefore, there's no need for the join to produce null-extended
 * rows in the first place --- which makes it a plain join not an outer join.
 * (This scenario may not be very likely in a query written out by hand, but
 * it's reasonably likely when pushing quals down into complex views.)
 *
 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)
 *
 * Another transformation we apply here is to recognize cases like
 *		SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
 * If the join clause is strict for b.y, then only null-extended rows could
 * pass the upper WHERE, and we can conclude that what the query is really
 * specifying is an anti-semijoin.  We change the join type from JOIN_LEFT
 * to JOIN_ANTI.  The IS NULL clause then becomes redundant, and must be
 * removed to prevent bogus selectivity calculations, but we leave it to
 * distribute_qual_to_rels to get rid of such clauses.
 *
 * Also, we get rid of JOIN_RIGHT cases by flipping them around to become
 * JOIN_LEFT.  This saves some code here and in some later planner routines,
 * but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
 * join type.
 *
 * To ease recognition of strict qual clauses, we require this routine to be
 * run after expression preprocessing (i.e., qual canonicalization and JOIN
 * alias-var expansion).
 */
void
reduce_outer_joins(PlannerInfo *root)
{
	reduce_outer_joins_state *state;

	/*
	 * To avoid doing strictness checks on more quals than necessary, we want
	 * to stop descending the jointree as soon as there are no outer joins
	 * below our current point.  This consideration forces a two-pass process.
	 * The first pass gathers information about which base rels appear below
	 * each side of each join clause, and about whether there are outer
	 * join(s) below each side of each join clause. The second pass examines
	 * qual clauses and changes join types as it descends the tree.
	 */
	state = reduce_outer_joins_pass1((Node *) root->parse->jointree);

	/* planner.c shouldn't have called me if no outer joins */
	if (state == NULL || !state->contains_outer)
		elog(ERROR, "so where are the outer joins?");

	reduce_outer_joins_pass2((Node *) root->parse->jointree,
							 state, root, NULL, NIL, NIL);
}

生成查询计划

grouping_planner函数 ,首先处理LIMIT、ORDER BY、GROUP BY语句,然后根据setOperations(集合操作语句)值判断是否为UNION/INTERSECT/EXCEPT语句还是普通语句:

  • 处理LIMIT语句
  • 处理UNION/INTERSECT/EXCEPT语句,调用plan_set_operations函数处理,内部进行分类处理
    • union递归处理 generate_recursion_path:对左右子句进行处理后再合并
    • 非递归处理recurse_set_operations:按照基本流程就行处理
  • 处理普通语句,按照基本流程就行处理

static void grouping_planner(PlannerInfo *root, bool inheritance_update, double tuple_fraction) { Query *parse = root->parse; int64 offset_est = 0; int64 count_est = 0; double limit_tuples = -1.0; bool have_postponed_srfs = false; PathTarget *final_target; List *final_targets; List *final_targets_contain_srfs; bool final_target_parallel_safe; RelOptInfo *current_rel; RelOptInfo *final_rel; FinalPathExtraData extra; ListCell *lc;

static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
				 double tuple_fraction)
{
	...
	
	//处理LIMIT语句
	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
	if (parse->limitCount || parse->limitOffset)
	{
		tuple_fraction = preprocess_limit(root, tuple_fraction,
										  &offset_est, &count_est);

		/*
		 * If we have a known LIMIT, and don't have an unknown OFFSET, we can
		 * estimate the effects of using a bounded sort.
		 */
		if (count_est > 0 && offset_est >= 0)
			limit_tuples = (double) count_est + (double) offset_est;
	}

	/* Make tuple_fraction accessible to lower-level routines */
	root->tuple_fraction = tuple_fraction;

	if (parse->setOperations)
	{
		...
		
		//处理UNION/INTERSECT/EXCEPT语句
		/*
		 * Construct Paths for set operations.  The results will not need any
		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
		 * special work for recursive unions is the responsibility of
		 * plan_set_operations.
		 */
		current_rel = plan_set_operations(root);

		...
	}
	else
	{
		//普通语句处理
		...
	}
	...
}
普通语句处理流程

除开特殊语句,其他的语句都执行普通执行流程:

  • preprocess_groupclause处理分组语句:将GROUP BY 后的元素重新排列顺序,调整的顺序按照ORDER BY调整。便于后续利用索引快速完成ORDER BY和GROUP BY操作。
  • preprocess_targetlist处理目标列语句:没看懂怎么处理的
  • get_agg_clause_costs收集聚集函数使用的成本:
  • select_active_windows执行windows函数:
  • query_planner创建查询访问路径:因为该部分比较重要,所以单独讲一下
创建查询访问路径

注意:阅读代码前需要对查询引擎原理进行了解,不然不知道为什么这么做。

query_planner函数处理时分为普通语句和fromlist链表长度为1的语句("SELECT expression" and "INSERT ... VALUES()")--该类型调用函数直接处理并返回结果。重点讲一下普通语句的处理流程(普通查询会有三个要素:数据源、输出结果、查询条件,下面依次进行填充):

  • setup_simple_rel_arrays收集基表信息:从root->parse->rtable表设置root->append_rel_array表。

  • add_base_rels_to_query构建RelOptInfo数组(基表信息)(设置数据源):根据jointree类型创建RelOptInfo数组,将RelOptInfo数据存放再root->simple_rel_array中。简单来说就是创建基表的数组,再填充基表中的数据源。

    • build_simple_rel设置RelOptInfo参数填充输出链表targetlist和查询条件quals
  • build_base_rel_tlists设置目标列(设置输出结果):设置查询语句的输出结果,遍历target list(这里入参为root->processed_tlist),查找出所有的Var类型节点并添加到Var所属基表的RelOptInfo的reltargetlist中(已存在则不做重复添加)。简单来说就是填充基表的输出结果;将列名和数据源绑定:(select a1 , b1 from aa, bb where aa.a1 = bb.b1;------将a1和aa的关系绑定起来)

    • pull_var_clause:调用pull_var_clause_walker函数查询所用的Var变量
    • add_vars_to_targetlist:将Var变量添加到root->simple_rel_array中
  • deconstruct_jointree设置约束条件(设置查询条件):调用deconstruct_recurse函数按照类型进行处理:

    • RangeTblRef:直接返回,不处理
    • FromExpr:查找FromExpr中所有的基表Relids信息,然后和基表相关的约束条件绑定到基表RelOptInfo中。调用distribute_qual_to_rels函数执行绑定操作,绑定函数中涉及较复杂流程,后续介绍。
    • JoinExpr:按照join的类型分类进行查找,查找JoinExpr中左右子节点中所有的基表Relids信息,然后和基表相关的约束条件绑定到基表RelOptInfo中。最后调用make_outerjoininfo函数做连接顺序处理
  • reconsider_outer_join_clauses处理外连接:

  • generate_base_implied_equalities创建约束条件 :将clause中所有的约束添加进行优化,并将clause中约束条件与基表信息RelOptInfo进行绑定。

  • (*qp_callback) (root, qp_extra)-----standard_qp_callback回调:处理排序的pathkey

  • create_lateral_join_info构建lateraljoin信息:子查询中的处理,略

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章