flink sql 在執行中是如何從sql語句或者是table api 轉爲最後的DataStream任務或者是DataSet任務的,本篇我們從源碼角度看下中間的執行和轉換過程。
DEMO
這是flink的一個單元測試方法,模擬實時數據查詢
@Test
public void testSelect() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
StreamITCase.clear();
DataStream<Tuple3<Integer, Long, String>> ds = JavaStreamTestData.getSmall3TupleDataSet(env);
Table in = tableEnv.fromDataStream(ds, "a,b,c");
tableEnv.registerTable("MyTable", in);
String sqlQuery = "SELECT * FROM MyTable";
Table result = tableEnv.sqlQuery(sqlQuery);
DataStream<Row> resultSet = tableEnv.toAppendStream(result, Row.class);
resultSet.addSink(new StreamITCase.StringSink<Row>());
env.execute();
List<String> expected = new ArrayList<>();
expected.add("1,1,Hi");
expected.add("2,2,Hello");
expected.add("3,2,Hello world");
StreamITCase.compareWithList(expected);
}
註冊表
tableEnv.registerTable("MyTable", in);
==>
StreamTableEnvironment.registerDataStream
==>
registerDataStreamInternal
==>
registerTableInternal
==>
protected def registerTableInternal(name: String, table: AbstractTable): Unit = {
if (isRegistered(name)) {
throw new TableException(s"Table \'$name\' already exists. " +
s"Please, choose a different name.")
} else {
rootSchema.add(name, table)
}
}
將表結構添加到schema中,註冊表成功。
Table生成過程
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
Table result = tableEnv.sqlQuery(sqlQuery);
===>
def sqlQuery(query: String): Table = {
val planner = new FlinkPlannerImpl(getFrameworkConfig, getPlanner, getTypeFactory)
// parse the sql query
val parsed = planner.parse(query)//生成SqlNode抽象語法樹,SqlNode是抽象類,子類爲SqlSelect,SqlDelete,SqlJoin,SqlAlter等
if (null != parsed && parsed.getKind.belongsTo(SqlKind.QUERY)) {
// validate the sql query
val validated = planner.validate(parsed)//校驗 SqlNode抽象語法樹
// transform to a relational treex
val relational = planner.rel(validated)//Ast--> logic plan
new Table(this, LogicalRelNode(relational.rel))//relational.rel表示Logic plan,在這裏是 LogicalProject類型
} else {
throw new TableException(
"Unsupported SQL query! sqlQuery() only accepts SQL queries of type " +
"SELECT, UNION, INTERSECT, EXCEPT, VALUES, and ORDER_BY.")
}
}
構造Table對象的過程就是將sql 轉爲 SqlNode ,再校驗,再轉爲邏輯計劃。調用的過程都是和calsite同理,calsite可以參考 https://matt33.com/2019/03/07/apache-calcite-process-flow/
Table 轉爲 DataStream過程
DataStream<Row> resultSet = tableEnv.toAppendStream(result, Row.class);
resultSet.addSink(new StreamITCase.StringSink<Row>());
env.execute();
==>
def toAppendStream[T](
table: Table,
clazz: Class[T],
queryConfig: StreamQueryConfig): DataStream[T] = {
val typeInfo = TypeExtractor.createTypeInfo(clazz)
TableEnvironment.validateType(typeInfo)
translate[T](table, queryConfig, updatesAsRetraction = false, withChangeFlag = false)(typeInfo)
}
==>
protected def translate[A](
table: Table,
queryConfig: StreamQueryConfig,
updatesAsRetraction: Boolean,
withChangeFlag: Boolean)(implicit tpe: TypeInformation[A]): DataStream[A] = {
val relNode = table.getRelNode//獲取邏輯計劃
val dataStreamPlan = optimize(relNode, updatesAsRetraction)//優化生成物理執行計劃
val rowType = getResultType(relNode, dataStreamPlan)
translate(dataStreamPlan, rowType, queryConfig, withChangeFlag)
}
==》translateToCRow ==》DataStreamScan.translateToPlan ==》convertToInternalRow ==》generateConversionProcessFunction 生成具體的算子
DataSet也是同理的翻譯過程,最終sql 就可以像DataStream一樣執行任務。