目錄
語法
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
AUTHORIZATION
從 HIVE 0.14.0 通過 HIVE-5961開始支持,VECTORIZATION
從 Hive 2.3.0 通過HIVE-11394開始支持. LOCKS
is supported from Hive 3.2.0 via HIVE-17683.
執行計劃(最多)一共有三個部分:
- 這個語句的抽象語法樹
- 這個計劃不同階段之間的依賴關係
- 對於每個階段的詳細描述
基本信息
在查詢sql前面添加EXPLAIN關鍵字,即可以展示sql基本信息,包含:
- 作業的依賴關係圖,即STAGE DEPENDENCIES;
- 每個作業的詳細信息,即STAGE PLANS。
例如sql:
EXPLAIN
SELECT default_pay_class_name,
default_sub_pay_class_name,
COUNT(1)
FROM mart_fspinno.rpt_maiton_cashier_pay_d
WHERE partition_date='2020-01-18'
AND default_pay_class_name='xx'
GROUP BY 1,2
執行計劃如下:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce // 當前的計算引擎,爲MR
Map Operator Tree: // Map階段操作
TableScan // 表掃描操作
alias: rpt_maiton_cashier_pay_d // 表示對錶rpt_maiton_cashier_pay_d 進行掃描
Statistics: Num rows: 4973 Data size: 994632 Basic stats: COMPLETE Column stats: NONE // 預估的統計信息
Filter Operator // 過濾操作
predicate: (default_pay_class_name = 'xx') (type: boolean)
Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE
Select Operator // 列選擇,之前操作結果上進行投影
expressions: default_pay_class_name (type: string), default_sub_pay_class_name (type: string) // 需要投影的列
outputColumnNames: _col0, _col1
Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE
Group By Operator // 之前執行結果進行分組聚合
aggregations: count(1)
keys: _col0 (type: string), _col1 (type: string) // 分組的列
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string)
sort order: ++ // 是否排序 +表示正序,- 表示倒序
Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE
value expressions: _col2 (type: bigint)
Reduce Operator Tree: // Reduce階段操作
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false // 是否壓縮
Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe // 序列化與反序列化類型
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
擴展信息
使用explain extended 關鍵字。比基礎信息多AST,另外stage plans會打印出更多信息,如每個表的HDFS的讀取路徑,每個Hive表的表配置信息等。
ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_FROM
TOK_TABREF
TOK_TABNAME
mart_fspinno
rpt_maiton_cashier_pay_d
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_TABLE_OR_COL
default_pay_class_name
TOK_SELEXPR
TOK_TABLE_OR_COL
default_sub_pay_class_name
TOK_SELEXPR
TOK_FUNCTION
COUNT
1
TOK_WHERE
AND
=
TOK_TABLE_OR_COL
partition_date
'2020-01-18'
=
TOK_TABLE_OR_COL
default_pay_class_name
'xx'
TOK_GROUPBY
TOK_TABLE_OR_COL
default_pay_class_name
TOK_TABLE_OR_COL
default_sub_pay_class_name
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
...
Path -> Alias:
viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18 [$hdt$_0:rpt_maiton_cashier_pay_d]
Path -> Partition:
viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18
Partition
base file name: partition_date=2020-01-18
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
partition values:
partition_date 2020-01-18
properties:
COLUMN_STATS_ACCURATE false
bucket_count -1
columns cashier_primary_type_name,...
columns.comments '收銀臺一級分類名稱:標準收銀臺、極速支付','缺省一級分類方式',...
columns.types string:string:string:string:string:...
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18
name mart_fspinno.rpt_maiton_cashier_pay_d
numFiles 1
numRows -1
partition_columns partition_date
partition_columns.types string
rawDataSize -1
serialization.ddl struct rpt_maiton_cashier_pay_d { string cashier_primary_type_name, ...}
serialization.format 1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 994632
transient_lastDdlTime 1579379591
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
bucket_count -1
columns cashier_primary_type_name,default_pay_class_name,...
columns.comments '收銀臺一級分類名稱:標準收銀臺、極速支付','缺省一級分類方式',...
columns.types string:string:string:string:string:string:string:...
comment 買單收銀臺支付數據報表
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
last_modified_by hadoop-hmart-fspinno
last_modified_time 1579094504
location viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d
name mart_fspinno.rpt_maiton_cashier_pay_d
partition_columns partition_date
partition_columns.types string
serialization.ddl struct rpt_maiton_cashier_pay_d { string cashier_primary_type_name, string default_pay_class_name, ...}
serialization.format 1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
spark.sql.sources.schema.numPartCols 1
spark.sql.sources.schema.numParts 2
spark.sql.sources.schema.part.0 {"type":"struct","fields":[{"name":"cashier_primary_type_name","type":"string",...}]}
spark.sql.sources.schema.partCol.0 partition_date
transient_lastDdlTime 1579094504
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: mart_fspinno.rpt_maiton_cashier_pay_d
name: mart_fspinno.rpt_maiton_cashier_pay_d
Truncated Path -> Alias:
viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18 [$hdt$_0:rpt_maiton_cashier_pay_d]
Needs Tagging: false
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
GlobalTableId: 0
directory: hdfs://rz-nn15/tmp/hive-scratch/.hive-staging_hive_2020-01-22_17-43-54_147_5296011570429738017-14462/-ext-10002
NumFilesPerFileSink: 1
Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE
Stats Publishing Key Prefix: hdfs://rz-nn15/tmp/hive-scratch/.hive-staging_hive_2020-01-22_17-43-54_147_5296011570429738017-14462/-ext-10002/
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
columns _col0,_col1,_col2
columns.types string:string:bigint
escape.delim \
hive.serialization.extend.additional.nesting.levels true
serialization.format 1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
依賴信息