HIVE執行計劃(未完)

目錄

語法

基本信息

擴展信息

依賴信息

授權信息

參考文章


語法

EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

AUTHORIZATION 從 HIVE 0.14.0 通過 HIVE-5961開始支持,VECTORIZATION 從 Hive 2.3.0 通過HIVE-11394開始支持. LOCKS is supported from Hive 3.2.0 via HIVE-17683.

執行計劃(最多)一共有三個部分:

  • 這個語句的抽象語法樹
  • 這個計劃不同階段之間的依賴關係
  • 對於每個階段的詳細描述

基本信息

在查詢sql前面添加EXPLAIN關鍵字,即可以展示sql基本信息,包含:

  1. 作業的依賴關係圖,即STAGE DEPENDENCIES;
  2. 每個作業的詳細信息,即STAGE PLANS。

例如sql:

EXPLAIN 
SELECT default_pay_class_name,
	default_sub_pay_class_name,
	COUNT(1)
  FROM mart_fspinno.rpt_maiton_cashier_pay_d
  WHERE partition_date='2020-01-18'
 AND default_pay_class_name='xx'
 GROUP BY 1,2

執行計劃如下:

STAGE DEPENDENCIES: 
  Stage-1 is a root stage 
  Stage-0 depends on stages: Stage-1 
 
STAGE PLANS: 
  Stage: Stage-1 
    Map Reduce // 當前的計算引擎,爲MR
      Map Operator Tree: // Map階段操作
          TableScan // 表掃描操作
            alias: rpt_maiton_cashier_pay_d // 表示對錶rpt_maiton_cashier_pay_d 進行掃描
            Statistics: Num rows: 4973 Data size: 994632 Basic stats: COMPLETE Column stats: NONE // 預估的統計信息
            Filter Operator // 過濾操作
              predicate: (default_pay_class_name = 'xx') (type: boolean) 
              Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE 
              Select Operator // 列選擇,之前操作結果上進行投影
                expressions: default_pay_class_name (type: string), default_sub_pay_class_name (type: string) // 需要投影的列
                outputColumnNames: _col0, _col1 
                Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE 
                Group By Operator // 之前執行結果進行分組聚合
                  aggregations: count(1) 
                  keys: _col0 (type: string), _col1 (type: string) // 分組的列
                  mode: hash 
                  outputColumnNames: _col0, _col1, _col2 
                  Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE 
                  Reduce Output Operator 
                    key expressions: _col0 (type: string), _col1 (type: string) 
                    sort order: ++  // 是否排序 +表示正序,- 表示倒序
                    Map-reduce partition columns: _col0 (type: string), _col1 (type: string) 
                    Statistics: Num rows: 2486 Data size: 497215 Basic stats: COMPLETE Column stats: NONE 
                    value expressions: _col2 (type: bigint) 
      Reduce Operator Tree: // Reduce階段操作
        Group By Operator 
          aggregations: count(VALUE._col0) 
          keys: KEY._col0 (type: string), KEY._col1 (type: string) 
          mode: mergepartial 
          outputColumnNames: _col0, _col1, _col2 
          Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE 
          File Output Operator 
            compressed: false // 是否壓縮
            Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE 
            table: 
                input format: org.apache.hadoop.mapred.TextInputFormat 
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe // 序列化與反序列化類型
 
  Stage: Stage-0 
    Fetch Operator 
      limit: -1 
      Processor Tree: 
        ListSink 
 

擴展信息

使用explain extended 關鍵字。比基礎信息多AST,另外stage plans會打印出更多信息,如每個表的HDFS的讀取路徑,每個Hive表的表配置信息等。

ABSTRACT SYNTAX TREE: 
   
TOK_QUERY 
   TOK_FROM 
      TOK_TABREF 
         TOK_TABNAME 
            mart_fspinno 
            rpt_maiton_cashier_pay_d 
   TOK_INSERT 
      TOK_DESTINATION 
         TOK_DIR 
            TOK_TMP_FILE 
      TOK_SELECT 
         TOK_SELEXPR 
            TOK_TABLE_OR_COL 
               default_pay_class_name 
         TOK_SELEXPR 
            TOK_TABLE_OR_COL 
               default_sub_pay_class_name 
         TOK_SELEXPR 
            TOK_FUNCTION 
               COUNT 
               1 
      TOK_WHERE 
         AND 
            = 
               TOK_TABLE_OR_COL 
                  partition_date 
               '2020-01-18' 
            = 
               TOK_TABLE_OR_COL 
                  default_pay_class_name 
               'xx' 
      TOK_GROUPBY 
         TOK_TABLE_OR_COL 
            default_pay_class_name 
         TOK_TABLE_OR_COL 
            default_sub_pay_class_name 
 
 
STAGE DEPENDENCIES: 
  Stage-1 is a root stage 
  Stage-0 depends on stages: Stage-1 
 
STAGE PLANS: 
  Stage: Stage-1 
    Map Reduce 
      Map Operator Tree: 
	...
      Path -> Alias: 
        viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18 [$hdt$_0:rpt_maiton_cashier_pay_d] 
      Path -> Partition: 
        viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18  
          Partition 
            base file name: partition_date=2020-01-18 
            input format: org.apache.hadoop.mapred.TextInputFormat 
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
            partition values: 
              partition_date 2020-01-18 
            properties: 
              COLUMN_STATS_ACCURATE false 
              bucket_count -1 
              columns cashier_primary_type_name,...
              columns.comments '收銀臺一級分類名稱:標準收銀臺、極速支付','缺省一級分類方式',...
              columns.types string:string:string:string:string:...
              file.inputformat org.apache.hadoop.mapred.TextInputFormat 
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
              location viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18 
              name mart_fspinno.rpt_maiton_cashier_pay_d 
              numFiles 1 
              numRows -1 
              partition_columns partition_date 
              partition_columns.types string 
              rawDataSize -1 
              serialization.ddl struct rpt_maiton_cashier_pay_d { string cashier_primary_type_name, ...} 
              serialization.format 1 
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
              totalSize 994632 
              transient_lastDdlTime 1579379591 
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
           
              input format: org.apache.hadoop.mapred.TextInputFormat 
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
              properties: 
                bucket_count -1 
                columns cashier_primary_type_name,default_pay_class_name,...
                columns.comments '收銀臺一級分類名稱:標準收銀臺、極速支付','缺省一級分類方式',...
                columns.types string:string:string:string:string:string:string:...
                comment 買單收銀臺支付數據報表 
                file.inputformat org.apache.hadoop.mapred.TextInputFormat 
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
                last_modified_by hadoop-hmart-fspinno 
                last_modified_time 1579094504 
                location viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d 
                name mart_fspinno.rpt_maiton_cashier_pay_d 
                partition_columns partition_date 
                partition_columns.types string 
                serialization.ddl struct rpt_maiton_cashier_pay_d { string cashier_primary_type_name, string default_pay_class_name, ...} 
                serialization.format 1 
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
                spark.sql.sources.schema.numPartCols 1 
                spark.sql.sources.schema.numParts 2 
                spark.sql.sources.schema.part.0 {"type":"struct","fields":[{"name":"cashier_primary_type_name","type":"string",...}]} 
                spark.sql.sources.schema.partCol.0 partition_date 
                transient_lastDdlTime 1579094504 
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
              name: mart_fspinno.rpt_maiton_cashier_pay_d 
            name: mart_fspinno.rpt_maiton_cashier_pay_d 
      Truncated Path -> Alias: 
        viewfs://hadoop-meituan/nn15/warehouse/mart_fspinno.db/rpt_maiton_cashier_pay_d/partition_date=2020-01-18 [$hdt$_0:rpt_maiton_cashier_pay_d] 
      Needs Tagging: false 
      Reduce Operator Tree: 
        Group By Operator 
          aggregations: count(VALUE._col0) 
          keys: KEY._col0 (type: string), KEY._col1 (type: string) 
          mode: mergepartial 
          outputColumnNames: _col0, _col1, _col2 
          Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE 
          File Output Operator 
            compressed: false 
            GlobalTableId: 0 
            directory: hdfs://rz-nn15/tmp/hive-scratch/.hive-staging_hive_2020-01-22_17-43-54_147_5296011570429738017-14462/-ext-10002 
            NumFilesPerFileSink: 1 
            Statistics: Num rows: 1243 Data size: 248607 Basic stats: COMPLETE Column stats: NONE 
            Stats Publishing Key Prefix: hdfs://rz-nn15/tmp/hive-scratch/.hive-staging_hive_2020-01-22_17-43-54_147_5296011570429738017-14462/-ext-10002/ 
            table: 
                input format: org.apache.hadoop.mapred.TextInputFormat 
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
                properties: 
                  columns _col0,_col1,_col2 
                  columns.types string:string:bigint 
                  escape.delim \ 
                  hive.serialization.extend.additional.nesting.levels true 
                  serialization.format 1 
                  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
            TotalFiles: 1 
            GatherStats: false 
            MultiFileSpray: false 
 
  Stage: Stage-0 
    Fetch Operator 
      limit: -1 
      Processor Tree: 
        ListSink 

依賴信息

 

授權信息

參考文章

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章