Hive實驗5:查看Hql執行計劃及關鍵步驟說明

1、查看執行計劃方法

語法:explain [extended] Hiveql;

/*例子:*/
explain select count(distinct mobilename) from testtab_small;	

2、執行計劃基本要素

  1. 主要步驟及依賴關係,從上到下
  2. 各主要步驟關鍵信息,包括:
關鍵信息 關鍵字 說明
Map或reduce操作 Map Operator Tree、Reduce Operator Tree map、reduce階段
掃描表 TableScan 要查詢的表
表數據量統計 Statistics 包括行數、數據大小
查詢算子 Select Operator 要檢索的字段
分區算子 Group By Operator 聚合如count()等需要
排序 sort order 是否排序,+表排序,空不排序
是否本地任務 Local Work、Local Tables、Map Local Operator Tree 見於Map端的連接,有小表參與連接、且auto.convert.join=true
連接算子 Join Operator 連接
連接條件 condition map 連接條件:Left Outer Join0 to 1

2、explain select count(distinct mobilename) from testtab_small

完整的執行計劃示例。之後的主要體現差異,不是全部,比如刪除Stage0部分。

/*整體步驟,從上往下看*/
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
      	  /*掃描的表*/
          TableScan
            alias: testtab_small
            /*表數據量統計,13條,629字節?*/
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              /*檢索的字段是mobilename*/
              expressions: mobilename (type: string)
              outputColumnNames: mobilename
              Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                /*聚合操作*/
                aggregations: count(DISTINCT mobilename)
                keys: mobilename (type: string)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  /*+ 需要排序*/
                  sort order: +
                  Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col0:0._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

3、explain select count(mobilename) from testtab_small;

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testtab_small
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string)
              outputColumnNames: mobilename
              Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count(mobilename)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order:
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

4、全局排序

explain select * from testtab_small order by mobilename;
  1. 無聚合算子
  2. 在Map Operator Tree下的Reduce Output Operator下就有排序了(sort order: +)
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testtab_small
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string), testrecordid (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: string)
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

5、關聯小表,Map端連接

hive.auto.convert.join=true /默認值/

hive> explain select a.testrecordid,a.mobilename,b.mobilename from testtab_small a left join testtab_small2 b on a.testrecordid=b.testrecordid;	
  1. 總步驟增至3個
  2. 多了本地任務,Map Reduce Local Work。即hadoop的分佈式緩存技術
  3. Map Join Operator
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $hdt$_1:b
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $hdt$_1:b
          TableScan
            alias: b
            Statistics: Num rows: 13 Data size: 655 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string), testrecordid (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 13 Data size: 655 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col1 (type: string)
                  1 _col1 (type: string)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: a
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string), testrecordid (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Left Outer Join0 to 1
                keys:
                  0 _col1 (type: string)
                  1 _col1 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col1 (type: string), _col0 (type: string), _col2 (type: string)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

6、Reduce端連接

hive.auto.convert.join=false
與5相同的語句,Reduce端連接執行計劃:

  1. Map Operator Tree裏有2個評級的TableScan,對應Mapper多個數據來源
  2. Reduce Operator Tree下Join Operator:reduce端連接。
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: a
            Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string), testrecordid (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string)
                sort order: +
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 13 Data size: 629 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string)
          TableScan
            alias: b
            Statistics: Num rows: 13 Data size: 655 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: mobilename (type: string), testrecordid (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 13 Data size: 655 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string)
                sort order: +
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 13 Data size: 655 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
          keys:
            0 _col1 (type: string)
            1 _col1 (type: string)
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col1 (type: string), _col0 (type: string), _col2 (type: string)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 14 Data size: 691 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章