Hive 表的連接

Hive表常用連接

對於直接在mapReduce中用join相比,hive的好處是簡化了繁瑣的處理工作,hive表的連接操作就是如此,本文主要講解hive的4中主要連接:內連接、外連接、半連接、map連接。

我們用如下的sales,things表的數據來舉例說明各種連接的作用,方便大家理解。

                             

(圖1 sales表)                                                                       (圖2 things表)

內連接

內連接是最簡單的一種連接,它就是將表匹配的行顯示出來。通過join關鍵字對錶連接,然後是通過on關鍵字進行謂語動詞的連接,等值的條件在on語句中進行限定,當然我們可以在條件中用and,or等分割限定的條件。

如:select sales.*,things.* from sales join things on (sales.id=things.id);

(圖3 查詢結果)

添加and限定:select sales.*,things.* from sales join things on (sales.id=things.id and sales.id>2);

(圖4 查詢結果)

通常單個的連接是執行一個mapredce,可以通過explain來看執行了多少個mapreduce

如:explain extended select sales.*,things.* from sales join things on (sales.id=things.id);

外連接

外連接可以顯示錶中不能匹配的行,外連接可以分爲left outer join,right outer join,full outer join三種

left outer join

左連接是顯示左表的字段,將join表的字段不能匹配的行null來顯示

比如:select sales.*,things.* from sales left outer join things on (sales.id=things.id);

joe	2	shuit	2
hank	3	milk	3
wangwu	4	water	4
lisi	0	NULL	NULL
daic	2	shuit	2

right outer join

相對於left outer join相比,right outer join是交換兩表的連接關係

比如:select sales.*,things.* from sales right outer join things on (sales.id=things.id);

joe	2	shuit	2
daic	2	shuit	2
wangwu	4	water	4
NULL	NULL	air	1
hank	3	milk	3

full outer join

顧名思義就是將所有表所在的行都有對應的行輸出

比如:

select sales.*,things.* from sales full outer join things on (sales.id=things.id);

lisi	0	NULL	NULL
wangwu	4	water	4
NULL	NULL	air	1
joe	2	shuit	2
daic	2	shuit	2
hank	3	milk	3

 半連接,半連接類似於左連接,不過並不會輸出右表的值:

比如:select * from sales left semi join things on (sales.id=things.id);

joe	2
hank	3
wangwu	4
daic	2

map連接

當一個表足夠小,比如sales表,適合放在內存中,就可以將其放在內存中做連接操作。如果需要指定map,就需要通過註釋的方式來做。

不如:select /* + mapjoin(sales) */ sales.*,things.* from sales join things on (sales.id=things.id);

joe	2	shuit	2
hank	3	milk	3
wangwu	4	water	4
daic	2	shuit	2

最後查看下執行過程。

比如:explain  select /* + mapjoin(sales) */ sales.*,things.* from sales join things on (sales.id=things.id);

STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Spark
      DagName: hadoop_20190126120909_7f4e37ab-c15f-465e-89d7-14f2b8283d6a:32
      Vertices:
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: things
                  Statistics: Num rows: 1 Data size: 29 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: id is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 29 Basic stats: COMPLETE Column stats: NONE
                    Spark HashTable Sink Operator
                      keys:
                        0 id (type: string)
                        1 id (type: string)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-1
    Spark
      DagName: hadoop_20190126120909_7f4e37ab-c15f-465e-89d7-14f2b8283d6a:31
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: sales
                  Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: id is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 id (type: string)
                        1 id (type: string)
                      outputColumnNames: _col0, _col1, _col5, _col6
                      input vertices:
                        1 Map 2
                      Statistics: Num rows: 1 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string), _col6 (type: string)
                        outputColumnNames: _col0, _col1, _col2, _col3
                        Statistics: Num rows: 1 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Local Work:
              Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章