Hive0.14在left outer join多級連接中，執行計劃生成BUG記錄

前言：

這幾天遇到一個很詭異的問題，一個三級left outer join的句子，在hive0.9和0.14上的執行結果會不一樣。

而且在0.14上通過轉換右表連接的順序可以達到正確輸出的目的，但是其中是爲什麼卻不得而知，情況非常

詭異，猜測是編譯器編譯執行計劃有問題！（所謂轉換右表連接順序是指把A left outer join B left outer join C

改成A left outer join C left outer join B,出問題的在B子句中有個聚合出的結果在最終結果中表現不對。）

下面詳細介紹下問題：

原語句：

select 
A.state_date,

           A.customer,

           A.channel_2,

           A.id,

           A.pid,

           A.type,

           A.pv,

           A.uv,

           A.visits,

           if(C.stay_visits
is null,0,C.stay_visits)
as stay_visits,

           A.stay_time,

           if(B.bounce
is null,0,B.bounce)
as bounce

 from

     (select
a.state_date,

            a.customer,

            b.url
as channel_2,

            b.id,

            b.pid,

            b.type,

            count(1)
as pv,

            count(distinct
a.gid) uv,

            count(distinct
a.session_id) as visits,

            sum(a.stay_time)
as stay_time

       from      

               (
select state_date,

                           customer,

                           gid,

                           session_id,

                           ep,

                           stay_time

                    from
bdi_fact.mid_pageview_dt0

                    where
l_date ='$v_date'

                  )a

                  join

                  (select
l_date as state_date ,

                          url,

                          id,

                          pid,

                          type,

                          cid

                   from
bdi_fact.frequency_channel

                   where
l_date ='$v_date'

                   and
type ='2'

                   and
dr='0'

                  )b

                   on 
a.customer=b.cid 

                   where
a.ep  rlike b.url

                   group
by a.state_date, a.customer, b.url,b.id,b.pid,b.type

       )A

       

    left
outer join

       (  
select

                   c.state_date
,

                   c.customer
,

                   d.url
as channel_2,

                   d.id,

                   sum(pagedepth)
as bounce

            from

                  (
select

                              t1.state_date
,

                              t1.customer
,

                              t1.session_id,

                              t1.ep,

                              t2.pagedepth

                    from          

                         (
select

                                     state_date
,

                                     customer
,

                                     session_id,

                                     exit_url
as ep

                          from
ods.mid_session_enter_exit_dt0

                          where
l_date ='$v_date'

                          )t1

                         join

                          (
select

                                    state_date
,

                                    customer
,

                                    session_id,

                                    pagedepth

                            from
ods.mid_session_action_dt0

                            where
l_date ='$v_date'

                            and 
pagedepth='1'

                          )t2

                         on
t1.customer=t2.customer

                         and
t1.session_id=t2.session_id

                   )c

                   join

                   (select
*

                   from
bdi_fact.frequency_channel

                   where
l_date ='$v_date'

                   and
type ='2'

                   and
dr='0'

                   )d

                   on
c.customer=d.cid

                   where
c.ep  rlike d.url

                   group
by  c.state_date,c.customer,d.url,d.id

             )B

             on

         A.customer=B.customer

             and
A.channel_2=B.channel_2

             and
A.id=B.id

      left
outer join

     (

             select
e.state_date,

            e.customer,

            f.url
as channel_2,

            f.id,

            f.pid,

            f.type,

            count(distinct
e.session_id) as stay_visits

       from      

               (
select state_date,

                           customer,

                           gid,

                           session_id,

                           ep,

                           stay_time

                    from
bdi_fact.mid_pageview_dt0

                    where
l_date ='$v_date'

                  )e

                  join

                  (select
l_date as state_date,

                          url,

                          id,

                          pid,

                          type,

                          cid

                   from
bdi_fact.frequency_channel

                   where
l_date ='$v_date'

                   and
type ='2'

                   and
dr='0'

                  )f

                   on 
e.customer=f.cid 

                   where
e.ep  rlike f.url

                   and
e.stay_time is not null

                   and
e.stay_time <>'0'

                   group
by e.state_date, e.customer, f.url,f.id,f.pid,f.type

           )C

    on

        A.customer=C.customer

        and  
A.channel_2=C.channel_2

        and  
A.id=C.id

        and  
A.pid=C.pid

        and  
A.type=C.type

 where
A.customer='Cdianyingwang' and
A.channel_2='http://www.1905.com/film/filmnews/jk/' and
A.id='127';"