Hive 遇到兩表join,數據發散

首先對SQL進行explain

Explain
Plan optimized by CBO

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
Fetch Operator
limit:-1
Stage-1
Map 1 vectorized
File Output Operator [FS_28]
Select Operator [SEL_27] (rows=38059719202 width=412)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
Map Join Operator [MAPJOIN_26] (rows=38059719202 width=412)
Conds:SEL_25._col1=RS_23._col0(Left Outer),Output:["_col0","_col1","_col2","_col3","_col4","_col6","_col7","_col8","_col9"]
<-Map 2 [BROADCAST_EDGE] vectorized
BROADCAST [RS_23]
PartitionCols:_col0
Select Operator [SEL_22] (rows=34570155 width=202)
Output:["_col0","_col1","_col2","_col3","_col4"]
Filter Operator [FIL_21] (rows=34570155 width=202)
predicate:dim_project_sk is not null
TableScan [TS_3] (rows=35449323 width=202)
bgy_data_platform@dws_f_mk_

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章