Hive版本: 2.1.1, Spark版本是1.6.0
這幾天發現insert overwrite partition運行的很慢,看了下是hive on spark引擎,這引擎平時比mapreduce快多了,但是怎麼今天感覺比mapreduce慢了好幾倍,運行了1h多還沒運行完。
將SQL拿來手動hive -f 文件.sql執行了,看到spark的stage狀態一直都是處於0,幾乎沒有改變,如List-1所示。
List-1
[xx@xxxx xx]# hive -f sql.sql
...
Query ID = root_20200807155008_80726145-e8f2-4f4e-8222-94083907a70c
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = d5e51d11-0254-49e3-93c7-f1380a89b3d5
Running with YARN Application = application_1593752968338_0506
Kill Command = /usr/local/hadoop/bin/yarn application -kill application_1593752968338_0506
Query Hive on Spark job[0] stages:
0
Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2020-08-07 15:50:47,501 Stage-0_0: 0(+2)/3
2020-08-07 15:50:50,530 Stage-0_0: 0(+2)/3
2020-08-07 15:50:53,555 Stage-0_0: 0(+2)/3
2020-08-07 15:50:56,582 Stage-0_0: 0(+2)/3
2020-08-07 15:50:57,590 Stage-0_0: 0(+3)/3
2020-08-07 15:51:00,620 Stage-0_0: 0(+3)/3
2020-08-07 15:51:03,641 Stage-0_0: 0(+3)/3
2020-08-07 15:51:06,662 Stage-0_0: 0(+3)/3
2020-08-07 15:51:09,680 Stage-0_0: 0(+3)/3
2020-08-07 15:51:12,700 Stage-0_0: 0(+3)/3
...
運行1h多了,但是還是處於那個狀態,感覺不對立即搜索了下,別人也遇到了這個問題,沒找到好的解決方法
我暫時對這個任務設置mr作爲執行引擎——使用set hive.execution.engine=mr,不使用spark作爲引擎,這樣就解決了一直卡住不動的問題
之後hive又報錯了,提示超過了單個node的max partition數,如List-2
List-2
...
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:933)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:704)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
... 9 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
...
再設置partitions和partitions.pernode,如下List-3
List-3
set hive.execution.engine=mr;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;
...
這個問題,google了下,在Spark的jira issue裏面有,說是個bug,後面修復了。
這樣就解決了,但是mr還是慢,沒辦法要麼更換hive/spark版本,要麼自己去修改spark源碼,先用mr暫時解決下。