問題描述
hive執行sql包含join時候,提示異常: ERROR | main | Hive Runtime Error: Map local work exhausted memory
分析過程
1.異常日誌下:
2019-06-24 13:39:41,706 | ERROR | main | Hive Runtime Error: Map local work exhausted memory | org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:400)
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2019-06-24 13:39:41 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 926540440 percentage: 0.914
at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)
at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.process(HashTableSinkOperator.java:253)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:132)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:455)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:392)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:830)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:225)
at org.apache.hadoop.util.RunJar.main(RunJar.java:140)
從日誌看,在localtask出現了內存溢出。
2.由於開啓了hive.auto.convert.join,但是實際小表大小是hive.mapjoin.smalltable.filesize(默認25M,小表不會超過25M)。由於使用的是orc壓縮,解壓縮後可能大小到了250M,存放到內存大小可能就會超過1G。
可以看到JVM Max Heap Size大小爲:1013645312 (大約1G)
2019-06-24 13:39:35,741 | INFO | main | JVM Max Heap Size: 1013645312 | org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.<init>(MapJoinMemoryExhaustionHandler.java:61)
2019-06-24 13:39:35,775 | INFO | main | Key count from statistics is -1; setting map size to 100000 | org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.calculateTableSize(HashMapWrapper.java:95)
2019-06-24 13:39:35,776 | INFO | main | Initialization Done 2 HASHTABLESINK done is reset. | org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:373)
原因
由於使用了hive.auto.convert.join,對小表進行廣播,但是原表是orc的,存放到內存可能膨脹到大於localtask的堆內存大小,導致sql執行失敗。
解決措施
方案一
調大localtask的內存,set hive.mapred.local.mem=XX ,默認1G,調大到4G
方案二
直接關表autojoin,將hive.auto.convert.join設置成false