Hive MapJoin 優化

原創

2018-08-29 05:52

1、Hive本地MR

如果在hive中運行的sql本身數據量很小，那麼使用本地mr的效率要比分佈式的快很多。但是hive本地MR對內存使用很敏感，查詢的數據不能太大，否則本地內存是吃不消的。

So the query processor will launch this task in a child jvm, which has the same heap size as the Mapper's. Since the Local Task may run out of memory, the query processor will measure the memory usage of the local task very carefully. Once the memory usage of the Local Task is higher than a threshold number. This Local Task will abort itself and tells the user that this table is too large to hold in the memory. User can change this threshold by set hive.mapjoin.localtask.max.memory.usage = 0.999

查詢處理器會在一個子的jvm裏運作這個任務，jvm堆大小跟Mapper的堆大小一樣。本地MR可能內存消耗殆盡，查詢處理器用精確的計算本地MR的內存大小，一旦內存超過了設定的值，那麼這個MR就會自動kill掉。可以通過設置hive.mapjoin.localtask.max.memory.usage =0.9，這個值太保守。

set hive.exec.mode.local.auto=true; //開啓本地mr

//設置local mr的最大輸入數據量,當輸入數據量小於這個值的時候會採用local mr的方式

set hive.exec.mode.local.auto.inputbytes.max=50000000;

//設置local mr的最大輸入文件個數,當輸入文件個數小於這個值的時候會採用local mr的方式

set hive.exec.mode.local.auto.tasks.max=10;

當這三個參數同時成立時候，纔會採用本地mr

2、Mapjoin使用

就是把小的表加入內存，可以配置以下參數，是hive自動根據sql，選擇使用common join或者map join

set hive.auto.convert.join = true;

hive.mapjoin.smalltable.filesize 默認值是25mb

參考自:https://cwiki.apache.org/confluence/display/Hive/MapJoinOptimization

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive MapJoin 優化

Android啓動過程-萬字長文(Android14)

這種嵌套字典類型的數據，我想把它讀取到df裏，如何操作？

微調真的能讓LLM學到新東西嗎:引入新知識可能讓模型產生更多的幻覺

iNeuOS工業互聯網操作系統，增加電力IEC104協議

微服務實踐k8s&dapr開發部署實驗（3）訂閱發佈

kbgressdb之數據結構V0.2

java ping ip地址

ASN1 研究4

Hibernate 屬性文件

托斯卡尼 tuscany2

Mina　IoFuture研究

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結