Hive學習筆記——fetch Hive筆記之Fetch Task

原創

2021-01-21 13:25

在美團點評的文章中，介紹了Hive SQL轉化爲MapReduce的過程

1、Antlr定義SQL的語法規則，完成SQL詞法，語法解析，將SQL轉化爲抽象語法樹AST Tree
2、遍歷AST Tree，抽象出查詢的基本組成單元QueryBlock
3、遍歷QueryBlock，翻譯爲執行操作樹OperatorTree
4、邏輯層優化器進行OperatorTree變換，合併不必要的ReduceSinkOperator，減少shuffle數據量
5、遍歷OperatorTree，翻譯爲MapReduce任務
6、物理層優化器進行MapReduce任務的變換，生成最終的執行計劃

參考

https://tech.meituan.com/2014/02/12/hive-sql-to-mapreduce.html

但是不是所有的SQL都有必要轉換爲MR來執行，比如

select * from xx.xx limit 1

Hive只需要直接讀取文件，並傳輸到控制檯即可

在hive-default.xml配置文件中，有2個參數，hive.fetch.task.conversion和hive.fetch.task.conversion.threshold

hive.fetch.task.conversion屬性修改爲more以後，在全局查找、字段查找、limit查找等都不走mapreduce

hive.fetch.task.conversion.threshold屬性表示在輸入大小爲多少以內的時候fetch task生效，默認1073741824 byte = 1G

<property>
    <name>hive.fetch.task.conversion</name>
    <value>more</value>
    <description>
      Expects one of [none, minimal, more].
      Some select queries can be converted to single FETCH task minimizing latency.
      Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurs RS), lateral views and joins.
      0. none : disable hive.fetch.task.conversion
      1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
      2. more  : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
    </description>
</property>

<property>
  <name>hive.fetch.task.conversion.threshold</name>
  <value>1073741824</value>
  <description>
    Input threshold for applying hive.fetch.task.conversion. If target table is native, input length
    is calculated by summation of file lengths. If it's not native, storage handler for the table
    can optionally implement org.apache.hadoop.hive.ql.metadata.InputEstimator interface.
  </description>
</property>

參考：

Hive快速入門系列(14) | Hive性能調優 [一]Fetch抓取與本地模式

Hive筆記之Fetch Task

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive學習筆記——fetch Hive筆記之Fetch Task

自學編程兩個月，現在我月入 4 萬元

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

存儲底層數據結構對比 LSM樹由來、設計思想以及應用到HBase的索引

zigzag編碼原理

kafka學習筆記——配置

Hbase學習筆記——客戶端API

Flink學習筆記——讀寫Hbase HBase讀寫的幾種方式（三）flink篇

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結