HIVE學習二:hive on tez

環境介紹

組件 版本
hadoop 2.6.5
hive 2.3.6
tez 0.8.5

tez對hadoop版本是有要求的。tez 0.8及以上需要hadoop 2.6及以上。tez 0.9及以上需要hadoop 2.7及以上。

下載、安裝、配置TEZ

  1. 清華鏡像站下載對應版本的tez如apache-tez-0.8.5-bin.tar.gz,解壓後放在/usr/local/src目錄下並建立軟連接。如下圖所示。tez官網介紹的是用源碼編譯的方式獲取tez,由於源碼編譯太慢了,直接採用編譯好的tez包apache-tez-0.8.5-bin.tar.gz
    在這裏插入圖片描述
  2. 在hdfs上創建目錄,將tez的tar包拷貝到對應目錄。其中tez.tar.gz包放在${TEZ_HOME}/share目錄下。
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -put tez/share/tez.tar.gz /apps/tez

在這裏插入圖片描述

  1. 編寫tez-site.xml文件,放在${HADOOP_HOME}/etc/hadoop目錄下,內容如下。
    設置tez.lib.uris屬性指向剛剛上傳到hdfs上的tez.tar.gz路徑。編寫完成後拷貝tez-site.xml文件到所有節點的${HADOOP_HOME}/etc/hadoop目錄下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/apps/tez/tez.tar.gz</value>
    </property>
</configuration>

  1. 每個node都配置hadoop classpath環境變量,使其包括tez libraries。
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export TEZ_HOME=/usr/local/src/tez
export TEZ_JARS=${TEZ_HOME}/*:${TEZ_HOME}/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
  1. 配置tez ui
    tez ui依賴於YARN timelineserver服務。在hadoop2.4版本之前對任務執行的監控只開發了針對MR的Job History Server,它可以提供給用戶用戶查詢已經運行完成的作業的信息,但是後來,隨着在YARN上面集成的越來越多的計算框架,比如spark、Tez,也有必要爲基於這些計算引擎的技術開發相應的作業任務監控工具,所以hadoop的開發人員就考慮開發一款更加通用的Job History Server,即YARN Timeline Server

yarn-site.xml文件添加如下內容配置YARN Timeline Server。更加詳細的配置可參考TimelineServer
特別需要注意的是yarn.timeline-service.hostname需要改成啓動TimelineServer服務的節點地址,如我在master機器上啓動,這裏就寫master。官網默認是0.0.0.0,這樣DAG Master會報錯找不到TimelineServer。改成真正的hostname即可。

<!--configurations for timelineserver-->
    <property>
        <name>yarn.timeline-service.hostname</name>
        <value>master</value>
    </property>
    <property>
        <description>Address for the Timeline server to start the RPC server.</description>
        <name>yarn.timeline-service.address</name>
        <value>${yarn.timeline-service.hostname}:10200</value>
    </property>
    
    <property>
        <description>The http address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.address</name>
        <value>${yarn.timeline-service.hostname}:8188</value>
    </property>
    
    <property>
        <description>The https address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.https.address</name>
        <value>${yarn.timeline-service.hostname}:8190</value>
    </property>
    
    <property>
        <description>Handler thread count to serve the client RPC requests.</description>
        <name>yarn.timeline-service.handler-thread-count</name>
        <value>10</value>
    </property>
    
    <property>
        <description>The max number of applications could be fetched by using REST API
         or application history protocol and shown in timeline server web ui. Defaults
         to `10000`.</description>
        <name>yarn.timeline-service.generic-application-history.max-applications</name>
        <value>10000</value>
    </property>
    
    <property>
        <description>Enables cross-origin support (CORS) for web services where
        cross-origin web response headers are needed. For example, javascript making
        a web services request to the timeline server.</description>
        <name>yarn.timeline-service.http-cross-origin.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Comma separated list of origins that are allowed for web
        services needing cross-origin (CORS) support. Wildcards (*) and patterns
        allowed</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
        <value>*</value>
    </property>
    
    <property>
        <description>Comma separated list of methods that are allowed for web
        services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
        <value>GET,POST,HEAD</value>
    </property>
    
    <property>
        <description>Comma separated list of headers that are allowed for web
        services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
        <value>X-Requested-With,Content-Type,Accept,Origin</value>
    </property>
    
    <property>
        <description>The number of seconds a pre-flighted request can be cached
        for web services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.max-age</name>
        <value>1800</value>
    </property>
    
    <property>
        <description>Indicate to ResourceManager as well as clients whether
        history-service is enabled or not. If enabled, ResourceManager starts
        recording historical data that Timelien service can consume. Similarly,
        clients can redirect to the history service when applications
        finish if this is enabled.</description>
        <name>yarn.timeline-service.generic-application-history.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Store class name for history store, defaulting to file system
        store</description>
        <name>yarn.timeline-service.generic-application-history.store-class</name>
        <value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
    </property>

    <property>
        <description>Indicate to clients whether Timeline service is enabled or not.
        If enabled, the TimelineClient library used by end-users will post entities
        and events to the Timeline server.</description>
        <name>yarn.timeline-service.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Store class name for timeline store.</description>
        <name>yarn.timeline-service.store-class</name>
        <value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
    </property>
    
    <property>
        <description>Enable age off of timeline store data.</description>
        <name>yarn.timeline-service.ttl-enable</name>
        <value>true</value>
    </property>

    <property>
        <description>Publish YARN information to Timeline Server</description>
        <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Time to live for timeline store data in milliseconds.</description>
        <name>yarn.timeline-service.ttl-ms</name>
        <value>604800000</value>
    </property>

tez-site.xml文件添加如下內容來配置tez ui

<property>
    <name>tez.history.logging.service.class</name>
    <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
    <name>tez.tez-ui.history-url.base</name>
    <value>http://master:8080/tez-ui/</value>
</property>
  1. 在某個節點下例如master節點安裝tomcat,並將${TEZ_HOME}/tez-ui-0.8.5.war文件拷貝到${TOMCAT_HOME}/webapps/下並重命名爲tez-ui.war,如下圖。這就對應上面tez-site.xml文件中的tez.tez-ui.history-url.base屬性值
    在這裏插入圖片描述
  2. 如果tomcat不是安裝在YARN Timeline Server服務啓動的節點,就需要修改tez-ui/scripts/configs.js文件,如下所示,timelineBaseUrlRMWebUrl寫成正確的地址
    在這裏插入圖片描述
  3. 修改hive-site.xml文件,將執行引擎修改爲tez,如下所示
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
    <description/>
</property>
  1. 編輯完對應的文件後,啓動hdfs集羣和yarn集羣以及Timeline Server服務和tomcat
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start historyserver

測試hive on tez

在hive裏執行hql語句後出現如下圖所示的結果,並且能在yarn ui上點開進入到tez ui界面
在這裏插入圖片描述
在這裏插入圖片描述
在這裏插入圖片描述
默認情況下,application對應的歷史文件會存儲在yarn.timeline-service.leveldb-timeline-store.path,默認值是${hadoop.tmp.dir}/yarn/timeline
在這裏插入圖片描述

如果想退回用hive on mr,則可以通過unset命令取消掉當前會話下關於TEZ的環境變量和HADOOP_CLASSPATH,並同時修改hive-site.xml文件中的執行引擎,然後重啓hiveserver2服務重新進入beeline就可以退回了。
如果想再次用hive on tez,則需要source /etc/profile來加載關於TEZ的環境變量和HADOOP_CLASSPATH,並同時修改hive-site.xml文件中的執行引擎,然後重啓hiveserver2服務重新進入beeline

unset HADOOP_CLASSPATH
unset TEZ_CONF_DIR
unset TEZ_HOME
unset TEZ_JARS

beeline -u jdbc:hive2://master:10000 -n root --hiveconf hive.execution.engine=mr

不按照上述操作的話直接換成mr引擎,可能報SuchNoField等錯誤,明顯的版本不兼容。

參考網址

TimelineServer
Tez-install
tez-ui

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章