環境介紹
組件 | 版本 |
---|---|
hadoop | 2.6.5 |
hive | 2.3.6 |
tez | 0.8.5 |
tez對hadoop版本是有要求的。tez 0.8及以上需要hadoop 2.6及以上。tez 0.9及以上需要hadoop 2.7及以上。
下載、安裝、配置TEZ
- 從清華鏡像站下載對應版本的tez如
apache-tez-0.8.5-bin.tar.gz
,解壓後放在/usr/local/src
目錄下並建立軟連接。如下圖所示。tez官網介紹的是用源碼編譯的方式獲取tez,由於源碼編譯太慢了,直接採用編譯好的tez包apache-tez-0.8.5-bin.tar.gz
。
- 在hdfs上創建目錄,將tez的tar包拷貝到對應目錄。其中
tez.tar.gz
包放在${TEZ_HOME}/share
目錄下。
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -put tez/share/tez.tar.gz /apps/tez
- 編寫
tez-site.xml
文件,放在${HADOOP_HOME}/etc/hadoop
目錄下,內容如下。
設置tez.lib.uris
屬性指向剛剛上傳到hdfs上的tez.tar.gz
路徑。編寫完成後拷貝tez-site.xml
文件到所有節點的${HADOOP_HOME}/etc/hadoop
目錄下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez/tez.tar.gz</value>
</property>
</configuration>
- 給每個node都配置hadoop classpath環境變量,使其包括tez libraries。
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export TEZ_HOME=/usr/local/src/tez
export TEZ_JARS=${TEZ_HOME}/*:${TEZ_HOME}/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
- 配置tez ui
tez ui依賴於YARN timelineserver
服務。在hadoop2.4版本之前對任務執行的監控只開發了針對MR的Job History Server,它可以提供給用戶用戶查詢已經運行完成的作業的信息,但是後來,隨着在YARN上面集成的越來越多的計算框架,比如spark、Tez,也有必要爲基於這些計算引擎的技術開發相應的作業任務監控工具,所以hadoop的開發人員就考慮開發一款更加通用的Job History Server
,即YARN Timeline Server
。
在yarn-site.xml
文件添加如下內容配置YARN Timeline Server
。更加詳細的配置可參考TimelineServer
特別需要注意的是yarn.timeline-service.hostname
需要改成啓動TimelineServer
服務的節點地址,如我在master機器上啓動,這裏就寫master。官網默認是0.0.0.0,這樣DAG Master會報錯找不到TimelineServer。改成真正的hostname即可。
<!--configurations for timelineserver-->
<property>
<name>yarn.timeline-service.hostname</name>
<value>master</value>
</property>
<property>
<description>Address for the Timeline server to start the RPC server.</description>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<description>The http address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<description>The https address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:8190</value>
</property>
<property>
<description>Handler thread count to serve the client RPC requests.</description>
<name>yarn.timeline-service.handler-thread-count</name>
<value>10</value>
</property>
<property>
<description>The max number of applications could be fetched by using REST API
or application history protocol and shown in timeline server web ui. Defaults
to `10000`.</description>
<name>yarn.timeline-service.generic-application-history.max-applications</name>
<value>10000</value>
</property>
<property>
<description>Enables cross-origin support (CORS) for web services where
cross-origin web response headers are needed. For example, javascript making
a web services request to the timeline server.</description>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<description>Comma separated list of origins that are allowed for web
services needing cross-origin (CORS) support. Wildcards (*) and patterns
allowed</description>
<name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
<value>*</value>
</property>
<property>
<description>Comma separated list of methods that are allowed for web
services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
<value>GET,POST,HEAD</value>
</property>
<property>
<description>Comma separated list of headers that are allowed for web
services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
<value>X-Requested-With,Content-Type,Accept,Origin</value>
</property>
<property>
<description>The number of seconds a pre-flighted request can be cached
for web services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.max-age</name>
<value>1800</value>
</property>
<property>
<description>Indicate to ResourceManager as well as clients whether
history-service is enabled or not. If enabled, ResourceManager starts
recording historical data that Timelien service can consume. Similarly,
clients can redirect to the history service when applications
finish if this is enabled.</description>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<description>Store class name for history store, defaulting to file system
store</description>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
</property>
<property>
<description>Indicate to clients whether Timeline service is enabled or not.
If enabled, the TimelineClient library used by end-users will post entities
and events to the Timeline server.</description>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<description>Store class name for timeline store.</description>
<name>yarn.timeline-service.store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
</property>
<property>
<description>Enable age off of timeline store data.</description>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>
<property>
<description>Publish YARN information to Timeline Server</description>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<description>Time to live for timeline store data in milliseconds.</description>
<name>yarn.timeline-service.ttl-ms</name>
<value>604800000</value>
</property>
在tez-site.xml
文件添加如下內容來配置tez ui
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.tez-ui.history-url.base</name>
<value>http://master:8080/tez-ui/</value>
</property>
- 在某個節點下例如master節點安裝tomcat,並將
${TEZ_HOME}/tez-ui-0.8.5.war
文件拷貝到${TOMCAT_HOME}/webapps/
下並重命名爲tez-ui.war
,如下圖。這就對應上面tez-site.xml
文件中的tez.tez-ui.history-url.base
屬性值
- 如果tomcat不是安裝在
YARN Timeline Server
服務啓動的節點,就需要修改tez-ui/scripts/configs.js
文件,如下所示,timelineBaseUrl
和RMWebUrl
寫成正確的地址
- 修改
hive-site.xml
文件,將執行引擎修改爲tez,如下所示
<property>
<name>hive.execution.engine</name>
<value>tez</value>
<description/>
</property>
- 編輯完對應的文件後,啓動hdfs集羣和yarn集羣以及
Timeline Server
服務和tomcat
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start historyserver
測試hive on tez
在hive裏執行hql語句後出現如下圖所示的結果,並且能在yarn ui上點開進入到tez ui界面
默認情況下,application對應的歷史文件會存儲在yarn.timeline-service.leveldb-timeline-store.path
,默認值是${hadoop.tmp.dir}/yarn/timeline
如果想退回用hive on mr,則可以通過unset命令取消掉當前會話下關於TEZ的環境變量和HADOOP_CLASSPATH,並同時修改hive-site.xml文件中的執行引擎,然後重啓hiveserver2服務重新進入beeline
就可以退回了。
如果想再次用hive on tez,則需要source /etc/profile
來加載關於TEZ的環境變量和HADOOP_CLASSPATH,並同時修改hive-site.xml文件中的執行引擎,然後重啓hiveserver2服務重新進入beeline
。
unset HADOOP_CLASSPATH
unset TEZ_CONF_DIR
unset TEZ_HOME
unset TEZ_JARS
beeline -u jdbc:hive2://master:10000 -n root --hiveconf hive.execution.engine=mr
不按照上述操作的話直接換成mr引擎,可能報SuchNoField
等錯誤,明顯的版本不兼容。