Hive2.3.6更換TEZ引擎
前提環境
1. hadoop 我的是2.7.1
2. hive 我的是2.3.6
Tez環境準備
-
下載Tez的安裝包解壓 下載路徑
-
環境準備
-
進去tez安裝目錄下
-
[root@hadoop001 conf]# vi tez-site.xml
<?xml version="1.0" encoding="UTF-8"?> <configuration> <!-- hdfs上tez的壓縮包 基礎使用配這一個屬性就行 --> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/tez-0.9.0/tez.tar.gz</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> <property> <name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> </property> </configuration>
-
上傳tez壓縮包到HDFS上
- hdfs dfs -mkdir /tez-0.9.0
- hdfs dfs -put /usr/local/tez-0.9.0/share/tez.tar.gz /tez-0.9.0
-
tez下的lib目錄中的hadoop包的版本和真實安裝的hadoop版本不一致,需要將其jar包換成一致。
#刪除不符合版本的jar: [root@hadoop01 tez-0.9.0]# rm -rf ./lib/hadoop-mapreduce-client-core-2.7.0.jar ./lib/hadoop-mapreduce-client-common-2.7.0.jar #重新再hadoop目錄中拷貝: [root@hadoop01 tez-0.9.0]# cp /usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar /usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar /usr/local/tez-0.9.0/lib/
-
安裝Tez在Hive上
這種方式侵入性不強,配置完以後只有hive能隨意切換引擎。而別的運行在hadoop的mr程序則還是隻能走原來的yarn引擎。
-
配置hive-env-sh 新增以下內容
#TEZ_HOME根據你實際情況來 export TEZ_HOME=/opt/moudle/tez export TEZ_JARS="" for jar in `ls $TEZ_HOME |grep jar`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar done #這個lzo的包去自己hadoop下找對應的 export HIVE_AUX_JARS_PATH=/opt/moudle/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$TEZ_JARS
-
如果嫌棄上面增加這麼多代碼的過程很麻煩。。就可以直接將tez下的jar和tez/lib下的jar拷貝到$hive_home/lib下
-
啓動hive測試
hive (default)> set hive.execution.engine=tez; hive (default)> select > id, > count(id) > from zzy.l1 > group by id; 03:38:10.961 [59fd501b-ea06-4ce3-83e3-24b71a218eb8 main] ERROR org.apache.hadoop.hdfs.KeyProviderCache - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! Query ID = root_20190919033809_a5d0a892-78d7-4c21-ae26-234fc6429664 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1568834471830_0001) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.99 s ----------------------------------------------------------------------------------------------
出現類似的結果就代表成功了。
-
但是這樣在hive如果和hadoop的master在同一個結點上是沒問題的,如果兩個任務不在同一個機器上(考慮兩個都很費資源,保證穩定的時候)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
查詢日誌
File does not exist: hdfs:/user/zzy #zzy是我hive運行的用戶
知道問題就好解決了,說我在HDFS沒有zzy這個用戶的目錄,創建一個就好。在運行tez引擎就沒有錯誤了。
安裝Tez在Hadoop上
這種方式侵入性很強,對原hadoop集羣有一定影響。會使得所有在yarn上運行的mapreduce都只能走tez引擎,所有hive運行的時候自然也是tez。
-
將 tez-site.xml 複製一份到hadoop配置目錄下($HADOOP_HOME/etc/hadoop)
-
修改hadoop-env.sh
# 新增以下內容 export TEZ_HOME=/opt/moudle/tez #是你的tez的解壓安裝目錄 for jar in `ls $TEZ_HOME |grep jar`; do export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar done
-
修改mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn-tez</value> </property>
-
同步集羣
-
隨便上傳一個文件到HDFS上
-
進去tez的安裝目錄
hadoop jar ./tez-examples-0.9.0.jar orderedwordcount /LICENSE /out #/LICENSE是我上傳的問題
-
這時候去看8088,會看到跑的是給TEZ的任務,就成功了
-
測試Hive是否跑的直接就是Tez任務
-
直接執行一個語句
hive (default)> select > id, > count(id)· > from zzy.l1 > group by id; Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2019-09-19 04:10:56,905 Stage-1 map = 0%, reduce = 0% 2019-09-19 04:11:01,094 Stage-1 map = 100%, reduce = 100% Ended Job = job_1568837085850_0002 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK A 8 B 7 Time taken: 17.341 seconds, Fetched: 2 row(s)
-
感覺好像還是走的MR? 但是去看8088可以看出確實走的TEZ,只是沒出那炫酷的進度條~
-
set hive.execution.engine=tez; 加上這個就能顯示進度條了
-