Tez是從MapReduce計算框架演化而來的通用DAG計算框架,可作爲MapReduceR/Pig/Hive等系統的底層數據處理引擎,它天生融入Hadoop 2.0中的資源管理平臺YARN,且由Hadoop 2.0核心人員精心打造,勢必將會成爲計算框架中的後起之秀
需要的部分庫和工具包gcc make gcc-c++ openssl 其中有兩個phantomjs-2.1.1-linux-x86_64和 nodejs安裝會浪費點時間
官網下載TEZ源碼後解壓編譯
注意更改pom中hadoop version或在mvn中設定自己hadoop版本
mvn package -Dhadoop.version=2.7.2 -DskipTests -Dmaven.javadoc.skip=true
1.使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解壓在/opt/single/tez,
在$TEZ_HOME下建立conf,創建tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
</configuration>
2.設置linux的環境變量export TEZ_HOME=/opt/single/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME
3.在hadoop-env.sh中添加如下:export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*
mapred-size.xml設置 <property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
4.啓動hadoop將編譯的tez-0.8.4-minimal.tar.gz上傳到hdfs://hadoop:9000/apps/tez-0.8.4/目錄下
5.關於TEZ UI的設置如下:
在yarn-site.xml中添加:
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>hadoop</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:2191</value>
</property>
在tez-site.xml中添加:
<property>
<description>Enable Tez to use the Timeline Server for History Logging</description>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<!--自己安裝的tomcat的端口號-->
<name>tez.tez-ui.history-url.base</name>
<value>http://hadoop:8008/tez-ui/</value>
</property>
<property>
<name>tez.runtime.convert.user-payload.to.history-text</name>
<value>true</value>
</property>
<property>
<name>tez.task.generate.counters.per.io</name>
<value>true</value>
</property>
6.tomcat配置:
安裝tomcat這裏省略,網上很多
然後將tez-ui-0.8.4.war,tez-ui2-0.8.4.war解壓到tomcat的webapps/目錄下
mkdir -pv /opt/modules/tomcat-7.0.69/webapps/tez-ui /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp /opt/single/tez/tez-ui-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui
cp /opt/single/tez/tez-ui2-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf tez-ui-0.8.4.war
jar xvf tez-ui2-0.8.4.war
配置webapps/tez-ui/scripts/config.js文件timelineBaseUrl: 'http://hadoop:8188',
RMWebUrl: 'http://hadoop:8088',
tomcat設置端口:8008/opt/modules/tomcat-7.0.69/conf/ server.xml
<Connector port="8008" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
7.測試:
啓動
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start timelineserver
startup.sh
hadoop jar /opt/single/tez/tez-tests-0.8.4.jar testorderedwordcount /data/data1 /output2
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim26, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:28 INFO client.TezClientUtils: Permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
16/08/27 00:33:28 INFO examples.TestOrderedWordCount: Creating Tez Session
16/08/27 00:33:28 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.8.4, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-25T08:17:01Z ]
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:28 INFO client.TezClient: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.TezClient: Session mode. Starting session.
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
16/08/27 00:33:29 INFO client.TezClient: Tez system stage directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_1472222203999_0005 doesn't exist and is created
16/08/27 00:33:29 INFO acls.ATSHistoryACLPolicyManager: Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1472222203999_0005
16/08/27 00:33:29 INFO impl.YarnClientImpl: Submitted application application_1472222203999_0005
16/08/27 00:33:29 INFO client.TezClient: The url to track the Tez Session: http://hadoop:8088/proxy/application_1472222203999_0005/
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/data/data1, outputPath=/output2
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state
16/08/27 00:33:32 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=1
16/08/27 00:33:32 INFO client.TezClient: Submitting dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1, callerContext={ context=Tez, callerType=TestOrderedWordCount, callerId=application_1472222203999_0005_1 }
16/08/27 00:33:33 INFO client.TezClient: Submitted dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1
16/08/27 00:33:33 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:33 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=1
省略數百行....
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: DAG 1 completed. FinalState=SUCCEEDED
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: Shutting down session
16/08/27 00:33:37 INFO client.TezClient: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005
測試tez是否能運行,然後在yarn的ui上觀察tez的運行狀況,
http://hadoop:8088/cluster
確認無誤後可以測試hive
此處可選配置:在hive-site.xml中添加如下:
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
或者在~/.hiverc中添加
set hive.execution.engine=tez;
或者直接啓動hive在命令行中設置以上set命令
然後執行hive查詢
比如:
hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select data1,data2 from test1 order by data1;
Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1472222203999_0006)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 SUCCEEDED 0 0 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 3.66 s
--------------------------------------------------------------------------------
OK
data1 data2
Time taken: 6.346 seconds
hive (default)>
出現以上顯示說明配置成功,同時可以在ui上產看詳細dag信息
點擊ApplicationMaster鏈接到TEZ的UI上如下圖:
選擇對應的Dag Name鏈接可以查看詳細內容如下:
也可以在hadoop:8008/tez-ui2/中查看