hive 使用TEZ的安裝配置

爲了更高效地運行存在依賴關係的作業(比如Pig和Hive產生的MapReduce作業),減少磁盤和網絡IO,Hortonworks開發了DAG計算框架Tez。

Tez是從MapReduce計算框架演化而來的通用DAG計算框架,可作爲MapReduceR/Pig/Hive等系統的底層數據處理引擎,它天生融入Hadoop 2.0中的資源管理平臺YARN,且由Hadoop 2.0核心人員精心打造,勢必將會成爲計算框架中的後起之秀

需要的部分庫和工具包gcc make gcc-c++ openssl 其中有兩個phantomjs-2.1.1-linux-x86_64和 nodejs安裝會浪費點時間

官網下載TEZ源碼後解壓編譯

注意更改pom中hadoop version或在mvn中設定自己hadoop版本

mvn package -Dhadoop.version=2.7.2 -DskipTests -Dmaven.javadoc.skip=true
1.使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解壓在/opt/single/tez,

在$TEZ_HOME下建立conf,創建tez-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>tez.lib.uris</name>
		<value>hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz</value>
	</property>
	<property>
		<name>tez.use.cluster.hadoop-libs</name>
		<value>true</value>
	</property>
</configuration>
2.設置linux的環境變量
export TEZ_HOME=/opt/single/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME
3.在hadoop-env.sh中添加如下:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*
mapred-size.xml設置
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn-tez</value>
	</property>
4.啓動hadoop將編譯的tez-0.8.4-minimal.tar.gz上傳到hdfs://hadoop:9000/apps/tez-0.8.4/目錄下

5.關於TEZ UI的設置如下:

在yarn-site.xml中添加:

<property>
		<name>yarn.timeline-service.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.timeline-service.hostname</name>
		<value>hadoop</value>
	</property>
	<property>
		<name>yarn.timeline-service.http-cross-origin.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.timeline-service.webapp.address</name>
		<value>${yarn.timeline-service.hostname}:8188</value>
	</property>
	<property>
		<name>yarn.timeline-service.webapp.https.address</name>
		<value>${yarn.timeline-service.hostname}:2191</value>
	</property>
在tez-site.xml中添加:

	<property>
		<description>Enable Tez to use the Timeline Server for History Logging</description>
		<name>tez.history.logging.service.class</name>
		<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
	</property>

	<property>
		<!--自己安裝的tomcat的端口號-->
		<name>tez.tez-ui.history-url.base</name>
		<value>http://hadoop:8008/tez-ui/</value>
	</property>

	<property>
		<name>tez.runtime.convert.user-payload.to.history-text</name>
		<value>true</value>
	</property>

	<property>
		<name>tez.task.generate.counters.per.io</name>
		<value>true</value>
	</property>
6.tomcat配置:

安裝tomcat這裏省略,網上很多

然後將tez-ui-0.8.4.war,tez-ui2-0.8.4.war解壓到tomcat的webapps/目錄下

mkdir -pv /opt/modules/tomcat-7.0.69/webapps/tez-ui  /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp /opt/single/tez/tez-ui-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui
cp /opt/single/tez/tez-ui2-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf tez-ui-0.8.4.war
jar xvf tez-ui2-0.8.4.war
配置webapps/tez-ui/scripts/config.js文件
timelineBaseUrl: 'http://hadoop:8188',
RMWebUrl: 'http://hadoop:8088',
tomcat設置端口:8008

/opt/modules/tomcat-7.0.69/conf/ server.xml
     <Connector port="8008" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
7.測試:

啓動

start-dfs.sh
start-yarn.sh
yarn-daemon.sh start timelineserver
startup.sh
hadoop jar /opt/single/tez/tez-tests-0.8.4.jar testorderedwordcount /data/data1 /output2
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim26, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:28 INFO client.TezClientUtils: Permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
16/08/27 00:33:28 INFO examples.TestOrderedWordCount: Creating Tez Session
16/08/27 00:33:28 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.8.4, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-25T08:17:01Z ]
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:28 INFO client.TezClient: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.TezClient: Session mode. Starting session.
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
16/08/27 00:33:29 INFO client.TezClient: Tez system stage directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_1472222203999_0005 doesn't exist and is created
16/08/27 00:33:29 INFO acls.ATSHistoryACLPolicyManager: Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1472222203999_0005
16/08/27 00:33:29 INFO impl.YarnClientImpl: Submitted application application_1472222203999_0005
16/08/27 00:33:29 INFO client.TezClient: The url to track the Tez Session: http://hadoop:8088/proxy/application_1472222203999_0005/
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/data/data1, outputPath=/output2
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state
16/08/27 00:33:32 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=1
16/08/27 00:33:32 INFO client.TezClient: Submitting dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1, callerContext={ context=Tez, callerType=TestOrderedWordCount, callerId=application_1472222203999_0005_1 }
16/08/27 00:33:33 INFO client.TezClient: Submitted dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1
16/08/27 00:33:33 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:33 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=1
省略數百行....
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: DAG 1 completed. FinalState=SUCCEEDED
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: Shutting down session
16/08/27 00:33:37 INFO client.TezClient: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005
測試tez是否能運行,然後在yarn的ui上觀察tez的運行狀況,

http://hadoop:8088/cluster

確認無誤後可以測試hive

此處可選配置:在hive-site.xml中添加如下:

	<property>
		<name>hive.execution.engine</name>
		<value>tez</value>
	</property>
或者在~/.hiverc中添加

set hive.execution.engine=tez;

或者直接啓動hive在命令行中設置以上set命令

然後執行hive查詢

比如:

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select data1,data2 from test1 order by data1;
Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1472222203999_0006)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1              SUCCEEDED      0          0        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.66 s     
--------------------------------------------------------------------------------
OK
data1	data2
Time taken: 6.346 seconds
hive (default)> 
出現以上顯示說明配置成功,同時可以在ui上產看詳細dag信息


點擊ApplicationMaster鏈接到TEZ的UI上如下圖:

選擇對應的Dag Name鏈接可以查看詳細內容如下:

也可以在hadoop:8008/tez-ui2/中查看

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章