重點介紹Pig、Hive使用Tez模式在Oozie上的調度
1.具體介紹
http://dongxicheng.org/mapreduce-nextgen/apache-tez-optimizations/
http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/
2.Pig with tez
2.1 本地提交(集羣支持tez模式)
pig -x tez t.pig
2.2 oozie調度
(1)配置workflow (注意變紅的)
<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">
<start to="report1"/>
<action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >
<pig>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<job-xml>${oozie_app_path}/workflow/job.xml</job-xml>
<configuration>
<property>
<name>exectype</name>
<value>tez</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${queue_name}</value>
</property>
</configuration>
<script>script/report_monitor.pig</script>
<param>input1=${input1}</param>
<param>input12=${input12}</param>
<param>house_type=1</param>
<file>lib/udf-1.0.0.jar</file>
<file>conf/hive-site.xml</file>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
(2)將程序運行中需要的jar包,拷貝到workflow下的lib目錄下面(以下幾個必須要的)
commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar
tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar
一個解決上傳jar包的方法,是將所有tez依賴的包傳遞到集羣的share lib中
3.Hive with tez
3.1 本地提交(集羣支持tez模式)
在hive腳本中添加
set hive.execution.engine=tez;
將hive的執行引擎設置成tez
3.2 oozie調度
(1)配置workflow (注意變紅的)
<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">
<start to="report1"/>
<action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >
<pig>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<job-xml>${oozie_app_path}/workflow/job.xml</job-xml>
<configuration>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${queue_name}</value>
</property>
</configuration>
<script>script/report_monitor.sql</script>
<param>input1=${input1}</param>
<param>input12=${input12}</param>
<param>house_type=1</param>
<file>lib/udf-1.0.0.jar</file>
<file>conf/hive-site.xml</file>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
(2)將程序運行中需要的jar包,拷貝到workflow下的lib目錄下面(以下幾個必須要的)
commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar
tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar
一個解決上傳jar包的方法,是將所有tez依賴的包傳遞到集羣的share lib中