背景
以官方例子爲例,記錄下如何使用oozie調度任務,首先進入oozie的解壓根目錄
調度普通任務
1、解壓oozie根目錄裏的oozie-examples.tar.gz
# tar -zxvf oozie-examples.tar.gz
2、在oozie根目錄新建目錄,將解壓得到的examples裏的app/shell目錄複製其中
# mkdir oozie-apps
# cp -r examples/apps/shell/ oozie-apps/
3、切換到oozie-apps/shell目錄下,編寫待執行腳本p1.sh
# cd oozie-apps/shell/
# vim p1.sh
p1.sh負責輸出當前日期到指定文件
/sbin/date > /home/szc/p1.log
4、修改job.properties文件
# vim job.properties
將nameNode、jobTracker的ip換成自己的,修改exapmplesRoot和oozie.wf.application.path,定義變量EXEC爲腳本在examplesRoot下的路徑,這裏就是p1.sh
nameNode=hdfs://192.168.57.141:8020
jobTracker=192.168.57.141:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC=p1.sh
5、修改workflow.xml文件
# vim workflow.xml
start標籤裏定義shell-node結點就是任務的起點,所以我們修改shell-node結點,將exec標籤修改爲${EXEC},表示引用EXEC變量的值;增加file標籤,內容爲腳本文件在hdfs中的路徑#腳本文件;刪除原有的check-output和fail-output標籤即可
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<argument>my_output=Hello Oozie</argument>
<file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
6、上傳oozie-apps到hdfs中/user/${user.name}目錄中,這裏用戶名爲root
# cd ../..
# /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/root
上傳完成後,可以在namenode的webui中找到這個目錄和其中三個文件
7、執行任務,指定oozie地址和配置文件,這裏配置文件路徑爲hdfs中用戶名目錄下的路徑
# bin/oozie job -oozie http://192.168.57.141:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000003-200428221417435-oozie-root-W
執行成功會得到job的id,並且可以在oozie的webui中看到此任務,並且狀態爲RUNNING
多刷新幾次,就會在此頁面的Done Jobs裏看到已經完成的此任務
任務成功後,我們可以在執行此任務的namenode的/home/szc下看到ps1.log
$ cat /home/szc/p1.log
Thu Apr 30 08:50:57 CST 2020
調度多個任務
1、在oozie根目錄\oozie-apps\shell目錄下新建ps2.sh,內容如下
/sbin/date > /home/szc/p2.sh
2、修改job.properties,加入EXEC2
nameNode=hdfs://192.168.57.141:8020
jobTracker=192.168.57.141:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC1=p1.sh
EXEC2=p2.sh
3、修改workflow.xml,加入shell-node2結點,並修改EXEC相關內容
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node1"/>
<action name="shell-node1">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC1}</exec>
<argument>my_output=Hello Oozie</argument>
<file>/user/root/oozie-apps/shell/${EXEC1}#${EXEC1}</file>
<capture-output/>
</shell>
<ok to="shell-node2"/>
<error to="fail"/>
</action>
<action name="shell-node2">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC2}</exec>
<argument>my_output=Hello Oozie</argument>
<file>/user/root/oozie-apps/shell/${EXEC2}#${EXEC2}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
4、刪除hdfs中原有的shell,上傳新的
# /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r /user/root/oozie-apps/shell
20/04/30 09:10:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/30 09:10:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/root/oozie-apps/shell
# /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/shell /user/root/oozie-apps
20/04/30 09:10:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5、執行oozie任務
# bin/oozie job -oozie http://192.168.57.141:11000/oozie -config oozie-apps/shell/job.properties -run
完成後,可以在webui對應任務中的Job DAG中看到任務的有向無環圖
最後,在執行map任務的namenode的/home/szc目錄下可以看到ps1.log和ps2.sh
$ cat /home/szc/p1.log
Thu Apr 30 09:11:14 CST 2020
$ cat /home/szc/p2.sh
Thu Apr 30 09:11:22 CST 2020
調度定時任務
切換到oozie根目錄
1、修改conf/oozie-site.xml,在最後添加時區設置
# vim conf/oozie-site.xml
對應代碼如下
<property>
<name>oozie.processing.timezone</name>
<value>GMT+0800</value>
</property>
2、修改oozie-server/webapps/oozie/oozie-console.js文件,修改時區
# vim oozie-server/webapps/oozie/oozie-console.js
對應代碼如下
function getTimeZone() {
Ext.state.Manager.setProvider(new Ext.state.CookieProvider());
return Ext.state.Manager.get("TimezoneId","GMT+0800");
}
3、重啓oozie服務,清空瀏覽器緩存
# bin/oozied.sh stop
# bin/oozied.sh start
清空瀏覽器緩存,重啓瀏覽器,再次進入oozie的webui,查看Done jobs,可以發現時間都變成了東八區時間
4、把examples目錄下apps/cron目錄複製到oozie-apps目錄下,並切換到裏面
# cp -r examples/apps/cron oozie-apps/
# cd oozie-apps/cron/
5、修改job.properties、workflow.xml和coordinator.xml文件
job.properties主要修改namenode的ip、hdfs目錄、起止時間。注意,起止時間必須設置成未來時間
nameNode=hdfs://192.168.57.141:8020
jobTracker=192.168.57.141:8032
queueName=default
examplesRoot=oozie-apps
oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/cron
start=2020-04-30T11:02+0800 # 開始時間
end=2020-04-30T11:30+0800 # 結束時間
workflowAppUri=${nameNode}/user/${user.name}/${examplesRoot}/cron
EXEC3=p3.sh
p3.sh就是把時間追加到某個文件中
date >> /home/szc/p3.log
workflow.xml主要修改執行的腳本
<workflow-app xmlns="uri:oozie:workflow:0.5" name="one-op-wf">
<start to="action1"/>
<action name="action1">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC3}</exec>
<argument>my_output=Hello Oozie</argument>
<file>/user/root/oozie-apps/cron/${EXEC3}#${EXEC3}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="end"/>
</action>
<end name="end"/>
</workflow-app>
coordinator.xml主要修改任務的時間間隔(frequency)爲5分鐘
<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
6、回到oozie根目錄下,把cron目錄上傳到hdfs中
# /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/cron/ /user/root/oozie-apps/
7、執行任務
# bin/oozie job -oozie http://192.168.57.141:11000/oozie -config oozie-apps/cron/job.properties -run
8、到時間時,在oozie的webUI界面中的Coordinator Jobs標籤下,可以看到我們的定時任務
點擊它,發現11點2分和7分的任務已經執行完畢
並且在文件中也有對應輸出
# cat /home/szc/p3.log
Thu Apr 30 11:02:07 CST 2020
Thu Apr 30 11:07:08 CST 2020
調度MapReduce任務
調度MR任務和調度普通任務類似,只是需要在workflow.xml中指定mapper等屬性
1、切換到hadoop根目錄下,構建wordcount.txt文件,作爲測試用例
2、上傳wordcount.txt到hdfs,用yarn運行待測MR任務的jar包
# bin/hadoop fs -put wordcount.txt /user/root
# bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/root/wordcount.txt /user/root/out
3、完成後,在集羣的webui界面找到詞頻統計應用,點擊History
進入如下界面,點擊右側的Configuration,查看此任務的配置
在右側搜索框中搜索屬性,包括map.class、reduce.class、combine.class、output.key、output.value、mapper.new-api、reducer.new-api
4、切換到oozie根目錄,把examples/apps/map-reduce目錄複製到oozie-apps目錄下
# cp -r examples/apps/map-reduce/ oozie-apps/
5、進入oozie-apps/map-reduce目錄,修改job.properties文件
nameNode=hdfs://192.168.57.141:8020
jobTracker=192.168.57.141:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
outputDir=map-reduce
6、修改workflow.xml文件,指定map.class、reduce.class、combine.class、output.key、output.value、mapper.new-api、reducer.new-api等屬性
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.map.class</name>
<value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.combine.class</name>
<value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
</property>
<property>
<name>mapreduce.job.reduce.class</name>
<value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/user/root/wordcount.txt</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/root/output</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
7、刪除lib目錄中原有jar包,替換成待測jar包
# rm -rf lib/*
# cp /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar lib/
8、上傳mapreduce目錄到hdfs中/user/root/oozie-apps目錄下
# cd ..
# /home/szc/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put map-reduce/ /user/root/oozie-apps/
9、執行任務
# bin/oozie job -oozie http://192.168.57.141:11000/oozie -config oozie-apps/map-reduce/job.properties -run
執行完成後,可以在hdfs中我們指定的輸出目錄下看到結果
結語
總結一下oozie這個用於實時任務的調度的hadoop組件
功能模塊:
1、workflow:順序執行流程結點,支持fork、join
2、coordinator:定時觸發workflow,控制任務的開始和結束
workflow常用結點:
1、控制流結點:控制工作流的開始、結束以及執行路徑等
2、動作結點:執行具體的動作,比如拷貝文件、執行腳本等