在使用Oozie的時候,如果要實現兩個action並行跑,可以使用fork;
如果要實現一個workflow中調用另一個workflow,可以使用subworkflow;
這裏介紹一下如何在oozie中同時使用fork與subworkflow
文件結構如圖:
fork:
p1.sh
#!/bin/bash
date > date.log
/home/hadoop/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put -f date.log /user/hadoop/
workflow.xml
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app
xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell
xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<!-- <argument>my_output=Hello Oozie</argument> -->
<file>/user/hadoop/oozie-apps/subwf_fork/fork/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="fork"/>
<error to="fail"/>
</action>
<fork name="fork">
<path start="fork1"/>
<path start="fork2"/>
</fork>
<action name="fork1">
<sub-workflow>
<app-path>/user/hadoop/oozie-apps/subwf_fork/fork1/workflow.xml</app-path>
<propagate-configuration/>
</sub-workflow>
<ok to="joining"/>
<error to="fail"/>
</action>
<action name="fork2">
<sub-workflow>
<app-path>/user/hadoop/oozie-apps/subwf_fork/fork2/workflow.xml</app-path>
<propagate-configuration/>
</sub-workflow>
<ok to="joining"/>
<error to="fail"/>
</action>
<join name="joining" to="end"/>
<decision name="check-output">
<switch>
<case to="end">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<default to="fail-output"/>
</switch>
</decision>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
- fork:定義並行任務
- path:從哪個任務開始
- sub-workflow:調用另外的workflow
- propagate-configuration:自定義配置信息
- joining:合併fork任務
詳細信息可以查看官網:https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.6_Sub-workflow_Action
fork1:
p1.sh
#!/bin/bash
/home/hadoop/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -cat /user/hadoop/date.log
workflow.xml
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app
xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell
xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<!-- <argument>my_output=Hello Oozie</argument> -->
<file>/user/hadoop/oozie-apps/subwf_fork/fork1/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<decision name="check-output">
<switch>
<case to="end">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<default to="fail-output"/>
</switch>
</decision>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
fork2:
p1.sh
#!/bin/bash
echo 'Successfully!!' | /home/hadoop/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put -f - /user/hadoop/oozie.log
job.properties
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#HDFS地址
nameNode=hdfs://hadoop01:8020
#ResourceManager地址
jobTracker=hadoop03:8032
#隊列名稱
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/subwf_fork/fork/workflow.xml
EXEC=p1.sh
上傳Linux,同步到HDFS,並提交任務
hdfs dfs -put /home/hadoop/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps/subwf_fork /user/hadoop/oozie-apps/
oozie job -oozie http://hadoop03:11000/oozie -config /home/hadoop/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps/subwf_fork/job.properties -run
去webUIhttp://hadoop03:11000/oozie/ 查看,此時可以看到正在run fork文件夾裏面的東西
點擊一下如圖所示的刷新按鈕,可以看到兩個子workflow也開始run了起來
點擊Done Jobs,可以看到任務執行成功~~
可以驗證一下
hdfs dfs -cat /user/hadoop/date.log
hdfs dfs -cat /user/hadoop/oozie.log