oozie概述:oozie能幹什麼
oozie格式:怎麼用oozie
oozie執行:怎麼運行oozie
oozie概述:
oozie是基於hadoop的調度器,以xml的形式寫調度流程,可以調度mr,pig,hive,shell,jar等等。
主要的功能有
Workflow: 順序執行流程節點,支持fork(分支多個節點),join(合併多個節點爲一個)
Coordinator,定時觸發workflow
Bundle Job,綁定多個coordinator
oozie格式:
寫一個oozie,有兩個是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)
一、job.properties裏定義環境變量
nameNode | hdfs://xxx5:8020 | hdfs地址 |
jobTracker | xxx5:8034 | jobTracker地址 |
queueName | default | oozie隊列 |
examplesRoot | examples | 全局目錄 |
oozie.usr.system.libpath | true | 是否加載用戶lib庫 |
oozie.libpath | share/lib/user | 用戶lib庫 |
oozie.wf.appication.path | ${nameNode}/user/${user.name}/... | oozie流程所在hdfs地址 |
注意:
workflow:oozie.wf.application.path
coordinator:oozie.coord.application.path
bundle:oozie.bundle.application.path
二、XML
1.workflow:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="wf-example1">
<start to="pig-node">
<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="hdfs://xxx5/user/hadoop/appresult" />
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
<property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
<property>
<property>
<name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
<value>false</value>
<property>
</configuration>
<script>test.pig</script>
<param>filepath=${filpath}</param>
</pig>
<ok to="end">
<error to="fail">
</action>
<kill name="fail">
<message>
Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name="end"/>
</workflow-app>
2.coordinator
- <coordinator-app name="cron-coord" frequence="${coord:hours(6)}" start="${start}" end="${end}"
- timezoe="UTC" xmlns="uri:oozie:coordinator:0.2">
- <action>
- <workflow>
- <app-path>${nameNode}/user/{$coord:user()}/${examplesRoot}/wpath</app-path>
- <configuration>
- <property>
- <name>jobTracker</name>
- <value>${jobTracker}</value>
- </property>
- <property>
- <name>nameNode</name>
- <value>${nameNode}</value>
- </property>
- <property>
- <name>queueName</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- </workflow>
- </action>
注意:coordinator設置的UTC,比北京時間晚8個小時,所以你要是把期望執行時間減8小時
3.bundle
- <bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>
- <controls>
- <kick-off-time>${kickOffTime}</kick-off-time>
- </controls>
- <coordinator name='coordJobFromBundle1' >
- <app-path>${appPath}</app-path>
- <configuration>
- <property>
- <name>startTime1</name>
- <value>${START_TIME}</value>
- </property>
- <property>
- <name>endTime1</name>
- <value>${END_TIME}</value>
- </property>
- </configuration>
- </coordinator>
- <coordinator name='coordJobFromBundle2' >
- <app-path>${appPath2}</app-path>
- <configuration>
- <property>
- <name>startTime2</name>
- <value>${START_TIME2}</value>
- </property>
- <property>
- <name>endTime2</name>
- <value>${END_TIME2}</value>
- </property>
- </configuration>
- </coordinator>
- </bundle-app>
oozie hive
- <action name="hive-app">
- <hive xmlns="uri:oozie:hive-action:0.2">
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <job-xml>hive-site.xml</job-xml>
- <script>hivescript.q</script>
- <param>yyyymmdd=${yyyymmdd}</param>
- <param>yesterday=${yesterday}</param>
- <param>lastmonth=${lastmonth}</param>
- </hive>
- <ok to="result-stat-join"/>
- <error to="fail"/>
- </action>
oozie運行
啓動任務:
- oozie job -oozie http://xxx5:11000/oozie -config job.properties -run
停止任務:
oozie job -oozie http://localhost:8080/oozie -kill 14-20090525161321-oozie-joe
注意:在停止任務的時候,有的時候會出現全線問題,需要修改oozie-site.xml文件
hadoop.proxyuser.oozie.groups *
hadoop.proxyuser.oozie.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.groups *