oozie學習筆記

oozie遠程觸發一個工作,一個工作完成之後,返回oozie知行下一個

oozie是以DAG的形式執行,前一個執行完,下一個才能執行!

 

可以在工作流中使用參數化形式定義參數類似${inputDir}的形式,在提交工作時必須提供參數!

Oozie workflows contain control flow nodes and action nodes.

工作流,包括控制流節點(control flow nodes),action節點

控制flow 節點配置開始,結束,fail

action node配置任務的觸發,執行...

 

 

Workflow Diagram:

hPDL Workflow Definition:

<workflow-app name='wordcount-wf' xmlns="uri:oozie:workflow:0.1">
    <start to='wordcount'/>
    <action name='wordcount'>
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.myorg.WordCount.Map</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.myorg.WordCount.Reduce</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to='end'/>
        <error to='end'/>
    </action>
    <kill name='kill'>
        <message>Something went wrong: ${wf:errorCode('wordcount')}</message>
    </kill/>
    <end name='end'/>
</workflow-app>

 

using decision, fork and join nodes. Cycles in workflows are not supporteds

使用decision, fork and join nodes. 實現流控制


Possible states for a workflow jobs are: PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED .

工作狀態:PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED .


Oozie can make HTTP callback notifications on action start/end/failure events and workflow end/failure events

oozie action 開始、end 、failuer  ,workflow  end/failure 事件觸發,可以通過HTTP回調獲得通知


Workflow Definition


flow nodes (start, end, decision, fork, join, kill) or action nodes (map-reduce, pig, etc.

flow node: (start, end, decision, fork, join, kill) 可以用於實現DAG 控制流的實現

oozie中不能有環,否則會部署失敗

 Workflow Nodes


  • Control flow nodes: nodes that control the start and end of the workflow and workflow job execution path.   

控制工作流的開始、結束,以及工作的執行路徑的節點

  • Action nodes: nodes that trigger the execution of a computation/processing task.

觸發任務執行



Node names and transitions must be conform to the following pattern =[a-zA-Z][\-_a-zA-Z0-0]*=, of up to 20 characters long.

nodeName 只能以pattern =[a-zA-Z][\-_a-zA-Z0-0]*= 模式,做多20個字符


start:

start是工作流的起始入口,工作流開始自動找到指定的start節點


Syntax:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
  ...
  <start to="[NODE-NAME]"/>
  ...
</workflow-app>

The to attribute is the name of first workflow node to execute.

Example:

<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <start to="firstHadoopJob"/>
    ...
</workflow-app>


End Control Node

The end node is the end for a workflow job, it indicates that the workflow job has completed successfully.

When a workflow job reaches the end it finishes successfully (SUCCEEDED).

If one or more actions started by the workflow job are executing when the end node is reached, the actions will be killed. In this scenario the workflow job is still considered as successfully run.

A workflow definition must have one end node.


每個workflow必須有一個end node

end表示一個work flow job成功的執行

一單遇到end,work flow 中觸發的多個action 都會被kill掉,認定工作成功結束

Syntax:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <end name="[NODE-NAME]"/>
    ...
</workflow-app>

The name attribute is the name of the transition to do to end the workflow job.

Example:

<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <end name="end"/>
</workflow-app>


 Kill Control Node

The kill node allows a workflow job to kill itself.

When a workflow job reaches the kill it finishes in error (KILLED).

If one or more actions started by the workflow job are executing when the kill node is reached, the actions will be killed.

A workflow definition may have zero or more kill nodes.

kill 會節點會終止job

結束狀態爲error

一個job可能有0或多個kill

可以在kill node中輸出錯誤信息,會被寫到log中

Syntax:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <kill name="[NODE-NAME]">
        <message>[MESSAGE-TO-LOG]</message>
    </kill>
    ...
</workflow-app>

The name attribute in the kill node is the name of the Kill action node.

The content of the message element will be logged as the kill reason for the workflow job.

A kill node does not have transition elements because it ends the workflow job, asKILLED.

Example:

<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <kill name="killBecauseNoInput">
        <message>Input unavailable</message>
    </kill>
    ...
</workflow-app>


Decision Control Node

A decision node enables a workflow to make a selection on the execution path to follow.

The behavior of a decision node can be seen as a switch-case statement.

A decision node consists of a list of predicates-transition pairs plus a default transition. Predicates are evaluated in order or appearance until one of them evaluates totrue and the corresponding transition is taken. If none of the predicates evaluates totrue the default transition is taken.

Predicates are JSP Expression Language (EL) expressions (refer to section 4.2 of this document) that resolve into a boolean value,true or false. For example:


決策節點,類似於c語言中的switch case

有0/多個case外加defaul組成

知道有case的判定爲treue,跳轉到case的to節點,否者執行defaut

判定的表達式,是用JSP的EL表達式 

    ${fs:fileSize('/usr/foo/myinputdir') gt 10 * GB}

Syntax:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <decision name="[NODE-NAME]">
        <switch>
            <case to="[NODE_NAME]">[PREDICATE]</case>
            ...
            <case to="[NODE_NAME]">[PREDICATE]</case>
            <default to="[NODE_NAME]"/>
        </switch>
    </decision>
    ...
</workflow-app>

The name attribute in the decision node is the name of the decision node.

Each case elements contains a predicate and a transition name. The predicate ELs are evaluated in order until one returnstrue and the corresponding transition is taken.

The default element indicates the transition to take if none of the predicates evaluates totrue.

All decision nodes must have a default element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.

Example:

<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <decision name="mydecision">
        <switch>
            <case to="reconsolidatejob">
              ${fs:fileSize(secondjobOutputDir) gt 10 * GB}
            </case> <case to="rexpandjob">
              ${fs:fileSize(secondjobOutputDir) lt 100 * MB}
            </case>
            <case to="recomputejob">
              ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
            </case>
            <default to="end"/>
        </switch>
    </decision>
    ...
</workflow-app>



Fork and Join Control Nodes

A fork node splits one path of execution into multiple concurrent paths of execution.

A join node waits until every concurrent execution path of a previousfork node arrives to it.

The fork and join nodes must be used in pairs. The join node assumes concurrent execution paths are children of the samefork node.


fork node將當前路徑分成多個路徑

fork必須和join一起使用,join會等待所有的path都抵達


Syntax:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <fork name="[FORK-NODE-NAME]">
        <path start="[NODE-NAME]" />
        ...
        <path start="[NODE-NAME]" />
    </fork>
    ...
    <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
    ...
</workflow-app>

The name attribute in the fork node is the name of the workflow fork node. Thestart attribute in the pathelements in the fork node indicate the name of the workflow node that will be part of the concurrent execution paths.

The name attribute in the join node is the name of the workflow join node. Theto attribute in the join node indicates the name of the workflow node that will executed after all concurrent execution paths of the corresponding fork arrive to the join node.

Example:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <fork name="forking">
        <path start="firstparalleljob"/>
        <path start="secondparalleljob"/>
    </fork>
    <action name="firstparallejob">
        <map-reduce>
            <job-tracker>foo:8021</job-tracker>
            <name-node>bar:8020</name-node>
            <job-xml>job1.xml</job-xml>
        </map-reduce>
        <ok to="joining"/>
        <error to="kill"/>
    </action>
    <action name="secondparalleljob">
        <map-reduce>
            <job-tracker>foo:8021</job-tracker>
            <name-node>bar:8020</name-node>
            <job-xml>job2.xml</job-xml>
        </map-reduce>
        <ok to="joining"/>
        <error to="kill"/>
    </action>
    <join name="joining" to="nextaction"/>
    ...
</workflow-app>

By default, Oozie performs some validation that any forking in a workflow is valid and won't lead to any incorrect behavior or instability. However, if Oozie is preventing a workflow from being submitted and you are very certain that it should work, you can disable forkjoin validation so that Oozie will accept the workflow. To disable this validation just for a specific workflow, simply setoozie.wf.validate.ForkJoin to false in the job.properties file. To disable this validation for all workflows, simply set =oozie.validate.ForkJoin= tofalse in the oozie-site.xml file. Disabling this validation is determined by the AND of both of these properties, so it will be disabled if either or both are set to false and only enabled if both are set to true (or not specified).


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章