背景

Ambari是一個強大的大數據集羣管理平臺。在實際使用中，我們使用的大數據組件不會侷限於官網提供的那些。如何在Ambari中集成進去其他組件呢？

Stacks & Services

Stack爲一系列service的集合。可以在Ambari中定義多個不同版本的stacks。比如HDP3.1爲一個stack，可以包含Hadoop, Spark等等多個特定版本的service。

Stack和Service的關係

Ambari中stacks相關的配置信息位於：

源碼包：ambari-server/src/main/resources/stacks
安裝後：/var/lib/ambari-server/resources/stacks

如果多個stacks需要使用相同的service配置，需要將配置放置於common-services中。common-services目錄中存放的內容可供任意版本的stack直接使用或繼承。

common-services目錄

common-services目錄位於源碼包的ambari-server/src/main/resources/common-services目錄中。如果某個服務需要在多個stacks之中共享，需要將此service定義在common-services中。通常來說common-services中給出了各個service的公用配置。比如下文提到的組件在ambari中的配置項（configuration）部分的配置。

Service目錄結構

Service的目錄結構如下圖所示：

Service目錄結構

如圖所示，以HDFS爲例，每個service的組成部分解釋如下：

Service ID：通常爲大寫，爲Service名稱。
configuration：存放了service對應的配置文件。該配置文件爲XML格式。這些XML文件描述了service的配置項如何Ambari的組件配置頁面展示（即service的圖形化配置頁面的配置文件，配置該頁面包含什麼配置項）。
package：該目錄包含了多個子目錄。其中用service控制腳本（啓動，停止和自定義操作等）和組件的配置文件模板。
alert.json：service的告警信息定義。
kerberos.json：service和Kerberos結合使用的配置信息。
metainfo.xml：service最爲重要的配置文件。其中定義的service的名稱，版本號，簡介和控制腳本名稱等等信息。
metrics.json：service的監控信息配置文件。
widgets.json：service的監控圖形界面展示的配置。

metainfo.xml 詳解

不僅service具有metainfo.xml配置文件，stack也會有這個配置文件。對於stack來說，metainfo.xml基本用於指定各個stack之間的繼承關係。

service metainfo.xml的基礎配置項：

<services>
    <service>
        <name>HDFS</name>
        <displayName>HDFS</displayName>
        <comment>Hadoop分佈式文件系統。</comment>
        <version>2.1.0.2.0</version>
    </service>
</services>

displayName，comment和version中的內容會展示在安裝service的第一步，勾選所需組件的列表中。

component相關配置

component配置組規定了該服務下每個組件的部署方式和控制腳本等內容。舉例來說，對於HDFS這個service，它的component包含namenode，datanode，secondary namenode以及HDFS client等。在component配置項中可以對這些組件進行配置。

HDFS的namenode組件配置：

<component>
    <name>NAMENODE</name>
    <displayName>NameNode</displayName>
    <category>MASTER</category>
    <cardinality>1-2</cardinality>
    <versionAdvertised>true</versionAdvertised>
    <reassignAllowed>true</reassignAllowed>
    <commandScript>
        <script>scripts/namenode.py</script>
        <scriptType>PYTHON</scriptType>
        <timeout>1800</timeout>
    </commandScript>
    <logs>
        <log>
            <logId>hdfs_namenode</logId>
            <primary>true</primary>
        </log>
        <log>
            <logId>hdfs_audit</logId>
        </log>
    </logs>
    <customCommands>
        <customCommand>
            <name>DECOMMISSION</name>
            <commandScript>
                <script>scripts/namenode.py</script>
                <scriptType>PYTHON</scriptType>
                <timeout>600</timeout>
            </commandScript>
        </customCommand>
        <customCommand>
            <name>REBALANCEHDFS</name>
            <background>true</background>
            <commandScript>
                <script>scripts/namenode.py</script>
                <scriptType>PYTHON</scriptType>
            </commandScript>
        </customCommand>
    </customCommands>
</component>

其中各個配置項的解釋：

name：組件名稱。
displayName：組件顯示的名稱。
category：組件的類型，包含MASTER，SLAVE和CLIENT三種。其中MASTER和SLAVE是有狀態的（啓動和停止），CLIENT是無狀態的。
cardinality：該組件可以安裝幾個實例。可以支持如下格式。1：一個實例。1-2：1個至2個實例。1+：1個或多個實例。
commandScript：組件的控制腳本配置。
logs：爲log search服務提供日誌接入。

其中commandScript中的配置項含義如下：

script：該組件的控制腳本相對路徑。
scriptType：腳本類型，通常我們使用Python腳本。
timeout：腳本執行的超時時間。

customCommands配置

該配置項爲組件的自定義命令，即除了啓動，停止等等系統自帶命令之外的命令。
下面以HDFS的REBALANCEHDFS命令爲例說明下。

<customCommand>
    <name>REBALANCEHDFS</name>
    <background>true</background>
    <commandScript>
        <script>scripts/namenode.py</script>
        <scriptType>PYTHON</scriptType>
    </commandScript>
</customCommand>

該配置項會在service管理頁面右上方菜單增加新的菜單項。配置項的含義和CommandScript相同。其中background爲true說明此command爲後臺執行。
接下來大家可能有疑問，當點擊這個custom command的菜單項之後，ambari調用了namenode.py這個文件的哪個函數呢？
實際上ambari會調用和customCommand的name相同，名稱全爲小寫的python方法。如下所示。

def rebalancehdfs(self, env):
  ...

osSpecifics配置

同一個service的安裝包在不同的平臺下，名字通常是不一樣的。安裝包的名稱和系統的對應關係是該配置項所負責的內容。
Zookeeper的osSpecifics配置示例

<osSpecifics>
    <osSpecific>
        <osFamily>amazon2015,redhat6,redhat7,suse11,suse12</osFamily>
        <packages>
            <package>
                <name>zookeeper_${stack_version}</name>
            </package>
            <package>
                <name>zookeeper_${stack_version}-server</name>
            </package>
        </packages>
    </osSpecific>
    <osSpecific>
        <osFamily>debian7,ubuntu12,ubuntu14,ubuntu16</osFamily>
        <packages>
            <package>
                <name>zookeeper-${stack_version}</name>
            </package>
            <package>
                <name>zookeeper-${stack_version}-server</name>
            </package>
        </packages>
    </osSpecific>
</osSpecifics>

注意：該配置中name爲組件安裝包全名除了版本號以外的部分。需要在系統中使用apt search 或者 yum search能夠搜索到。如果包搜索不到，或者說沒有當前系統對應的osFamily，service在安裝過程不會報錯，但是軟件包並沒有被安裝，這點一定要注意。

service的繼承關係配置

以HDP這個stack爲例，各個版本的HDP存在繼承關係，高版本的HDP的各個組件的配置繼承自低版本的HDP。這條繼承線可以一直追溯至HDP2.0.6。
此時common-services中的配置爲何可以共用就得到了解釋。common-services中的service配置之所以會生效，是因爲在最基礎的HDP stack（2.0.6）中，每個service都繼承了common-services中的對應配置。
例如AMBARI_INFRA這個service。

<services>
  <service>
    <name>AMBARI_INFRA</name>
    <extends>common-services/AMBARI_INFRA/0.1.0</extends>
  </service>
</services>

HDP中的AMBARI_INFRA這個service的配置繼承自common-services中的AMBARI_INFRA/0.1.0的配置。其他組件也是類似的，有興趣可以查看下相關源代碼。

禁用service

加入deleted標籤，該service在新增service嚮導的列表中會被隱藏。

<services>
    <service>
        <name>FALCON</name>
        <version>0.10.0</version>
        <deleted>true</deleted>
    </service>
</services>

configuration-dependencies配置

列出了組件依賴的配置類別。如果依賴的配置類更新了配置信息，該組件會被ambari標記爲需要重新啓動。

其他配置項

更爲詳細的介紹請參考官方文檔：https://cwiki.apache.org/confluence/display/AMBARI/Writing+metainfo.xml

configuration配置文件

configuration包含了一個或多個xml配置文件。其中每一個xml配置文件都代表了一個配置組。配置組名爲xml文件名。
每個xml文件中規定了service配置項的名稱，value類型和描述。
下面以HDFS的部分配置項爲例說明。

<property>
    <!-- 配置項名稱 -->
    <name>dfs.https.port</name>
    <!-- 配置的默認值 -->
    <value>50470</value>
    <!-- 配置的描述，即鼠標移動到文本框彈出的提示 -->
    <description>
        This property is used by HftpFileSystem.
    </description>
    <on-ambari-upgrade add="true"/>
</property>
<property>
    <name>dfs.datanode.max.transfer.threads</name>
    <value>1024</value>
    <description>Specifies the maximum number of threads to use for transferring data in and out of the datanode.
    </description>
    <display-name>DataNode max data transfer threads</display-name>
    <!-- 這裏規定了屬性值的類型爲int，最小值爲0，最大值爲48000 -->
    <value-attributes>
        <type>int</type>
        <minimum>0</minimum>
        <maximum>48000</maximum>
    </value-attributes>
    <on-ambari-upgrade add="true"/>
</property>

其他更多的配置項，請參考官方文檔：https://cwiki.apache.org/confluence/display/AMBARI/Configuration+support+in+Ambari

在Python腳本中讀取配置項的值

舉例來說，此處我們需要在控制腳本中讀取用戶在頁面填寫的instance_name配置項的值。

配置項的配置文件爲： configuration/sample.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>instance_name</name>
        <value>instance1</value>
        <description>Instance name for samplesrv</description>
    </property>
</configuration>

在python腳本中的讀取方法爲：
params.py

from resource_management.libraries.script import Script

config = Script.get_config()

# config被封裝爲了字典格式，層級爲configurations/文件名/屬性名
instance_name = config['configurations']['sample']['instance_name']

組件控制腳本的編寫

組件的控制腳本位於package/scripts中。腳本必須繼承resource_management.Script類。

package/scripts: 控制腳本
package/files: 控制腳本使用的文件
package/templates: 生成配置文件的模板文件，比如core-site.xml, hdfs-site.xml的樣板配置文件等。

一個最簡單的控制腳本文件：

import sys
from resource_management import Script
class Master(Script):
  def install(self, env):
    # 安裝組件時執行的方法
    print 'Install the Sample Srv Master';
  def stop(self, env):
    # 停止組件時執行的方法
    print 'Stop the Sample Srv Master';
  def start(self, env):
    # 啓動組件時執行的方法
    print 'Start the Sample Srv Master';
  def status(self, env):
    # 組件運行狀態檢測方法
    print 'Status of the Sample Srv Master';
  def configure(self, env):
    # 組件配置更新時執行的方法
    print 'Configure the Sample Srv Master';
if __name__ == "__main__":
  Master().execute()

ambari爲編寫控制腳本提供瞭如下庫：

resource_management
ambari_commons
ambari_simplejson

這些庫提供了常用的操作命令，無需再引入額外的Python包。

如果需要針對不同的操作系統編寫不同的script，需要在繼承resource_management.Script之時添加不同的@OsFamilyImpl()註解。

下面給出常用的部分控制腳本片段的寫法。

檢查PID文件是否存在（進程是否運行）

from resource_management import *

# 如果pid文件不存在，會拋出ComponentIsNotRunning異常
check_process_status(pid_file_full_path)

Template 填充配置文件模板

使用用戶在service頁面配置中填寫的值，來填充組件的配置模板，生成最終的配置文件。

# params文件提前將用戶在配置頁填寫的配置項的值讀取進來
# 對於config-template.xml.j2所有的模板變量，必須在params文件中定義，否則模板填充會報錯，也就是說所有模板內容必須能夠正確填充。
import params
env.set_params(params)
# config-template爲configuration文件夾中的j2文件名
file_content = Template('config-template.xml.j2')

Python替換配置文件模板使用的是Jinja2模板

InlineTemplate

和Template相同，只不過配置文件的模板來自於變量值，而不是Template中的xml模板

file_content = InlineTemplate(self.getConfig()['configurations']['gateway-log4j']['content'])

File

把內容寫入文件

 File(path,
        content=file_content,
        owner=owner_user,
        group=sample_group)

User

用戶操作

# 創建用戶
User(user_name, action = "create", groups = group_name)

Execute

執行特定的腳本

Execute('ls -al', user = 'user1')

Package 安裝指定的軟件包

Package(params.all_lzo_packages,
            retry_on_repo_unavailability=params.agent_stack_retry_on_unavailability,
            retry_count=params.agent_stack_retry_count)

後記

本博客爲大家指明瞭Ambari集成大數據組件的基本配置。本人會在後續博客中爲大家介紹如何爲Ambari集成Elasticsearch服務。

Ambari官網參考資料

https://cwiki.apache.org/confluence/display/AMBARI/Defining+a+Custom+Stack+and+Services

https://cwiki.apache.org/confluence/display/AMBARI/How-To+Define+Stacks+and+Services#How-ToDefineStacksandServices-metainfo.xml

Ambari Custom Service 3 - 腳本乾貨(一)

背景

Stacks & Services

common-services目錄

Service目錄結構

metainfo.xml 詳解

component相關配置

customCommands配置

osSpecifics配置

service的繼承關係配置

禁用service

configuration-dependencies配置

其他配置項

configuration配置文件

在Python腳本中讀取配置項的值

組件控制腳本的編寫

檢查PID文件是否存在（進程是否運行）

Template 填充配置文件模板

InlineTemplate

File

Directory

User

Execute

Package 安裝指定的軟件包

後記

Ambari官網參考資料