高效調度新篇章:詳解DolphinScheduler 3.2.0生產級集羣搭建

轉載自tuoluzhe8521

導讀:通過簡化複雜的任務依賴關係, DolphinScheduler爲數據工程師提供了強大的工作流程管理和調度能力。在3.2.0版本中,DolphinScheduler帶來了一系列新功能和改進,使其在生產環境中的穩定性和可用性得到了顯著提升。

爲了幫助讀者更好地理解和應用這一版本,我們精心準備了這篇DolphinScheduler 3.2.0生產集羣高可用搭建全攻略,深入探討如何在生產環境中搭建一個高可用的DolphinScheduler集羣,包括但不限於環境準備、數據庫配置、用戶權限設置、SSH免密登陸配置、ZooKeeper啓動、以及服務的啓動與停止等關鍵步驟。

1. 環境準備

1.1 集羣規劃

file

本次安裝環境爲contos7.9

1.2 組件下載地址

DolphinScheduler-3.20官網:https://dolphinscheduler.apache.org/zh-cn/download/3.2.0

官網安裝文檔:https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster

1.3 前置準備工作

  1. JDK:下載JDK (1.8+),安裝並配置 JAVA_HOME 環境變量,並將其下的 bin 目錄追加到 PATH 環境變量中。如果你的環境中已存在,可以跳過這步。
  2. 二進制包:在下載頁面下載 DolphinScheduler 二進制包
  3. 數據庫:PostgreSQL (8.2.15+) 或者 MySQL (5.7+),兩者任選其一即可,如 MySQL 則需要 JDBC Driver 8.0.16
  4. 註冊中心:ZooKeeper (3.8.0+),下載地址
  5. 進程樹分析
  • macOS 安裝pstree
  • Fedora/Red/Hat/CentOS/Ubuntu/Debian 安裝psmisc
[hadoop@hadoop1 ~]$ sudo yum install -y psmisc

注意: DolphinScheduler 本身不依賴 Hadoop、Hive、Spark,但如果你運行的任務需要依賴他們,就需要有對應的環境支持

2.DolphinScheduler集羣安裝

2.1 解壓安裝包

  1. 上傳DolphinScheduler安裝包到hadoop1節點的/data/software目錄
  2. 解壓安裝包到當前目錄

注:解壓目錄並非最終的安裝目錄

[hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin

2.2 配置數據庫

DolphinScheduler 元數據存儲在關係型數據庫中,故需創建相應的數據庫和用戶。

mysql -uroot -p
//創建數據庫
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
//創建用戶
//修改 {user} 和 {password} 爲你希望的用戶名和密碼
mysql> CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;

注: 若出現以下錯誤信息,表明新建用戶的密碼過於簡單。 ERROR 1819 (HY000): Your password does not satisfy the current policy requirements 可提高密碼複雜度或者執行以下命令降低MySQL密碼強度級別。

mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=4;

賦予用戶相應權限

mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
mysql> flush privileges;

如果使用 MySQL 需要手動下載 mysql-connector-java 驅動 (8.0.31) 並移動到 DolphinScheduler 的每個模塊的 libs 目錄下,其中包括 api-server/libs 和 alert-server/libs 和 master-server/libs 和 worker-server/libs 和 tools/libs。 注意:如果你只是想要在數據源中心使用 MySQL,則對 MySQL JDBC 驅動的版本沒有要求,如果你想要將 MySQL 作爲 DolphinScheduler 的元數據庫, 則僅支持 8.0.16 及以上的版本。

echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar

2.2 準備 DolphinScheduler 啓動環境

  • 配置用戶免密及權限

如果已有haodoop集羣的賬號,建議直接使用,無需配置

創建部署用戶,並且一定要配置 sudo 免密。以創建 hadoop 用戶爲例

# 創建用戶需使用 root 登錄
useradd hadoop

# 添加密碼
echo "hadoop" | passwd --stdin hadoop

# 配置 sudo 免密
sed -i '$ahadoop  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

# 修改目錄權限,使得部署用戶對二進制包解壓後的 apache-dolphinscheduler-*-bin 目錄有操作權限
chown -R hadoop:hadoop apache-dolphinscheduler-*-bin
chmod -R 755 apache-dolphinscheduler-*-bin

注意: 1.因爲任務執行服務是以 sudo -u {linux-user} 切換不同 linux 用戶的方式來實現多租戶運行作業,所以部署用戶需要有 sudo 權限,而且是免密的。初學習者不理解的話,完全可以暫時忽略這一點 2.如果發現 /etc/sudoers 文件中有 “Defaults requirett” 這行,也請註釋掉

  • 配置機器 SSH 免密登陸

由於安裝的時候需要向不同機器發送資源,所以要求各臺機器間能實現 SSH 免密登陸。配置免密登陸的步驟如下

su hadoop

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

注意: 配置完成後,可以通過運行命令 ssh localhost 判斷是否成功,如果不需要輸入密碼就能 ssh 登陸則證明成功

2.3 啓動 zookeeper(hadoop集羣已有無需配置)

進入 zookeeper 的安裝目錄,將 zoo_sample.cfg 配置文件複製到 conf/zoo.cfg,並將 conf/zoo.cfg 中 dataDir 中的值改成 dataDir=./tmp/zookeeper

# 啓動 zookeeper
./bin/zkServer.sh start

2.4 修改install_env.sh 文件

文件 install_env.sh 描述了哪些機器將被安裝 DolphinScheduler 以及每臺機器對應安裝哪些服務。您可以在路徑 bin/env/install_env.sh 中找到此文件,可通過以下方式更改 env 變量,export <ENV_NAME>=,配置詳情如下。

ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"}
# modify it if you use different ssh port
sshPort=${sshPort:-"xxx"}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"hadoop1,hadoop2"}

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"}

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"hadoop3"}

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"hadoop2"}

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"}

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"hadoop"}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
# It will delete ${zkRoot} in the zookeeper when you run install.sh, so please keep it same as registry.zookeeper.namespace in yml files.
# Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well.
zkRoot=${zkRoot:-"/dolphinscheduler"}

2.5 修改 dolphinscheduler_env.sh 文件

文件 ./bin/env/dolphinscheduler_env.sh 描述了下列配置: DolphinScheduler 的數據庫配置,詳細配置方法見[初始化數據庫],一些任務類型外部依賴路徑或庫文件,如 JAVA_HOME 和 SPARK_HOME都是在這裏定義的。

如果您不使用某些任務類型,可以忽略任務外部依賴項,但必須根據您的環境更改 JAVA_HOME、註冊中心和數據庫相關配置。

export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8"
export SPRING_DATASOURCE_USERNAME=xxx
export SPRING_DATASOURCE_PASSWORD=xxx

# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181}


export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3}
export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2}
export DATAX_HOME=${DATAX_HOME:-/data/module/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}

export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH

2.6 初始化數據庫

完成上述步驟後,您已經爲 DolphinScheduler 創建一個新數據庫,並在DolphinScheduler配置好,現在你可以通過快速的 Shell 腳本來初始化數據庫

bash tools/bin/upgrade-schema.sh

2.7 修改application.yaml文件

共5個文件,需要修改的部分相同,但裏面其他的配置不相同,需要單獨改寫分別爲:

  • master-server/conf/application.yaml
  • api-server/conf/application.yaml
  • worker-server/conf/application.yaml
  • alert-server/conf/application.yaml
  • tools/conf/application.yaml
   datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
    username: xxx
    password: xxx

registry:
  type: zookeeper
  zookeeper:
    namespace: dolphinscheduler
    connect-string: xxxx
    retry-policy:
      base-sleep-time: 60ms
      max-sleep: 300ms
      max-retries: 5
    session-timeout: 30s
    connection-timeout: 9s
    block-until-connected: 600ms
    digest: ~

spring:
  config:
    activate:
      on-profile: mysql
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
    username: xxxx
    password: xxxx
  quartz:
    properties:
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate

2.8 修改common.properties文件

共5個文件,需要修改的部分相同,但裏面其他的配置不相同,需要單獨改寫分別爲:

  1. master-server/conf/common.properties
  2. api-server/conf/common.properties
  3. worker-server/conf/common.properties
  4. alert-server/conf/common.properties
  5. tools/conf/common.properties
data.basedir.path=自定義本地文件存儲位置
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=自定義hdfs的存儲位置
resource.hdfs.root.user=自定義用戶名稱,和本文檔之前做的配置要一致
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://xxx:8020
#高可用ip地址
yarn.resourcemanager.ha.rm.ids=xxxx,xxx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s

注:本次dolphinscheduler分佈式存儲採用的hdfs,如需其他配置,根據官網介紹配置即可

2.9 分佈式存儲hdfs依賴分發

echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml 

2.10 啓動 DolphinScheduler

使用上面創建的部署用戶運行以下命令完成部署,部署後的運行日誌將存放在 logs 文件夾內

bash ./bin/install.sh

注意: 第一次部署的話,可能出現 5 次sh: bin/dolphinscheduler-daemon.sh: No such file or directory相關信息,此爲非重要信息直接忽略即可

2.11 登錄 DolphinScheduler

瀏覽器訪問地址 http://localhost:12345/dolphinscheduler/ui 即可登錄系統 UI。默認的用戶名和密碼是 admin/dolphinscheduler123

3.起停服務

# 一鍵停止集羣所有服務
bash ./bin/stop-all.sh

# 一鍵開啓集羣所有服務
bash ./bin/start-all.sh

# 啓停 Master
bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server

# 啓停 Worker
bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server

# 啓停 Api
bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server

# 啓停 Alert
bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server

原文鏈接:https://blog.csdn.net/Brother_ning/article/details/135149045

本文由 白鯨開源科技 提供發佈支持!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章