轉載自tuoluzhe8521
導讀:通過簡化複雜的任務依賴關係, DolphinScheduler爲數據工程師提供了強大的工作流程管理和調度能力。在3.2.0版本中,DolphinScheduler帶來了一系列新功能和改進,使其在生產環境中的穩定性和可用性得到了顯著提升。
爲了幫助讀者更好地理解和應用這一版本,我們精心準備了這篇DolphinScheduler 3.2.0生產集羣高可用搭建全攻略,深入探討如何在生產環境中搭建一個高可用的DolphinScheduler集羣,包括但不限於環境準備、數據庫配置、用戶權限設置、SSH免密登陸配置、ZooKeeper啓動、以及服務的啓動與停止等關鍵步驟。
1. 環境準備
1.1 集羣規劃
本次安裝環境爲contos7.9
1.2 組件下載地址
DolphinScheduler-3.20官網:https://dolphinscheduler.apache.org/zh-cn/download/3.2.0
官網安裝文檔:https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster
1.3 前置準備工作
- JDK:下載JDK (1.8+),安裝並配置 JAVA_HOME 環境變量,並將其下的 bin 目錄追加到 PATH 環境變量中。如果你的環境中已存在,可以跳過這步。
- 二進制包:在下載頁面下載 DolphinScheduler 二進制包
- 數據庫:PostgreSQL (8.2.15+) 或者 MySQL (5.7+),兩者任選其一即可,如 MySQL 則需要 JDBC Driver 8.0.16
- 註冊中心:ZooKeeper (3.8.0+),下載地址
- 進程樹分析
- macOS 安裝pstree
- Fedora/Red/Hat/CentOS/Ubuntu/Debian 安裝psmisc
[hadoop@hadoop1 ~]$ sudo yum install -y psmisc
注意: DolphinScheduler 本身不依賴 Hadoop、Hive、Spark,但如果你運行的任務需要依賴他們,就需要有對應的環境支持
2.DolphinScheduler集羣安裝
2.1 解壓安裝包
- 上傳DolphinScheduler安裝包到hadoop1節點的/data/software目錄
- 解壓安裝包到當前目錄
注:解壓目錄並非最終的安裝目錄
[hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin
2.2 配置數據庫
DolphinScheduler 元數據存儲在關係型數據庫中,故需創建相應的數據庫和用戶。
mysql -uroot -p
//創建數據庫
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
//創建用戶
//修改 {user} 和 {password} 爲你希望的用戶名和密碼
mysql> CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;
注: 若出現以下錯誤信息,表明新建用戶的密碼過於簡單。 ERROR 1819 (HY000): Your password does not satisfy the current policy requirements 可提高密碼複雜度或者執行以下命令降低MySQL密碼強度級別。
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=4;
賦予用戶相應權限
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
mysql> flush privileges;
如果使用 MySQL 需要手動下載 mysql-connector-java 驅動 (8.0.31) 並移動到 DolphinScheduler 的每個模塊的 libs 目錄下,其中包括 api-server/libs 和 alert-server/libs 和 master-server/libs 和 worker-server/libs 和 tools/libs。 注意:如果你只是想要在數據源中心使用 MySQL,則對 MySQL JDBC 驅動的版本沒有要求,如果你想要將 MySQL 作爲 DolphinScheduler 的元數據庫, 則僅支持 8.0.16 及以上的版本。
echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar
2.2 準備 DolphinScheduler 啓動環境
- 配置用戶免密及權限
如果已有haodoop集羣的賬號,建議直接使用,無需配置
創建部署用戶,並且一定要配置 sudo 免密。以創建 hadoop 用戶爲例
# 創建用戶需使用 root 登錄
useradd hadoop
# 添加密碼
echo "hadoop" | passwd --stdin hadoop
# 配置 sudo 免密
sed -i '$ahadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
# 修改目錄權限,使得部署用戶對二進制包解壓後的 apache-dolphinscheduler-*-bin 目錄有操作權限
chown -R hadoop:hadoop apache-dolphinscheduler-*-bin
chmod -R 755 apache-dolphinscheduler-*-bin
注意: 1.因爲任務執行服務是以 sudo -u {linux-user} 切換不同 linux 用戶的方式來實現多租戶運行作業,所以部署用戶需要有 sudo 權限,而且是免密的。初學習者不理解的話,完全可以暫時忽略這一點 2.如果發現 /etc/sudoers 文件中有 “Defaults requirett” 這行,也請註釋掉
- 配置機器 SSH 免密登陸
由於安裝的時候需要向不同機器發送資源,所以要求各臺機器間能實現 SSH 免密登陸。配置免密登陸的步驟如下
su hadoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
注意: 配置完成後,可以通過運行命令 ssh localhost 判斷是否成功,如果不需要輸入密碼就能 ssh 登陸則證明成功
2.3 啓動 zookeeper(hadoop集羣已有無需配置)
進入 zookeeper 的安裝目錄,將 zoo_sample.cfg 配置文件複製到 conf/zoo.cfg,並將 conf/zoo.cfg 中 dataDir 中的值改成 dataDir=./tmp/zookeeper
# 啓動 zookeeper
./bin/zkServer.sh start
2.4 修改install_env.sh 文件
文件 install_env.sh 描述了哪些機器將被安裝 DolphinScheduler 以及每臺機器對應安裝哪些服務。您可以在路徑 bin/env/install_env.sh 中找到此文件,可通過以下方式更改 env 變量,export <ENV_NAME>=,配置詳情如下。
ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"}
# modify it if you use different ssh port
sshPort=${sshPort:-"xxx"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
masters=${masters:-"hadoop1,hadoop2"}
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"}
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
alertServer=${alertServer:-"hadoop3"}
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
apiServers=${apiServers:-"hadoop2"}
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"}
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
deployUser=${deployUser:-"hadoop"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
# It will delete ${zkRoot} in the zookeeper when you run install.sh, so please keep it same as registry.zookeeper.namespace in yml files.
# Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well.
zkRoot=${zkRoot:-"/dolphinscheduler"}
2.5 修改 dolphinscheduler_env.sh 文件
文件 ./bin/env/dolphinscheduler_env.sh 描述了下列配置: DolphinScheduler 的數據庫配置,詳細配置方法見[初始化數據庫],一些任務類型外部依賴路徑或庫文件,如 JAVA_HOME 和 SPARK_HOME都是在這裏定義的。
如果您不使用某些任務類型,可以忽略任務外部依賴項,但必須根據您的環境更改 JAVA_HOME、註冊中心和數據庫相關配置。
export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8"
export SPRING_DATASOURCE_USERNAME=xxx
export SPRING_DATASOURCE_PASSWORD=xxx
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181}
export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3}
export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2}
export DATAX_HOME=${DATAX_HOME:-/data/module/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
2.6 初始化數據庫
完成上述步驟後,您已經爲 DolphinScheduler 創建一個新數據庫,並在DolphinScheduler配置好,現在你可以通過快速的 Shell 腳本來初始化數據庫
bash tools/bin/upgrade-schema.sh
2.7 修改application.yaml文件
共5個文件,需要修改的部分相同,但裏面其他的配置不相同,需要單獨改寫分別爲:
- master-server/conf/application.yaml
- api-server/conf/application.yaml
- worker-server/conf/application.yaml
- alert-server/conf/application.yaml
- tools/conf/application.yaml
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: xxx
password: xxx
registry:
type: zookeeper
zookeeper:
namespace: dolphinscheduler
connect-string: xxxx
retry-policy:
base-sleep-time: 60ms
max-sleep: 300ms
max-retries: 5
session-timeout: 30s
connection-timeout: 9s
block-until-connected: 600ms
digest: ~
spring:
config:
activate:
on-profile: mysql
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: xxxx
password: xxxx
quartz:
properties:
org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate
2.8 修改common.properties文件
共5個文件,需要修改的部分相同,但裏面其他的配置不相同,需要單獨改寫分別爲:
- master-server/conf/common.properties
- api-server/conf/common.properties
- worker-server/conf/common.properties
- alert-server/conf/common.properties
- tools/conf/common.properties
data.basedir.path=自定義本地文件存儲位置
resource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path=自定義hdfs的存儲位置
resource.hdfs.root.user=自定義用戶名稱,和本文檔之前做的配置要一致
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://xxx:8020
#高可用ip地址
yarn.resourcemanager.ha.rm.ids=xxxx,xxx
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s
注:本次dolphinscheduler分佈式存儲採用的hdfs,如需其他配置,根據官網介紹配置即可
2.9 分佈式存儲hdfs依賴分發
echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml
2.10 啓動 DolphinScheduler
使用上面創建的部署用戶運行以下命令完成部署,部署後的運行日誌將存放在 logs 文件夾內
bash ./bin/install.sh
注意: 第一次部署的話,可能出現 5 次sh: bin/dolphinscheduler-daemon.sh: No such file or directory相關信息,此爲非重要信息直接忽略即可
2.11 登錄 DolphinScheduler
瀏覽器訪問地址 http://localhost:12345/dolphinscheduler/ui 即可登錄系統 UI。默認的用戶名和密碼是 admin/dolphinscheduler123
3.起停服務
# 一鍵停止集羣所有服務
bash ./bin/stop-all.sh
# 一鍵開啓集羣所有服務
bash ./bin/start-all.sh
# 啓停 Master
bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server
# 啓停 Worker
bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server
# 啓停 Api
bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server
# 啓停 Alert
bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server
原文鏈接:https://blog.csdn.net/Brother_ning/article/details/135149045
本文由 白鯨開源科技 提供發佈支持!