OceanBase 環境基本都會先安裝 OCP 來部署、監控、運維數據庫集羣。但如果有機器過保等問題,就需要有平穩的 OCP 節點的替換方案。
作者:張瑞遠
上海某公司 DBA,曾經從事銀行、證券數倉設計、開發、優化類工作,現主要從事電信級 IT 系統及數據庫工作。有三年以上 OceanBase 工作經驗。獲得的專業技能與認證包括 OceanBase OBCP、Oracle OCP 11g、OracleOCM 11g 、MySQL OCP 5.7。
本文來源:原創投稿
- 愛可生開源社區出品,原創內容未經授權不得隨意使用,轉載請聯繫小編並註明來源。
前言
OceanBase 雲平臺(OceanBase Cloud Platform,OCP),是以 OceanBase 爲核心的企業級數據庫管理平臺。
我們生產環境基本都是需要先創建 OCP 平臺,然後依賴 OCP 去創建及管理監控生產集羣,所以安裝 OCP 一般是系統上線的第一步。之後可能隨着機房規劃等問題,就會有需要搬遷或者替換 OCP 的機器的需求。
分別介紹兩種 OCP 節點的替換方法。一種是使用 OAT 平臺(OceanBase Admin Toolkit,管理者工具)來替換;另一種就是使用 ANTMAN 工具替換。上次我們介紹了第一種 OAT 的方案,本文介紹第二種。
PS:我的環境的 OCP 負載均衡使用的 F5,所以新的機器需要先配置 F5,其他負載均衡場景同理。
環境背景
大家如果有接觸 OB 生產環境的經驗的話,可能會了解,前期版本在安裝 OCP 的時候,需要安裝 OCP 軟件/metadb/obproxy 三個 Docker 包,後期 OCP 版本將 DB+Proxy 集成在了一個 Docker 包裏,OAT 的話只能納管 DB + Proxy。集成的 metadb,分開的情況還需要使用 ANTMAN 工具來替換。
軟件信息
軟件 | 版本 |
---|---|
OCP | ocp-all-in-one:3.3.3-20220906114643 |
metadb+proxy | OB2277_OBP320_x86_20220429 |
Proxy | 4.1.1_20230519_x86 |
antman | t-oceanbase-antman-1.4.3-20220807073355.alios7.x86_64 |
操作過程
3.1 環境檢查/準備
檢查替換機器環境,包括分盤,創建 admin 用戶,安裝 Docker 軟件等。安裝好後檢查下。
cd /root/t-oceanbase-antman/clonescripts/
sh precheck.sh -m ocp
登錄 meta
庫檢查有沒有 tenant
的主 ZONE 在要被替換的節點,提前切主。
MySQL [oceanbase]> select * from __all_Server;
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
| gmt_create | gmt_modified | svr_ip | svr_port | id | zone | inner_port | with_rootserver | status | block_migrate_in_time | build_version | stop_time | start_service_time | first_sessid | with_partition | last_offline_time |
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
| 2022-03-17 22:59:19.979627 | 2023-03-20 10:27:01.147283 | 10.10.100.87 | 2882 | 6 | META_OB_ZONE_2 | 2881 | 0 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1679279220991796 | 0 | 1 | 1679278517144838 |
| 2022-03-17 23:54:49.277939 | 2023-03-20 09:29:22.079578 | 10.10.100.9 | 2882 | 7 | META_OB_ZONE_1 | 2881 | 1 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1679275725595691 | 0 | 1 | 0 |
| 2021-12-21 22:44:16.476503 | 2023-03-20 09:29:22.080425 | 122.44.11.2 | 2882 | 5 | META_OB_ZONE_3 | 2881 | 0 | active | 0 | 2.2.76_20210406232249-a1e144bdc179fbf473cea37f199e8a76c736b8d4(Apr 6 2021 23:55:12) | 0 | 1640097866698859 | 0 | 1 | 0 |
+----------------------------+----------------------------+--------------+----------+----+----------------+------------+-----------------+--------+-----------------------+--------------------------------------------------------------------------------------+-----------+--------------------+--------------+----------------+-------------------+
MySQL [oceanbase]> select tenant_name,primary_zone from __all_tenant;
+----------------+----------------------------------------------+
| tenant_name | primary_zone |
+----------------+----------------------------------------------+
| sys | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| ocp_meta | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| ocp_monitor | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| oms_tt_tenant | META_OB_ZONE_1;META_OB_ZONE_3;META_OB_ZONE_2 |
| oms_cc7_tenant | META_OB_ZONE_3;META_OB_ZONE_2;META_OB_ZONE_1 |
| oms_ff9_tenant | META_OB_ZONE_1;META_OB_ZONE_2,META_OB_ZONE_3 |
| oms_cc9_tenant | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
| oms_dd_tenant | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
| obdw_meta | META_OB_ZONE_3;META_OB_ZONE_1,META_OB_ZONE_2 |
+----------------+----------------------------------------------+
MySQL [oceanbase]> alter tenant sys primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.04 sec)
MySQL [oceanbase]> alter tenant ocp_meta primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (1.27 sec)
MySQL [oceanbase]> alter tenant ocp_monitor primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.02 sec)
MySQL [oceanbase]> alter tenant oms_tt_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.03 sec)
MySQL [oceanbase]> alter tenant oms_ff9_tenant primary_zone='META_OB_ZONE_2;META_OB_ZONE_3,META_OB_ZONE_1';
Query OK, 0 rows affected (0.03 sec)
因爲使用 ANTMAN 工具遷移,需要在執行機器上修改 obcluster.conf
文件,或者直接從原 OCP 上 copy 後檢查下,鏡像包也需要傳到該機器 /root/t-oceanbase-antman
目錄下。
55obffocp:~/t-oceanbase-antman # cat obcluster.conf
ZONE1_RS_IP=10.10.100.9
ZONE2_RS_IP=10.10.100.87
ZONE3_RS_IP=122.44.11.2
###### 自動配置,無需修改 / AUTO-CONFIGURATION ######
OBSERVER01_HOSTNAME=OCP_META_SERVER_1
OBSERVER02_HOSTNAME=OCP_META_SERVER_2
OBSERVER03_HOSTNAME=OCP_META_SERVER_3
ZONE1_NAME=META_OB_ZONE_1 --後續命令參數,主要和參數文件中對上
ZONE2_NAME=META_OB_ZONE_2
ZONE3_NAME=META_OB_ZONE_3
##there must be more than half zone within same region
ZONE1_REGION=OCP_META_REGION
ZONE2_REGION=OCP_META_REGION
ZONE3_REGION=OCP_META_REGION
MYSQL_PORT=2881
RPC_PORT=2882
OCP_VERSION=3.3.3
檢查執行 ANTMAN 腳本機器上默認集羣密碼是否正確。cd ~/t-oceanbase-antman/tools
,執行 getpass.sh
的腳本,如果不對需要使用 setpass.sh
修改,因爲後續 Proxy 的 Docker 遷移後會有驗證,OCP 的 Docker 遷移前也會驗證。
55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F
password file sys in /root/.key already exist!
**********************
Password of root@sys is CqVgg9}Aut
Password of root@ocp_meta is r6kS^EINTU
Password of root@ocp_monitor is pkJv1a{7J7
Password of root@odc is j{fjdd3X9f
Password of root@oms is {oOIsE9fdQ
55obffocp:~ # mv .key .key_bak
55obffocp:~ # cd /root/t-oceanbase-antman/tools/
55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -s 0Aa255yK^F
**********************
Password of root@sys is 0Aa255yK^F
Password of root@ocp_meta is
Password of root@ocp_monitor is
Password of root@odc is
Password of root@oms is
55obffocp:~/t-oceanbase-antman/tools # bash setpass.sh -c rSf@jO%6EO
**********************
Password of root@sys is 0Aa255yK^F
Password of root@ocp_meta is rSf@jO%6EO
Password of root@ocp_monitor is
Password of root@odc is
Password of root@oms is
3.2 添加新機器
執行 ANTMAN 的 manage 腳本進行新機器的添加。
PS:這個版本 manage 會有報錯,文末會有分享。
55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
[2023-06-16 16:31:45.375633] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 16:31:45.381844] INFO [conf file is upper case format.]
[2023-06-16 16:31:45.391446] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: ob obproxy ocp
CLEAR_COMPONENTS:
IP_LIST: 133.55.22.19
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Jnydzycscc@123
ADMIN_PASSWORD_LIST: OceanBase#123
[2023-06-16 16:31:45.746503] INFO [INSTALL_COMPONENT: ob START ######################################]
[2023-06-16 16:31:45.751057] INFO [deploy_ob: check whether OBSERVER port 2881,2882 are in use or not on 133.55.22.19]
[2023-06-16 16:31:45.806500] INFO [deploy_ob: OBSERVER port 2881,2882 are idle on 133.55.22.19]
[2023-06-16 16:31:45.810773] INFO [deploy_ob: installing ob cluster, logfile: /root/t-oceanbase-antman/logs/deploy_ob.log]
cp: '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' and '/root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/install_OB_docker.sh' and '/root/t-oceanbase-antman/install_OB_docker.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
cp: '/root/.key' and '/root/.key' are the same file
skip copy same file
nohup: ignoring input
[2023-06-16 16:31:45.841348] INFO [installing OB docker and starting OB server on 133.55.22.19, pid: 144513, log: /root/t-oceanbase-antman/logs/install_OB_docker.log and /home/admin/logs/ob-server/ inside docker]
[2023-06-16 16:31:45.925592] INFO [load docker image: docker load -i /root/t-oceanbase-antman/OB2276_x86_20210409.tar.gz]
[2023-06-16 16:31:45.930723] INFO [install_OB_docker.sh is still running on 133.55.22.19]
[2023-06-16 16:31:56.021465] INFO [install_OB_docker.sh is still running on 133.55.22.19]
Loaded image: reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409
[2023-06-16 16:32:06.111458] INFO [install_OB_docker.sh is still running on 133.55.22.19]
[2023-06-16 16:32:06.359285] INFO [start container: docker run -d -it --cap-add SYS_RESOURCE --name META_OB_ZONE_1 --net=host -e OBCLUSTER_NAME=obcluster -e DEV_NAME=bond0 -e ROOTSERVICE_LIST="10.10.100.9:2882:2881;10.10.100.87:2882:2881;122.44.11.2:2882:2881" -e DATAFILE_DISK_PERCENTAGE=90 -e CLUSTER_ID=1632654636 -e ZONE_NAME=META_OB_ZONE_1 -e OBPROXY_PORT=2883 -e MYSQL_PORT=2881 -e RPC_PORT=2882 -e OCP_VIP=134.80.173.57 -e OCP_VPORT=80 -e app.password_root='Jnydzycscc@123' -e app.password_admin='OceanBase#123' -e OBPROXY_OPTSTR="" -e OPTSTR="cpu_count=64,system_memory=50G,memory_limit=254G,__min_full_resource_pool_memory=1073741824,_ob_enable_prepared_statement=false,memory_limit_percentage=90" --cpu-period 100000 --cpu-quota 6400000 --cpuset-cpus 0-63 --memory 256G -v /home/admin/oceanbase:/home/admin/oceanbase -v /data/log1:/data/log1 -v /data/1:/data/1 --restart on-failure:5 reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409]
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
4f1c15e8194cc1ae2fffcc124ea9c982b3fda87ce1a6d0038db88435c737af89
[2023-06-16 16:32:16.209761] INFO [install_OB_docker.sh finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 started on 133.55.22.19]
[2023-06-16 16:32:16.214771] INFO [waiting on observer ready on 133.55.22.19]
[2023-06-16 16:35:16.244133] INFO [waiting on observer ready on 133.55.22.19 for 3 Minitues]
[2023-06-16 16:36:16.264776] INFO [waiting on observer ready on 133.55.22.19 for 4 Minitues]
[2023-06-16 16:37:16.285808] INFO [waiting on observer ready on 133.55.22.19 for 5 Minitues]
[2023-06-16 16:37:16.579057] INFO [observer on 133.55.22.19 is ready]
[2023-06-16 16:37:16.584583] INFO [deploy_ob: installation of ob cluster done]
[2023-06-16 16:37:16.588617] INFO [INSTALL_COMPONENT: ob DONE ######################################]
[2023-06-16 16:37:16.593604] INFO [INSTALL_COMPONENT: obproxy START ######################################]
這裏日誌太多,就不都粘貼出來了,可以從上面看到 metadb 的 Docker 服務添加完後開始了 OBProxy 的 Docker 服務添加。
過程說明
-
133.55.22.19
是要去替換 OCP 的服務器的實際物理 IP。 -
-z 1
選項,指定的133.55.22.19
會被添加到 OCP 環境中的第 1 個 ZONE,即和上文查到的10.10.100.9
機器在同一個 ZONE 裏。這裏關於 ZONE 的定義,主要是針對 OCP 服務器上的
meta_ob docker
而言,obproxy docker 和 ocp docker 並沒有 ZONE 的概念。關於每臺 OCP 服務器上的 meta_ob docker 屬於哪一個 ZONE,請參考
obcluster.conf
配置文件中的三個變量:ZONE1_RS_IP
ZONE2_RS_IP
ZONE3_RS_IP
-
-R
和-A
,後面需要分別填寫成133.55.22.19
服務器的root
用戶密碼和admin
用戶密碼。 -
-i
是安裝,如果替換成-c
就是清除。
這時候正常的話可以通過新添加節點的 IP:8080
前臺登錄 OCP,也可以通過這個機器的 2883
端口去連 meta
庫了。
3.3 新增租戶
登錄 OCP 的 metadb 的 sys
租戶新增 meta_ob
Docker 的上線。
MySQL [oceanbase]> alter system add server '133.55.22.19:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.02 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 10.10.100.9 | META_OB_ZONE_1 | 0 | active | 1679275725595691 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 0 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.00 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 10.10.100.9 | META_OB_ZONE_1 | 0 | active | 1679275725595691 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
4 rows in set (0.01 sec)
3.4 替換下線
登錄 OCP 的 metadb 的 sys
租戶將被替換 meta_ob
Docker 的下線。
MySQL [oceanbase]> alter system delete server '10.10.100.9:2882' zone 'META_OB_ZONE_1';
Query OK, 0 rows affected (0.19 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+----------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+----------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 10.10.100.9 | META_OB_ZONE_1 | 0 | deleting | 1679275725595691 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
+--------------+----------------+-----------------+----------+--------------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
3.5 更新服務器信息
登錄 ocp_meta
租戶,手工更新 OCP 服務器信息。
前面步驟處理完,OCP 前臺還可以看到殘留的信息,需要替換下信息。
55obffocp:~/t-oceanbase-antman # mysql -h10.10.100.87 -P2883 -uroot@ocp_meta#obcluster -p'rSf@jO%6EO' -Docp -c
MySQL [ocp]> select * from compute_host where inner_ip_address='10.10.100.9'\G
*************************** 1. row ***************************
id: 1
name: ocp1a
description: NULL
operating_system: 4.12.14-120-default
architecture: x86_64
inner_ip_address: 10.10.100.9
ssh_port: 2022
kind: DEDICATED_PHYSICAL_MACHINE
publish_ports: NULL
status: ONLINE
vpc_id: 1
idc_id: 1
host_type_id: 1
serial_number: NULL
alias: NULL
create_time: 2021-09-26 21:04:11
update_time: 2023-03-20 11:01:58
1 row in set (0.00 sec)
MySQL [ocp]> update compute_host set inner_ip_address='133.55.22.19', name='55obffocp' where inner_ip_address='10.10.100.9';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MySQL [ocp]> select * from compute_host where id =1;
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
| id | name | description | operating_system | architecture | inner_ip_address | ssh_port | kind | publish_ports | status | vpc_id | idc_id | host_type_id | serial_number | alias | create_time | update_time |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
| 1 | 55obffocp | NULL | 4.12.14-120-default | x86_64 | 133.55.22.19 | 2022 | DEDICATED_PHYSICAL_MACHINE | NULL | ONLINE | 1 | 1 | 1 | NULL | NULL | 2021-09-26 21:04:11 | 2023-06-16 17:47:19 |
+----+-----------+-------------+---------------------+--------------+------------------+----------+----------------------------+---------------+--------+--------+--------+--------------+---------------+-------+---------------------+---------------------+
1 row in set (0.00 sec)
3.6 清理被替換機器上殘留的服務
ocp1a:~/t-oceanbase-antman # ./manage.sh -c ob,ocp,obproxy -l 10.10.100.9 -z 1 -R 'Dt!n(Rg4Av!t' -A OceanBase#123
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.101400] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 22:45:44.106779] INFO [conf file is upper case format.]
[2023-06-16 22:45:44.114437] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets:
CLEAR_COMPONENTS: ob obproxy ocp
IP_LIST: 10.10.100.9
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Dt!n(Rg4Av!t
ADMIN_PASSWORD_LIST: OceanBase#123
[2023-06-16 22:45:44.474268] INFO [CLEAR_COMPONENT: ob START ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:45:44.504069] INFO [remove OB server and docker on host: 10.10.100.9]
[2023-06-16 22:45:44.548697] INFO [docker rm -f 62ab623cb4ed]
62ab623cb4ed
[2023-06-16 22:46:01.260706] INFO [remove OB server and docker on host: 10.10.100.9 done!]
[2023-06-16 22:46:01.370808] INFO [uninstall.sh ob finished and reg.docker.alibaba-inc.com/antman/ob-docker:OB2276_x86_20210409 removed on 10.10.100.9]
[2023-06-16 22:46:01.375914] INFO [OB docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.380667] INFO [CLEAR_COMPONENT: ob DONE ######################################]
[2023-06-16 22:46:01.385398] INFO [CLEAR_COMPONENT: obproxy START ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.416495] INFO [remove obproxy docker on host:10.10.100.9]
[2023-06-16 22:46:01.514215] INFO [docker rm -f 01bdcadf2e11]
01bdcadf2e11
[2023-06-16 22:46:01.765459] INFO [remove obproxy docker on host:10.10.100.9 done!]
[2023-06-16 22:46:01.858848] INFO [uninstall.sh obproxy finished and reg.docker.alibaba-inc.com/antman/obproxy:OBP186_20210315 removed on 10.10.100.9]
[2023-06-16 22:46:01.863806] INFO [obproxy docker on 10.10.100.9 is removed]
[2023-06-16 22:46:01.868778] INFO [CLEAR_COMPONENT: obproxy DONE ######################################]
[2023-06-16 22:46:01.873368] INFO [CLEAR_COMPONENT: ocp START ######################################]
cp: '/root/t-oceanbase-antman/uninstall.sh' and '/root/t-oceanbase-antman/uninstall.sh' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/obcluster.conf' and '/root/t-oceanbase-antman/obcluster.conf' are the same file
skip copy same file
cp: '/root/t-oceanbase-antman/common/utils.sh' and '/root/t-oceanbase-antman/common/utils.sh' are the same file
skip copy same file
grep: /etc/system-release: No such file or directory
[2023-06-16 22:46:01.906934] INFO [remove ocp docker on host:10.10.100.9]
[2023-06-16 22:46:01.944811] INFO [docker rm -f 8b044744a92e]
8b044744a92e
[2023-06-16 22:46:26.162467] INFO [remove ocp docker on host:10.10.100.9 done]
[2023-06-16 22:46:26.253927] INFO [uninstall.sh ocp finished and reg.docker.alibaba-inc.com/oceanbase/ocp-all-in-one:3.3.3-20220906114643 removed on 10.10.100.9]
[2023-06-16 22:46:26.258281] INFO [ocp docker on 10.10.100.9 is removed]
[2023-06-16 22:46:26.263047] INFO [CLEAR_COMPONENT: ocp DONE ######################################]
ocp1a:~/t-oceanbase-antman # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
服務已經被清除。
報錯記錄處理
manage
腳本執行報錯。
55obffocp:~/t-oceanbase-antman # ./manage.sh -i ob,ocp,obproxy -l 133.55.22.19 -z 1 -R Jnydzycscc@123 -A OceanBase#123
[2023-06-16 16:31:03.305062] INFO [check conf file /root/t-oceanbase-antman/obcluster.conf format ...]
[2023-06-16 16:31:03.309079] INFO [conf file is upper case format.]
[2023-06-16 16:31:03.315290] INFO [SSH_AUTH=password SSH_USER=root SSH_PORT=22 SSH_PASSWORD= SSH_KEY_FILE=/root/.ssh/id_rsa]
LB_MODE=f5
INSTALL_COMPONENTS componets: ob obproxy ocp
CLEAR_COMPONENTS:
IP_LIST: 133.55.22.19
ZONE_LIST: 1
ROOT_PASSWORD_LIST: Jnydzycscc@123
ADMIN_PASSWORD_LIST: OceanBase#123
/root/t-oceanbase-antman/common/utils.sh: line 484: -e: command not found
[2023-06-16 16:31:03.636862] ERROR [: ssh authorization to 133.55.22.19 failed, Please check SSH affinity environment varialbes.]
這個問題也需要修改腳本代碼解決。
執行 alter system delete server
語句之後很久,被替換的 Server 沒有 Delete 掉,一直是 deleting
狀態,檢查發現 OCP 的 meta
庫內存參數調整過,新加的 Server 參數小,導致 UNIT 遷移卡住。
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+----------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+----------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 10.10.100.9 | META_OB_ZONE_1 | 0 | deleting | 1679275725595691 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
+--------------+----------------+-----------------+----------+--------------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select count(*),svr_ip from gv$unit group by svr_ip;
+----------+--------------+
| count(*) | svr_ip |
+----------+--------------+
| 27 | 133.55.22.19 |
| 33 | 10.10.100.87 |
| 33 | 122.44.11.2 |
| 6 | 10.10.100.9 |
+----------+--------------+
4 rows in set (0.01 sec)
MySQL [oceanbase]> select * from gv$unit where svr_ip='10.10.100.9';
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
| unit_id | unit_config_id | unit_config_name | resource_pool_id | resource_pool_name | zone | tenant_id | tenant_name | svr_ip | svr_port | migrate_from_svr_ip | migrate_from_svr_port | max_cpu | min_cpu | max_memory | min_memory | max_iops | min_iops | max_disk_size | max_session_num |
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
| 1106 | 1090 | config_oms_tt_tenant_META_OB_ZONE_1_S2_gpa | 1080 | pool_oms_tt_tenant_META_OB_ZONE_1_gpa | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 3 | 3 | 12884901888 | 12884901888 | 2500 | 2500 | 536870912000 | 750 |
| 1139 | 1094 | oms_unit | 1129 | oms_ff9_tenant_resource_pool | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 2 | 2 | 5368709120 | 4294967296 | 128 | 128 | 5368709120 | 10000 |
| 1122 | 1097 | config_oms_c55_tenant_META_OB_ZONE_1_S1_ifu | 1088 | pool_oms_c55_tenant_META_OB_ZONE_1_ifu | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
| 1126 | 1100 | config_oms_ff6_tenant_META_OB_ZONE_1_S1_uzz | 1092 | pool_oms_ff6_tenant_META_OB_ZONE_1_uzz | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
| 1127 | 1101 | config_oms_ff7_tenant_META_OB_ZONE_1_S1_gkj | 1093 | pool_oms_ff7_tenant_META_OB_ZONE_1_gkj | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
| 1135 | 1108 | config_oms_cc8_tenant_META_OB_ZONE_1_S1_wwo | 1101 | pool_oms_cc8_tenant_META_OB_ZONE_1_wwo | META_OB_ZONE_1 | NULL | NULL | 10.10.100.9 | 2882 | | 0 | 1.5 | 1.5 | 6442450944 | 6442450944 | 1250 | 1250 | 536870912000 | 375 |
+---------+----------------+---------------------------------------------+------------------+----------------------------------------+----------------+-----------+-------------+-------------+----------+---------------------+-----------------------+---------+---------+-------------+-------------+----------+----------+---------------+-----------------+
6 rows in set (0.02 sec)
MySQL [oceanbase]> alter system migrate unit=1106 destination='133.55.22.19:2882';
ERROR 4624 (HY000): machine resource is not enough to hold a new unit ----------------手動去遷移報資源不足
MySQL [oceanbase]> select zone,svr_ip, cpu_total, cpu_assigned,cpu_assigned_percent cpu_ass_pct, round(mem_total/1024/1024/1024) mem_total_gb,
-> round(mem_assigned/1024/1024/1024) mem_ass_gb, mem_assigned_percent mem_ass_pct, unit_num, migrating_unit_num, leader_count, round(`load`,2) `load`
-> from __all_virtual_server_stat
-> order by zone, svr_ip; ---------------------檢查資源發現內存不足
+----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
| zone | svr_ip | cpu_total | cpu_assigned | cpu_ass_pct | mem_total_gb | mem_ass_gb | mem_ass_pct | unit_num | migrating_unit_num | leader_count | load |
+----------------+--------------+-----------+--------------+-------------+--------------+------------+-------------+----------+--------------------+--------------+------+
| META_OB_ZONE_1 | 10.10.100.9 | 62 | 11 | 17 | 250 | 40 | 16 | 6 | 0 | 0 | 0.17 |
| META_OB_ZONE_1 | 133.55.22.19 | 62 | 48 | 77 | 204 | 196 | 96 | 27 | 0 | 0 | 0.87 |
| META_OB_ZONE_2 | 10.10.100.87 | 62 | 59 | 95 | 250 | 236 | 94 | 33 | 0 | 2935 | 0.95 |
| META_OB_ZONE_3 | 122.44.11.2 | 62 | 59 | 95 | 250 | 236 | 94 | 33 | 0 | 1051 | 0.95 |
MySQL [oceanbase]> show parameters like '%memory_limit%'
;
| META_OB_ZONE_3 | observer | 122.44.11.2 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_1 | observer | 133.55.22.19 | 2882 | memory_limit | NULL | 254G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_2 | observer | 10.10.100.87 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
| META_OB_ZONE_1 | observer | 10.10.100.9 | 2882 | memory_limit | NULL | 300G | the size of the memory reserved for internal use(for testing purpose). Range: [0M,) | OBSERVER | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |
MySQL [oceanbase]> alter system set memory_limit ='300G' ;
Query OK, 0 rows affected (0.05 sec)
MySQL [oceanbase]> select count(*),svr_ip from gv$unit group by svr_ip;
+----------+--------------+
| count(*) | svr_ip |
+----------+--------------+
| 33 | 133.55.22.19 |
| 33 | 10.10.100.87 |
| 33 | 122.44.11.2 |
+----------+--------------+
3 rows in set (0.00 sec)
MySQL [oceanbase]> select svr_ip, zone, with_rootserver, status, start_service_time from __all_server;
+--------------+----------------+-----------------+--------+--------------------+
| svr_ip | zone | with_rootserver | status | start_service_time |
+--------------+----------------+-----------------+--------+--------------------+
| 10.10.100.87 | META_OB_ZONE_2 | 1 | active | 1679279220991796 |
| 122.44.11.2 | META_OB_ZONE_3 | 0 | active | 1640097866698859 |
| 133.55.22.19 | META_OB_ZONE_1 | 0 | active | 1686908200404755 |
+--------------+----------------+-----------------+--------+--------------------+
3 rows in set (0.00 sec)
總結
到此,使用 ANTMAN 工具的方式去替換 OCP 機器的操作就結束了,包括前面一篇使用 OAT 替換 OCP 節點的文章可能看起來沒什麼難度,但是整個過程來回做了好幾遍,充滿我的坎坷和淚水。爲了別人以後少踩坑,所以寫了這兩篇文章分享。
如果看了上篇文章的話,應該知道 OAT 替換 OCP 的時候,新加機器是在 metadb
中新創建了一個 ZONE,然後再把被替換機器下掉。其中還涉及新建資源池修改 Locality,增加副本數等操作。其實使用 ANTMAN 工具的話這個步驟就不太一樣,是將新機器加入到需要替換機器的同一個 ZONE 內,然後做同 ZONE 內 UNIT 的遷移,然後把被替換的機器下線。
現階段的話,相對來說使用 ANTMAN 工具替換之後對於 OCP 元數據的影響更小一些,但是 OAT 黑屏操作更少已些。對於 OBProxy 單獨 Docker 的前期場景必須使用 ANTMAN,後期版本就看大家自己酌情選擇了。
行之所向,莫問遠方。
關於 SQLE
愛可生開源社區的 SQLE 是一款面向數據庫使用者和管理者,支持多場景審覈,支持標準化上線流程,原生支持 MySQL 審覈且數據庫類型可擴展的 SQL 審覈工具。