蓝鲸社区版_部署bkdata服务无法启动问题

  • 问题:蓝鲸社区版完整部署,执行安装bkdata时,报“databus.service.consul start failed.”

[root@paas-1 install]# ./bkcec start bkdata

[192.168.50.117]20181212-091416 72   starting bkdata(ALL) on host: 192.168.50.115

"-":23: bad minute

errors in crontab file, can't install.

[192.168.50.117]20181212-091427 79   going to init snapshot data. this may take a while.

E

======================================================================

ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)

------------------------------------------------------------------------------------------------------------------------------

  • 排查思路:

  1. 执行dig databus.service.consul正常

  2. 执行./bkcec start bkdata databus提示“ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)”

  3. 执行./bkcec stop bkdata之后,执行./bkcec install bkdata 1(去除之前的环境,覆盖安装)

  4. 执行./bkcec initdata bkdata(初始化bkdata)

  5. 执行./bkcec start bkdata,再次报“databus.service.consul start failed.”

注释:cat  .bk_install.step可查看安装进度...

  • 问题解决:重启了cmdb

  • 原因分析:

  1. bkdata从cmdb获取基础业务的信息获取不到,导致报错。

  2. 还有个是脚本bug

(bkdata机器上,执行vim /data/bkce/bkdata/dataapi/databus/tests.py,将 “update_bizid” 引用这个字段的内容注释掉,该问题在下个版本中会进行修复。)

tests.py内容注释掉之后的效果图示:

图片.png


其他辅助操作命令:

  • 确认中控机位置:

  • [root@paas-1 install]# cat /data/install/.controller_ip

    192.168.50.117

  • 查看日志:

[root@paas-1 install]# cd /data/bkce/logs/

[root@paas-1 logs]# ll

  • 加载ssh工具($:代表变量,cat /data/install/utils.fc)

[root@paas-1 install]# source utils.fc

[root@paas-1 install]# ssh $BKDATA_IP

#ssh登录主机后,可以执行ifconfig查看对应主机ip,utils.fc为脚本文件。加载utils.fc主要是为了调用服务名称登录主机。而不需要以ip的方式登录主机。

[root@rbtnode1 install]# ssh $FTA_IP

  • 查看详细:

[root@rbtnode1 bkdata]# ls -lsrt

  • 显示日志信息

[root@rbtnode1 bkdata]# tail -f kernel.log

  • 查看性能资源:

[root@rbtnode1 bkdata]# top

top - 09:35:26 up 18:53,  1 user,  load average: 17.46, 12.08, 10.62

Tasks: 361 total,   1 running, 359 sleeping,   1 stopped,   0 zombie

%Cpu(s): 64.0 us, 14.2 sy,  0.0 ni, 20.8 id,  0.2 wa,  0.0 hi,  0.9 si,  0.0 st

KiB Mem : 16267340 total,  2454172 free, 12075284 used,  1737884 buff/cache

KiB Swap:  6160380 total,  6160380 free,        0 used.  3669524 avail Mem

  • 查看启动任务计划:确保服务是否正常运行,或看配置文件是否选举出集群领导者

如有乱码,可以执行清除任务计划,然后[root@rbtnode1 install]# ./bkcec install cron 1进行重新安装crontab,服务启动的时候自动会写入crontab

[root@rbtnode1 ~]# crontab -l

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch consul >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch nginx >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch zk >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch rabbitmq >/dev/null 2>&1

* * * * * /usr/local/gse/agent/bin/gsectl watch

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch paas_agent >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch es >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch kafka >/dev/null 2>&1

*/10 * * * * /data/bkce/bkdata/dataapi/bin/update_cc_cache.sh

  • 查看进程信息:

[root@rbtnode1 ~]# ps -ef |grep bkdata

[root@rbtnode1 ~]# ps -ef |grep gse_agent


说明:脚本bug的问题主要是为了解决安装部署蓝鲸时,在初始化bkdata遇到的以下问题:

  • 解决方法:bkdata机器上,执行vim /data/bkce/bkdata/dataapi/databus/tests.py,将 “update_bizid” 引用这个字段的内容注释掉。

  • 原因分析:如不注释掉,该引用的字段内容,将消耗很大的主机资源,导致主机因性能瓶颈以致蓝鲸服务拉不起来。

[root@paas-1 install]# ./bkcec initdata bkdata


                       initdata for bkdata()                        

[192.168.50.117]20181212-101752 153   exec initdata_bkdata on 192.168.50.115

[192.168.50.115]20181212-101755 103   start to make migration for bkdata ...

[192.168.50.115]20181212-101755 111   on-migrate ... /data/bkce/bkdata/dataapi/on_migrate

[192.168.50.115]20181212-101757 9   init dataserver zk config

[192.168.50.115]20181212-101757 12   create topic

[192.168.50.115]20181212-101758 15   run trt migration

System check identified some issues:


WARNINGS:

trt.TrtResultTableField.field_index: (fields.W122) 'max_length' is ignored when used with IntegerField

       HINT: Remove 'max_length' from field

Operations to perform:

 Apply all migrations: trt

Running migrations:

 No migrations to apply.

 Your models have changes that are not yet reflected in a migration, and so won't be applied.

 Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.

[192.168.50.115]20181212-101801 18   insert reserved dataid

E=================set reserved dataid========================================

======================================================================

ERROR: update_reserved_dataid (databus.tests.DatabusHealthTestCase)

------------------------------------------------------------------------------------------------------------------------------

Traceback (most recent call last):

 File "/data/bkce/bkdata/dataapi/databus/tests.py", line 46, in update_reserved_dataid

   blueking_bizid = utils.get_blueking_bizid()

 File "/data/bkce/bkdata/dataapi/databus/init/utils.py", line 19, in get_blueking_bizid

   raise Exception('Failed to get application id of BlueKing. The response is error %s' % json.dumps(ret))

Exception: Failed to get application id of BlueKing. The response is error {"message": "Component request third-party system [CC] interface [get_app_list] error: Status Code: 404, Error Message: Third-party system does not find this interface, please try again later or contact component developer to handle this", "code": 1306201, "data": null, "result": false, "request_id": "47ef124353824f7a898900c0defc93e1"}


----------------------------------------------------------------------

Ran 1 test in 0.759s


FAILED (errors=1)

[192.168.50.115]20181212-101804 21   running 'update_reserved_dataid' for databus health test failed.

[192.168.50.115]20181212-101804 130   migrate failed for bkdata(dataapi)

[192.168.50.117]20181212-101803 453   create database bksuite_common

[192.168.50.117]20181212-101803 455   add version info to db



环境说明:

[root@paas-1 install]# cd /data/src/
您在 /var/spool/mail/root 中有新邮件
[root@paas-1 src]# grep . VERSION  */VERSION */*/VERSION
VERSION:4.1.16
cmdb/VERSION:0.0.42
fta/VERSION:4.1.12
gse/VERSION:3.2.12
job/VERSION:4.3.3
license/VERSION:3.1.4
open_paas/VERSION:3.0.83
paas_agent/VERSION:3.0.8
bkdata/dataapi/VERSION:1.2.105
bkdata/databus/VERSION:1.2.23
bkdata/monitor/VERSION:0.2.6
[root@paas-1 src]#

注,本文章为个人近期学习蓝鲸的内容总结,仅供大家参考学习!
























發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章