藍鯨社區版_部署bkdata服務無法啓動問題

  • 問題:藍鯨社區版完整部署,執行安裝bkdata時,報“databus.service.consul start failed.”

[root@paas-1 install]# ./bkcec start bkdata

[192.168.50.117]20181212-091416 72   starting bkdata(ALL) on host: 192.168.50.115

"-":23: bad minute

errors in crontab file, can't install.

[192.168.50.117]20181212-091427 79   going to init snapshot data. this may take a while.

E

======================================================================

ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)

------------------------------------------------------------------------------------------------------------------------------

  • 排查思路:

  1. 執行dig databus.service.consul正常

  2. 執行./bkcec start bkdata databus提示“ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)”

  3. 執行./bkcec stop bkdata之後,執行./bkcec install bkdata 1(去除之前的環境,覆蓋安裝)

  4. 執行./bkcec initdata bkdata(初始化bkdata)

  5. 執行./bkcec start bkdata,再次報“databus.service.consul start failed.”

註釋:cat  .bk_install.step可查看安裝進度...

  • 問題解決:重啓了cmdb

  • 原因分析:

  1. bkdata從cmdb獲取基礎業務的信息獲取不到,導致報錯。

  2. 還有個是腳本bug

(bkdata機器上,執行vim /data/bkce/bkdata/dataapi/databus/tests.py,將 “update_bizid” 引用這個字段的內容註釋掉,該問題在下個版本中會進行修復。)

tests.py內容註釋掉之後的效果圖示:

圖片.png


其他輔助操作命令:

  • 確認中控機位置:

  • [root@paas-1 install]# cat /data/install/.controller_ip

    192.168.50.117

  • 查看日誌:

[root@paas-1 install]# cd /data/bkce/logs/

[root@paas-1 logs]# ll

  • 加載ssh工具($:代表變量,cat /data/install/utils.fc)

[root@paas-1 install]# source utils.fc

[root@paas-1 install]# ssh $BKDATA_IP

#ssh登錄主機後,可以執行ifconfig查看對應主機ip,utils.fc爲腳本文件。加載utils.fc主要是爲了調用服務名稱登錄主機。而不需要以ip的方式登錄主機。

[root@rbtnode1 install]# ssh $FTA_IP

  • 查看詳細:

[root@rbtnode1 bkdata]# ls -lsrt

  • 顯示日誌信息

[root@rbtnode1 bkdata]# tail -f kernel.log

  • 查看性能資源:

[root@rbtnode1 bkdata]# top

top - 09:35:26 up 18:53,  1 user,  load average: 17.46, 12.08, 10.62

Tasks: 361 total,   1 running, 359 sleeping,   1 stopped,   0 zombie

%Cpu(s): 64.0 us, 14.2 sy,  0.0 ni, 20.8 id,  0.2 wa,  0.0 hi,  0.9 si,  0.0 st

KiB Mem : 16267340 total,  2454172 free, 12075284 used,  1737884 buff/cache

KiB Swap:  6160380 total,  6160380 free,        0 used.  3669524 avail Mem

  • 查看啓動任務計劃:確保服務是否正常運行,或看配置文件是否選舉出集羣領導者

如有亂碼,可以執行清除任務計劃,然後[root@rbtnode1 install]# ./bkcec install cron 1進行重新安裝crontab,服務啓動的時候自動會寫入crontab

[root@rbtnode1 ~]# crontab -l

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch consul >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch nginx >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch zk >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch rabbitmq >/dev/null 2>&1

* * * * * /usr/local/gse/agent/bin/gsectl watch

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch paas_agent >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch es >/dev/null 2>&1

* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch kafka >/dev/null 2>&1

*/10 * * * * /data/bkce/bkdata/dataapi/bin/update_cc_cache.sh

  • 查看進程信息:

[root@rbtnode1 ~]# ps -ef |grep bkdata

[root@rbtnode1 ~]# ps -ef |grep gse_agent


說明:腳本bug的問題主要是爲了解決安裝部署藍鯨時,在初始化bkdata遇到的以下問題:

  • 解決方法:bkdata機器上,執行vim /data/bkce/bkdata/dataapi/databus/tests.py,將 “update_bizid” 引用這個字段的內容註釋掉。

  • 原因分析:如不註釋掉,該引用的字段內容,將消耗很大的主機資源,導致主機因性能瓶頸以致藍鯨服務拉不起來。

[root@paas-1 install]# ./bkcec initdata bkdata


                       initdata for bkdata()                        

[192.168.50.117]20181212-101752 153   exec initdata_bkdata on 192.168.50.115

[192.168.50.115]20181212-101755 103   start to make migration for bkdata ...

[192.168.50.115]20181212-101755 111   on-migrate ... /data/bkce/bkdata/dataapi/on_migrate

[192.168.50.115]20181212-101757 9   init dataserver zk config

[192.168.50.115]20181212-101757 12   create topic

[192.168.50.115]20181212-101758 15   run trt migration

System check identified some issues:


WARNINGS:

trt.TrtResultTableField.field_index: (fields.W122) 'max_length' is ignored when used with IntegerField

       HINT: Remove 'max_length' from field

Operations to perform:

 Apply all migrations: trt

Running migrations:

 No migrations to apply.

 Your models have changes that are not yet reflected in a migration, and so won't be applied.

 Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.

[192.168.50.115]20181212-101801 18   insert reserved dataid

E=================set reserved dataid========================================

======================================================================

ERROR: update_reserved_dataid (databus.tests.DatabusHealthTestCase)

------------------------------------------------------------------------------------------------------------------------------

Traceback (most recent call last):

 File "/data/bkce/bkdata/dataapi/databus/tests.py", line 46, in update_reserved_dataid

   blueking_bizid = utils.get_blueking_bizid()

 File "/data/bkce/bkdata/dataapi/databus/init/utils.py", line 19, in get_blueking_bizid

   raise Exception('Failed to get application id of BlueKing. The response is error %s' % json.dumps(ret))

Exception: Failed to get application id of BlueKing. The response is error {"message": "Component request third-party system [CC] interface [get_app_list] error: Status Code: 404, Error Message: Third-party system does not find this interface, please try again later or contact component developer to handle this", "code": 1306201, "data": null, "result": false, "request_id": "47ef124353824f7a898900c0defc93e1"}


----------------------------------------------------------------------

Ran 1 test in 0.759s


FAILED (errors=1)

[192.168.50.115]20181212-101804 21   running 'update_reserved_dataid' for databus health test failed.

[192.168.50.115]20181212-101804 130   migrate failed for bkdata(dataapi)

[192.168.50.117]20181212-101803 453   create database bksuite_common

[192.168.50.117]20181212-101803 455   add version info to db



環境說明:

[root@paas-1 install]# cd /data/src/
您在 /var/spool/mail/root 中有新郵件
[root@paas-1 src]# grep . VERSION  */VERSION */*/VERSION
VERSION:4.1.16
cmdb/VERSION:0.0.42
fta/VERSION:4.1.12
gse/VERSION:3.2.12
job/VERSION:4.3.3
license/VERSION:3.1.4
open_paas/VERSION:3.0.83
paas_agent/VERSION:3.0.8
bkdata/dataapi/VERSION:1.2.105
bkdata/databus/VERSION:1.2.23
bkdata/monitor/VERSION:0.2.6
[root@paas-1 src]#

注,本文章爲個人近期學習藍鯨的內容總結,僅供大家參考學習!
























發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章