OSD(Hammer)開機自啓動失敗

OSD(Hammer)開機自啓動失敗

分析及解決

1. 初步定位問題出在手動添加的幾個osd的磁盤分區的type code沒有修改。

  • ceph中兩種類型分區的type code:
type type code
journal 45b0969e-9b03-4f30-b4c6-b4b80ceff106
osd 4fbd7e29-9d25-41b8-afd0-062c0ceff05d
  • 修改分區type code及分區名
root@host1:~# sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 -c 1:"ceph journal" /dev/nvme0n1
The operation has completed successfully.

root@host1:~# sgdisk -t 3:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -c 3:"ceph data" /dev/nvme0n1
The operation has completed successfully.

第一種解決後,OSD還是沒有開機啓動,繼續分析:

2. 試圖使用ceph-disk activate-all手動啓動所有osd,但失敗了,報錯如下:

Error EINVAL: entity osd.27 exists but cap mon does not match
ERROR:ceph-disk:Failed to activate
ceph-disk: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.27', '-i', '/var/lib/ceph/tmp/mnt.pHvUBm/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22
Error EINVAL: entity osd.30 exists but cap mon does not match
ERROR:ceph-disk:Failed to activate
ceph-disk: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.30', '-i', '/var/lib/ceph/tmp/mnt.KJtXc9/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22
Error EINVAL: entity osd.24 exists but cap mon does not match
ERROR:ceph-disk:Failed to activate
ceph-disk: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.24', '-i', '/var/lib/ceph/tmp/mnt.HraMX9/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22
Error EINVAL: entity osd.29 exists but cap mon does not match
ERROR:ceph-disk:Failed to activate
ceph-disk: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.29', '-i', '/var/lib/ceph/tmp/mnt.tQTBrQ/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22
Error EINVAL: entity osd.28 exists but cap mon does not match
ERROR:ceph-disk:Failed to activate
ceph-disk: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.28', '-i', '/var/lib/ceph/tmp/mnt.vVEkBg/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22
ceph-disk: Error: One or more partitions failed to activate

從log來看, 應該是caps的問題。

: 在系統啓動的時候,其實也有上面的log,在/var/log/upstart目錄下的ceph-osd-all-starter.log中。

  • 查看ceph auth:
root@node-16:~# ceph auth list
installed auth entries:

osd.0
    key: AQBiCpZXqoEFJxAAJSJNB6ssR6Llfem6yYapiQ==
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.1
    key: AQBkCpZXLu8wBRAAB/93w9IREudzSZqFCe8BPw==
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.24
    key: AQBORxBYNsfGORAAon01sB3Bc3smHw4aH37hqg==
    caps: [mon] allow rwx
    caps: [osd] allow *
osd.25
    key: AQD6RxBYx37NIxAA1ruW4XnGuHgqlGdzxVlXPA==
    caps: [mon] allow rwx
    caps: [osd] allow *

對比上面正常啓動的OSD和不正常的OSD發現:

正常啓動的OSD的mon caps是”allow profile osd”

異常的OSD的mon caps是”allow rwx”

基本上判斷是這個問題導致的。

  • 修改OSD的caps:
root@node-16:/var/lib/ceph# ceph auth caps osd.24 mon 'allow profile osd' osd 'allow *'
updated caps for osd.24
root@node-16:/var/lib/ceph# ceph auth caps osd.25 mon 'allow profile osd' osd 'allow *'
updated caps for osd.25
發佈了38 篇原創文章 · 獲贊 62 · 訪問量 17萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章