關於ceph的一些問題及解決

1.問題:

# ceph health
HEALTH_WARN application not enabled on 1 pool(s)

解決:

# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'kube'
    use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
# ceph osd pool application enable kube rbd
enabled application 'rbd' on pool 'kube'
# ceph health
HEALTH_OK

2.問題:

# ceph -s
  cluster:
    id:     e781a2e4-097d-4867-858d-bdbd3a264435
    health: HEALTH_WARN
            clock skew detected on mon.ceph02, mon.ceph03

解決:

####確認NTP服務是否正常工作
# systemctl status ntpd
####修改ceph配置中的時間偏差閾值
# vim /etc/ceph/ceph.conf
###在global字段下添加:
mon clock drift allowed = 2
mon clock drift warn backoff = 30   
####向需要同步的mon節點推送配置文件
# cd /etc/ceph/
# ceph-deploy --overwrite-conf config  push ceph{01..03}
####重啓mon服務並驗證
# systemctl restart ceph-mon.target
# ceph -s
  cluster:
    id:     e781a2e4-097d-4867-858d-bdbd3a264435
    health: HEALTH_OK

3.問題:

# rbd map abc/zhijian --id admin
rbd: sysfs write failed
RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

解決:

由於kernel不支持塊設備鏡像的一些特性，所以映射失敗

# rbd feature disable abc/zhijian exclusive-lock, object-map, fast-diff, deep-flatten
# rbd info abc/zhijian
rbd image 'zhijian':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.1011074b0dc51
format: 2
features: layering
flags: 
create_timestamp: Sun May  6 13:35:21 2018
# rbd map abc/zhijian --id admin
/dev/rbd0

4.問題:

# ceph osd pool delete cephfs_data
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool cephfs_data.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool

解決:

# tail -n 2 /etc/ceph/ceph.conf 
[mon]
mon allow pool delete = true

向需要同步的mon節點推送配置文件:

# cd /etc/ceph/
# ceph-deploy --overwrite-conf config  push ceph{01..03}

重啓mon服務並驗證:

# systemctl restart ceph-mon.target
# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
pool 'cephfs_data' removed

5.問題:

# ceph osd pool rm cephfs_data cephfs_data --yes-i-really-really-mean-it
Error EBUSY: pool 'cephfs_data' is in use by CephFS

解決:

# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
#  ceph fs rm cephfs --yes-i-really-mean-it
Error EINVAL: all MDS daemons must be inactive before removing filesystem
# systemctl stop ceph-mds.target
# ceph fs rm cephfs
Error EPERM: this is a DESTRUCTIVE operation and will make data in your filesystem permanently inaccessible.  Add --yes-i-really-mean-it if you are sure you wish to continue.
# ceph fs rm cephfs --yes-i-really-mean-it
# ceph fs ls
No filesystems enabled

6.問題:

使用靜態PV創建pod，pod一直處於ContainerCreating狀態:

# kubectl get pod ceph-pod1
NAME        READY     STATUS              RESTARTS   AGE
ceph-pod1   0/1       ContainerCreating   0          10s
......
# kubectl describe pod ceph-pod1
Warning  FailedMount             41s (x8 over 1m)  kubelet, node01            MountVolume.WaitForAttach failed for volume "ceph-pv" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()
Warning  FailedMount             0s                kubelet, node01            Unable to mount volumes for pod "ceph-pod1_default(14e3a07d-93a8-11e8-95f6-000c29b1ec26)": timeout expired waiting for volumes to attach or mount for pod "default"/"ceph-pod1". list of unmounted volumes=[ceph-vol1]. list of unattached volumes=[ceph-vol1 default-token-v9flt]

解決:node節點安裝最新版的ceph-common解決該問題，ceph集羣使用的是最新的mimic版本，而base源的版本太陳舊，故出現該問題

7.問題:

創建動態PV，PVC一直處於pending狀態:

# kubectl get pvc -n ceph
NAME       STATUS   VOLUME  CAPACITY  ACCESS MODES  STORAGECLASS  AGE
ceph-pvc   Pending                                     ceph-rbd    2m
# kubectl describe pvc -n ceph
......
Warning  ProvisioningFailed  27s   persistentvolume-controller  Failed to provision volume with StorageClass "ceph-rbd": failed to create rbd image: exit status 1, command output: 2018-07-31 11:10:33.395991 7faa3558b7c0 -1 did not load config file, using default settings.
rbd: extraneous parameter --image-feature

解決:

persistentvolume-controller 服務運行在master節點，受kube-controller-manager 控制，故master節點也需要安裝ceph-common包

關於ceph的一些問題及解決

Kubernetes部署Jenkins

vSphere 6.5之VCSA安裝

Ceph添加新節點

關於ceph的一些問題及解決

Kubernetes+Gitlab+Jenkins構建鏡像並創建Pod

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結