Ceph运维常用命令
📅 2017-04-07 | 🖱️
🔖 ceph
1.集群管理 #
1.1 启动和停止各组件 #
查看当前节点的systemd unit服务:
1systemctl list-units 'ceph*' --type=service
2UNIT LOAD ACTIVE SUB DESCRIPTION
3ceph-mgr@node1.service loaded active running Ceph cluster manager daemon
4ceph-mon@node1.service loaded active running Ceph cluster monitor daemon
5ceph-osd@0.service loaded active running Ceph object storage daemon osd.0
6ceph-radosgw@rgw.node1.service loaded active running Ceph rados gateway
查看某个服务是否开机启动:
1systemctl is-enabled [email protected]
2enabled
确认了相关的systemd unit后就可以使用systemd在Ceph集群各个节点上启动或停止各个Ceph组件。
1.2 查看集群状态 #
查看集群的概要状态:
1ceph health
2HEALTH_OK
3
4
5ceph -s
6 cluster 83d9e421-46bf-4d64-af15-af0e2c381b88
7 health HEALTH_WARN
8 clock skew detected on mon.node2
9 Monitor clock skew detected
10 monmap e2: 3 mons at {node1=192.168.61.41:6789/0,node2=192.168.61.42:6789/0,node3=192.168.61.43:6789/0}
11 election epoch 114, quorum 0,1,2 node1,node2,node3
12 mgr active: node2 standbys: node1, node3
13 osdmap e138: 3 osds: 3 up, 3 in
14 flags sortbitwise,require_jewel_osds,require_kraken_osds
15 pgmap v9909: 152 pgs, 12 pools, 2686 bytes data, 214 objects
16 133 MB used, 284 GB / 284 GB avail
17 152 active+clean
1.3 查看集群空间使用 #
1ceph df
2GLOBAL:
3 SIZE AVAIL RAW USED %RAW USED
4 284G 284G 133M 0.05
5POOLS:
6 NAME ID USED %USED MAX AVAIL OBJECTS
7 rbd 0 0 0 97185M 0
8 .rgw.root 1 1681 0 97185M 4
9 default.rgw.control 2 0 0 97185M 8
10 default.rgw.data.root 3 603 0 97185M 2
11 default.rgw.gc 4 0 0 97185M 32
12 default.rgw.lc 5 0 0 97185M 32
13 default.rgw.log 6 0 0 97185M 128
14 default.rgw.users.uid 7 350 0 97185M 2
15 default.rgw.users.email 8 8 0 97185M 1
16 default.rgw.users.keys 9 8 0 97185M 1
17 default.rgw.buckets.index 10 0 0 97185M 1
18 default.rgw.buckets.data 11 36 0 97185M 3
查看每个OSD的存储空间
1ceph osd df
2ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
3 0 1.80170 1.00000 1844G 28903M 1816G 1.53 0.98 152
4 1 1.80170 1.00000 1844G 29313M 1816G 1.55 1.00 152
5 2 1.80170 1.00000 1844G 29873M 1815G 1.58 1.02 152
6 TOTAL 5534G 88089M 5448G 1.55
7MIN/MAX VAR: 0.98/1.02 STDDEV: 0.02
2.pool管理 #
2.1 删除pool #
1ceph osd pool delete poolname poolname --yes-i-really-really-mean-it
删除pool的命令比较危险,需要重复输入两次pool的名字,并且带上--yes-i-really-really-mean-it
参数
2.2 设置Pool的配额 #
1ceph osd pool set-quota cephfs_data max_bytes $((100 * 1024 * 1024 * 1024))
2
3ceph osd pool get-quota cephfs_data
4quotas for pool 'cephfs_data':
5 max objects: N/A
6 max bytes : 100GiB
取消配额限制只需要把对应值设为0即可。
3.MON节点命令 #
查看MON节点状态:
1ceph quorum_status
2
3{
4 "election_epoch": 114,
5 "quorum": [
6 0,
7 1,
8 2
9 ],
10 "quorum_names": [
11 "node1",
12 "node2",
13 "node3"
14 ],
15 "quorum_leader_name": "node1",
16 "monmap": {
17 "epoch": 2,
18 "fsid": "83d9e421-46bf-4d64-af15-af0e2c381b88",
19 "modified": "2017-04-06 19:52:00.882973",
20 "created": "2017-04-06 19:51:47.569731",
21 "features": {
22 "persistent": [
23 "kraken"
24 ],
25 "optional": []
26 },
27 "mons": [
28 {
29 "rank": 0,
30 "name": "node1",
31 "addr": "192.168.61.41:6789/0",
32 "public_addr": "192.168.61.41:6789/0"
33 },
34 {
35 "rank": 1,
36 "name": "node2",
37 "addr": "192.168.61.42:6789/0",
38 "public_addr": "192.168.61.42:6789/0"
39 },
40 {
41 "rank": 2,
42 "name": "node3",
43 "addr": "192.168.61.43:6789/0",
44 "public_addr": "192.168.61.43:6789/0"
45 }
46 ]
47 }
48}
MON使用Paxos算法进行选举,上面的输出中election_epoch
表示一共进行的投票轮次数量,quorum
和quorum_names
表示参与投票者的编号和名称,quorum_leader_name
表示当前的leader名称,rank表示每个MON节点的权重,权重越小在选举时月容易得到支持。
4.OSD节点命令 #
4.1 查看OSD节点状态 #
1ceph osd stat
2 osdmap e138: 3 osds: 3 up, 3 in
3 flags sortbitwise,require_jewel_osds,require_kraken_osds
使用ceph osd dump
可查看osd节点的详细信息。
4.2 查看OSD节点的分布 #
1ceph osd tree
2ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
3-1 0.27809 root default
4-2 0.09270 host node1
5 0 0.09270 osd.0 up 1.00000 1.00000
6-3 0.09270 host node2
7 1 0.09270 osd.1 up 1.00000 1.00000
8-4 0.09270 host node3
9 2 0.09270 osd.2 up 1.00000 1.00000
5.rbd镜像命令 #
5.1 删除rbd镜像 #
删除RBD镜像的命令是rbd rm {pool-name}/{image-name}
1rbd rm kube/kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175
删除时报错:
1......
2rbd: error: image still has watchers
3This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
查看正在使用的客户端:
1rados -p kube listwatchers kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175.rbd
2watcher=192.168.1.4:0/3418766042 client.284106 cookie=75
到客户端所在主机取消内核映射:
1rbd showmapped | grep ubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175
23 kube kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175 - /dev/rbd3
3
4rbd unmap /dev/rbd3
再次删除:
1rados -p kube listwatchers kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175.rbd
2rbd rm kube/kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175