Ceph运维常用命令

📅 2017-04-07 | 🖱️

1.集群管理 #

1.1 启动和停止各组件 #

查看当前节点的systemd unit服务：

1systemctl list-units 'ceph*' --type=service
2UNIT                           LOAD   ACTIVE SUB     DESCRIPTION
3ceph-mgr@node1.service         loaded active running Ceph cluster manager daemon
4ceph-mon@node1.service         loaded active running Ceph cluster monitor daemon
5ceph-osd@0.service             loaded active running Ceph object storage daemon osd.0
6ceph-radosgw@rgw.node1.service loaded active running Ceph rados gateway

查看某个服务是否开机启动：

1systemctl is-enabled [email protected]
2enabled

确认了相关的systemd unit后就可以使用systemd在Ceph集群各个节点上启动或停止各个Ceph组件。

1.2 查看集群状态 #

查看集群的概要状态：

 1ceph health
 2HEALTH_OK
 3
 4
 5ceph -s
 6    cluster 83d9e421-46bf-4d64-af15-af0e2c381b88
 7     health HEALTH_WARN
 8            clock skew detected on mon.node2
 9            Monitor clock skew detected
10     monmap e2: 3 mons at {node1=192.168.61.41:6789/0,node2=192.168.61.42:6789/0,node3=192.168.61.43:6789/0}
11            election epoch 114, quorum 0,1,2 node1,node2,node3
12        mgr active: node2 standbys: node1, node3
13     osdmap e138: 3 osds: 3 up, 3 in
14            flags sortbitwise,require_jewel_osds,require_kraken_osds
15      pgmap v9909: 152 pgs, 12 pools, 2686 bytes data, 214 objects
16            133 MB used, 284 GB / 284 GB avail
17                 152 active+clean

1.3 查看集群空间使用 #

 1ceph df
 2GLOBAL:
 3    SIZE     AVAIL     RAW USED     %RAW USED
 4    284G      284G         133M          0.05
 5POOLS:
 6    NAME                          ID     USED     %USED     MAX AVAIL     OBJECTS
 7    rbd                           0         0         0        97185M           0
 8    .rgw.root                     1      1681         0        97185M           4
 9    default.rgw.control           2         0         0        97185M           8
10    default.rgw.data.root         3       603         0        97185M           2
11    default.rgw.gc                4         0         0        97185M          32
12    default.rgw.lc                5         0         0        97185M          32
13    default.rgw.log               6         0         0        97185M         128
14    default.rgw.users.uid         7       350         0        97185M           2
15    default.rgw.users.email       8         8         0        97185M           1
16    default.rgw.users.keys        9         8         0        97185M           1
17    default.rgw.buckets.index     10        0         0        97185M           1
18    default.rgw.buckets.data      11       36         0        97185M           3

查看每个OSD的存储空间

1ceph osd df
2ID WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS
3 0 1.80170  1.00000 1844G 28903M 1816G 1.53 0.98 152
4 1 1.80170  1.00000 1844G 29313M 1816G 1.55 1.00 152
5 2 1.80170  1.00000 1844G 29873M 1815G 1.58 1.02 152
6              TOTAL 5534G 88089M 5448G 1.55
7MIN/MAX VAR: 0.98/1.02  STDDEV: 0.02

2.pool管理 #

2.1 删除pool #

1ceph osd pool delete poolname poolname --yes-i-really-really-mean-it

删除pool的命令比较危险，需要重复输入两次pool的名字，并且带上--yes-i-really-really-mean-it参数

2.2 设置Pool的配额 #

1ceph osd pool set-quota cephfs_data max_bytes $((100 * 1024 * 1024 * 1024))
2
3ceph osd pool get-quota cephfs_data
4quotas for pool 'cephfs_data':
5  max objects: N/A
6  max bytes  : 100GiB

取消配额限制只需要把对应值设为0即可。

3.MON节点命令 #

查看MON节点状态：

 1ceph quorum_status
 2
 3{
 4  "election_epoch": 114,
 5  "quorum": [
 6    0,
 7    1,
 8    2
 9  ],
10  "quorum_names": [
11    "node1",
12    "node2",
13    "node3"
14  ],
15  "quorum_leader_name": "node1",
16  "monmap": {
17    "epoch": 2,
18    "fsid": "83d9e421-46bf-4d64-af15-af0e2c381b88",
19    "modified": "2017-04-06 19:52:00.882973",
20    "created": "2017-04-06 19:51:47.569731",
21    "features": {
22      "persistent": [
23        "kraken"
24      ],
25      "optional": []
26    },
27    "mons": [
28      {
29        "rank": 0,
30        "name": "node1",
31        "addr": "192.168.61.41:6789/0",
32        "public_addr": "192.168.61.41:6789/0"
33      },
34      {
35        "rank": 1,
36        "name": "node2",
37        "addr": "192.168.61.42:6789/0",
38        "public_addr": "192.168.61.42:6789/0"
39      },
40      {
41        "rank": 2,
42        "name": "node3",
43        "addr": "192.168.61.43:6789/0",
44        "public_addr": "192.168.61.43:6789/0"
45      }
46    ]
47  }
48}

MON使用Paxos算法进行选举，上面的输出中election_epoch表示一共进行的投票轮次数量，quorum和quorum_names表示参与投票者的编号和名称，quorum_leader_name表示当前的leader名称，rank表示每个MON节点的权重，权重越小在选举时月容易得到支持。

4.OSD节点命令 #

4.1 查看OSD节点状态 #

1ceph osd stat
2     osdmap e138: 3 osds: 3 up, 3 in
3            flags sortbitwise,require_jewel_osds,require_kraken_osds

使用ceph osd dump可查看osd节点的详细信息。

4.2 查看OSD节点的分布 #

1ceph osd tree
2ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
3-1 0.27809 root default
4-2 0.09270     host node1
5 0 0.09270         osd.0       up  1.00000          1.00000
6-3 0.09270     host node2
7 1 0.09270         osd.1       up  1.00000          1.00000
8-4 0.09270     host node3
9 2 0.09270         osd.2       up  1.00000          1.00000

5.rbd镜像命令 #

5.1 删除rbd镜像 #

删除RBD镜像的命令是rbd rm {pool-name}/{image-name}

1rbd rm kube/kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175

删除时报错：

1......
2rbd: error: image still has watchers
3This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

查看正在使用的客户端：

1rados -p kube listwatchers kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175.rbd
2watcher=192.168.1.4:0/3418766042 client.284106 cookie=75

到客户端所在主机取消内核映射：

1rbd showmapped | grep ubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175
23  kube kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175 -    /dev/rbd3
3
4rbd unmap /dev/rbd3

再次删除：

1rados -p kube listwatchers kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175.rbd
2rbd rm kube/kubernetes-dynamic-pvc-d6f0802e-9dd5-11e7-a66f-1866da8c6175

参考 #

MONITORING A CLUSTER