Kubernetes 1.8关于资源使用情况的metrics(例如容器的CPU和内存),可以通过Metrics API获取到。 前面在做Kubernetes 1.11的升级工作时,Kubernetes 1.11已经废弃heapster那套监控的东东。因此是时候了解一下Kubernetes的Metrics API和Metrics Server了。

Metrics Server的安装

Metrics API的URI是/apis/metrics.k8s.io/,扩展了Kubernetes的核心API,因此在往集群中部署metrics-server之前需要确认Kubernetes集群配置了Aggregation Layer(聚合层),具体参考这里Configure the Aggregation Layer

这里使用Heml来部署metrics-server,chat模板的值文件metrics-server.yaml如下:

1args:
2- --logtostderr
3- --kubelet-insecure-tls

将其安装到Kubernetes的kube-system名字空间:

1helm install stable/metrics-server \
2-n metrics-server \
3--namespace kube-system \
4-f metrics-server.yaml

部署完成后使用下面的命令查看node相关的指标:

1 kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
2{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[]}

没有获取到信息,此时查看metric-server容器的日志,有下面的错误:

1E1003 05:46:13.757009       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node1: unable to fetch metrics from Kubelet node1 (node1): Get https://node1:10250/stats/summary/: dial tcp: lookup node1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:node2: unable to fetch metrics from Kubelet node2 (node2): Get https://node2:10250/stats/summary/: dial tcp: lookup node2 on 10.96.0.10:53: read udp 10.244.1.6:45288->10.96.0.10:53: i/o timeout]

可以看到metrics-server在从kubelet的10250端口获取信息时,使用的是hostname,而因为node1和node2是一个独立的Kubernetes演示环境,只是修改了这两个节点系统的/etc/hosts文件,而并没有内网的DNS服务器,所以metrics-server中不认识node1和node2的名字。这里我们可以直接修改Kubernetes集群中的coredns的configmap,修改Corefile加入hostnames插件,将Kubernetes的各个节点的主机名加入到hostnames中,这样Kubernetes集群中的所有Pod都可以从CoreDNS中解析各个节点的名字。

 1kubectl edit configmap coredns -n kube-system
 2
 3apiVersion: v1
 4data:
 5  Corefile: |
 6    .:53 {
 7        errors
 8        health
 9        hosts {
10           192.168.61.11 node1
11           192.168.61.12 node2
12           fallthrough
13        }
14        kubernetes cluster.local in-addr.arpa ip6.arpa {
15           pods insecure
16           upstream
17           fallthrough in-addr.arpa ip6.arpa
18        }
19        prometheus :9153
20        proxy . /etc/resolv.conf
21        cache 30
22        loop
23        reload
24        loadbalance
25    }
26kind: ConfigMap

配置修改完毕后重启集群中coredns和metrics-server,确认metrics-server不再有错误日志。

1kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"

Metrics API

Metrics Server从Kubernetes集群中每个Node上kubelet的API收集metrics数据。通过Metrics API可以获取Kubernetes资源的Metrics指标,Metrics API挂载/apis/metrics.k8s.io/ 下。 可以使用kubectl top命令访问Metrics API,例如:

1kubectl top node
2NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
3node1   140m         14%    1285Mi          73%
4node2   37m          3%     458Mi           26%
 1kubectl top pod --all-namespaces
 2NAMESPACE       NAME                                             CPU(cores)   MEMORY(bytes)
 3ingress-nginx   nginx-ingress-controller-77fc55d6dd-hmlmt        3m           90Mi
 4ingress-nginx   nginx-ingress-controller-77fc55d6dd-htms6        2m           84Mi
 5ingress-nginx   nginx-ingress-default-backend-684f76869d-pxlmz   1m           1Mi
 6kube-system     coredns-576cbf47c7-mlfcd                         2m           13Mi
 7kube-system     coredns-576cbf47c7-xgqdd                         2m           10Mi
 8kube-system     etcd-node1                                       13m          86Mi
 9kube-system     kube-apiserver-node1                             23m          514Mi
10kube-system     kube-controller-manager-node1                    26m          54Mi
11kube-system     kube-flannel-ds-amd64-8rcq4                      1m           18Mi
12kube-system     kube-flannel-ds-amd64-mhx9t                      2m           14Mi
13kube-system     kube-proxy-nljs8                                 2m           30Mi
14kube-system     kube-proxy-pjdsj                                 2m           19Mi
15kube-system     kube-scheduler-node1                             10m          18Mi
16kube-system     kubernetes-dashboard-5746dd4544-gtj65            1m           28Mi
17kube-system     metrics-server-8854b78d9-nx9tx                   1m           12Mi
18kube-system     tiller-deploy-6f6fd74b68-mc2cw                   1m           27Mi

参考