kubeadm是Kubernetes官方提供的用于快速安装Kubernetes集群的工具,伴随Kubernetes每个版本的发布都会同步更新,kubeadm会对集群配置方面的一些实践做调整,通过实验kubeadm可以学习到Kubernetes官方在集群配置上一些新的最佳实践。

最近发布的Kubernetes 1.13中,kubeadm的主要特性已经GA了,但还不包含高可用,不过说明kubeadm可在生产环境中使用的距离越来越近了。

Area Maturity Level
Command line UX GA
Implementation GA
Config file API beta
CoreDNS GA
kubeadm alpha subcommands alpha
High availability alpha
DynamicKubeletConfig alpha
Self-hosting alpha

当然我们线上稳定运行的Kubernetes集群是使用ansible以二进制形式的部署的高可用集群,这里体验Kubernetes 1.13中的kubeadm是为了跟随官方对集群初始化和配置方面的最佳实践,进一步完善我们的ansible部署脚本。

1.准备

1.1系统配置

在安装之前,需要先做如下准备。两台CentOS 7.4主机如下:

1cat /etc/hosts
2192.168.61.11 node1
3192.168.61.12 node2

如果各个主机启用了防火墙,需要开放Kubernetes各个组件所需要的端口,可以查看Installing kubeadm中的"Check required ports"一节。 这里简单起见在各节点禁用防火墙:

1systemctl stop firewalld
2systemctl disable firewalld

禁用SELINUX:

1setenforce 0
1vi /etc/selinux/config
2SELINUX=disabled

创建/etc/sysctl.d/k8s.conf文件,添加如下内容:

1net.bridge.bridge-nf-call-ip6tables = 1
2net.bridge.bridge-nf-call-iptables = 1
3net.ipv4.ip_forward = 1

执行命令使修改生效。

1modprobe br_netfilter
2sysctl -p /etc/sysctl.d/k8s.conf

1.2kube-proxy开启ipvs的前置条件

由于ipvs已经加入到了内核的主干,所以为kube-proxy开启ipvs的前提需要加载以下的内核模块:

1ip_vs
2ip_vs_rr
3ip_vs_wrr
4ip_vs_sh
5nf_conntrack_ipv4

在所有的Kubernetes节点node1和node2上执行以下脚本:

1cat > /etc/sysconfig/modules/ipvs.modules <<EOF
2#!/bin/bash
3modprobe -- ip_vs
4modprobe -- ip_vs_rr
5modprobe -- ip_vs_wrr
6modprobe -- ip_vs_sh
7modprobe -- nf_conntrack_ipv4
8EOF
9chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

上面脚本创建了的/etc/sysconfig/modules/ipvs.modules文件,保证在节点重启后能自动加载所需模块。 使用lsmod | grep -e ip_vs -e nf_conntrack_ipv4命令查看是否已经正确加载所需的内核模块。

接下来还需要确保各个节点上已经安装了ipset软件包yum install ipset。 为了便于查看ipvs的代理规则,最好安装一下管理工具ipvsadm yum install ipvsadm

如果以上前提条件如果不满足,则即使kube-proxy的配置开启了ipvs模式,也会退回到iptables模式。

1.3安装Docker

Kubernetes从1.6开始使用CRI(Container Runtime Interface)容器运行时接口。默认的容器运行时仍然是Docker,使用的是kubelet中内置dockershim CRI实现。

安装docker的yum源:

1yum install -y yum-utils device-mapper-persistent-data lvm2
2yum-config-manager \
3    --add-repo \
4    https://download.docker.com/linux/centos/docker-ce.repo

查看最新的Docker版本:

 1yum list docker-ce.x86_64  --showduplicates |sort -r
 2docker-ce.x86_64            3:18.09.0-3.el7                     docker-ce-stable
 3docker-ce.x86_64            18.06.1.ce-3.el7                    docker-ce-stable
 4docker-ce.x86_64            18.06.0.ce-3.el7                    docker-ce-stable
 5docker-ce.x86_64            18.03.1.ce-1.el7.centos             docker-ce-stable
 6docker-ce.x86_64            18.03.0.ce-1.el7.centos             docker-ce-stable
 7docker-ce.x86_64            17.12.1.ce-1.el7.centos             docker-ce-stable
 8docker-ce.x86_64            17.12.0.ce-1.el7.centos             docker-ce-stable
 9docker-ce.x86_64            17.09.1.ce-1.el7.centos             docker-ce-stable
10docker-ce.x86_64            17.09.0.ce-1.el7.centos             docker-ce-stable
11docker-ce.x86_64            17.06.2.ce-1.el7.centos             docker-ce-stable
12docker-ce.x86_64            17.06.1.ce-1.el7.centos             docker-ce-stable
13docker-ce.x86_64            17.06.0.ce-1.el7.centos             docker-ce-stable
14docker-ce.x86_64            17.03.3.ce-1.el7                    docker-ce-stable
15docker-ce.x86_64            17.03.2.ce-1.el7.centos             docker-ce-stable
16docker-ce.x86_64            17.03.1.ce-1.el7.centos             docker-ce-stable
17docker-ce.x86_64            17.03.0.ce-1.el7.centos             docker-ce-stable

Kubernetes 1.12已经针对Docker的1.11.1, 1.12.1, 1.13.1, 17.03, 17.06, 17.09, 18.06等版本做了验证,需要注意Kubernetes 1.12最低支持的Docker版本是1.11.1。Kubernetes 1.13对Docker的版本依赖方面没有变化。 我们这里在各节点安装docker的18.06.1版本。

1yum makecache fast
2
3yum install -y --setopt=obsoletes=0 \
4  docker-ce-18.06.1.ce-3.el7
5
6systemctl start docker
7systemctl enable docker

确认一下iptables filter表中FOWARD链的默认策略(pllicy)为ACCEPT。

 1iptables -nvL
 2Chain INPUT (policy ACCEPT 263 packets, 19209 bytes)
 3 pkts bytes target     prot opt in     out     source               destination
 4
 5Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 6 pkts bytes target     prot opt in     out     source               destination
 7    0     0 DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
 8    0     0 DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
 9    0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
10    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
11    0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
12    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0

Docker从1.13版本开始调整了默认的防火墙规则,禁用了iptables filter表中FOWARD链,这样会引起Kubernetes集群中跨Node的Pod无法通信。但这里通过安装docker 1806,发现默认策略又改回了ACCEPT,这个不知道是从哪个版本改回的,因为我们线上版本使用的1706还是需要手动调整这个策略的。

2.使用kubeadm部署Kubernetes

2.1 安装kubeadm和kubelet

下面在各节点安装kubeadm和kubelet:

 1cat <<EOF > /etc/yum.repos.d/kubernetes.repo
 2[kubernetes]
 3name=Kubernetes
 4baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
 5enabled=1
 6gpgcheck=1
 7repo_gpgcheck=1
 8gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
 9        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
10EOF

测试地址https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64是否可用,如果不可用需要科学上网。

1curl https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
1yum makecache fast
2yum install -y kubelet kubeadm kubectl
3
4... 
5Installed:
6  kubeadm.x86_64 0:1.13.0-0                                    kubectl.x86_64 0:1.13.0-0                                                           kubelet.x86_64 0:1.13.0-0
7
8Dependency Installed:
9  cri-tools.x86_64 0:1.12.0-0                                  kubernetes-cni.x86_64 0:0.6.0-0                                                       socat.x86_64 0:1.7.3.2-2.el7
  • 从安装结果可以看出还安装了cri-tools, kubernetes-cni, socat三个依赖: * 官方从Kubernetes 1.9开始就将cni依赖升级到了0.6.0版本,在当前1.12中仍然是这个版本 * socat是kubelet的依赖 * cri-tools是CRI(Container Runtime Interface)容器运行时接口的命令行工具

运行kubelet --help可以看到原来kubelet的绝大多数命令行flag参数都被DEPRECATED了,如:

1......
2--address 0.0.0.0   The IP address for the Kubelet to serve on (set to 0.0.0.0 for all IPv4 interfaces and `::` for all IPv6 interfaces) (default 0.0.0.0) (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)
3......

而官方推荐我们使用--config指定配置文件,并在配置文件中指定原来这些flag所配置的内容。具体内容可以查看这里Set Kubelet parameters via a config file。这也是Kubernetes为了支持动态Kubelet配置(Dynamic Kubelet Configuration)才这么做的,参考Reconfigure a Node’s Kubelet in a Live Cluster

kubelet的配置文件必须是json或yaml格式,具体可查看这里

Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。

关闭系统的Swap方法如下:

1swapoff -a

修改 /etc/fstab 文件,注释掉 SWAP 的自动挂载,使用free -m确认swap已经关闭。 swappiness参数调整,修改/etc/sysctl.d/k8s.conf添加下面一行:

1vm.swappiness=0

执行sysctl -p /etc/sysctl.d/k8s.conf使修改生效。

因为这里本次用于测试两台主机上还运行其他服务,关闭swap可能会对其他服务产生影响,所以这里修改kubelet的配置去掉这个限制。 之前的Kubernetes版本我们都是通过kubelet的启动参数--fail-swap-on=false去掉这个限制的。前面已经分析了Kubernetes不再推荐使用启动参数,而推荐使用配置文件。 所以这里我们改成配置文件配置的形式。

查看/etc/systemd/system/kubelet.service.d/10-kubeadm.conf,看到了下面的内容:

 1# Note: This dropin only works with kubeadm and kubelet v1.11+
 2[Service]
 3Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
 4Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
 5# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
 6EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
 7# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
 8# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
 9EnvironmentFile=-/etc/sysconfig/kubelet
10ExecStart=
11ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

上面显示kubeadm部署的kubelet的配置文件--config=/var/lib/kubelet/config.yaml,实际去查看/var/lib/kubelet和这个config.yaml的配置文件都没有被创建。 可以猜想肯定是运行kubeadm初始化集群时会自动生成这个配置文件,而如果我们不关闭Swap的话,第一次初始化集群肯定会失败的。

所以还是老老实实的回到使用kubelet的启动参数--fail-swap-on=false去掉必须关闭Swap的限制。 修改/etc/sysconfig/kubelet,加入:

1KUBELET_EXTRA_ARGS=--fail-swap-on=false

2.2 使用kubeadm init初始化集群

在各节点开机启动kubelet服务:

1systemctl enable kubelet.service

接下来使用kubeadm初始化集群,选择node1作为Master Node,在node1上执行下面的命令:

1kubeadm init \
2  --kubernetes-version=v1.13.0 \
3  --pod-network-cidr=10.244.0.0/16 \
4  --apiserver-advertise-address=192.168.61.11

因为我们选择flannel作为Pod网络插件,所以上面的命令指定–pod-network-cidr=10.244.0.0/16。

执行时报了下面的错误:

1[init] using Kubernetes version: v1.13.0
2[preflight] running pre-flight checks
3[preflight] Some fatal errors occurred:
4        [ERROR Swap]: running with swap on is not supported. Please disable swap
5[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

有一个错误信息是running with swap on is not supported. Please disable swap。因为我们决定配置failSwapOn: false,所以重新添加–ignore-preflight-errors=Swap参数忽略这个错误,重新运行。

 1kubeadm init \
 2   --kubernetes-version=v1.13.0 \
 3   --pod-network-cidr=10.244.0.0/16 \
 4   --apiserver-advertise-address=192.168.61.11 \
 5   --ignore-preflight-errors=Swap
 6
 7[init] Using Kubernetes version: v1.13.0
 8[preflight] Running pre-flight checks
 9        [WARNING Swap]: running with swap on is not supported. Please disable swap
10[preflight] Pulling images required for setting up a Kubernetes cluster
11[preflight] This might take a minute or two, depending on the speed of your internet connection
12[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
13[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
14[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
15[kubelet-start] Activating the kubelet service
16[certs] Using certificateDir folder "/etc/kubernetes/pki"
17[certs] Generating "ca" certificate and key
18[certs] Generating "apiserver-kubelet-client" certificate and key
19[certs] Generating "apiserver" certificate and key
20[certs] apiserver serving cert is signed for DNS names [node1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.61.11]
21[certs] Generating "front-proxy-ca" certificate and key
22[certs] Generating "front-proxy-client" certificate and key
23[certs] Generating "etcd/ca" certificate and key
24[certs] Generating "etcd/healthcheck-client" certificate and key
25[certs] Generating "etcd/server" certificate and key
26[certs] etcd/server serving cert is signed for DNS names [node1 localhost] and IPs [192.168.61.11 127.0.0.1 ::1]
27[certs] Generating "etcd/peer" certificate and key
28[certs] etcd/peer serving cert is signed for DNS names [node1 localhost] and IPs [192.168.61.11 127.0.0.1 ::1]
29[certs] Generating "apiserver-etcd-client" certificate and key
30[certs] Generating "sa" key and public key
31[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
32[kubeconfig] Writing "admin.conf" kubeconfig file
33[kubeconfig] Writing "kubelet.conf" kubeconfig file
34[kubeconfig] Writing "controller-manager.conf" kubeconfig file
35[kubeconfig] Writing "scheduler.conf" kubeconfig file
36[control-plane] Using manifest folder "/etc/kubernetes/manifests"
37[control-plane] Creating static Pod manifest for "kube-apiserver"
38[control-plane] Creating static Pod manifest for "kube-controller-manager"
39[control-plane] Creating static Pod manifest for "kube-scheduler"
40[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
41[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
42[apiclient] All control plane components are healthy after 19.506551 seconds
43[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
44[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
45[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node1" as an annotation
46[mark-control-plane] Marking the node node1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
47[mark-control-plane] Marking the node node1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
48[bootstrap-token] Using token: 702gz5.49zhotgsiyqimwqw
49[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
50[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
51[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
52[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
53[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
54[addons] Applied essential addon: CoreDNS
55[addons] Applied essential addon: kube-proxy
56
57Your Kubernetes master has initialized successfully!
58
59To start using your cluster, you need to run the following as a regular user:
60
61  mkdir -p $HOME/.kube
62  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
63  sudo chown $(id -u):$(id -g) $HOME/.kube/config
64
65You should now deploy a pod network to the cluster.
66Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
67  https://kubernetes.io/docs/concepts/cluster-administration/addons/
68
69You can now join any number of machines by running the following on each node
70as root:
71
72  kubeadm join 192.168.61.11:6443 --token 702gz5.49zhotgsiyqimwqw --discovery-token-ca-cert-hash sha256:2bc50229343849e8021d2aa19d9d314539b40ec7a311b5bb6ca1d3cd10957c2f

上面记录了完成的初始化输出的内容,根据输出的内容基本上可以看出手动初始化安装一个Kubernetes集群所需要的关键步骤。

其中有以下关键内容:

  • [kubelet-start] 生成kubelet的配置文件"/var/lib/kubelet/config.yaml"
  • [certificates]生成相关的各种证书
  • [kubeconfig]生成相关的kubeconfig文件
  • [bootstraptoken]生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
  • 下面的命令是配置常规用户如何使用kubectl访问集群:
    1mkdir -p $HOME/.kube
    2sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    3sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
  • 最后给出了将节点加入集群的命令kubeadm join 192.168.61.11:6443 --token 702gz5.49zhotgsiyqimwqw --discovery-token-ca-cert-hash sha256:2bc50229343849e8021d2aa19d9d314539b40ec7a311b5bb6ca1d3cd10957c2f

查看一下集群状态:

1kubectl get cs
2NAME                 STATUS    MESSAGE              ERROR
3controller-manager   Healthy   ok
4scheduler            Healthy   ok
5etcd-0               Healthy   {"health": "true"}

确认个组件都处于healthy状态。

集群初始化如果遇到问题,可以使用下面的命令进行清理:

1kubeadm reset
2ifconfig cni0 down
3ip link delete cni0
4ifconfig flannel.1 down
5ip link delete flannel.1
6rm -rf /var/lib/cni/

2.3 安装Pod Network

接下来安装flannel network add-on:

 1mkdir -p ~/k8s/
 2cd ~/k8s
 3wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
 4kubectl apply -f  kube-flannel.yml
 5
 6clusterrole.rbac.authorization.k8s.io/flannel created
 7clusterrolebinding.rbac.authorization.k8s.io/flannel created
 8serviceaccount/flannel created
 9configmap/kube-flannel-cfg created
10daemonset.extensions/kube-flannel-ds-amd64 created
11daemonset.extensions/kube-flannel-ds-arm64 created
12daemonset.extensions/kube-flannel-ds-arm created
13daemonset.extensions/kube-flannel-ds-ppc64le created
14daemonset.extensions/kube-flannel-ds-s390x created

这里注意kube-flannel.yml这个文件里的flannel的镜像是0.10.0,quay.io/coreos/flannel:v0.10.0-amd64

如果Node有多个网卡的话,参考flannel issues 39701,目前需要在kube-flannel.yml中使用--iface参数指定集群主机内网网卡的名称,否则可能会出现dns无法解析。需要将kube-flannel.yml下载到本地,flanneld启动参数加上--iface=<iface-name>

 1......
 2containers:
 3      - name: kube-flannel
 4        image: quay.io/coreos/flannel:v0.10.0-amd64
 5        command:
 6        - /opt/bin/flanneld
 7        args:
 8        - --ip-masq
 9        - --kube-subnet-mgr
10        - --iface=eth1
11......

使用kubectl get pod --all-namespaces -o wide确保所有的Pod都处于Running状态。

 1kubectl get pod --all-namespaces -o wide
 2NAMESPACE     NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE
 3kube-system   coredns-576cbf47c7-njt7l        1/1     Running   0          12m    10.244.0.3      node1   <none>
 4kube-system   coredns-576cbf47c7-vg2gd        1/1     Running   0          12m    10.244.0.2      node1   <none>
 5kube-system   etcd-node1                      1/1     Running   0          12m    192.168.61.11   node1   <none>
 6kube-system   kube-apiserver-node1            1/1     Running   0          12m    192.168.61.11   node1   <none>
 7kube-system   kube-controller-manager-node1   1/1     Running   0          12m    192.168.61.11   node1   <none>
 8kube-system   kube-flannel-ds-amd64-bxtqh     1/1     Running   0          2m     192.168.61.11   node1   <none>
 9kube-system   kube-proxy-fb542                1/1     Running   0          12m    192.168.61.11   node1   <none>
10kube-system   kube-scheduler-node1            1/1     Running   0          12m    192.168.61.11   node1   <none>

2.4 master node参与工作负载

使用kubeadm初始化的集群,出于安全考虑Pod不会被调度到Master Node上,也就是说Master Node不参与工作负载。这是因为当前的master节点node1被打上了node-role.kubernetes.io/master:NoSchedule的污点:

1kubectl describe node node1 | grep Taint
2Taints:             node-role.kubernetes.io/master:NoSchedule

因为这里搭建的是测试环境,去掉这个污点使node1参与工作负载:

1kubectl taint nodes node1 node-role.kubernetes.io/master-
2node "node1" untainted

2.5 测试DNS

1kubectl run curl --image=radial/busyboxplus:curl -it
2kubectl run --generator=deployment/apps.v1beta1 is DEPRECATED and will be removed in a future version. Use kubectl create instead.
3If you don't see a command prompt, try pressing enter.
4[ root@curl-5cc7b478b6-r997p:/ ]$ 

进入后执行nslookup kubernetes.default确认解析正常:

1nslookup kubernetes.default
2Server:    10.96.0.10
3Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
4
5Name:      kubernetes.default
6Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

2.6 向Kubernetes集群中添加Node节点

下面我们将node2这个主机添加到Kubernetes集群中,因为我们同样在node2上的kubelet的启动参数中去掉了必须关闭swap的限制,所以同样需要--ignore-preflight-errors=Swap这个参数。 在node2上执行:

 1kubeadm join 192.168.61.11:6443 --token 702gz5.49zhotgsiyqimwqw --discovery-token-ca-cert-hash sha256:2bc50229343849e8021d2aa19d9d314539b40ec7a311b5bb6ca1d3cd10957c2f \
 2 --ignore-preflight-errors=Swap
 3
 4[preflight] Running pre-flight checks
 5        [WARNING Swap]: running with swap on is not supported. Please disable swap
 6[discovery] Trying to connect to API Server "192.168.61.11:6443"
 7[discovery] Created cluster-info discovery client, requesting info from "https://192.168.61.11:6443"
 8[discovery] Requesting info from "https://192.168.61.11:6443" again to validate TLS against the pinned public key
 9[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.61.11:6443"
10[discovery] Successfully established connection with API Server "192.168.61.11:6443"
11[join] Reading configuration from the cluster...
12[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
13[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
14[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
15[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
16[kubelet-start] Activating the kubelet service
17[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
18[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node2" as an annotation
19
20This node has joined the cluster:
21* Certificate signing request was sent to apiserver and a response was received.
22* The Kubelet was informed of the new secure connection details.
23
24Run 'kubectl get nodes' on the master to see this node join the cluster.

node2加入集群很是顺利,下面在master节点上执行命令查看集群中的节点:

1kubectl get nodes
2NAME    STATUS   ROLES    AGE    VERSION
3node1   Ready    master   16m    v1.13.0
4node2   Ready    <none>   4m5s   v1.13.0

如何从集群中移除Node

如果需要从集群中移除node2这个Node执行下面的命令:

在master节点上执行:

1kubectl drain node2 --delete-local-data --force --ignore-daemonsets
2kubectl delete node node2

在node2上执行:

1kubeadm reset
2ifconfig cni0 down
3ip link delete cni0
4ifconfig flannel.1 down
5ip link delete flannel.1
6rm -rf /var/lib/cni/

在node1上执行:

1kubectl delete node node2

2.7 kube-proxy开启ipvs

修改ConfigMap的kube-system/kube-proxy中的config.conf,mode: "ipvs"

1kubectl edit cm kube-proxy -n kube-system

之后重启各个节点上的kube-proxy pod:

1kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'
 1kubectl get pod -n kube-system | grep kube-proxy
 2kube-proxy-pf55q                1/1     Running   0          9s
 3kube-proxy-qjnnc                1/1     Running   0          14s
 4
 5kubectl logs kube-proxy-pf55q -n kube-system
 6I1208 06:12:23.516444       1 server_others.go:189] Using ipvs Proxier.
 7W1208 06:12:23.516738       1 proxier.go:365] IPVS scheduler not specified, use rr by default
 8I1208 06:12:23.516840       1 server_others.go:216] Tearing down inactive rules.
 9I1208 06:12:23.575222       1 server.go:464] Version: v1.13.0
10I1208 06:12:23.585142       1 conntrack.go:52] Setting nf_conntrack_max to 131072
11I1208 06:12:23.586203       1 config.go:202] Starting service config controller
12I1208 06:12:23.586243       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
13I1208 06:12:23.586269       1 config.go:102] Starting endpoints config controller
14I1208 06:12:23.586275       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
15I1208 06:12:23.686959       1 controller_utils.go:1034] Caches are synced for endpoints config controller
16I1208 06:12:23.687056       1 controller_utils.go:1034] Caches are synced for service config controller

日志中打印出了Using ipvs Proxier,说明ipvs模式已经开启。

3.Kubernetes常用组件部署

越来越多的公司和团队开始使用Helm这个Kubernetes的包管理器,我们也将使用Helm安装Kubernetes的常用组件。

3.1 Helm的安装

Helm由客户端命helm令行工具和服务端tiller组成,Helm的安装十分简单。 下载helm命令行工具到master节点node1的/usr/local/bin下,这里下载的2.12.0版本:

1wget https://storage.googleapis.com/kubernetes-helm/helm-v2.12.0-linux-amd64.tar.gz
2tar -zxvf helm-v2.12.0-linux-amd64.tar.gz
3cd linux-amd64/
4cp helm /usr/local/bin/

为了安装服务端tiller,还需要在这台机器上配置好kubectl工具和kubeconfig文件,确保kubectl工具可以在这台机器上访问apiserver且正常使用。 这里的node1节点以及配置好了kubectl。

因为Kubernetes APIServer开启了RBAC访问控制,所以需要创建tiller使用的service account: tiller并分配合适的角色给它。 详细内容可以查看helm文档中的Role-based Access Control。 这里简单起见直接分配cluster-admin这个集群内置的ClusterRole给它。创建rbac-config.yaml文件:

 1apiVersion: v1
 2kind: ServiceAccount
 3metadata:
 4  name: tiller
 5  namespace: kube-system
 6---
 7apiVersion: rbac.authorization.k8s.io/v1beta1
 8kind: ClusterRoleBinding
 9metadata:
10  name: tiller
11roleRef:
12  apiGroup: rbac.authorization.k8s.io
13  kind: ClusterRole
14  name: cluster-admin
15subjects:
16  - kind: ServiceAccount
17    name: tiller
18    namespace: kube-system
1kubectl create -f rbac-config.yaml
2serviceaccount/tiller created
3clusterrolebinding.rbac.authorization.k8s.io/tiller created

接下来使用helm部署tiller:

 1helm init --service-account tiller --skip-refresh
 2Creating /root/.helm
 3Creating /root/.helm/repository
 4Creating /root/.helm/repository/cache
 5Creating /root/.helm/repository/local
 6Creating /root/.helm/plugins
 7Creating /root/.helm/starters
 8Creating /root/.helm/cache/archive
 9Creating /root/.helm/repository/repositories.yaml
10Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
11Adding local repo with URL: http://127.0.0.1:8879/charts
12$HELM_HOME has been configured at /root/.helm.
13
14Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
15
16Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
17To prevent this, run `helm init` with the --tiller-tls-verify flag.
18For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
19Happy Helming!

tiller默认被部署在k8s集群中的kube-system这个namespace下:

1kubectl get pod -n kube-system -l app=helm
2NAME                            READY   STATUS    RESTARTS   AGE
3tiller-deploy-c4fd4cd68-dwkhv   1/1     Running   0          83s
1helm version
2Client: &version.Version{SemVer:"v2.12.0", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}
3Server: &version.Version{SemVer:"v2.12.0", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}

注意由于某些原因需要网络可以访问gcr.io和kubernetes-charts.storage.googleapis.com,如果无法访问可以通过helm init --service-account tiller --tiller-image <your-docker-registry>/tiller:v2.11.0 --skip-refresh使用私有镜像仓库中的tiller镜像

3.2 使用Helm部署Nginx Ingress

为了便于将集群中的服务暴露到集群外部,从集群外部访问,接下来使用Helm将Nginx Ingress部署到Kubernetes上。 Nginx Ingress Controller被部署在Kubernetes的边缘节点上,关于Kubernetes边缘节点的高可用相关的内容可以查看我前面整理的Bare metal环境下Kubernetes Ingress边缘节点的高可用(基于IPVS)

我们将node1(192.168.61.11)和node2(192.168.61.12)同时做为边缘节点,打上Label:

 1kubectl label node node1 node-role.kubernetes.io/edge=
 2node/node1 labeled
 3
 4kubectl label node node2 node-role.kubernetes.io/edge=
 5node/node2 labeled
 6
 7kubectl get node
 8NAME    STATUS   ROLES         AGE   VERSION
 9node1   Ready    edge,master   24m   v1.13.0
10node2   Ready    edge          11m   v1.13.0

stable/nginx-ingress chart的值文件ingress-nginx.yaml:

 1controller:
 2  replicaCount: 2
 3  service:
 4    externalIPs:
 5      - 192.168.61.10
 6  nodeSelector:
 7    node-role.kubernetes.io/edge: ''
 8  affinity:
 9    podAntiAffinity:
10        requiredDuringSchedulingIgnoredDuringExecution:
11        - labelSelector:
12            matchExpressions:
13            - key: app 
14              operator: In
15              values:
16              - nginx-ingress
17            - key: component
18              operator: In
19              values:
20              - controller
21          topologyKey: kubernetes.io/hostname
22  tolerations:
23      - key: node-role.kubernetes.io/master
24        operator: Exists
25        effect: NoSchedule
26
27defaultBackend:
28  nodeSelector:
29    node-role.kubernetes.io/edge: ''
30  tolerations:
31      - key: node-role.kubernetes.io/master
32        operator: Exists
33        effect: NoSchedule

nginx ingress controller的副本数replicaCount为2,将被调度到node1和node2这两个边缘节点上。externalIPs指定的192.168.61.10为VIP,将绑定到kube-proxy kube-ipvs0网卡上。

1helm repo update
2
3helm install stable/nginx-ingress \
4-n nginx-ingress \
5--namespace ingress-nginx  \
6-f ingress-nginx.yaml
1kubectl get pod -n ingress-nginx -o wide
2NAME                                             READY   STATUS    RESTARTS   AGE    IP           NODE    NOMINATED NODE   READINESS GATES
3nginx-ingress-controller-85f8597fc6-g2kcx        1/1     Running   0          5m2s   10.244.1.3   node2   <none>           <none>
4nginx-ingress-controller-85f8597fc6-g7pp5        1/1     Running   0          5m2s   10.244.0.5   node1   <none>           <none>
5nginx-ingress-default-backend-6dc6c46dcc-7plm8   1/1     Running   0          5m2s   10.244.1.4   node2   <none>           <none>

如果访问http://192.168.61.10返回default backend,则部署完成。

实际测试的结果是无法访问,于是怀疑kube-proxy出了问题,查看kube-proxy的日志,不停的刷下面的log:

 1I1208 07:59:28.902970       1 graceful_termination.go:160] Trying to delete rs: 10.104.110.193:80/TCP/10.244.1.5:80
 2I1208 07:59:28.903037       1 graceful_termination.go:170] Deleting rs: 10.104.110.193:80/TCP/10.244.1.5:80
 3I1208 07:59:28.903072       1 graceful_termination.go:160] Trying to delete rs: 10.104.110.193:80/TCP/10.244.0.6:80
 4I1208 07:59:28.903105       1 graceful_termination.go:170] Deleting rs: 10.104.110.193:80/TCP/10.244.0.6:80
 5I1208 07:59:28.903713       1 graceful_termination.go:160] Trying to delete rs: 192.168.61.10:80/TCP/10.244.1.5:80
 6I1208 07:59:28.903764       1 graceful_termination.go:170] Deleting rs: 192.168.61.10:80/TCP/10.244.1.5:80
 7I1208 07:59:28.903798       1 graceful_termination.go:160] Trying to delete rs: 192.168.61.10:80/TCP/10.244.0.6:80
 8I1208 07:59:28.903824       1 graceful_termination.go:170] Deleting rs: 192.168.61.10:80/TCP/10.244.0.6:80
 9I1208 07:59:28.904654       1 graceful_termination.go:160] Trying to delete rs: 10.0.2.15:31698/TCP/10.244.0.6:80
10I1208 07:59:28.904837       1 graceful_termination.go:170] Deleting rs: 10.0.2.15:31698/TCP/10.244.0.6:80

在Kubernetes的Github上找到了这个ISSUE https://github.com/kubernetes/kubernetes/issues/71071,大致是最近更新的IPVS proxier mode now support connection based graceful termination.引入了bug,导致Kubernetes的1.11.5、1.12.1~1.12.3、1.13.0都有这个问题,即kube-proxy在ipvs模式下不可用。而官方称在1.11.5、1.12.3、1.13.0中修复了12月4日k8s的特权升级漏洞(CVE-2018-1002105),如果针对这个漏洞做k8s升级的同学,需要小心,确认是否开启了ipvs,避免由升级引起k8s网络问题。由于我们线上的版本是1.11并且已经启用了ipvs,所以这里我们只能先把线上master node升级到了1.11.5,而kube-proxy还在使用1.11.4的版本。

https://github.com/kubernetes/kubernetes/issues/71071中已经描述有相关PR解决这个问题,后续只能跟踪一下1.11.5、1.12.3、1.13.0之后的小版本了。

3.3 使用kubeadm将集群从1.13.0升级到1.13.1

2018/12/18更新。最近Kubernetes官方发布了v1.11.6、v1.12.4和v1.13.1。这里使用kubeadm将个人本地用于测试的集群从1.13.0升级到1.13.1,对https://github.com/kubernetes/kubernetes/issues/71071的问题做一个初步的验证。

使用kubeadm upgrade plan查看可用的更新版本,确认已经可以使用kubeadm升级到1.13.1。

 1kubeadm upgrade plan
 2[preflight] Running pre-flight checks.
 3[upgrade] Making sure the cluster is healthy:
 4[upgrade/config] Making sure the configuration is correct:
 5[upgrade/config] Reading configuration from the cluster...
 6[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
 7[upgrade] Fetching available versions to upgrade to
 8[upgrade/versions] Cluster version: v1.13.0
 9[upgrade/versions] kubeadm version: v1.13.0
10I1217 10:58:38.493151    9935 version.go:94] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable.txt": Get https://storage.googleapis.com/kubernetes-release/release/stable.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
11I1217 10:58:38.493200    9935 version.go:95] falling back to the local client version: v1.13.0
12[upgrade/versions] Latest stable version: v1.13.0
13[upgrade/versions] Latest version in the v1.13 series: v1.13.1
14
15Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
16COMPONENT   CURRENT       AVAILABLE
17Kubelet     2 x v1.13.0   v1.13.1
18
19Upgrade to the latest version in the v1.13 series:
20
21COMPONENT            CURRENT   AVAILABLE
22API Server           v1.13.0   v1.13.1
23Controller Manager   v1.13.0   v1.13.1
24Scheduler            v1.13.0   v1.13.1
25Kube Proxy           v1.13.0   v1.13.1
26CoreDNS              1.2.6     1.2.6
27Etcd                 3.2.24    3.2.24
28
29You can now apply the upgrade by executing the following command:
30
31        kubeadm upgrade apply v1.13.1
32
33Note: Before you can perform this upgrade, you have to update kubeadm to v1.13.1.

首先将本地的kubeadm和kubelet升级到1.13.1:

1yum install kubeadm-1.13.1-0
2yum install kubelet-1.13.1-0

升级kubelet后注意修改/etc/sysconfig/kubelet,加入:

1KUBELET_EXTRA_ARGS=--fail-swap-on=false

执行kubeadm upgrade apply v1.13.1进行升级,重启各个节点上kubelet。

1kubectl get node
2NAME    STATUS   ROLES         AGE     VERSION
3node1   Ready    edge,master   10d23h   v1.13.1
4node2   Ready    edge          10d23h   v1.13.1

访问http://192.168.61.10返回default backend,确认升级后ingress nginx可用。

查看kube-proxy日志,https://github.com/kubernetes/kubernetes/issues/71071的问题依旧。

3.4 使用kubeadm将集群从1.13.1升级到1.13.2

2019/01/11更新。最近Kubernetes官方发布了v1.13.2。这里使用kubeadm将个人本地用于测试的集群从1.13.1升级到1.13.2,对https://github.com/kubernetes/kubernetes/issues/71071的问题做验证。 “kube-proxy在ipvs mode下工作几小时后停止更新ipvs规则,导致新Pod无法被访问”的问题已经解决。

https://github.com/kubernetes/kubernetes/issues/71071中描述了 1.11.7,1.12.5,1.13.2版本已经解决了这个问题。

3.5 使用Helm部署dashboard

kubernetes-dashboard.yaml:

 1image:
 2  repository: k8s.gcr.io/kubernetes-dashboard-amd64
 3  tag: v1.10.1
 4ingress:
 5  enabled: true
 6  hosts: 
 7    - k8s.frognew.com
 8  annotations:
 9    nginx.ingress.kubernetes.io/ssl-redirect: "true"
10    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
11  tls:
12    - secretName: frognew-com-tls-secret
13      hosts:
14      - k8s.frognew.com
15rbac:
16  clusterAdminRole: true
1helm install stable/kubernetes-dashboard \
2-n kubernetes-dashboard \
3--namespace kube-system  \
4-f kubernetes-dashboard.yaml
 1kubectl -n kube-system get secret | grep kubernetes-dashboard-token
 2kubernetes-dashboard-token-pkm2s                 kubernetes.io/service-account-token   3      3m7s
 3
 4kubectl describe -n kube-system secret/kubernetes-dashboard-token-pkm2s
 5Name:         kubernetes-dashboard-token-pkm2s
 6Namespace:    kube-system
 7Labels:       <none>
 8Annotations:  kubernetes.io/service-account.name: kubernetes-dashboard
 9              kubernetes.io/service-account.uid: 2f0781dd-156a-11e9-b0f0-080027bb7c43
10
11Type:  kubernetes.io/service-account-token
12
13Data
14====
15ca.crt:     1025 bytes
16namespace:  11 bytes
17token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi1wa20ycyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjJmMDc4MWRkLTE1NmEtMTFlOS1iMGYwLTA4MDAyN2JiN2M0MyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.24ad6ZgZMxdydpwlmYAiMxZ9VSIN7dDR7Q6-RLW0qC81ajXoQKHAyrEGpIonfld3gqbE0xO8nisskpmlkQra72-9X6sBPoByqIKyTsO83BQlME2sfOJemWD0HqzwSCjvSQa0x-bUlq9HgH2vEXzpFuSS6Svi7RbfzLXlEuggNoC4MfA4E2hF1OX_ml8iAKx-49y1BQQe5FGWyCyBSi1TD_-ZpVs44H5gIvsGK2kcvi0JT4oHXtWjjQBKLIWL7xxyRCSE4HmUZT2StIHnOwlX7IEIB0oBX4mPg2_xNGnqwcu-8OERU9IoqAAE2cZa0v3b5O2LMcJPrcxrVOukvRIumA

在dashboard的登录窗口使用上面的token登录。

dashboard

3.4 使用Helm部署metrics-server

从Heapster的github https://github.com/kubernetes/heapster中可以看到已经,heapster已经DEPRECATED。这里heapster的deprecation timeline。可以看出heapster从Kubernetes 1.12开始将从Kubernetes各种安装脚本中移除。

Kubernetes推荐使用metrics-server(https://github.com/kubernetes-incubator/metrics-server)。我们这里也使用helm来部署metrics-server。

metrics-server.yaml:

1args:
2- --logtostderr
3- --kubelet-insecure-tls
4- --kubelet-preferred-address-types=InternalIP
1helm install stable/metrics-server \
2-n metrics-server \
3--namespace kube-system \
4-f metrics-server.yaml

使用下面的命令可以获取到关于集群节点基本的指标信息:

1kubectl top node
2NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
3node1   650m         32%    1276Mi          73%
4node2   73m          3%     527Mi           30%
 1kubectl top pod --all-namespaces
 2NAMESPACE       NAME                                             CPU(cores)   MEMORY(bytes)
 3ingress-nginx   nginx-ingress-controller-6f5687c58d-jdxzk        3m           142Mi
 4ingress-nginx   nginx-ingress-controller-6f5687c58d-lxj5q        5m           146Mi
 5ingress-nginx   nginx-ingress-default-backend-6dc6c46dcc-lf882   1m           4Mi
 6kube-system     coredns-86c58d9df4-k5jkh                         2m           15Mi
 7kube-system     coredns-86c58d9df4-rw6tt                         3m           23Mi
 8kube-system     etcd-node1                                       20m          86Mi
 9kube-system     kube-apiserver-node1                             33m          468Mi
10kube-system     kube-controller-manager-node1                    29m          89Mi
11kube-system     kube-flannel-ds-amd64-8nr5j                      2m           13Mi
12kube-system     kube-flannel-ds-amd64-bmncz                      2m           21Mi
13kube-system     kube-proxy-d5gxv                                 2m           18Mi
14kube-system     kube-proxy-zm29n                                 2m           16Mi
15kube-system     kube-scheduler-node1                             8m           28Mi
16kube-system     kubernetes-dashboard-788c98d699-qd2cx            2m           16Mi
17kube-system     metrics-server-68785fbcb4-k4g9v                  3m           12Mi
18kube-system     tiller-deploy-c4fd4cd68-dwkhv                    1m           24Mi

遗憾的是,当前Kubernetes Dashboard还不支持metrics-server。因此如果使用metrics-server替代了heapster,将无法在dashboard中以图形展示Pod的内存和CPU情况(实际上这也不是很重要,当前我们是在Prometheus和Grafana中定制的Kubernetes集群中各个Pod的监控,因此在dashboard中查看Pod内存和CPU也不是很重要)。 Dashboard的github上有很多这方面的讨论,如https://github.com/kubernetes/dashboard/issues/3217https://github.com/kubernetes/dashboard/issues/3270,Dashboard已经准备在将来的某个时间点支持metrics-server。但由于metrics-server和metrics pipeline肯定是Kubernetes在monitor方面未来的方向,所以我们也很果断的在各个环境中切换到了metrics-server。

4.总结

本次安装涉及到的Docker镜像:

 1# kubernetes
 2k8s.gcr.io/kube-apiserver:v1.13.2
 3k8s.gcr.io/kube-controller-manager:v1.13.2
 4k8s.gcr.io/kube-proxy:v1.13.2
 5k8s.gcr.io/kube-scheduler:v1.13.2
 6k8s.gcr.io/kube-proxy:v1.13.1
 7k8s.gcr.io/etcd:3.2.24
 8k8s.gcr.io/pause:3.1
 9
10
11# network and dns
12quay.io/coreos/flannel:v0.10.1-amd64
13k8s.gcr.io/coredns:1.2.6
14
15
16# helm and tiller
17gcr.io/kubernetes-helm/tiller:v2.12.0
18
19# nginx ingress
20quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0
21k8s.gcr.io/defaultbackend:1.4
22
23# dashboard and metric-sever
24k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
25gcr.io/google_containers/metrics-server-amd64:v0.3.1

参考