使用Ansible部署etcd 3.2高可用集群

2017-06-05 阅读: etcd Ansible

之前写过一篇手动搭建etcd 3.1集群的文章《etcd 3.1 高可用集群搭建》,最近要初始化一套新的环境,考虑用ansible自动化部署整套环境, 先从部署etcd 3.2集群开始。

需要部署etcd的主机信息如下:

node1 192.168.61.11
node2 192.168.61.12
node3 192.168.61.13

1.配置管理项目目录结构

├── inventories
│   ├── staging
│   │   ├── group_vars
│   │   │   ├── all.yml
│   │   │   └── etcd-nodes.yml
│   │   ├── host_vars
│   │   │   ├── node1.yml
│   │   │   ├── node2.yml
│   │   │   └── node3.yml
│   │   └── hosts
│   └── production
├── roles
│   ├── common
│   │   ├── defaults
│   │   │   └── main.yml
│   │   └── tasks
│   │       └── main.yml
│   ├── etcd3
│       ├── defaults
│       │   └── main.yml
│       ├── files
│       │   └── make-ca-cert.sh
│       ├── meta
│       │   └── main.yml
│       ├── tasks
│       │   ├── create_etcd_user.yml
│       │   ├── etcd-restart.yml
│       │   ├── etcd-start.yml
│       │   ├── etcd-stop.yml
│       │   ├── gen-etcd-certs.yml
│       │   ├── gen-etcd-systemd.yml
│       │   ├── install_etcd_bin.yml
│       │   └── main.yml
│       └── templates
│           ├── etcd.conf.j2
│           └── etcd.service.j2
├── deploy-etcd3.yml

roles/etcd3/defaults/main.yml:


---

etcd_version: 3.2.0

etcd_download_url_base: "https://github.com/coreos/etcd/releases/download/v{{ etcd_version }}"
etcd_release: "etcd-v{{ etcd_version }}-linux-amd64" 
etcd_download_url: "{{ etcd_download_url_base }}/{{ etcd_release}}.tar.gz"

etcd_bin_path: /usr/bin
etcd_data_dir: /var/lib/etcd

etcd_conf_dir: /etc/etcd
etcd_certs_dir: "{{ etcd_conf_dir }}/ssl"
etcd_cert_group: root
etcd_ca_file: "{{ etcd_certs_dir }}/ca.crt"
etcd_cert_file: "{{ etcd_certs_dir }}/server.crt"
etcd_key_file: "{{ etcd_certs_dir }}/server.key"
etcd_peer_ca_file: "{{ etcd_certs_dir }}/ca.crt"
etcd_peer_cert_file: "{{ etcd_certs_dir }}/peer.crt"
etcd_peer_key_file: "{{ etcd_certs_dir }}/peer.key"
etcd_client_cert_file: "{{ etcd_certs_dir }}/client.crt"
etcd_client_key_file: "{{ etcd_certs_dir }}/client.key"

etcd_client_cert_auth: true
etcd_peer_client_cert_auth: true

etcd_client_port: 2379
etcd_peer_port: 2380


etcd_initial_cluster_state: new
etcd_initial_cluster_token: etcd-k8s-cluster


etcd_initial_advertise_peer_urls: "https://{{ etcd_machine_address }}:{{ etcd_peer_port }}"
etcd_listen_peer_urls: "https://{{ etcd_machine_address }}:{{ etcd_peer_port }}"
etcd_advertise_client_urls: "https://{{ etcd_machine_address }}:{{ etcd_client_port }}"
etcd_listen_client_urls: "https://{{ etcd_machine_address }}:2379,https://127.0.0.1:2379"

2.创建etcd用户和数据目录

创建etcd用户、用户组和数据目录。


- name: create system etcd group
  group:
    name: etcd
    state: present

- name: create system etcd user
  user:
    name: etcd
    comment: "etcd user"
    shell: /sbin/nologin
    state: present
    system: yes
    home: "{{ etcd_data_dir }}"
    groups: etcd

- name: ensure etcd_data_dir exists
  file:
    path: "{{ etcd_data_dir }}"
    recurse: yes
    state: directory
    owner: etcd
    group: etcd

3.下载和解压etcd

下载和解压缩etcd release tar包,并将可执行文件etcd, etcdctl拷贝到/usr/bin。


---

- name: set github s3 host on the first etcd server
  lineinfile: 
    dest: /etc/hosts 
    regexp: '.*github-production-release-asset-2e65be\.s3\.amazonaws\.com$' 
    line: "219.76.4.4 github-production-release-asset-2e65be.s3.amazonaws.com" 
    state: present
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  run_once: true
  
- name: check whether etcd release tar extracted on the first etcd server 
  stat: 
    path: "{{ ansible_temp_dir }}/{{ etcd_release }}"
  register: etcd_release_tar_check
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  run_once: true
  

- name: download etcd release tar file on first the etcd server 
  get_url:
    url: "{{ etcd_download_url }}"
    dest: "{{ ansible_temp_dir }}"
    validate_certs: no
    timeout: 20
  register: download_etcd
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  run_once: true
  when: not etcd_release_tar_check.stat.exists

- name: extract etcd tar file
  unarchive:
    src: "{{ download_etcd.dest }}"
    dest: "{{ ansible_temp_dir }}"
    remote_src: yes
  run_once: true
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  when: not etcd_release_tar_check.stat.exists
  
- name: fetch etcd bins from the first etcd server
  fetch:
    src: "{{ ansible_temp_dir }}/{{ etcd_release }}/{{ item }}"
    dest: "tmp/etcd3/{{ item }}"
    flat: yes
  register: fetch_etcd
  run_once: true
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  with_items:
    - etcd
    - etcdctl

- name: copy etcd binary
  copy:
    src: "tmp/etcd3/{{ item }}"
    dest: "{{ etcd_bin_path }}"
    owner: etcd
    group: etcd
    mode: 0750
  with_items:
    - etcd
    - etcdctl

4.生成并分发etcd TLS证书


---

- name: ensure etcd certs directory
  file:
    path: "{{ etcd_certs_dir }}"
    state: directory
    owner: etcd
    group: etcd
    mode: 0750
    recurse: yes
    
- name: copy make-ca-cert.sh
  copy:
    src: make-ca-cert.sh
    dest: "{{ etcd_certs_dir }}"
    owner: root
    group: root
    mode: "0500"
  run_once: true
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  
  
- name: gen certs on the first etcd server
  command:
    "{{ etcd_certs_dir }}/make-ca-cert.sh"
  args:
    creates: "{{ etcd_certs_dir }}/server.crt"
  run_once: true
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  environment:
    NODE_IPS: "{% for host in groups['etcd-nodes'] %}{{ hostvars[host]['etcd_machine_address'] }}{% if not loop.last %},{% endif %}{% endfor %}"
    NODE_DNS: "{{ groups['etcd-nodes']|join(',') }}"
    CERT_DIR: "{{ etcd_certs_dir }}"
    CERT_GROUP: "{{ etcd_cert_group }}"
    
- name: slurp etcd certs
  slurp:
    src: "{{ item }}"
  register: pki_certs
  run_once: true
  delegate_to: "{{ groups['etcd-nodes'][0] }}"
  with_items:
    - "{{ etcd_ca_file }}"
    - "{{ etcd_cert_file }}"
    - "{{ etcd_key_file }}"
    - "{{ etcd_peer_ca_file }}"
    - "{{ etcd_peer_cert_file }}"
    - "{{ etcd_peer_key_file }}"
    - "{{ etcd_client_cert_file }}"
    - "{{ etcd_client_key_file }}"
    
- name: copy etcd certs to other etcd servers
  copy:
    dest: "{{ item.item }}"
    content: "{{ item.content | b64decode }}"
    owner: etcd
    group: "{{ etcd_cert_group }}"
    mode: 0400
  with_items: "{{ pki_certs.results }}"
  when: inventory_hostname != groups['etcd-nodes'][0]


5.systemd和配置


---

- name: create etcd systemd unit file
  template: 
    src: etcd.service.j2
    dest: /etc/systemd/system/etcd.service
    
- name: create etcd env conf
  template: 
    src: etcd.conf.j2
    dest: /etc/etcd/etcd.conf
    owner: etcd
    group: etcd
    mode: 0540

6.启动etcd


---

- name: start etcd
  systemd:
    name: etcd
    daemon_reload: yes
    state: started
    enabled: yes

- name: restart etcd
  systemd:
    name: etcd
    state: restarted


7.查看集群状态

检查集群是否健康,在任一节点执行:

etcdctl \
  --ca-file=/etc/etcd/ssl/ca.crt \
  --cert-file=/etc/etcd/ssl/client.crt \
  --key-file=/etc/etcd/ssl/client.key \
  --endpoints=https://node1:2379,https://node2:2379,https://node3:2379 \
  cluster-health

member 1e3da2bf674fd07 is healthy: got healthy result from https://192.168.61.11:2379
member 88548a72a2e9a749 is healthy: got healthy result from https://192.168.61.13:2379
member c3bda13bf78ed2ab is healthy: got healthy result from https://192.168.61.12:2379
cluster is healthy
etcdctl \
  --ca-file=/etc/etcd/ssl/ca.crt \
  --cert-file=/etc/etcd/ssl/client.crt \
  --key-file=/etc/etcd/ssl/client.key \
  --endpoints=https://node1:2379,https://node2:2379,https://node3:2379 \
  member list

1e3da2bf674fd07: name=node1 peerURLs=https://192.168.61.11:2380 clientURLs=https://192.168.61.11:2379 isLeader=false
88548a72a2e9a749: name=node3 peerURLs=https://192.168.61.13:2380 clientURLs=https://192.168.61.13:2379 isLeader=false
c3bda13bf78ed2ab: name=node2 peerURLs=https://192.168.61.12:2380 clientURLs=https://192.168.61.12:2379 isLeader=true

附录源码

  • 2018/01/08 更新
    • 当时写这篇文档是在初次使用ansible初始化我们的Kubernetes集群之后做的记录。这篇文档也是参考Kubernetes github库中的kubernetes/contrib/ansible/roles/etcd/,只是官方的ansible考虑的内容比较全面,而我们线上环境都是CentOS 7的主机,所以当时参考官方的ansible role,写了一个在CentOS 7上用ansible部署etcd的精简版。
    • 好多朋友问源码在哪儿,今天把它从我们ansible项目中剥离处理,放到了github上,地址是:https://github.com/erichll/ansible-etcd3
标题:使用Ansible部署etcd 3.2高可用集群
本文链接:https://blog.frognew.com/2017/06/using-ansible-deploy-etcd-cluster.html
转载请注明出处。

目录