场景: 当etcd集群中的一个节点, 由于主机故障, 被迫临时下线时, 为了保证etcd集群的健壮性, 会先删除掉该故障节点后, 再补位新的节点顶上去. 当故障机器经历了N天辛苦的修复后, 重新上线了, 此时需要将该台机器重新加回到etcd集群中
实验环境 在前面两篇文章的实验中, 一个原始的etcd集群如下
192.168.149.60
192.168.149.61
192.168.149.62
将 192.168.149.60
迁移到了 192.168.149.63
将 192.168.149.61
替换掉换成了 192.168.149.64
所以我这里的实验环境中, 已经有两台曾经服役过的etcd节点, 本篇文章将介绍如何将原节点重新加回到集群中
整体步骤
确认待添加的节点, etcd服务是停止的状态
删除etcd数据目录 member
执行运行时配置, 按正常添加一台新节点执行
更新配置文件, 启动etcd服务
Step 1: 确认状态 保证待添加节点的etcd服务状态为down
Step 2: 删除数据目录 1 2 # 去配置文件中, 找你的数据目录 > rm -fr $ETCD_DATA_DIR /member
Step 3: 添加节点 1 2 3 4 5 6 > ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member add lv-etcd-research-alpha-0 --peer-urls="http://192.168.149.60:2380" Member 2145c204a51dbbc7 added to cluster 2c25150e88501a13 ETCD_NAME="lv-etcd-research-alpha-0" ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-0=http://192.168.149.60:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
Step 4: 更新配置文件, 启动服务 按照上一步添加集群成员的回显, 修改配置文件(由于该台主机曾经就是etcd集群中的一员, 所以配置文件中仅需要修改回显中的关键参数即可)
修改完配置文件, 启动服务
集群状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 > ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member list 1161d5b4260241e3, started, lv-etcd-research-alpha-1, http://192.168.149.63:2380, http://192.168.149.63:2379 2145c204a51dbbc7, started, lv-etcd-research-alpha-0, http://192.168.149.60:2380, http://192.168.149.60:2379 4252aec339d438d9, started, lv-etcd-research-alpha-3, http://192.168.149.62:2380, http://192.168.149.62:2379 ea04db3353b9fd4e, started, lv-etcd-research-alpha-4, http://192.168.149.64:2380, http://192.168.149.64:2379 # 注意 --endpoints 参数中添加新的节点地址 http://192.168.149.60:2379 > ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379,http://192.168.149.60:2379,http://192.168.149.63:2379,http://192.168.149.64:2379 endpoint status -w table +----------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +----------------------------+------------------+---------+---------+-----------+-----------+------------+ | http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 21 MB | true | 8 | 148457 | | http://192.168.149.60:2379 | 2145c204a51dbbc7 | 3.2.28 | 21 MB | false | 8 | 148457 | | http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 21 MB | false | 8 | 148457 | | http://192.168.149.64:2379 | ea04db3353b9fd4e | 3.2.28 | 21 MB | false | 8 | 148457 | +----------------------------+------------------+---------+---------+-----------+-----------+------------+
第二个节点也依照此方法炮制
1 2 3 4 5 6 > ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member add lv-etcd-research-alpha-2 --peer-urls="http://192.168.149.61:2380" Member e26482910894af8d added to cluster 2c25150e88501a13 ETCD_NAME="lv-etcd-research-alpha-2" ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-0=http://192.168.149.60:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-2=http://192.168.149.61:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
1 2 3 4 5 6 > ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member list 1161d5b4260241e3, started, lv-etcd-research-alpha-1, http://192.168.149.63:2380, http://192.168.149.63:2379 2145c204a51dbbc7, started, lv-etcd-research-alpha-0, http://192.168.149.60:2380, http://192.168.149.60:2379 4252aec339d438d9, started, lv-etcd-research-alpha-3, http://192.168.149.62:2380, http://192.168.149.62:2379 e26482910894af8d, started, lv-etcd-research-alpha-2, http://192.168.149.61:2380, http://192.168.149.61:2379 ea04db3353b9fd4e, started, lv-etcd-research-alpha-4, http://192.168.149.64:2380, http://192.168.149.64:2379
1 2 3 4 5 6 7 8 9 10 ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379,http://192.168.149.60:2379,http://192.168.149.61:2379,http://192.168.149.63:2379,http://192.168.149.64:2379 endpoint status -w table +----------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +----------------------------+------------------+---------+---------+-----------+-----------+------------+ | http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 21 MB | true | 8 | 148909 | | http://192.168.149.60:2379 | 2145c204a51dbbc7 | 3.2.28 | 21 MB | false | 8 | 148909 | | http://192.168.149.61:2379 | e26482910894af8d | 3.2.28 | 21 MB | false | 8 | 148909 | | http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 21 MB | false | 8 | 148909 | | http://192.168.149.64:2379 | ea04db3353b9fd4e | 3.2.28 | 21 MB | false | 8 | 148909 | +----------------------------+------------------+---------+---------+-----------+-----------+------------+
总结 重新添加之前已删除的节点的关键之所在就是删除数据目录, 因为数据目录中还保存这节点ID和集群ID等信息, 带着这些信息是无法通过校验添加到集群的