etcd 添加之前已删除过的节点

场景: 当etcd集群中的一个节点, 由于主机故障, 被迫临时下线时, 为了保证etcd集群的健壮性, 会先删除掉该故障节点后, 再补位新的节点顶上去. 当故障机器经历了N天辛苦的修复后, 重新上线了, 此时需要将该台机器重新加回到etcd集群中

实验环境

在前面两篇文章的实验中, 一个原始的etcd集群如下

  • 192.168.149.60
  • 192.168.149.61
  • 192.168.149.62

192.168.149.60 迁移到了 192.168.149.63

192.168.149.61 替换掉换成了 192.168.149.64

所以我这里的实验环境中, 已经有两台曾经服役过的etcd节点, 本篇文章将介绍如何将原节点重新加回到集群中

整体步骤

  • 确认待添加的节点, etcd服务是停止的状态
  • 删除etcd数据目录 member
  • 执行运行时配置, 按正常添加一台新节点执行
  • 更新配置文件, 启动etcd服务

Step 1: 确认状态

保证待添加节点的etcd服务状态为down

Step 2: 删除数据目录

1
2
# 去配置文件中, 找你的数据目录
> rm -fr $ETCD_DATA_DIR/member

Step 3: 添加节点

1
2
3
4
5
6
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member add lv-etcd-research-alpha-0 --peer-urls="http://192.168.149.60:2380"
Member 2145c204a51dbbc7 added to cluster 2c25150e88501a13

ETCD_NAME="lv-etcd-research-alpha-0"
ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-0=http://192.168.149.60:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Step 4: 更新配置文件, 启动服务

按照上一步添加集群成员的回显, 修改配置文件(由于该台主机曾经就是etcd集群中的一员, 所以配置文件中仅需要修改回显中的关键参数即可)

修改完配置文件, 启动服务

1
> systemctl start etcd

集群状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member list
1161d5b4260241e3, started, lv-etcd-research-alpha-1, http://192.168.149.63:2380, http://192.168.149.63:2379
2145c204a51dbbc7, started, lv-etcd-research-alpha-0, http://192.168.149.60:2380, http://192.168.149.60:2379
4252aec339d438d9, started, lv-etcd-research-alpha-3, http://192.168.149.62:2380, http://192.168.149.62:2379
ea04db3353b9fd4e, started, lv-etcd-research-alpha-4, http://192.168.149.64:2380, http://192.168.149.64:2379

# 注意 --endpoints 参数中添加新的节点地址 http://192.168.149.60:2379
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379,http://192.168.149.60:2379,http://192.168.149.63:2379,http://192.168.149.64:2379 endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 21 MB | true | 8 | 148457 |
| http://192.168.149.60:2379 | 2145c204a51dbbc7 | 3.2.28 | 21 MB | false | 8 | 148457 |
| http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 21 MB | false | 8 | 148457 |
| http://192.168.149.64:2379 | ea04db3353b9fd4e | 3.2.28 | 21 MB | false | 8 | 148457 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

第二个节点也依照此方法炮制

1
2
3
4
5
6
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member add lv-etcd-research-alpha-2 --peer-urls="http://192.168.149.61:2380"
Member e26482910894af8d added to cluster 2c25150e88501a13

ETCD_NAME="lv-etcd-research-alpha-2"
ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-0=http://192.168.149.60:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-2=http://192.168.149.61:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
1
2
3
4
5
6
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379 member list
1161d5b4260241e3, started, lv-etcd-research-alpha-1, http://192.168.149.63:2380, http://192.168.149.63:2379
2145c204a51dbbc7, started, lv-etcd-research-alpha-0, http://192.168.149.60:2380, http://192.168.149.60:2379
4252aec339d438d9, started, lv-etcd-research-alpha-3, http://192.168.149.62:2380, http://192.168.149.62:2379
e26482910894af8d, started, lv-etcd-research-alpha-2, http://192.168.149.61:2380, http://192.168.149.61:2379
ea04db3353b9fd4e, started, lv-etcd-research-alpha-4, http://192.168.149.64:2380, http://192.168.149.64:2379
1
2
3
4
5
6
7
8
9
10
ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.62:2379,http://192.168.149.60:2379,http://192.168.149.61:2379,http://192.168.149.63:2379,http://192.168.149.64:2379  endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 21 MB | true | 8 | 148909 |
| http://192.168.149.60:2379 | 2145c204a51dbbc7 | 3.2.28 | 21 MB | false | 8 | 148909 |
| http://192.168.149.61:2379 | e26482910894af8d | 3.2.28 | 21 MB | false | 8 | 148909 |
| http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 21 MB | false | 8 | 148909 |
| http://192.168.149.64:2379 | ea04db3353b9fd4e | 3.2.28 | 21 MB | false | 8 | 148909 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

总结

重新添加之前已删除的节点的关键之所在就是删除数据目录, 因为数据目录中还保存这节点ID和集群ID等信息, 带着这些信息是无法通过校验添加到集群的