本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
模拟集群网络故障
描述 —模拟网络故障,以测试大脑分裂时的集群行为。
运行节点:可以在任何节点上运行。在这个测试用例中,这是在节点 B 上完成的。
运行步骤:
-
使用以下命令丢弃所有来自和流向节点 A 的流量:
iptables -A INPUT -s <<Primary IP address of Node A>> -j DROP; iptables -A OUTPUT -d <<Primary IP address of Node A>> -j DROP
[root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: prihana(version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Fri Jan 22 14:45:24 2021 Last change: Fri Jan 22 14:45:11 2021 by hacluster via crmd on sechana 2 nodes configured 6 resources configured Online: [ prihana sechana ] Full list of resources: clusterfence (stonith:fence_aws): Started prihana Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00] Started: [ prihana sechana ] Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00] Masters: [ prihana] Slaves: [ sechana ] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started prihana Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@ sechana ~]#sechana:~ # iptables -A INPUT -s xxx.xxx.xxx.xxx -j DROP; iptables -A OUTPUT -d xxx.xxx.xxx.xxx -j DROP
预期结果:
-
集群检测到网络故障并屏蔽节点 1。集群将辅助 SAP HANA 数据库(位于节点 2 上)提升为主数据库,使其接管为主数据库,而不会出现大脑分裂的情况。
[root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Fri Jan 22 15:11:43 2021 Last change: Fri Jan 22 15:10:48 2021 by root via crm_attribute on sechana 2 nodes configured 6 resources configured Online: [ sechana ] OFFLINE: [ prihana] Full list of resources: clusterfence (stonith:fence_aws): Started sechana Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00] Started: [ sechana ] Stopped: [ prihana] Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00] Masters: [ sechana ] Stopped: [ prihana] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started sechana Failed Actions: * clusterfence_monitor_60000 on sechana 'unknown error' (1): call=-1, status=Timed Out, exitreason='', last-rc-change='Fri Jan 22 14:59:14 2021', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@sechana ~]#
恢复程序:
-
清理集群 “
failed actions
”。