模拟集群网络故障 - SAP HANA 开启 Amazon
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

模拟集群网络故障

描述 —模拟网络故障,以测试大脑分裂时的集群行为。

运行节点:可以在任何节点上运行。在这个测试用例中,这是在节点 B 上完成的。

运行步骤

  • 使用以下命令丢弃所有来自和流向节点 A 的流量:

    iptables -A INPUT -s <<Primary IP address of Node A>> -j DROP; iptables -A OUTPUT -d <<Primary IP address of Node A>> -j DROP
    [root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: prihana(version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Fri Jan 22 14:45:24 2021 Last change: Fri Jan 22 14:45:11 2021 by hacluster via crmd on sechana 2 nodes configured 6 resources configured Online: [ prihana sechana ] Full list of resources: clusterfence (stonith:fence_aws): Started prihana Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00] Started: [ prihana sechana ] Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00] Masters: [ prihana] Slaves: [ sechana ] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started prihana Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@ sechana ~]#sechana:~ # iptables -A INPUT -s xxx.xxx.xxx.xxx -j DROP; iptables -A OUTPUT -d xxx.xxx.xxx.xxx -j DROP

预期结果

  • 集群检测到网络故障并屏蔽节点 1。集群将辅助 SAP HANA 数据库(位于节点 2 上)提升为主数据库,使其接管为主数据库,而不会出现大脑分裂的情况。

    [root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Fri Jan 22 15:11:43 2021 Last change: Fri Jan 22 15:10:48 2021 by root via crm_attribute on sechana 2 nodes configured 6 resources configured Online: [ sechana ] OFFLINE: [ prihana] Full list of resources: clusterfence (stonith:fence_aws): Started sechana Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00] Started: [ sechana ] Stopped: [ prihana] Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00] Masters: [ sechana ] Stopped: [ prihana] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started sechana Failed Actions: * clusterfence_monitor_60000 on sechana 'unknown error' (1): call=-1, status=Timed Out, exitreason='', last-rc-change='Fri Jan 22 14:59:14 2021', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@sechana ~]#

恢复程序

  • 清理集群 “failed actions”。