模拟集群网络故障 - SAP HANA 开启 Amazon
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

模拟集群网络故障

描述--模拟网络故障,以测试大脑分裂时的集群行为。

运行节点:可以在任何节点上运行。在这个测试用例中,这是在节点 B 上完成的。

运行步骤

  • 使用以下命令丢弃所有来自和流向节点 A 的流量:

    iptables -A INPUT -s <<Primary IP address of Node A>> -j DROP;
    iptables -A OUTPUT -d <<Primary IP address of Node A>> -j DROP
[root@sechana ~] pcs status
Cluster name: rhelhanaha
Stack: corosync
Current DC: prihana(version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Fri Jan 22 14:45:24 2021
Last change: Fri Jan 22 14:45:11 2021 by hacluster via crmd on  sechana
2 nodes configured
6 resources configured
Online: [ prihana sechana ]
Full list of resources:
 clusterfence   (stonith:fence_aws):    Started prihana
 Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00]
     Started: [ prihana sechana ]
 Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00]
     Masters: [ prihana]
     Slaves: [ sechana ]
 hana-oip       (ocf::heartbeat:aws-vpc-move-ip):       Started prihana
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@ sechana ~]sechana:~  iptables -A INPUT -s xxx.xxx.xxx.xxx -j DROP;
iptables -A OUTPUT -d xxx.xxx.xxx.xxx -j DROP

预期结果

  • 集群检测到网络故障并屏蔽节点 1。集群将辅助 SAP HANA 数据库(位于节点 2 上)提升为主数据库,使其接管为主数据库,而不会出现大脑分裂的情况。

    [root@sechana ~] pcs status
    Cluster name: rhelhanaha
    Stack: corosync
    Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
    Last updated: Fri Jan 22 15:11:43 2021
    Last change: Fri Jan 22 15:10:48 2021 by root via crm_attribute on sechana
    2 nodes configured
    6 resources configured
    Online: [ sechana ]
    OFFLINE: [ prihana]
    Full list of resources:
     clusterfence   (stonith:fence_aws):    Started sechana
     Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00]
         Started: [ sechana ]
         Stopped: [ prihana]
     Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00]
         Masters: [ sechana ]
         Stopped: [ prihana]
     hana-oip       (ocf::heartbeat:aws-vpc-move-ip):       Started sechana
    Failed Actions:
    * clusterfence_monitor_60000 on sechana 'unknown error' (1): call=-1,
    status=Timed Out, exitreason='',
        last-rc-change='Fri Jan 22 14:59:14 2021', queued=0ms, exec=0ms
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@sechana ~]

恢复程序

  • 清理集群 “失败的操作”。