Simulating a cluster network failure - SAP NetWeaver on Amazon
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Simulating a cluster network failure

Description – simulate a network failure to test the cluster behavior in case of a split brain.

Run node – can be run on any node (this test case uses the secondary node).

Run steps

hahost02:~ # iptables -A INPUT -s x.x.x.x -j DROP; iptables -A OUTPUT -d x.x.x.x -j DROP

Notex.x.x.x is the IP address of hahost01.

Expected result – the cluster detects the network failure and fences one of the nodes to avoid a split brain situation.

hahost02:/etc/corosync # crm status Cluster Summary: * Stack: corosync * Current DC: hahost02(version 2.0.xxxxxxxxx) - partition with quorum * Last updated: * Last change: by hacluster via crm_attribute on hahost02 * 2 nodes configured * 7 resource instances configured Node List: * Online: [ hahost02 ] * OFFLINE: [ hahost01 ] Full List of Resources: * res_AWS_STONITH (stonith:external/ec2): Started hahost02 * Resource Group: grp_HA1_ASCS00: * rsc_IP_HA1_ASCS00 (ocf::suse:aws-vpc-move-ip): Started hahost02 * rsc_FS_HA1_ASCS00 (ocf::heartbeat:Filesystem): Started hahost02 * rsc_SAP_HA1_ASCS00 (ocf::heartbeat:SAPInstance): Started hahost02 * Resource Group: grp_HA1_ERS10: * rsc_IP_HA1_ERS10 (ocf::suse:aws-vpc-move-ip): Started hahost02 * rsc_FS_HA1_ERS10 (ocf::heartbeat:Filesystem): Started hahost02 * rsc_SAP_HA1_ERS10 (ocf::heartbeat:SAPInstance): Started hahost02

Recovery procedure

  • Log in to the Amazon Web Services Management Console and start the fenced node.

  • The cluster will move the ERS to the fenced node when the instance is up.