本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
警报和监控
本节涵盖以下主题。
使用 Amazon CloudWatch 应用程序见解
为了监控和查看集群状态和操作,Application Insights 包括用于监控入队复制状态的指标、集群指标以及 SAP 和高可用性检查。其他指标,例如 EFS 和 CPU 监控,也可以帮助进行根本原因分析。
有关更多信息,请参阅亚马逊 CloudWatch 应用程序见解入门和 SAP HANA 高可用性 EC2。
使用集群警报代理
在集群配置中,您可以调用外部程序(警报代理)来处理警报。这是推送通知。它通过环境变量传递有关事件的信息。
然后,可以将代理配置为发送电子邮件、记录到文件、更新监控系统等。例如,以下脚本可用于访问 Amazon SNS。
#!/bin/sh # alert_sns.sh # modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample ############################################################################## # SETUP # * Create an SNS Topic and subscribe email or chatbot # * Note down the ARN for the SNS topic # * Give the IAM Role attached to both Instances permission to publish to the SNS Topic # * Ensure the aws cli is installed # * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes # * Ensure the permissions allow for hacluster and root to execute the script # * Run the following as root (modify file location if necessary and replace SNS ARN): # # SLES: # crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts }> # # RHEL: # pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" # pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts ############################################################################## # Additional information to send with the alerts node_name=`uname -n` sns_body=`env | grep CRM_alert_` # Required for SNS TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") # Get metadata REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}') sns_subscription_arn=${CRM_alert_recipient} # Format depending on alert type case ${CRM_alert_kind} in node) sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'" ;; fencing) sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}" ;; resource) if [ ${CRM_alert_interval} = "0" ]; then CRM_alert_interval="" else CRM_alert_interval=" (${CRM_alert_interval})" fi if [ ${CRM_alert_target_rc} = "0" ]; then CRM_alert_target_rc="" else CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})" fi case ${CRM_alert_desc} in Cancelled) ;; *) sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}" ;; esac ;; attribute) sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'" ;; *) sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert" ;; esac # Use this information to send the email aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}