警报和监控 - SAP HANA 开启 Amazon
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

警报和监控

本节涵盖以下主题。

使用 Amazon CloudWatch 应用程序见解

为了监控和查看集群状态和操作,Application Insights 包括用于监控入队复制状态的指标、集群指标以及 SAP 和高可用性检查。其他指标,例如 EFS 和 CPU 监控,也可以帮助进行根本原因分析。

有关更多信息,请参阅亚马逊 CloudWatch 应用程序见解入门SAP HANA 高可用性 EC2

使用集群警报代理

在集群配置中,您可以调用外部程序(警报代理)来处理警报。这是推送通知。它通过环境变量传递有关事件的信息。

然后,可以将代理配置为发送电子邮件、记录到文件、更新监控系统等。例如,以下脚本可用于访问 Amazon SNS。

#!/bin/sh # alert_sns.sh # modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample ############################################################################## # SETUP # * Create an SNS Topic and subscribe email or chatbot # * Note down the ARN for the SNS topic # * Give the IAM Role attached to both Instances permission to publish to the SNS Topic # * Ensure the aws cli is installed # * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes # * Ensure the permissions allow for hacluster and root to execute the script # * Run the following as root (modify file location if necessary and replace SNS ARN): # # SLES: # crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts }> # # RHEL: # pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" # pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts ############################################################################## # Additional information to send with the alerts node_name=`uname -n` sns_body=`env | grep CRM_alert_` # Required for SNS TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") # Get metadata REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}') sns_subscription_arn=${CRM_alert_recipient} # Format depending on alert type case ${CRM_alert_kind} in node) sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'" ;; fencing) sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}" ;; resource) if [ ${CRM_alert_interval} = "0" ]; then CRM_alert_interval="" else CRM_alert_interval=" (${CRM_alert_interval})" fi if [ ${CRM_alert_target_rc} = "0" ]; then CRM_alert_target_rc="" else CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})" fi case ${CRM_alert_desc} in Cancelled) ;; *) sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}" ;; esac ;; attribute) sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'" ;; *) sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert" ;; esac # Use this information to send the email aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}