将现有 CloudWatch 警报配置为创建 OpsItems(以编程方式) - Amazon Systems Manager
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

将现有 CloudWatch 警报配置为创建 OpsItems(以编程方式)

您可以将 Amazon CloudWatch 警报配置为使用 Amazon Command Line Interface(Amazon CLI)、Amazon CloudFormation 模板或 Java 代码段以编程方式创建 OpsItems。

开始前的准备工作

如果您以编程方式编辑现有警报或创建用于创建 OpsItems 的警报,则必须指定 Amazon 资源名称(ARN)。此 ARN 将 Systems Manager OpsCenter 标识为通过告警创建的 OpsItems 目标。您可以自定义 ARN,使通过告警创建的 OpsItems 包含特定信息,如严重性或类别。每个 ARN 包含下表中描述的信息。

参数 详细信息

Region(必填)

告警所在的 Amazon Web Services 区域。例如:us-west-2。有关如何在 Amazon Web Services 区域 中使用OpsCenter 的更多信息,请参阅 Amazon Systems Manager 端点和配额

account_ID(必填)

创建告警时使用的同一 Amazon Web Services 账户 ID。例如:123456789012。账户 ID 必须后跟冒号(:)和参数 opsitem,如以下示例所示。

severity(必填)

由用户定义的通过告警创建的 OpsItems 的严重性级别。有效值:1234

Category(可选)

通过告警创建的 OpsItems 的类别。有效值:AvailabilityCostPerformanceRecoverySecurity

使用以下句法创建 ARN。此 ARN 不包括可选的 Category 参数。

arn:aws:ssm:Region:account_ID:opsitem:severity

以下为示例。

arn:aws:ssm:us-west-2:123456789012:opsitem:3

要创建使用可选的 Category 参数的 ARN,请使用以下句法:

arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name

以下为示例。

arn:aws:ssm:us-west-2:123456789012:opsitem:3#CATEGORY=Security

将 CloudWatch 警报配置为创建 OpsItems(Amazon CLI)

此命令要求您为 alarm-actions 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

将 CloudWatch 警报配置为创建 OpsItems(Amazon CLI)
  1. 安装并配置 Amazon Command Line Interface (Amazon CLI)(如果尚未执行该操作)。

    有关信息,请参阅安装或更新 Amazon CLI 的最新版本

  2. 运行以下命令以收集要配置的告警的信息。

    aws cloudwatch describe-alarms --alarm-names "alarm name"
  3. 运行以下命令以更新告警。将每个示例资源占位符替换为您自己的信息。

    aws cloudwatch put-metric-alarm --alarm-name name \ --alarm-description "description" \ --metric-name name --namespace namespace \ --statistic statistic --period value --threshold value \ --comparison-operator value \ --dimensions "dimensions" --evaluation-periods value \ --alarm-actions arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name \ --unit unit

    以下为示例。

    Linux & macOS
    aws cloudwatch put-metric-alarm --alarm-name cpu-mon \ --alarm-description "Alarm when CPU exceeds 70 percent" \ --metric-name CPUUtilization --namespace AWS/EC2 \ --statistic Average --period 300 --threshold 70 \ --comparison-operator GreaterThanThreshold \ --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 \ --alarm-actions arn:aws:ssm:us-east-1:123456789012:opsitem:3#CATEGORY=Security \ --unit Percent
    Windows
    aws cloudwatch put-metric-alarm --alarm-name cpu-mon ^ --alarm-description "Alarm when CPU exceeds 70 percent" ^ --metric-name CPUUtilization --namespace AWS/EC2 ^ --statistic Average --period 300 --threshold 70 ^ --comparison-operator GreaterThanThreshold ^ --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 ^ --alarm-actions arn:aws:ssm:us-east-1:123456789012:opsitem:3#CATEGORY=Security ^ --unit Percent

将 CloudWatch 警报配置为创建或更新 OpsItems(CloudFormation)

本部分包括多个 Amazon CloudFormation 模板,您可以使用这些模板将 CloudWatch 警报配置为自动创建或更新 OpsItems。每个模板均要求您为 AlarmActions 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

指标警报 - 使用以下 CloudFormation 模板创建或更新 CloudWatch 指标警报。此模板中指定的警报用于监控 Amazon Elastic Compute Cloud(Amazon EC2)实例状态检查。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

{ "AWSTemplateFormatVersion": "2010-09-09", "Parameters" : { "RecoveryInstance" : { "Description" : "The EC2 instance ID to associate this alarm with.", "Type" : "AWS::EC2::Instance::Id" } }, "Resources": { "RecoveryTestAlarm": { "Type": "AWS::CloudWatch::Alarm", "Properties": { "AlarmDescription": "Run a recovery action when instance status check fails for 15 consecutive minutes.", "Namespace": "AWS/EC2" , "MetricName": "StatusCheckFailed_System", "Statistic": "Minimum", "Period": "60", "EvaluationPeriods": "15", "ComparisonOperator": "GreaterThanThreshold", "Threshold": "0", "AlarmActions": [ {"Fn::Join" : ["", ["arn:arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name", { "Ref" : "AWS::Partition" }, ":ssm:", { "Ref" : "AWS::Region" }, { "Ref" : "AWS:: AccountId" }, ":opsitem:3" ]]} ], "Dimensions": [{"Name": "InstanceId","Value": {"Ref": "RecoveryInstance"}}] } } } }

复合警报 - 使用以下 CloudFormation 模板创建或更新复合警报。复合告警由多个指标告警组成。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

"Resources":{ "HighResourceUsage":{ "Type":"AWS::CloudWatch::CompositeAlarm", "Properties":{ "AlarmName":"HighResourceUsage", "AlarmRule":"(ALARM(HighCPUUsage) OR ALARM(HighMemoryUsage)) AND NOT ALARM(DeploymentInProgress)", "AlarmActions":"arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name", "AlarmDescription":"Indicates that the system resource usage is high while no known deployment is in progress" }, "DependsOn":[ "DeploymentInProgress", "HighCPUUsage", "HighMemoryUsage" ] }, "DeploymentInProgress":{ "Type":"AWS::CloudWatch::CompositeAlarm", "Properties":{ "AlarmName":"DeploymentInProgress", "AlarmRule":"FALSE", "AlarmDescription":"Manually updated to TRUE/FALSE to disable other alarms" } }, "HighCPUUsage":{ "Type":"AWS::CloudWatch::Alarm", "Properties":{ "AlarmDescription":"CPUusageishigh", "AlarmName":"HighCPUUsage", "ComparisonOperator":"GreaterThanThreshold", "EvaluationPeriods":1, "MetricName":"CPUUsage", "Namespace":"CustomNamespace", "Period":60, "Statistic":"Average", "Threshold":70, "TreatMissingData":"notBreaching" } }, "HighMemoryUsage":{ "Type":"AWS::CloudWatch::Alarm", "Properties":{ "AlarmDescription":"Memoryusageishigh", "AlarmName":"HighMemoryUsage", "ComparisonOperator":"GreaterThanThreshold", "EvaluationPeriods":1, "MetricName":"MemoryUsage", "Namespace":"CustomNamespace", "Period":60, "Statistic":"Average", "Threshold":65, "TreatMissingData":"breaching" } } }

将 CloudWatch 警报配置为创建或更新 OpsItems(Java)

本部分包括多个 Java 代码段,您可以使用这些代码段将 CloudWatch 警报配置为自动创建或更新 OpsItems。每个代码段均要求您为 validSsmActionStr 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

特定警报 - 使用以下 Java 代码段创建或更新 CloudWatch 警报。此模板中指定的告警用于监控 Amazon EC2 实例状态检查。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

import com.amazonaws.services.cloudwatch.AmazonCloudWatch; import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder; import com.amazonaws.services.cloudwatch.model.ComparisonOperator; import com.amazonaws.services.cloudwatch.model.Dimension; import com.amazonaws.services.cloudwatch.model.PutMetricAlarmRequest; import com.amazonaws.services.cloudwatch.model.PutMetricAlarmResult; import com.amazonaws.services.cloudwatch.model.StandardUnit; import com.amazonaws.services.cloudwatch.model.Statistic; private void putMetricAlarmWithSsmAction() { final AmazonCloudWatch cw = AmazonCloudWatchClientBuilder.defaultClient(); Dimension dimension = new Dimension() .withName("InstanceId") .withValue(instanceId); String validSsmActionStr = "arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name"; PutMetricAlarmRequest request = new PutMetricAlarmRequest() .withAlarmName(alarmName) .withComparisonOperator( ComparisonOperator.GreaterThanThreshold) .withEvaluationPeriods(1) .withMetricName("CPUUtilization") .withNamespace("AWS/EC2") .withPeriod(60) .withStatistic(Statistic.Average) .withThreshold(70.0) .withActionsEnabled(false) .withAlarmDescription( "Alarm when server CPU utilization exceeds 70%") .withUnit(StandardUnit.Seconds) .withDimensions(dimension) .withAlarmActions(validSsmActionStr); PutMetricAlarmResult response = cw.putMetricAlarm(request); }

更新所有警报 - 使用以下 Java 代码段更新您的 Amazon Web Services 账户 中的所有 CloudWatch 警报,以便在警报进入 ALARM 状态时创建 OpsItems。

import com.amazonaws.services.cloudwatch.AmazonCloudWatch; import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder; import com.amazonaws.services.cloudwatch.model.DescribeAlarmsRequest; import com.amazonaws.services.cloudwatch.model.DescribeAlarmsResult; import com.amazonaws.services.cloudwatch.model.MetricAlarm; private void listMetricAlarmsAndAddSsmAction() { final AmazonCloudWatch cw = AmazonCloudWatchClientBuilder.defaultClient(); boolean done = false; DescribeAlarmsRequest request = new DescribeAlarmsRequest(); String validSsmActionStr = "arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name"; while(!done) { DescribeAlarmsResult response = cw.describeAlarms(request); for(MetricAlarm alarm : response.getMetricAlarms()) { // assuming there are no alarm actions added for the metric alarm alarm.setAlarmActions(ImmutableList.of(validSsmActionStr)); } request.setNextToken(response.getNextToken()); if(response.getNextToken() == null) { done = true; } } }