Monitor your policies using Amazon CloudWatch
You can monitor your Amazon Data Lifecycle Manager lifecycle policies using CloudWatch, which collects raw data and processes it into readable, near real-time metrics. You can use these metrics to see exactly how many Amazon EBS snapshots and EBS-backed AMIs are created, deleted, and copied by your policies over time. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met.
Metrics are kept for a period of 15 months, so that you can access historical information and gain a better understanding of how your lifecycle policies perform over an extended period.
For more information about Amazon CloudWatch, see the Amazon CloudWatch User Guide.
Topics
Supported metrics
The Data Lifecycle Manager
namespace includes the following metrics for Amazon Data Lifecycle Manager lifecycle
policies. The supported metrics differ by policy type.
All metrics can be measured on the DLMPolicyId
dimension. The most useful statistics are
sum
and average
, and the unit of measure is count
.
Choose a tab to view the metrics supported by that policy type.
View CloudWatch metrics for your policies
You can use the Amazon Web Services Management Console or the command line tools to list the metrics that Amazon Data Lifecycle Manager sends to Amazon CloudWatch.
Graph metrics for your policies
After you create a policy, you can open the Amazon EC2 console and view the monitoring graphs for the policy on the Monitoring tab. Each graph is based on one of the available Amazon EC2 metrics.
The following graphs metrics are available:
-
Resources targeted (based on
ResourcesTargeted
) -
Snapshot creation started (based on
SnapshotsCreateStarted
) -
Snapshot creation completed (based on
SnapshotsCreateCompleted
) -
Snapshot creation failed (based on
SnapshotsCreateFailed
) -
Snapshot sharing completed (based on
SnapshotsSharedCompleted
) -
Snapshot deletion completed (based on
SnapshotsDeleteCompleted
) -
Snapshot deletion failed (based on
SnapshotsDeleteFailed
) -
Snapshot cross-Region copy started (based on
SnapshotsCopiedRegionStarted
) -
Snapshot cross-Region copy completed (based on
SnapshotsCopiedRegionCompleted
) -
Snapshot cross-Region copy failed (based on
SnapshotsCopiedRegionFailed
) -
Snapshot cross-Region copy deletion completed (based on
SnapshotsCopiedRegionDeleteCompleted
) -
Snapshot cross-Region copy deletion failed (based on
SnapshotsCopiedRegionDeleteFailed
) -
Snapshot cross-account copy started (based on
SnapshotsCopiedAccountStarted
) -
Snapshot cross-account copy completed (based on
SnapshotsCopiedAccountCompleted
) -
Snapshot cross-account copy failed (based on
SnapshotsCopiedAccountFailed
) -
Snapshot cross-account copy deletion completed (based on
SnapshotsCopiedAccountDeleteCompleted
) -
Snapshot cross-account copy deletion failed (based on
SnapshotsCopiedAccountDeleteFailed
) -
AMI creation started (based on
ImagesCreateStarted
) -
AMI creation completed (based on
ImagesCreateCompleted
) -
AMI creation failed (based on
ImagesCreateFailed
) -
AMI deregistration completed (based on
ImagesDeregisterCompleted
) -
AMI deregistration failed (based on
ImagesDeregisterFailed
) -
AMI cross-Region copy started (based on
ImagesCopiedRegionStarted
) -
AMI cross-Region copy completed (based on
ImagesCopiedRegionCompleted
) -
AMI cross-Region copy failed (based on
ImagesCopiedRegionFailed
) -
AMI cross-Region copy deregistration completed (based on
ImagesCopiedRegionDeregisterCompleted
) -
AMI cross-Region copy deregister failed (based on
ImagesCopiedRegionDeregisteredFailed
) -
AMI enable deprecation completed (based on
EnableImageDeprecationCompleted
) -
AMI enable deprecation failed (based on
EnableImageDeprecationFailed
) -
AMI cross-Region copy enable deprecation completed (based on
EnableCopiedImageDeprecationCompleted
) -
AMI cross-Region copy enable deprecation failed (based on
EnableCopiedImageDeprecationFailed
)
Create a CloudWatch alarm for a policy
You can create a CloudWatch alarm that monitors CloudWatch metrics for your policies. CloudWatch will automatically send you a notification when the metric reaches a threshold that you specify. You can create a CloudWatch alarm using the CloudWatch console.
For more information about creating alarms using the CloudWatch console, see the following topic in the Amazon CloudWatch User Guide.
Example use cases
The following are example use cases.
Topics
Example 1: ResourcesTargeted metric
You can use the ResourcesTargeted
metric to monitor the total number of resources
that are targeted by a specific policy each time it is run. This enables you to trigger an
alarm when the number of targeted resources is below or above an expected threshold.
For example, if you expect your daily policy to create backups of no more than 50
volumes, you can create an alarm that sends an email notification when the sum
for
ResourcesTargeted
is greater than 50
over a 1
hour period. In this way, you can ensure that no snapshots have been unexpectedly created from
volumes that have been incorrectly tagged.
You can use the following command to create this alarm:
C:\>
aws cloudwatch put-metric-alarm \ --alarm-name resource-targeted-monitor \ --alarm-description "Alarm when policy targets more than 50 resources" \ --metric-name ResourcesTargeted \ --namespace AWS/EBS \ --statistic Sum \ --period 3600 \ --threshold 50 \ --comparison-operator GreaterThanThreshold \ --dimensions "Name=DLMPolicyId,Value=policy_id
" \ --evaluation-periods 1 \ --alarm-actionssns_topic_arn
Example 2: SnapshotDeleteFailed metric
You can use the SnapshotDeleteFailed
metric to monitor for failures to delete
snapshots as per the policy's snapshot retention rule.
For example, if you've created a policy that should automatically delete snapshots every twelve
hours, you can create an alarm that notifies your engineering team when the sum
of
SnapshotDeletionFailed
is greater than 0
over a 1
hour
period. This could help to investigate improper snapshot retention and to ensure that your storage
costs are not increased by unnecessary snapshots.
You can use the following command to create this alarm:
C:\>
aws cloudwatch put-metric-alarm \ --alarm-name snapshot-deletion-failed-monitor \ --alarm-description "Alarm when snapshot deletions fail" \ --metric-name SnapshotsDeleteFailed \ --namespace AWS/EBS \ --statistic Sum \ --period 3600 \ --threshold 0 \ --comparison-operator GreaterThanThreshold \ --dimensions "Name=DLMPolicyId,Value=policy_id
" \ --evaluation-periods 1 \ --alarm-actionssns_topic_arn
Example 3: SnapshotsCopiedRegionFailed metric
Use the SnapshotsCopiedRegionFailed
metric to identify when your policies fail to
copy snapshots to other Regions.
For example, if your policy copies snapshots across Regions daily, you can create an alarm that sends
an SMS to your engineering team when the sum
of SnapshotCrossRegionCopyFailed
is greater than 0
over a 1
hour period. This can be useful for verifying whether
subsequent snapshots in the lineage were successfully copied by the policy.
You can use the following command to create this alarm:
C:\>
aws cloudwatch put-metric-alarm \ --alarm-name snapshot-copy-region-failed-monitor \ --alarm-description "Alarm when snapshot copy fails" \ --metric-name SnapshotsCopiedRegionFailed \ --namespace AWS/EBS \ --statistic Sum \ --period 3600 \ --threshold 0 \ --comparison-operator GreaterThanThreshold \ --dimensions "Name=DLMPolicyId,Value=policy_id
" \ --evaluation-periods 1 \ --alarm-actionssns_topic_arn
Managing policies that report failed actions
For more information about what to do when one of your policies reports an unexpected
non-zero value for a failed action metric, see the What should I do if
Amazon Data Lifecycle Manager reports failed actions in CloudWatch metrics?