Setting up the Route 53 health check for EventBridge global endpoints
When using global endpoints you have to have a Route 53 health check to monitor the status of your Regions. The following template defines a Amazon CloudWatch alarm and uses it to define a Route 53 health check.
Topics
Amazon CloudFormation template for defining a Route 53 health check
Use the following template to define your Route 53 health check.
Description: |- Global endpoints health check that will fail when the average Amazon EventBridge latency is above 30 seconds for a duration of 5 minutes. Note, missing data will cause the health check to fail, so if you only send events intermittently, consider changing the heath check to use a longer evaluation period or instead treat missing data as 'missing' instead of 'breaching'. Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: "Global endpoint health check alarm configuration" Parameters: - HealthCheckName - HighLatencyAlarmPeriod - MinimumEvaluationPeriod - MinimumThreshold - TreatMissingDataAs ParameterLabels: HealthCheckName: default: Health check name HighLatencyAlarmPeriod: default: High latency alarm period MinimumEvaluationPeriod: default: Minimum evaluation period MinimumThreshold: default: Minimum threshold TreatMissingDataAs: default: Treat missing data as Parameters: HealthCheckName: Description: Name of the health check Type: String Default: LatencyFailuresHealthCheck HighLatencyAlarmPeriod: Description: The period, in seconds, over which the statistic is applied. Valid values are 10, 30, 60, and any multiple of 60. MinValue: 10 Type: Number Default: 60 MinimumEvaluationPeriod: Description: The number of periods over which data is compared to the specified threshold. You must have at least one evaluation period. MinValue: 1 Type: Number Default: 5 MinimumThreshold: Description: The value to compare with the specified statistic. Type: Number Default: 30000 TreatMissingDataAs: Description: Sets how this alarm is to handle missing data points. Type: String AllowedValues: - breaching - notBreaching - ignore - missing Default: breaching Mappings: "InsufficientDataMap": "missing": "HCConfig": "LastKnownStatus" "breaching": "HCConfig": "Unhealthy" Resources: HighLatencyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmDescription: High Latency in Amazon EventBridge MetricName: IngestionToInvocationStartLatency Namespace: AWS/Events Statistic: Average Period: !Ref HighLatencyAlarmPeriod EvaluationPeriods: !Ref MinimumEvaluationPeriod Threshold: !Ref MinimumThreshold ComparisonOperator: GreaterThanThreshold TreatMissingData: !Ref TreatMissingDataAs LatencyHealthCheck: Type: AWS::Route53::HealthCheck Properties: HealthCheckTags: - Key: Name Value: !Ref HealthCheckName HealthCheckConfig: Type: CLOUDWATCH_METRIC AlarmIdentifier: Name: Ref: HighLatencyAlarm Region: !Ref AWS::Region InsufficientDataHealthStatus: !FindInMap [InsufficientDataMap, !Ref TreatMissingDataAs, HCConfig] Outputs: HealthCheckId: Description: The identifier that Amazon Route 53 assigned to the health check when you created it. Value: !GetAtt LatencyHealthCheck.HealthCheckId
Event IDs can change across API calls so correlating events across Regions requires you to have an immutable, unique identifier. Consumers should also be designed with idempotency in mind. That way, if you're replicating events, or replaying them from archives, there are no side effects from the events being processed in both Regions.
CloudWatch alarm template properties
Note
For all editable
fields, consider your throughput per second. If you only send events intermittently, consider
changing the heath check to use a longer evaluation period or instead treat missing data as missing
instead of breaching
.
The following properties are used in th CloudWatch alarm section of the template:
Metric | Description |
---|---|
|
The description of the alarm. Default: |
|
The name of the metric associated with the alarm.
This is required for an alarm based on a metric. For an alarm based on a math expression, you use Default: IngestionToInvocationStartLatency |
|
The namespace of the metric associated with the alarm.
This is required for an alarm based on a metric. For an alarm based on a math expression, you can't specify Default: |
|
The statistic for the metric associated with the alarm, other than percentile. Default: Average |
|
The period, in seconds, over which the statistic is applied. This is required for an alarm based on a metric. Valid values are 10, 30, 60, and any multiple of 60. Default: |
|
The number of periods over which data is compared to the specified threshold.
If you are setting an alarm that requires that a number of consecutive data points be breaching to trigger the alarm, this value specifies that number.
If you are setting an "M out of N" alarm, this value is the N, and Default: |
|
The value to compare with the specified statistic. Default: |
|
The arithmetic operation to use when comparing the specified statistic and threshold. The specified statistic value is used as the first operand. Default: |
|
Sets how this alarm is to handle missing data points. Valid values: Default: |
Route 53 health check template properties
Note
For all editable
fields, consider your throughput per second. If you only send events intermittently, consider
changing the heath check to use a longer evaluation period or instead treat missing data as missing
instead of breaching
.
The following properties are used in th Route 53 health check section of the template:
Metric | Description |
---|---|
|
The name of the health check. Default: |
|
When CloudWatch has insufficient data about the metric to determine the alarm state, the status that you want Amazon Route 53 to assign to the health check Valid values:
Default: Unhealthy NoteThis field is updated based on the input to the |