排查 Amazon EC2 Auto Scaling 中的问题 - Amazon EC2 Auto Scaling
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

排查 Amazon EC2 Auto Scaling 中的问题

Amazon EC2 Auto Scaling 提供特定的描述性错误消息来帮助您排查问题。可以从扩展活动的描述中发现错误消息。

检索来自扩缩活动的错误消息

要从扩展活动的描述中检索错误消息,请使用 describe-scaling-activities 命令。您拥有可追溯到 6 周的扩展活动记录。扩展活动按开始时间排序,首先列出最新的扩展活动。

注意

在 Amazon EC2 Auto Scaling 控制台中,在该 Auto Scaling 组的 Activity(活动)选项卡的活动历史记录中也会显示扩缩活动。

要查看特定 Auto Scaling 组的扩展活动,请使用以下命令。

aws autoscaling describe-scaling-activities --auto-scaling-group-name my-asg

在下面的示例响应中,StatusCode 包含活动的当前状态,StatusMessage 包含错误消息。

{ "Activities": [ { "ActivityId": "3b05dbf6-037c-b92f-133f-38275269dc0f", "AutoScalingGroupName": "my-asg", "Description": "Launching a new EC2 instance: i-003a5b3ffe1e9358e. Status Reason: Instance failed to complete user's Lifecycle Action: Lifecycle Action with token e85eb647-4fe0-4909-b341-a6c42d8aba1f was abandoned: Lifecycle Action Completed with ABANDON Result", "Cause": "At 2021-01-11T00:35:52Z a user request created an AutoScalingGroup changing the desired capacity from 0 to 1. At 2021-01-11T00:35:53Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.", "StartTime": "2021-01-11T00:35:55.542Z", "EndTime": "2021-01-11T01:06:31Z", "StatusCode": "Cancelled", "StatusMessage": "Instance failed to complete user's Lifecycle Action: Lifecycle Action with token e85eb647-4fe0-4909-b341-a6c42d8aba1f was abandoned: Lifecycle Action Completed with ABANDON Result", "Progress": 100, "Details": "{\"Subnet ID\":\"subnet-5ea0c127\",\"Availability Zone\":\"us-west-2b\"...}", "AutoScalingGroupARN": "arn:aws:autoscaling:us-west-2:123456789012:autoScalingGroup:283179a2-f3ce-423d-93f6-66bb518232f7:autoScalingGroupName/my-asg" }, ... ] }

有关输出中字段的描述,请参阅 Amazon EC2 Auto Scaling API 参考中的活动

要查看已删除组的扩展活动

要在删除自动扩缩组后查看扩展活动,请将 --include-deleted-groups 选项添加到以下 describe-scaling-activities 命令。

aws autoscaling describe-scaling-activities --auto-scaling-group-name my-asg --include-deleted-groups

以下是示例响应,其中包含已删除组的扩展活动。

{ "Activities": [ { "ActivityId": "e1f5de0e-f93e-1417-34ac-092a76fba220", "AutoScalingGroupName": "my-asg", "Description": "Launching a new EC2 instance. Status Reason: Your Spot request price of 0.001 is lower than the minimum required Spot request fulfillment price of 0.0031. Launching EC2 instance failed.", "Cause": "At 2021-01-13T20:47:24Z a user request update of AutoScalingGroup constraints to min: 1, max: 5, desired: 3 changing the desired capacity from 0 to 3. At 2021-01-13T20:47:27Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 3.", "StartTime": "2021-01-13T20:47:30.094Z", "EndTime": "2021-01-13T20:47:30Z", "StatusCode": "Failed", "StatusMessage": "Your Spot request price of 0.001 is lower than the minimum required Spot request fulfillment price of 0.0031. Launching EC2 instance failed.", "Progress": 100, "Details": "{\"Subnet ID\":\"subnet-5ea0c127\",\"Availability Zone\":\"us-west-2b\"...}", "AutoScalingGroupState": "Deleted", "AutoScalingGroupARN": "arn:aws:autoscaling:us-west-2:123456789012:autoScalingGroup:283179a2-f3ce-423d-93f6-66bb518232f7:autoScalingGroupName/my-asg" }, ... ] }

关闭扩缩活动

如果您需要在不受扩缩策略或计划操作干扰的情况下调查问题,则可以使用以下选项:

  • 通过暂停 AlarmNotificationScheduledActions 进程,防止所有动态扩缩策略和计划操作更改组的所需容量。有关更多信息,请参阅 暂停和恢复 Amazon EC2 Auto Scaling 进程

  • 禁用单个动态扩缩策略,以使其不会因负载变化而更改组的所需容量。有关更多信息,请参阅 禁用 Auto Scaling 组的扩缩策略

  • 通过禁用策略的横向缩减部分,将单个目标跟踪扩缩策略更新为仅横向扩展(增加容量)。这种方法可以防止组的所需容量缩小,但允许在负载增加时增加容量。有关更多信息,请参阅 Amazon EC2 Auto Scaling 的目标跟踪扩缩策略

  • 将您的预测性扩展策略更新为仅预测模式。在仅预测模式下,预测性扩展将继续生成预测,但不会自动增加容量。有关更多信息,请参阅 创建自动扩缩组的预测性扩展策略

其他故障排除资源

以下页面提供了有关对 Amazon EC2 Auto Scaling 问题进行故障排除的其他信息。

以下 Amazon 资源也会有所帮助:

故障排除通常需要由专家或多个帮助者进行迭代查询和发现。如果尝试这一部分的建议后仍然存在问题,请联系 Amazon Web Services 支持 [在 Amazon Web Services 管理控制台 中,单击支持支持中心] 或使用 Amazon EC2 Auto Scaling 标签在 Amazon re:Post 上提问。