Control the instances Amazon ECS terminates
Important
You must turn on Auto Scaling instance scale-in protection on the Auto Scaling group to use the managed termination protection feature of cluster auto scaling.
Managed termination protection allows cluster auto scaling to control which instances are terminated.
When you used managed termination protection, Amazon ECS only terminates EC2 instances that
don't have any running Amazon ECS tasks. Tasks that are run by a service that uses the
DAEMON
scheduling strategy are ignored and an instance can be
terminated by cluster auto scaling even when the instance is running these tasks. This is because all
of the instances in the cluster are running these tasks.
Amazon ECS first turns on the instance scale-in protection option for the EC2 instances in the Auto Scaling group. Then, Amazon ECS places the tasks on the instances. When all non-daemon tasks are stopped on an instance, Amazon ECS initiates the scale-in process and turns off scale-in protection for the EC2 instance. The Auto Scaling group can then terminate the instance.
Auto Scaling instance scale-in protection controls which EC2 instances can be terminated by Auto Scaling. Instances with the scale-in feature turned on can't be terminated during the scale-in process. For more information about Auto Scaling instance scale-in protection, see Using instance scale-in protection in the Amazon EC2 Auto Scaling User Guide.
You can set the targetCapacity
percentage so that you have spare
capacity. This helps future tasks launch more quickly because the Auto Scaling group does not have to
launch more instances. Amazon ECS uses the target capacity value to manage the CloudWatch metric
that the service creates. Amazon ECS manages the CloudWatch metric. The Auto Scaling group is treated as a
steady state so that no scaling action is required. The values can be from 0-100%. For
example, to configure Amazon ECS to keep 10% free capacity on top of that used by Amazon ECS
tasks, set the target capacity value to 90%. Consider the following when setting the
targetCapacity
value on a capacity provider.
-
A
targetCapacity
value of less than 100% represents the amount of free capacity (Amazon EC2 instances) that need to be present in the cluster. Free capacity means that there are no running tasks. -
Placement constraints such as Availability Zones, without additional
binpack
forces Amazon ECS to eventually run one task for each instance, which might not be the desired behavior.
You must turn on Auto Scaling instance scale-in protection on the Auto Scaling group to use managed termination protection. If you don't turn on scale-in protection, then turning on managed termination protection can lead to undesirable behavior. For example, you may have instances stuck in draining state. For more information, see Using instance scale-in protection in the Amazon EC2 Auto Scaling User Guide.
When you use termination protection with a capacity provider, don't perform any manual actions, like detaching the instance, on the Auto Scaling group associated with the capacity provider. Manual actions can break the scale-in operation of the capacity provider. If you detach an instance from the Auto Scaling group, you need to also deregister the detached instance from the Amazon ECS cluster.
Managed scale-out behavior
When you have Auto Scaling group capacity providers that use managed scaling, Amazon ECS estimates the optimal number of instances to add to your cluster and uses the value to determine how many instances to request.
Amazon ECS selects a capacity provider for each task by following the capacity provider strategy from the service, standalone task, or the cluster default. Amazon ECS follows the rest of these steps for a single capacity provider.
Tasks without a capacity provider strategy are ignored by capacity providers. A pending task that doesn't have a capacity provider strategy won't cause any capacity provider to scale out. Tasks or services can't set a capacity provider strategy if that task or service sets a launch type.
The following describes the scale-out behavior in more detail.
-
Group all of the provisioning tasks for this capacity provider so that each group has the same exact resource requirements.
-
When you use multiple instance types in an Auto Scaling group, the instance types in the Auto Scaling group are sorted by their parameters. These parameters include vCPU, memory, elastic network interfaces (ENIs), ports, and GPUs. The smallest and the largest instance types for each parameter are selected. For more information about how to choose the instance type, see Amazon EC2 container instances for Amazon ECS.
Important
If a group of tasks have resource requirements that are greater than the smallest instance type in the Auto Scaling group, then that group of tasks can’t run with this capacity provider. The capacity provider doesn’t scale the Auto Scaling group. The tasks remain in the
PROVISIONING
state.To prevent tasks from staying in the
PROVISIONING
state, we recommend that you create separate Auto Scaling groups and capacity providers for different minimum resource requirements. When you run tasks or create services, only add capacity providers to the capacity provider strategy that can run the task on the smallest instance type in the Auto Scaling group. For other parameters, you can use placement constraints -
For each group of tasks, Amazon ECS calculates the number of instances that are required to run the unplaced tasks. This calculation uses a
binpack
strategy. This strategy accounts for the vCPU, memory, elastic network interfaces (ENI), ports, and GPUs requirements of the tasks. It also accounts for the resource availability of the Amazon EC2 instances. The values for the largest instance types are treated as the maximum calculated instance count. The values for the smallest instance type are used as protection. If the smallest instance type can't run at least one instance of the task, the calculation considers the task as not compatible. As a result, the task is excluded from scale-out calculation. When all the tasks aren't compatible with the smallest instance type, cluster auto scaling stops and theCapacityProviderReservation
value remains at thetargetCapacity
value. -
Amazon ECS publishes the
CapacityProviderReservation
metric to CloudWatch with respect to theminimumScalingStepSize
if either of the following is the case.-
The maximum calculated instance count is less than the minimum scaling step size.
-
The lower value of either the
maximumScalingStepSize
or the maximum calculated instance count.
-
-
CloudWatch alarms use the
CapacityProviderReservation
metric for capacity providers. When theCapacityProviderReservation
metric is greater than thetargetCapacity
value, alarms also increase theDesiredCapacity
of the Auto Scaling group. ThetargetCapacity
value is a capacity provider setting that's sent to the CloudWatch alarm during the cluster auto scaling activation phase.The default
targetCapacity
is 100%. -
The Auto Scaling group launches additional EC2 instances. To prevent over-provisioning, Auto Scaling makes sure that recently launched EC2 instance capacity is stabilized before it launches new instances. Auto Scaling checks if all existing instances have passed the
instanceWarmupPeriod
(now minus the instance launch time). The scale-out is blocked for instances that are within theinstanceWarmupPeriod
.The default number of seconds for a newly launched instance to warm up is 300.
For more information, see Deep dive on Amazon ECS cluster auto scaling
Scale-out considerations
Consider the following for the scale-out process:
-
Although there are multiple placement constraints, we recommend that you only use the
distinctInstance
task placement constraint. This prevents the scale-out process from stopping because you're using a placement constraint that's not compatible with the sampled instances. -
Managed scaling works best if your Auto Scaling group uses the same or similar instance types.
-
When a scale-out process is required and there are no currently running container instances, Amazon ECS always scales-out to two instances initially, and then performs additional scale-out or scale-in processes. Any additional scale-out waits for the instance warmup period. For scale-in processes, Amazon ECS waits 15 minutes after a scale-out process before starting scale-in processes at all times.
-
The second scale-out step needs to wait until the
instanceWarmupPeriod
expires, which might affect the overall scale limit. If you need to reduce this time, make sure thatinstanceWarmupPeriod
is large enough for the EC2 instance to launch and start the Amazon ECS agent (which prevents over provisioning). -
Cluster auto scaling supports Launch Configuration, Launch Templates, and multiple instance types in the capacity provider Auto Scaling group. You can also use attribute-based instance type selection without multiple instances types.
-
When using an Auto Scaling group with On-Demand instances and multiple instance types or Spot Instances, place the larger instance types higher in the priority list and don't specify a weight. Specifying a weight isn't supported at this time. For more information, see Auto Scaling groups with multiple instance types in the Amazon Auto Scaling User Guide.
-
Amazon ECS then launch either the
minimumScalingStepSize
, if the maximum calculated instance count is less than the minimum scaling step size, or the lower of either themaximumScalingStepSize
or the maximum calculated instance count value. -
If an Amazon ECS service or
run-task
launches a task and the capacity provider container instances don't have enough resources to start the task, then Amazon ECS limits the number of tasks with this status for each cluster and prevents any tasks from exceeding this limit. For more information, see Amazon ECS service quotas.
Managed scale-in behavior
Amazon ECS monitors container instances for each capacity provider within a cluster. When a container instance isn't running any tasks, the container instance is considered empty and Amazon ECS starts the scale-in process.
CloudWatch scale-in alarms require 15 data points (15 minutes) before the scale-in
process for the Auto Scaling group starts. After the scale-in process starts until Amazon ECS needs to
reduce the number of registered container instances, the Auto Scaling group sets the
DesireCapacity
value to be greater than one instance and less than
50% each minute.
When Amazon ECS requests a scale-out (when CapacityProviderReservation
is
greater than 100) while a scale-in process is in progress, the scale-in process is
stopped and starts from the beginning if required.
The following describes the scale-in behavior in more detail:
-
Amazon ECS calculates the number of container instances that are empty. A container instance is considered empty even when daemon tasks are running.
-
Amazon ECS sets the
CapacityProviderReservation
value to a number between 0-100 that uses the following formula to represent the ratio of how big the Auto Scaling group needs to be relative to how big it actually is, expressed as a percentage. Then, Amazon ECS publishes the metric to CloudWatch. For more information about how the metric is calculated, see Deep Dive on Amazon ECS Cluster Auto ScalingCapacityProviderReservation = (number of instances needed) / (number of running instances) x 100
-
The
CapacityProviderReservation
metric generates a CloudWatch alarm. This alarm updates theDesiredCapacity
value for the Auto Scaling group. Then, one of the following actions occurs:-
If you don't use capacity provider managed termination, the Auto Scaling group selects EC2 instances using the Auto Scaling group termination policy and terminates the instances until the number of EC2 instances reaches the
DesiredCapacity
. The container instances are then deregistered from the cluster. -
If all the container instances use managed termination protection, Amazon ECS removes the scale-in protection on the container instances that are empty. The Auto Scaling group will then be able to terminate the EC2 instances. The container instances are then deregistered from the cluster.
-