Scheduling your containers on Amazon ECS - Amazon Elastic Container Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Scheduling your containers on Amazon ECS

Amazon Elastic Container Service (Amazon ECS) is a shared state, optimistic concurrency system that provides flexible scheduling capabilities for your containerized workloads. The Amazon ECS schedulers use the same cluster state information as the Amazon ECS API to make appropriate placement decisions.

Amazon ECS provides a service scheduler for long-running tasks and applications. It also provides the ability to run standalone tasks or scheduled tasks for batch jobs or single run tasks. You can specify the task placement strategies and constraints for running tasks that best meet your needs. For example, you can specify whether tasks run across multiple Availability Zones or within a single Availability Zone. And, optionally, you can integrate tasks with your own custom or third-party schedulers.

Service scheduler

The service scheduler is suitable for long running stateless services and applications. The service scheduler ensures that the scheduling strategy that you specify is followed and reschedules tasks when a task fails. For example, if the underlying infrastructure fails, the service scheduler can reschedule tasks.

The service scheduler also replaces tasks determined to be unhealthy after a container health check or a load balancer target group health check fails. This replacement depends on the maximumPercent and desiredCount service definition parameters. If a task is marked unhealthy, the service scheduler will first start a replacement task. If the replacement task has a health status of HEALTHY, the service scheduler stops the unhealthy task. If the replacement task has a health status of UNHEALTHY, the scheduler will stop either the unhealthy replacement task or the existing unhealthy task to get the total task count to equal desiredCount. If the maximumPercent parameter limits the scheduler from starting a replacement task first, the scheduler will stop an unhealthy task one at a time at random to free up capacity, and then start a replacement task. The start and stop process continues until all unhealthy tasks are replaced with healthy tasks. Once all unhealthy tasks have been replaced and only healthy tasks are running, if the total task count exceeds the desiredCount, healthy tasks are stopped at random until the total task count equals desiredCount. For more information about maximumPercent and desiredCount, see Service definition parameters.

Note

This behavior does not apply to tasks run and maintained by services that use the rolling update deployment type. During a rolling update, the service scheduler first stops unhealthy tasks and then starts replacement tasks.

There are two service scheduler strategies available:

  • REPLICA—The replica scheduling strategy places and maintains the desired number of tasks across your cluster. By default, the service scheduler spreads tasks across Availability Zones. You can use task placement strategies and constraints to customize task placement decisions. For more information, see Replica.

  • DAEMON—The daemon scheduling strategy deploys exactly one task on each active container instance that meets all of the task placement constraints that you specify in your cluster. When using this strategy, there is no need to specify a desired number of tasks, a task placement strategy, or use Service Auto Scaling policies. For more information, see Daemon.

    Note

    Fargate tasks do not support the DAEMON scheduling strategy.

The service scheduler optionally also makes sure that tasks are registered against an Elastic Load Balancing load balancer. You can update your services that are maintained by the service scheduler. This might include deploying a new task definition or changing the number of desired tasks that are running. By default, the service scheduler spreads tasks across multiple Availability Zones. However, you can use task placement strategies and constraints to customize task placement decisions. For more information, see Amazon ECS services.

Standalone tasks

The RunTask action is suitable for processes such as batch jobs that perform work and then stop. For example, you can have a process call RunTask when work comes into a queue. The task pulls work from the queue, performs the work, and then exits. Using RunTask, you can allow the default task placement strategy to distribute tasks randomly across your cluster. This minimizes the chances that a single instance gets a disproportionate number of tasks. Alternatively, you can use RunTask to customize how the scheduler places tasks using task placement strategies and constraints. For more information, see Creating a standalone task and RunTask in the Amazon Elastic Container Service API Reference.

Scheduled tasks

Running tasks on a cron-like schedule

If you have tasks to run at set intervals in your cluster, you can use EventBridge Scheduler to create a schedule. You can run tasks for a backup operation or a log scan. The EventBridge Scheduler schedule that you create can run one or more tasks in your cluster at specified times. Your scheduled event can be set to a specific interval (run every N minutes, hours, or days). Otherwise, for more complicated scheduling, you can use a cron expression. For more information, see Amazon ECS scheduled tasks.

Custom schedulers

With Amazon ECS, you can create your own schedulers or use third-party schedulers. For more information, see How to create a custom scheduler for Amazon ECS. Custom schedulers use the StartTask API operation to place tasks on specific container instances within your cluster.

Note

Custom schedulers are only compatible with tasks hosted on EC2 instances. If you use Amazon ECS on Fargate, the StartTask API doesn't work.

Task lifecycle

When a task is started, either manually or as part of a service, it can pass through several states before it finishes on its own or is stopped manually. Some tasks are meant to run as batch jobs that naturally progress through from PENDING to RUNNING to STOPPED. Other tasks, which can be part of a service, are meant to continue running indefinitely, or to be scaled up and down as needed.

When task status changes are requested, such as stopping a task or updating the desired count of a service to scale it up or down, the Amazon ECS container agent tracks these changes as the last known status (lastStatus) of the task and the desired status (desiredStatus) of the task. Both the last known status and desired status of a task can be seen either in the console or by describing the task with the API or Amazon CLI.

The flow chart below shows the task lifecycle flow.


                Diagram of the task lifecycle states. The states are PROVISIONING, PENDING,
                    ACTIVATING, RUNNING, DEACTOVATING, STOPPING.

Lifecycle states

The following are descriptions of each of the task lifecycle states.

PROVISIONING

Amazon ECS has to perform additional steps before the task is launched. For example, for tasks that use the awsvpc network mode, the elastic network interface needs to be provisioned.

PENDING

This is a transition state where Amazon ECS is waiting on the container agent to take further action. A task stays in the pending state until there are available resources for the task.

ACTIVATING

This is a transition state where Amazon ECS has to perform additional steps after the task is launched but before the task can transition to the RUNNING state. For example, for tasks that have service discovery configured, the service discovery resources must be created. For tasks that are part of a service that's configured to use multiple Elastic Load Balancing target groups, the target group registration occurs during this state.

RUNNING

The task is successfully running.

DEACTIVATING

This is a transition state where Amazon ECS has to perform additional steps before the task is stopped. For example, for tasks that are part of a service that's configured to use multiple Elastic Load Balancing target groups, the target group deregistration occurs during this state.

STOPPING

This is a transition state where Amazon ECS is waiting on the container agent to take further action.

For Linux containers, the container agent will send the SIGTERM signal to notify the application needs to finish and shut down, and then the sends a SIGKILL after waiting the StopTimeout duration set in the task definition.

DEPROVISIONING

Amazon ECS has to perform additional steps after the task has stopped but before the task transitions to the STOPPED state. For example, for tasks that use the awsvpc network mode, the elastic network interface needs to be detached and deleted.

STOPPED

The task has been successfully stopped.

DELETED

This is a transition state when a task stops. This state is not displayed in the console, but is displayed in describe-tasks.