Amazon ECS task definitions for Amazon Neuron machine learning workloads
You can register Amazon EC2
Trn1
Amazon EC2 Trn1 instances are powered by Amazon Trainium
The Amazon EC2 Inf1 instances and Inf2 instances are powered by Amazon Inferentia
Machine learning models are deployed to containers using Amazon Neuron
Considerations
Before you begin deploying Neuron on Amazon ECS, consider the following:
-
Your clusters can contain a mix of Trn1, Inf1, Inf2 and other instances.
-
You need a Linux application in a container that uses a machine learning framework that supports Amazon Neuron.
Important
Applications that use other frameworks might not have improved performance on Trn1, Inf1, and Inf2 instances.
-
Only one inference or inference-training task can run on each Amazon Trainium
or Amazon Inferentia chip. For Inf1, each chip has 4 NeuronCores. For Trn1 and Inf2 each chip has 2 NeuronCores. You can run as many tasks as there are chips for each of your Trn1, Inf1, and Inf2 instances. -
When creating a service or running a standalone task, you can use instance type attributes when you configure task placement constraints. This ensures that the task is launched on the container instance that you specify. Doing so can help you optimize overall resource utilization and ensure that tasks for inference workloads are on your Trn1, Inf1, and Inf2 instances. For more information, see How Amazon ECS places tasks on container instances.
In the following example, a task is run on an
Inf1.xlarge
instance on yourdefault
cluster.aws ecs run-task \ --cluster default \ --task-definition ecs-inference-task-def \ --placement-constraints type=memberOf,expression="attribute:ecs.instance-type == Inf1.xlarge"
-
Neuron resource requirements can't be defined in a task definition. Instead, you configure a container to use specific Amazon Trainium or Amazon Inferentia chips available on the host container instance. Do this by using the
linuxParameters
parameter and specifying the device details. For more information, see Task definition requirements.
Use the Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI
Amazon ECS provides an Amazon ECS optimized AMI that's based on Amazon Linux 2023 for Amazon Trainium and Amazon Inferentia workloads. It comes with the Amazon Neuron drivers and runtime for Docker. This AMI makes running machine learning inference workloads easier on Amazon ECS.
We recommend using the Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI when launching your Amazon EC2 Trn1, Inf1, and Inf2 instances.
You can retrieve the current Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI using the Amazon CLI with the following command.
aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/neuron/recommended
The Amazon ECS-optimized Amazon Linux 2023 (Neuron) AMI is supported in the following Regions:
-
US East (N. Virginia)
-
US East (Ohio)
-
US West (N. California)
-
US West (Oregon)
-
Asia Pacific (Mumbai)
-
Asia Pacific (Osaka)
-
Asia Pacific (Seoul)
-
Asia Pacific (Tokyo)
-
Asia Pacific (Singapore)
-
Asia Pacific (Sydney)
-
Canada (Central)
-
Europe (Frankfurt)
-
Europe (Ireland)
-
Europe (London)
-
Europe (Paris)
-
Europe (Stockholm)
-
South America (São Paulo)
Use the Amazon ECS optimized Amazon Linux 2 (Neuron) AMI
Amazon ECS provides an Amazon ECS optimized AMI that's based on Amazon Linux 2 for Amazon Trainium and Amazon Inferentia workloads. It comes with the Amazon Neuron drivers and runtime for Docker. This AMI makes running machine learning inference workloads easier on Amazon ECS.
We recommend using the Amazon ECS optimized Amazon Linux 2 (Neuron) AMI when launching your Amazon EC2 Trn1, Inf1, and Inf2 instances.
You can retrieve the current Amazon ECS optimized Amazon Linux 2 (Neuron) AMI using the Amazon CLI with the following command.
aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2/inf/recommended
The Amazon ECS optimized Amazon Linux 2 (Neuron) AMI is supported in the following Regions:
-
US East (N. Virginia)
-
US East (Ohio)
-
US West (N. California)
-
US West (Oregon)
-
Asia Pacific (Mumbai)
-
Asia Pacific (Osaka)
-
Asia Pacific (Seoul)
-
Asia Pacific (Tokyo)
-
Asia Pacific (Singapore)
-
Asia Pacific (Sydney)
-
Canada (Central)
-
Europe (Frankfurt)
-
Europe (Ireland)
-
Europe (London)
-
Europe (Paris)
-
Europe (Stockholm)
-
South America (São Paulo)
Task definition requirements
To deploy Neuron on Amazon ECS, your task definition must contain the container definition for a pre-built container serving the inference model for TensorFlow. It's provided by Amazon Deep Learning Containers. This container contains the Amazon Neuron runtime and the TensorFlow Serving application. At startup, this container fetches your model from Amazon S3, launches Neuron TensorFlow Serving with the saved model, and waits for prediction requests. In the following example, the container image has TensorFlow 1.15 and Ubuntu 18.04. A complete list of pre-built Deep Learning Containers optimized for Neuron is maintained on GitHub. For more information, see Using Amazon Neuron TensorFlow Serving.
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04
Alternatively, you can build your own Neuron sidecar container image. For more
information, see Tutorial: Neuron TensorFlow Serving
The task definition must be specific to a single instance type. You must configure
a container to use specific Amazon Trainium or Amazon Inferentia devices that are
available on the host container instance. You can do so using the
linuxParameters
parameter. The following table details the chips
that are specific to each instance type.
Instance Type | vCPUs | RAM (GiB) | Amazon ML accelerator chips | Device Paths |
---|---|---|---|---|
trn1.2xlarge | 8 | 32 | 1 | /dev/neuron0 |
trn1.32xlarge | 128 | 512 | 16 |
/dev/neuron0 , /dev/neuron1 ,
/dev/neuron2 , /dev/neuron3 ,
/dev/neuron4 , /dev/neuron5 ,
/dev/neuron6 , /dev/neuron7 ,
/dev/neuron8 , /dev/neuron9 ,
/dev/neuron10 , /dev/neuron11 ,
/dev/neuron12 , /dev/neuron13 ,
/dev/neuron14 , /dev/neuron15
|
inf1.xlarge | 4 | 8 | 1 | /dev/neuron0 |
inf1.2xlarge | 8 | 16 | 1 | /dev/neuron0 |
inf1.6xlarge | 24 | 48 | 4 | /dev/neuron0 , /dev/neuron1 ,
/dev/neuron2 , /dev/neuron3 |
inf1.24xlarge | 96 | 192 | 16 |
/dev/neuron0 , /dev/neuron1 ,
/dev/neuron2 , /dev/neuron3 ,
/dev/neuron4 , /dev/neuron5 ,
/dev/neuron6 , /dev/neuron7 ,
/dev/neuron8 , /dev/neuron9 ,
/dev/neuron10 , /dev/neuron11 ,
/dev/neuron12 , /dev/neuron13 ,
/dev/neuron14 , /dev/neuron15
|
inf2.xlarge | 8 | 16 | 1 | /dev/neuron0 |
inf2.8xlarge | 32 | 64 | 1 | /dev/neuron0 |
inf2.24xlarge | 96 | 384 | 6 | /dev/neuron0 , /dev/neuron1 ,
/dev/neuron2 , /dev/neuron3 ,
/dev/neuron4 , /dev/neuron5 , |
inf2.48xlarge | 192 | 768 | 12 | /dev/neuron0 , /dev/neuron1 ,
/dev/neuron2 , /dev/neuron3 ,
/dev/neuron4 , /dev/neuron5 ,
/dev/neuron6 , /dev/neuron7 ,
/dev/neuron8 , /dev/neuron9 ,
/dev/neuron10 , /dev/neuron11 |