How EI Works - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How EI Works

Amazon Elastic Inference accelerators are network attached devices that work along with SageMaker instances in your endpoint to accelerate your inference calls. Elastic Inference accelerates inference by allowing you to attach fractional GPUs to any SageMaker instance. You can select the client instance to run your application and attach an Elastic Inference accelerator to use the right amount of GPU acceleration for your inference needs. Elastic Inference helps you lower your cost when not fully utilizing your GPU instance for inference. We recommend trying Elastic Inference with your model using different CPU instances and accelerator sizes.

The following EI accelerator types are available. You can configure your endpoints or notebook instances with any EI accelerator type.

In the table, the throughput in teraflops (TFLOPS) is listed for both single-precision floating-point (F32) and half-precision floating-point (F16) operations. The memory in GB is also listed.

Accelerator Type F32 Throughput in TFLOPS F16 Throughput in TFLOPS Memory in GB
ml.eia2.medium 1 8 2
ml.eia2.large 2 16 4
ml.eia2.xlarge 4 32 8
ml.eia1.medium 1 8 1
ml.eia1.large 2 16 2
ml.eia1.xlarge 4 32 4