Supported algorithms, frameworks, and instances for multi-model endpoints - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Supported algorithms, frameworks, and instances for multi-model endpoints

For information about the algorithms, frameworks, and instance types that you can use with multi-model endpoints, see the following sections.

Supported algorithms, frameworks, and instances for multi-model endpoints using CPU backed instances

The inference containers for the following algorithms and frameworks support multi-model endpoints:

To use any other framework or algorithm, use the SageMaker AI inference toolkit to build a container that supports multi-model endpoints. For information, see Build Your Own Container for SageMaker AI Multi-Model Endpoints.

Multi-model endpoints support all of the CPU instance types.

Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances

Hosting multiple GPU backed models on multi-model endpoints is supported through the SageMaker AI Triton Inference server. This supports all major inference frameworks such as NVIDIA® TensorRT™, PyTorch, MXNet, Python, ONNX, XGBoost, scikit-learn, RandomForest, OpenVINO, custom C++, and more.

To use any other framework or algorithm, you can use Triton backend for Python or C++ to write your model logic and serve any custom model. After you have the server ready, you can start deploying 100s of Deep Learning models behind one endpoint.

Multi-model endpoints support the following GPU instance types:

Instance family Instance type vCPUs GiB of memory per vCPU GPUs GPU memory

p2

ml.p2.xlarge

4

15.25

1

12

p3

ml.p3.2xlarge

8

7.62

1

16

g5

ml.g5.xlarge

4

4

1

24

g5

ml.g5.2xlarge

8

4

1

24

g5

ml.g5.4xlarge

16

4

1

24

g5

ml.g5.8xlarge

32

4

1

24

g5

ml.g5.16xlarge

64

4

1

24

g4dn

ml.g4dn.xlarge

4

4

1

16

g4dn

ml.g4dn.2xlarge

8

4

1

16

g4dn

ml.g4dn.4xlarge

16

4

1

16

g4dn

ml.g4dn.8xlarge

32

4

1

16

g4dn

ml.g4dn.16xlarge

64

4

1

16