Supported algorithms, frameworks, and instances for multi-model endpoints using CPU backed instances Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances

Supported algorithms, frameworks, and instances for multi-model endpoints

For information about the algorithms, frameworks, and instance types that you can use with multi-model endpoints, see the following sections.

Supported algorithms, frameworks, and instances for multi-model endpoints using CPU backed instances

The inference containers for the following algorithms and frameworks support multi-model endpoints:

To use any other framework or algorithm, use the SageMaker AI inference toolkit to build a container that supports multi-model endpoints. For information, see Build Your Own Container for SageMaker AI Multi-Model Endpoints.

Multi-model endpoints support all of the CPU instance types.

Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances

Hosting multiple GPU backed models on multi-model endpoints is supported through the SageMaker AI Triton Inference server. This supports all major inference frameworks such as NVIDIA® TensorRT™, PyTorch, MXNet, Python, ONNX, XGBoost, scikit-learn, RandomForest, OpenVINO, custom C++, and more.

To use any other framework or algorithm, you can use Triton backend for Python or C++ to write your model logic and serve any custom model. After you have the server ready, you can start deploying 100s of Deep Learning models behind one endpoint.

Multi-model endpoints support the following GPU instance types:

Instance family	Instance type	vCPUs	GiB of memory per vCPU	GPUs	GPU memory
p2	ml.p2.xlarge	4	15.25	1	12
p3	ml.p3.2xlarge	8	7.62	1	16
g5	ml.g5.xlarge	4	4	1	24
g5	ml.g5.2xlarge	8	4	1	24
g5	ml.g5.4xlarge	16	4	1	24
g5	ml.g5.8xlarge	32	4	1	24
g5	ml.g5.16xlarge	64	4	1	24
g4dn	ml.g4dn.xlarge	4	4	1	16
g4dn	ml.g4dn.2xlarge	8	4	1	16
g4dn	ml.g4dn.4xlarge	16	4	1	16
g4dn	ml.g4dn.8xlarge	32	4	1	16
g4dn	ml.g4dn.16xlarge	64	4	1	16

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Multi-model endpoints

Instance recommendations for multi-model endpoint deployments