Advanced endpoint options for inference with Amazon SageMaker AI

With real-time inference, you can further optimize for performance and cost with the following advanced inference options:

Multi-model endpoints – Use this option if you have multiple models that use the same framework and can share a container. This option helps you optimize costs by improving endpoint utilization and reducing deployment overhead.
Multi-container endpoints – Use this option if you have multiple models that use different frameworks and require their own containers. You get many of the benefits of Multi-Model Endpoints and can deploy a variety of frameworks and models.
Serial Inference Pipelines – Use this option if you want to host models with pre-processing and post-processing logic behind an endpoint. Inference pipelines are fully managed by SageMaker AI and provide lower latency because all of the containers are hosted on the same Amazon EC2 instances.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Inference options

Next steps