ScalingPolicyMetric - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

ScalingPolicyMetric

The metric for a scaling policy.

Contents

InvocationsPerInstance

The number of invocations sent to a model, normalized by InstanceCount in each ProductionVariant. 1/numberOfInstances is sent as the value on each request, where numberOfInstances is the number of active instances for the ProductionVariant behind the endpoint at the time of the request.

Type: Integer

Required: No

ModelLatency

The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

Type: Integer

Required: No

See Also

For more information about using this API in one of the language-specific Amazon SDKs, see the following: