ScalingPolicyMetric
The metric for a scaling policy.
Contents
- InvocationsPerInstance
-
The number of invocations sent to a model, normalized by
InstanceCount
in each ProductionVariant.1/numberOfInstances
is sent as the value on each request, wherenumberOfInstances
is the number of active instances for the ProductionVariant behind the endpoint at the time of the request.Type: Integer
Required: No
- ModelLatency
-
The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.
Type: Integer
Required: No
See Also
For more information about using this API in one of the language-specific Amazon SDKs, see the following: