Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions,
see Getting Started with Amazon Web Services in China
(PDF).
AsyncInferenceClientConfig
Configures the behavior of the client used by SageMaker to interact with the model
container during asynchronous inference.
Contents
-
MaxConcurrentInvocationsPerInstance
-
The maximum number of concurrent requests sent by the SageMaker client to the model
container. If no value is provided, SageMaker chooses an optimal value.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 1000.
Required: No
See Also
For more information about using this API in one of the language-specific Amazon SDKs, see the following: