GetScalingConfigurationRecommendation - Amazon SageMaker

GetScalingConfigurationRecommendation

Starts an Amazon SageMaker Inference Recommender autoscaling recommendation job. Returns recommendations for autoscaling policies that you can apply to your SageMaker endpoint.

Request Syntax

{ "EndpointName": "string", "InferenceRecommendationsJobName": "string", "RecommendationId": "string", "ScalingPolicyObjective": { "MaxInvocationsPerMinute": number, "MinInvocationsPerMinute": number }, "TargetCpuUtilizationPerCore": number }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

EndpointName

The name of an endpoint benchmarked during a previously completed inference recommendation job. This name should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.

Specify either this field or the RecommendationId field.

Type: String

Length Constraints: Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

Required: No

InferenceRecommendationsJobName

The name of a previously completed Inference Recommender job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 64.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}

Required: Yes

RecommendationId

The recommendation ID of a previously completed inference recommendation. This ID should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.

Specify either this field or the EndpointName field.

Type: String

Required: No

ScalingPolicyObjective

An object where you specify the anticipated traffic pattern for an endpoint.

Type: ScalingPolicyObjective object

Required: No

TargetCpuUtilizationPerCore

The percentage of how much utilization you want an instance to use before autoscaling. The default value is 50%.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 100.

Required: No

Response Syntax

{ "DynamicScalingConfiguration": { "MaxCapacity": number, "MinCapacity": number, "ScaleInCooldown": number, "ScaleOutCooldown": number, "ScalingPolicies": [ { ... } ] }, "EndpointName": "string", "InferenceRecommendationsJobName": "string", "Metric": { "InvocationsPerInstance": number, "ModelLatency": number }, "RecommendationId": "string", "ScalingPolicyObjective": { "MaxInvocationsPerMinute": number, "MinInvocationsPerMinute": number }, "TargetCpuUtilizationPerCore": number }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

DynamicScalingConfiguration

An object with the recommended values for you to specify when creating an autoscaling policy.

Type: DynamicScalingConfiguration object

EndpointName

The name of an endpoint benchmarked during a previously completed Inference Recommender job.

Type: String

Length Constraints: Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

InferenceRecommendationsJobName

The name of a previously completed Inference Recommender job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 64.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}

Metric

An object with a list of metrics that were benchmarked during the previously completed Inference Recommender job.

Type: ScalingPolicyMetric object

RecommendationId

The recommendation ID of a previously completed inference recommendation.

Type: String

ScalingPolicyObjective

An object representing the anticipated traffic pattern for an endpoint that you specified in the request.

Type: ScalingPolicyObjective object

TargetCpuUtilizationPerCore

The percentage of how much utilization you want an instance to use before autoscaling, which you specified in the request. The default value is 50%.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 100.

Errors

For information about the errors that are common to all actions, see Common Errors.

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: