How to troubleshoot Common errors Check CloudWatch Check benchmarks

Troubleshoot Inference Recommender errors

This section contains information about how to understand and prevent common errors, the error messages they generate, and guidance on how to resolve these errors.

How to troubleshoot

You can attempt to resolve your error by going through the following steps:

Check if you've covered all the prerequisites to use Inference Recommender. See the Inference Recommender Prerequisites.
Check that you are able to deploy your model from Model Registry to an endpoint and that it can process your payloads without errors. See Deploy a Model from the Registry.
When you kick off an Inference Recommender job, you should see endpoints being created in the console and you can review the CloudWatch logs.

Common errors

Review the following table for common Inference Recommender errors and their solutions.

Error	Solution
Specify `Domain` in the Model Package version 1. `Domain` is a mandatory parameter for the job.	Make sure you provide the ML domain or `OTHER` if unknown.
Provided role ARN cannot be assumed and an `AWSSecurityTokenServiceException` error occurred.	Make sure the execution role provided has the necessary permissions specified in the prerequisites.
Specify `Framework` in the Model Package version 1.`Framework` is a mandatory parameter for the job.	Make sure you provide the ML Framework or `OTHER` if unknown.
Users at the end of prev phase is 0 while initial users of current phase is 1.	Users here refers to virtual users or threads used to send requests. Each phase starts with A users and ends with B users such that B > A. Between sequential phases, x_1 and x_2, we require that abs(x_2.A - x_1.B) <= 3 and >= 0.
Total Traffic duration (across) should not be more than Job duration.	The total duration of all your Phases cannot exceed the Job duration.
Burstable instance type ml.t2.medium is not allowed.	Inference Recommender doesn't support load testing on t2 instance family because burstable instances do not provide consistent performance.
ResourceLimitExceeded when calling CreateEndpoint operation	You have exceeded a SageMaker AI resource limit. For example, Inference Recommender might be unable to provision endpoints for benchmarking if the account has reached the endpoint quota. For more information about SageMaker AI limits and quotas, see Amazon SageMaker AI endpoints and quotas.
ModelError when calling InvokeEndpoint operation	A model error can happen for the following reasons: The invocation timed out while waiting for a response from the model container. The model couldn't process the input payload.
PayloadError when calling InvokeEndpoint operation	A payload error can happen for following reasons: The payload source isn't in the Amazon S3 bucket. The payload is in a non-file object format. The payload is in an invalid file type. For example, a model expects an image type payload but is passed a text file. The payload is empty.

Check CloudWatch

When you kick off an Inference Recommender job, you should see endpoints being created in the console. Select one of the endpoints and view the CloudWatch logs to monitor for any 4xx/5xx errors. If you have a successful Inference Recommender job, you will be able to see the endpoint names as part of the results. Even if your Inference Recommender job is unsuccessful, you can still check the CloudWatch logs for the deleted endpoints by following the steps below:

Open the Amazon CloudWatch console at https://console.amazonaws.cn/cloudwatch/.
Select the Region in which you created the Inference Recommender job from the Region dropdown list in the top right.
In the navigation pane of CloudWatch, choose Logs, and then select Log groups.
Search for the log group called /aws/sagemaker/Endpoints/sm-epc-*. Select the log group based on your most recent Inference Recommender job.

You can also troubleshoot your job by checking the Inference Recommender CloudWatch logs. The Inference Recommender logs, which are published in the /aws/sagemaker/InferenceRecommendationsJobs CloudWatch log group, give a high level view on the progress of the job in the <jobName>/execution log stream. You can find detailed information on each of the endpoint configurations being tested in the <jobName>/Endpoint/<endpointName> log stream.

Overview of the Inference Recommender log streams

<jobName>/execution contains overall job information such as endpoint configurations scheduled for benchmarking, compilation job skip reason, and validation failure reason.
<jobName>/Endpoint/<endpointName> contains information such as resource creation progress, test configuration, load test stop reason, and resource cleanup status.
<jobName>/CompilationJob/<compilationJobName> contains information on compilation jobs created by Inference Recommender, such as the compilation job configuration and compilation job status.

Create an alarm for Inference Recommender error messages

Inference Recommender outputs log statements for errors that might be helpful while troubleshooting. With a CloudWatch log group and a metric filter, you can look for terms and patterns in this log data as the data is sent to CloudWatch. Then, you can create a CloudWatch alarm based on the log group-metric filter. For more information, see Create a CloudWatch alarm based on a log group-metric filter.

Check benchmarks

When you kick off an Inference Recommender job, Inference Recommender creates several benchmarks to evaluate the performance of your model on different instance types. You can use the ListInferenceRecommendationsJobSteps API to view the details for all the benchmarks. If you have a failed benchmark, you can see the failure reasons as part of the results.

To use the ListInferenceRecommendationsJobSteps API, provide the following values:

For JobName, provide the name of the Inference Recommender job.
For StepType, use BENCHMARK to return details about the job's benchmarks.
For Status, use FAILED to return details about only the failed benchmarks. For a list of the other status types, see the Status field in the ListInferenceRecommendationsJobSteps API.


# Create a low-level SageMaker service client.
import boto3
aws_region = '<region>'
sagemaker_client = boto3.client('sagemaker', region_name=aws_region) 

# Provide the job name for the SageMaker Inference Recommender job
job_name = '<job-name>'

# Filter for benchmarks
step_type = 'BENCHMARK' 

# Filter for benchmarks that have a FAILED status
status = 'FAILED'

response = sagemaker_client.list_inference_recommendations_job_steps(
    JobName = job_name,
    StepType = step_type,
    Status = status
)

You can print the response object to view the results. The preceding code example stored the response in a variable called response:


print(response)

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Stop your load test

Real-time inference