Get an inference recommendation - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Get an inference recommendation

Inference recommendation jobs run a set of load tests on recommended instance types or a serverless endpoint. Inference recommendation jobs use performance metrics that are based on load tests using the sample data you provided during model version registration.

Note

Before you create an Inference Recommender recommendation job, make sure you have satisfied the Prerequisites.

The following demonstrates how to use Amazon SageMaker Inference Recommender to create an inference recommendation based on your model type using the Amazon SDK for Python (Boto3), Amazon CLI, and Amazon SageMaker Studio Classic, and the SageMaker console

Create an inference recommendation

Create an inference recommendation programmatically using the Amazon SDK for Python (Boto3) or the Amazon CLI, or interactively using Studio Classic or the SageMaker console. Specify a job name for your inference recommendation, an Amazon IAM role ARN, an input configuration, and either a model package ARN when you registered your model with the model registry, or your model name and a ContainerConfig dictionary from when you created your model in the Prerequisites section.

Amazon SDK for Python (Boto3)

Use the CreateInferenceRecommendationsJob API to start an inference recommendation job. Set the JobType field to 'Default' for inference recommendation jobs. In addition, provide the following:

  • The Amazon Resource Name (ARN) of an IAM role that enables Inference Recommender to perform tasks on your behalf. Define this for the RoleArn field.

  • A model package ARN or model name. Inference Recommender supports either one model package ARN or a model name as input. Specify one of the following:

    • The ARN of the versioned model package you created when you registered your model with SageMaker model registry. Define this for ModelPackageVersionArn in the InputConfig field.

    • The name of the model you created. Define this for ModelName in the InputConfig field. Also, provide the ContainerConfig dictionary, which includes the required fields that need to be provided with the model name. Define this for ContainerConfig in the InputConfig field. In the ContainerConfig, you can also optionally specify the SupportedEndpointType field as either RealTime or Serverless. If you specify this field, Inference Recommender returns recommendations for only that endpoint type. If you don't specify this field, Inference Recommender returns recommendations for both endpoint types.

  • A name for your Inference Recommender recommendation job for the JobName field. The Inference Recommender job name must be unique within the Amazon Region and within your Amazon account.

Import the Amazon SDK for Python (Boto3) package and create a SageMaker client object using the client class. If you followed the steps in the Prerequisites section, only specify one of the following:

  • Option 1: If you would like to create an inference recommendations job with a model package ARN, then store the model package group ARN in a variable named model_package_arn.

  • Option 2: If you would like to create an inference recommendations job with a model name and ContainerConfig, store the model name in a variable named model_name and the ContainerConfig dictionary in a variable named container_config.

# Create a low-level SageMaker service client. import boto3 aws_region = '<INSERT>' sagemaker_client = boto3.client('sagemaker', region_name=aws_region) # Provide only one of model package ARN or model name, not both. # Provide your model package ARN that was created when you registered your # model with Model Registry model_package_arn = '<INSERT>' ## Uncomment if you would like to create an inference recommendations job with a ## model name instead of a model package ARN, and comment out model_package_arn above ## Provide your model name # model_name = '<INSERT>' ## Provide your container config # container_config = '<INSERT>' # Provide a unique job name for SageMaker Inference Recommender job job_name = '<INSERT>' # Inference Recommender job type. Set to Default to get an initial recommendation job_type = 'Default' # Provide an IAM Role that gives SageMaker Inference Recommender permission to # access AWS services role_arn = 'arn:aws:iam::<account>:role/*' sagemaker_client.create_inference_recommendations_job( JobName = job_name, JobType = job_type, RoleArn = role_arn, # Provide only one of model package ARN or model name, not both. # If you would like to create an inference recommendations job with a model name, # uncomment ModelName and ContainerConfig, and comment out ModelPackageVersionArn. InputConfig = { 'ModelPackageVersionArn': model_package_arn # 'ModelName': model_name, # 'ContainerConfig': container_config } )

See the Amazon SageMaker API Reference Guide for a full list of optional and required arguments you can pass to CreateInferenceRecommendationsJob.

Amazon CLI

Use the create-inference-recommendations-job API to start an inference recommendation job. Set the job-type field to 'Default' for inference recommendation jobs. In addition, provide the following:

  • The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker Inference Recommender to perform tasks on your behalf. Define this for the role-arn field.

  • A model package ARN or model name. Inference Recommender supports either one model package ARN or a model name as input. Specify one of the following

    • The ARN of the versioned model package you created when you registered your model with Model Registry. Define this for ModelPackageVersionArn in the input-config field.

    • The name of the model you created. Define this for ModelName in the input-config field. Also, provide the ContainerConfig dictionary which includes the required fields that need to be provided with the model name. Define this for ContainerConfig in the input-config field. In the ContainerConfig, you can also optionally specify the SupportedEndpointType field as either RealTime or Serverless. If you specify this field, Inference Recommender returns recommendations for only that endpoint type. If you don't specify this field, Inference Recommender returns recommendations for both endpoint types.

  • A name for your Inference Recommender recommendation job for the job-name field. The Inference Recommender job name must be unique within the Amazon Region and within your Amazon account.

To create an inference recommendation jobs with a model package ARN, use the following example:

aws sagemaker create-inference-recommendations-job --region <region>\ --job-name <job_name>\ --job-type Default\ --role-arn arn:aws:iam::<account:role/*>\ --input-config "{ \"ModelPackageVersionArn\": \"arn:aws:sagemaker:<region:account:role/*>\", }"

To create an inference recommendation jobs with a model name and ContainerConfig, use the following example. The example uses the SupportedEndpointType field to specify that we only want to return real-time inference recommendations:

aws sagemaker create-inference-recommendations-job --region <region>\ --job-name <job_name>\ --job-type Default\ --role-arn arn:aws:iam::<account:role/*>\ --input-config "{ \"ModelName\": \"model-name\", \"ContainerConfig\" : { \"Domain\": \"COMPUTER_VISION\", \"Framework\": \"PYTORCH\", \"FrameworkVersion\": \"1.7.1\", \"NearestModelName\": \"resnet18\", \"PayloadConfig\": { \"SamplePayloadUrl\": \"s3://{bucket}/{payload_s3_key}\", \"SupportedContentTypes\": [\"image/jpeg\"] }, \"SupportedEndpointType\": \"RealTime\", \"DataInputConfig\": \"[[1,3,256,256]]\", \"Task\": \"IMAGE_CLASSIFICATION\", }, }"
Amazon SageMaker Studio Classic

Create an inference recommendation job in Studio Classic.

  1. In your Studio Classic application, choose the home icon ( Home icon in Studio Classic ).

  2. In the left sidebar of Studio Classic, choose Models.

  3. Choose Model Registry from the dropdown list to display models you have registered with the model registry.

    The left panel displays a list of model groups. The list includes all the model groups registered with the model registry in your account, including models registered outside of Studio Classic.

  4. Select the name of your model group. When you select your model group, the right pane of Studio Classic displays column heads such as Versions and Setting.

    If you have one or more model packages within your model group, you see a list of those model packages within the Versions column.

  5. Choose the Inference recommender column.

  6. Choose an IAM role that grants Inference Recommender permission to access Amazon services. You can create a role and attach the AmazonSageMakerFullAccess IAM managed policy to accomplish this. Or you can let Studio Classic create a role for you.

  7. Choose Get recommendations.

    The inference recommendation can take up to 45 minutes.

    Warning

    Do not close this tab. If you close this tab, you cancel the instance recommendation job.

SageMaker console

Create an instance recommendation job through the SageMaker console by doing the following:

  1. Go to the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. In the left navigation pane, choose Inference, and then choose Inference recommender.

  3. On the Inference recommender jobs page, choose Create job.

  4. For Step 1: Model configuration, do the following:

    1. For Job type, choose Default recommender job.

    2. If you’re using a model registered in the SageMaker model registry, then turn on the Choose a model from the model registry toggle and do the following:

      1. From the Model group dropdown list, choose the model group in SageMaker model registry where your model is located.

      2. From the Model version dropdown list, choose the desired version of your model.

    3. If you’re using a model that you’ve created in SageMaker, then turn off the Choose a model from the model registry toggle and do the following:

      1. For the Model name field, enter the name of your SageMaker model.

    4. From the IAM role dropdown list, you can select an existing Amazon IAM role that has the necessary permissions to create an instance recommendation job. Alternatively, if you don’t have an existing role, you can choose Create a new role to open the role creation pop-up, and SageMaker adds the necessary permissions to the new role that you create.

    5. For S3 bucket for benchmarking payload, enter the Amazon S3 path to your sample payload archive, which should contain sample payload files that Inference Recommender uses to benchmark your model on different instance types.

    6. For Payload content type, enter the MIME types of your sample payload data.

    7. (Optional) If you turned off the Choose a model from the model registry toggle and specified a SageMaker model, then for Container configuration, do the following:

      1. For the Domain dropdown list, select the machine learning domain of the model, such as computer vision, natural language processing, or machine learning.

      2. For the Framework dropdown list, select the framework of your container, such as TensorFlow or XGBoost.

      3. For Framework version, enter the framework version of your container image.

      4. For the Nearest model name dropdown list, select the pre-trained model that mostly closely matches your own.

      5. For the Task dropdown list, select the machine learning task that the model accomplishes, such as image classification or regression.

    8. (Optional) For Model compilation using SageMaker Neo, you can configure the recommendation job for a model that you’ve compiled using SageMaker Neo. For Data input configuration, enter the correct input data shape for your model in a format similar to {'input':[1,1024,1024,3]}.

    9. Choose Next.

  5. For Step 2: Instances and environment parameters, do the following:

    1. (Optional) For Select instances for benchmarking, you can select up to 8 instance types that you want to benchmark. If you don’t select any instances, Inference Recommender considers all instance types.

    2. Choose Next.

  6. For Step 3: Job parameters, do the following:

    1. (Optional) For the Job name field, enter a name for your instance recommendation job. When you create the job, SageMaker appends a timestamp to the end of this name.

    2. (Optional) For the Job description field, enter a description for the job.

    3. (Optional) For the Encryption key dropdown list, choose an Amazon KMS key by name or enter its ARN to encrypt your data.

    4. (Optional) For Max test duration (s), enter the maximum number of seconds you want each test to run for.

    5. (Optional) For Max invocations per minute, enter the maximum number of requests per minute the endpoint can reach before stopping the recommendation job. After reaching this limit, SageMaker ends the job.

    6. (Optional) For P99 Model latency threshold (ms), enter the model latency percentile in milliseconds.

    7. Choose Next.

  7. For Step 4: Review job, review your configurations and then choose Submit.

Get your inference recommendation job results

Collect the results of your inference recommendation job programmatically with Amazon SDK for Python (Boto3), the Amazon CLI, Studio Classic, or the SageMaker console.

Amazon SDK for Python (Boto3)

Once an inference recommendation is complete, you can use DescribeInferenceRecommendationsJob to get the job details and recommendations. Provide the job name that you used when you created the inference recommendation job.

job_name='<INSERT>' response = sagemaker_client.describe_inference_recommendations_job( JobName=job_name)

Print the response object. The previous code sample stored the response in a variable named response.

print(response['Status'])

This returns a JSON response similar to the following example. Note that this example shows the recommended instance types for real-time inference (for an example showing serverless inference recommendations, see the example after this one).

{ 'JobName': 'job-name', 'JobDescription': 'job-description', 'JobType': 'Default', 'JobArn': 'arn:aws:sagemaker:region:account-id:inference-recommendations-job/resource-id', 'Status': 'COMPLETED', 'CreationTime': datetime.datetime(2021, 10, 26, 20, 4, 57, 627000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2021, 10, 26, 20, 25, 1, 997000, tzinfo=tzlocal()), 'InputConfig': { 'ModelPackageVersionArn': 'arn:aws:sagemaker:region:account-id:model-package/resource-id', 'JobDurationInSeconds': 0 }, 'InferenceRecommendations': [{ 'Metrics': { 'CostPerHour': 0.20399999618530273, 'CostPerInference': 5.246913588052848e-06, 'MaximumInvocations': 648, 'ModelLatency': 263596 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.c5.xlarge', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 2.92620870823157e-06, 'MaximumInvocations': 655, 'ModelLatency': 826019 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.c5d.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 3.3625731248321244e-06, 'MaximumInvocations': 570, 'ModelLatency': 1085446 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.m5.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }], 'ResponseMetadata': { 'RequestId': 'request-id', 'HTTPStatusCode': 200, 'HTTPHeaders': { 'x-amzn-requestid': 'x-amzn-requestid', 'content-type': 'content-type', 'content-length': '1685', 'date': 'Tue, 26 Oct 2021 20:31:10 GMT' }, 'RetryAttempts': 0 } }

The first few lines provide information about the inference recommendation job itself. This includes the job name, role ARN, and creation and deletion times.

The InferenceRecommendations dictionary contains a list of Inference Recommender inference recommendations.

The EndpointConfiguration nested dictionary contains the instance type (InstanceType) recommendation along with the endpoint and variant name (a deployed Amazon machine learning model) that was used during the recommendation job. You can use the endpoint and variant name for monitoring in Amazon CloudWatch Events. See Monitor Amazon SageMaker with Amazon CloudWatch for more information.

The Metrics nested dictionary contains information about the estimated cost per hour (CostPerHour) for your real-time endpoint in US dollars, the estimated cost per inference (CostPerInference) in US dollars for your real-time endpoint, the expected maximum number of InvokeEndpoint requests per minute sent to the endpoint (MaxInvocations), and the model latency (ModelLatency), which is the interval of time (in microseconds) that your model took to respond to SageMaker. The model latency includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

The following example shows the InferenceRecommendations part of the response for an inference recommendations job configured to return serverless inference recommendations:

"InferenceRecommendations": [ { "EndpointConfiguration": { "EndpointName": "value", "InitialInstanceCount": value, "InstanceType": "value", "VariantName": "value", "ServerlessConfig": { "MaxConcurrency": value, "MemorySizeInMb": value } }, "InvocationEndTime": value, "InvocationStartTime": value, "Metrics": { "CostPerHour": value, "CostPerInference": value, "CpuUtilization": value, "MaxInvocations": value, "MemoryUtilization": value, "ModelLatency": value, "ModelSetupTime": value }, "ModelConfiguration": { "Compiled": "False", "EnvironmentParameters": [], "InferenceSpecificationName": "value" }, "RecommendationId": "value" } ]

You can interpret the recommendations for serverless inference similarly to the results for real-time inference, with the exception of the ServerlessConfig, which tells you the metrics returned for a serverless endpoint with the given MemorySizeInMB and when MaxConcurrency = 1. To increase the throughput possible on the endpoint, increase the value of MaxConcurrency linearly. For example, if the inference recommendation shows MaxInvocations as 1000, then increasing MaxConcurrency to 2 would support 2000 MaxInvocations. Note that this is true only up to a certain point, which can vary based on your model and code. Serverless recommendations also measure the metric ModelSetupTime, which measures (in microseconds) the time it takes to launch computer resources on a serverless endpoint. For more information about setting up serverless endpoints, see the Serverless Inference documentation.

Amazon CLI

Once an inference recommendation is complete, you can use describe-inference-recommendations-job to get the job details and recommended instance types. Provide the job name that you used when you created the inference recommendation job.

aws sagemaker describe-inference-recommendations-job\ --job-name <job-name>\ --region <aws-region>

The JSON response similar should resemble the following example. Note that this example shows the recommended instance types for real-time inference (for an example showing serverless inference recommendations, see the example after this one).

{ 'JobName': 'job-name', 'JobDescription': 'job-description', 'JobType': 'Default', 'JobArn': 'arn:aws:sagemaker:region:account-id:inference-recommendations-job/resource-id', 'Status': 'COMPLETED', 'CreationTime': datetime.datetime(2021, 10, 26, 20, 4, 57, 627000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2021, 10, 26, 20, 25, 1, 997000, tzinfo=tzlocal()), 'InputConfig': { 'ModelPackageVersionArn': 'arn:aws:sagemaker:region:account-id:model-package/resource-id', 'JobDurationInSeconds': 0 }, 'InferenceRecommendations': [{ 'Metrics': { 'CostPerHour': 0.20399999618530273, 'CostPerInference': 5.246913588052848e-06, 'MaximumInvocations': 648, 'ModelLatency': 263596 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.c5.xlarge', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 2.92620870823157e-06, 'MaximumInvocations': 655, 'ModelLatency': 826019 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.c5d.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 3.3625731248321244e-06, 'MaximumInvocations': 570, 'ModelLatency': 1085446 }, 'EndpointConfiguration': { 'EndpointName': 'endpoint-name', 'VariantName': 'variant-name', 'InstanceType': 'ml.m5.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }], 'ResponseMetadata': { 'RequestId': 'request-id', 'HTTPStatusCode': 200, 'HTTPHeaders': { 'x-amzn-requestid': 'x-amzn-requestid', 'content-type': 'content-type', 'content-length': '1685', 'date': 'Tue, 26 Oct 2021 20:31:10 GMT' }, 'RetryAttempts': 0 } }

The first few lines provide information about the inference recommendation job itself. This includes the job name, role ARN, creation, and deletion time.

The InferenceRecommendations dictionary contains a list of Inference Recommender inference recommendations.

The EndpointConfiguration nested dictionary contains the instance type (InstanceType) recommendation along with the endpoint and variant name (a deployed Amazon machine learning model) used during the recommendation job. You can use the endpoint and variant name for monitoring in Amazon CloudWatch Events. See Monitor Amazon SageMaker with Amazon CloudWatch for more information.

The Metrics nested dictionary contains information about the estimated cost per hour (CostPerHour) for your real-time endpoint in US dollars, the estimated cost per inference (CostPerInference) in US dollars for your real-time endpoint, the expected maximum number of InvokeEndpoint requests per minute sent to the endpoint (MaxInvocations), and the model latency (ModelLatency), which is the interval of time (in milliseconds) that your model took to respond to SageMaker. The model latency includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

The following example shows the InferenceRecommendations part of the response for an inference recommendations job configured to return serverless inference recommendations:

"InferenceRecommendations": [ { "EndpointConfiguration": { "EndpointName": "value", "InitialInstanceCount": value, "InstanceType": "value", "VariantName": "value", "ServerlessConfig": { "MaxConcurrency": value, "MemorySizeInMb": value } }, "InvocationEndTime": value, "InvocationStartTime": value, "Metrics": { "CostPerHour": value, "CostPerInference": value, "CpuUtilization": value, "MaxInvocations": value, "MemoryUtilization": value, "ModelLatency": value, "ModelSetupTime": value }, "ModelConfiguration": { "Compiled": "False", "EnvironmentParameters": [], "InferenceSpecificationName": "value" }, "RecommendationId": "value" } ]

You can interpret the recommendations for serverless inference similarly to the results for real-time inference, with the exception of the ServerlessConfig, which tells you the metrics returned for a serverless endpoint with the given MemorySizeInMB and when MaxConcurrency = 1. To increase the throughput possible on the endpoint, increase the value of MaxConcurrency linearly. For example, if the inference recommendation shows MaxInvocations as 1000, then increasing MaxConcurrency to 2 would support 2000 MaxInvocations. Note that this is true only up to a certain point, which can vary based on your model and code. Serverless recommendations also measure the metric ModelSetupTime, which measures (in microseconds) the time it takes to launch computer resources on a serverless endpoint. For more information about setting up serverless endpoints, see the Serverless Inference documentation.

Amazon SageMaker Studio Classic

The inference recommendations populate in a new Inference recommendations tab within Studio Classic. It can take up to 45 minutes for the results to show up. This tab contains Results and Details column headings.

The Details column provides information about the inference recommendation job, such as the name of the inference recommendation, when the job was created (Creation time), and more. It also provides Settings information, such as the maximum number of invocations that occurred per minute and information about the Amazon Resource Names used.

The Results column provides a Deployment goals and SageMaker recommendations window in which you can adjust the order that the results are displayed based on deployment importance. There are three dropdown menus that you can use to provide the level of importance of the Cost, Latency, and Throughput for your use case. For each goal (cost, latency, and throughput), you can set the level of importance: Lowest Importance, Low Importance, Moderate importance, High importance, or Highest importance.

Based on your selections of importance for each goal, Inference Recommender displays its top recommendation in the SageMaker recommendation field on the right of the panel, along with the estimated cost per hour and inference request. It also provides information about the expected model latency, maximum number of invocations, and the number of instances. For serverless recommendations, you can see the ideal values for the maximum concurrency and endpoint memory size.

In addition to the top recommendation displayed, you can also see the same information displayed for all instances that Inference Recommender tested in the All runs section.

SageMaker console

You can view your instance recommendation jobs in the SageMaker console by doing the following:

  1. Go to the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. In the left navigation pane, choose Inference, and then choose Inference recommender.

  3. On the Inference recommender jobs page, choose the name of your inference recommendation job.

On the details page for your job, you can view the Inference recommendations, which are the instance types SageMaker recommends for your model, as shown in the following screenshot.

Screenshot of the inference recommendations list on the job details page in the SageMaker console.

In this section, you can compare the instance types by various factors such as Model latency, Cost per hour, Cost per inference, and Invocations per minute.

On this page, you can also view the configurations you specified for your job. In the Monitor section, you can view the Amazon CloudWatch metrics that were logged for each instance type. To learn more about interpreting these metrics, see Interpret results.

For more information about interpreting the results of your recommendation job, see Interpret recommendation results.

Stop your inference recommendation

You might want to stop a job that is currently running if you began a job by mistake or no longer need to run the job. Stop your Inference Recommender inference recommendation jobs programmatically with the StopInferenceRecommendationsJob API or with Studio Classic.

Amazon SDK for Python (Boto3)

Specify the name of the inference recommendation job for the JobName field:

sagemaker_client.stop_inference_recommendations_job( JobName='<INSERT>' )
Amazon CLI

Specify the job name of the inference recommendation job for the job-name flag:

aws sagemaker stop-inference-recommendations-job --job-name <job-name>
Amazon SageMaker Studio Classic

Close the tab in which you initiated the inference recommendation to stop your Inference Recommender inference recommendation.

SageMaker console

To stop your instance recommendation job through the SageMaker console, do the following:

  1. Go to the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. In the left navigation pane, choose Inference, and then choose Inference recommender.

  3. On the Inference recommender jobs page, select your instance recommendation job.

  4. Choose Stop job.

  5. In the dialog box that pops up, choose Confirm.

After stopping your job, the job’s Status should change to Stopping.