Manage your endpoints - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Manage your endpoints

After deploying your model to an endpoint, you might want to view and manage the endpoint. With SageMaker, you can view the status and details of your endpoint, check metrics and logs to monitor your endpoint’s performance, update the models deployed to your endpoint, and more.

The following page describes how to interactively view and make changes to your endpoints using the Amazon SageMaker console or SageMaker Studio.

Manage endpoints in SageMaker Studio

In Amazon SageMaker Studio, you can view and manage your SageMaker Hosting endpoints. To learn more about Studio, see Amazon SageMaker Studio.

To find the list of your endpoints in SageMaker Studio do the following:

  1. Open the Studio application.

  2. In the left navigation pane, choose Deployments.

  3. From the dropdown menu, choose Endpoints.

The Endpoints page opens, which lists all of your SageMaker Hosting endpoints. From this page, you can see the endpoints and their Status. You can also create a new endpoint, edit an existing endpoint, or delete an endpoint.

To see the details for a specific endpoint, choose an endpoint from the list. On the endpoint’s details page, you get an overview like the following screenshot.

Screenshot of an endpoint's main page showing a summary of the endpoint details in Studio.

Each endpoint details page contains the following tabs of information:

Variants (or Models)

The Variants tab (also called the Models tab if your endpoint has multiple models deployed) shows you the list of model variants or models currently deployed to your endpoint. The following screenshot shows you what the overview and Models section looks like for an endpoint with multiple models deployed.

Screenshot of an endpoint's main page showing multiple models deployed.

You can add or edit the settings for each variant or model. You can also select a variant and enable a default auto-scaling policy, which you can edit later in the Auto-scaling tab.

Settings

On the Settings tab, you can view the endpoint’s associated Amazon IAM role, the Amazon KMS key used for encryption (if applicable), the name of your VPC, and the network isolation settings.

Test inference

On the Test inference tab, you can send a test inference request to a deployed model. This is useful if you’d like to verify that your endpoint responds to requests as expected.

To test inference, do the following:

  1. On the model's Test inference tab, choose one of the following options:

    1. Select Enter the request body if you’d like to test the endpoint and receive a response through the Studio interface.

    2. Select Copy example code (Python) if you’d like to copy an Amazon SDK for Python (Boto3) example that you can use to invoke your endpoint from a local environment and receive a response programmatically.

  2. For Model, select the model that you want to test on the endpoint.

  3. If you chose the Studio interface testing method, then you can also choose your desired Content type for the response from the dropdown.

After configuring your request, then you can either choose Send request (to receive a response through the Studio interface) or Copy to copy the Python example.

If you receive a response through the Studio interface, it’ll look like the following screenshot.

Screenshot of a successful inference test request on an endpoint in Studio.

Auto-scaling

On the Auto-scaling tab, you can view any auto-scaling policies configured for the models hosted on your endpoint. The following screenshot shows you the Auto-scaling tab.

Screenshot of the Auto-scaling tab, showing one active policy.

You can choose Edit auto-scaling to change any of the policies and turn on or turn off the default auto-scaling policy.

To learn more about auto-scaling for real-time endpoints, see Automatically Scale Amazon SageMaker Models. If you’re not sure how to configure an auto-scaling policy for your endpoint, you can use an Inference Recommender autoscaling recommendations job to get recommendations for an auto-scaling policy.

Manage endpoints in the SageMaker console

To view your endpoints in the SageMaker console, do the following:

  1. Go to the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. In the left navigation pane, choose Inference.

  3. From the dropdown list, choose Endpoints.

  4. On the Endpoints page, choose your endpoint.

The endpoint details page should open, showing you a summary of your endpoint and metrics that have been collected for your endpoint.

The following sections describe the tabs on the endpoints details page.

Monitoring

After creating a SageMaker Hosting endpoint, you can monitor your endpoint using Amazon CloudWatch, which collects raw data and processes it into readable, near real-time metrics. Using these metrics, you can access historical information and gain a better perspective on how your endpoint is performing. For more information, see the Amazon CloudWatch User Guide.

From the Monitoring tab on the endpoint details page, you can view CloudWatch metrics data that has been collected from your endpoint.

The Monitoring tab includes the following sections:

  • Operational metrics: View metrics that track the utilization of your endpoint’s resources, such as CPU Utilization and Memory Utilization.

  • Invocation metrics: View metrics that track the number, health, and status of InvokeEndpoint requests coming to your endpoint, such as Invocation Model Errors and Model Latency.

  • Health metrics: View metrics that track your endpoint’s overall health, such as Invocation Failures and Notification Failures.

For detailed descriptions of each metric, see Monitor SageMaker with CloudWatch.

The following screenshot shows the Operational metrics section for a serverless endpoint.

Screenshot of metrics graphs in the operational metrics section of the endpoint details page.

You can adjust the Period and Statistic that you want to track for the metrics in a given section, as well as the length of time for which you want to view metrics data. You can also add and remove metric widgets from the view for each section by choosing Add widget. In the Add widget dialog box, you can select and deselect the metrics that you want to see.

The metrics that are available may depend on your endpoint type. For example, serverless endpoints have some metrics that aren’t available for real-time endpoints. For more specific metrics information by endpoint type, see the following pages:

Settings

You can choose the Settings tab to view additional information about your endpoint, such as the data capture settings, the endpoint configuration, and tags.

Alarms

From the Alarms tab on your endpoint details page, you can view and create simple static threshold metric alarms, where you specify a threshold value for a metric. If the metric breaches the threshold value, the alarm goes into the ALARM state. For more information about CloudWatch alarms, see Using Amazon CloudWatch alarms.

In the Endpoint summary section, you can view the Alarms field, which tells you how many alarms are currently active on your endpoint.

To view which alarms are in the ALARM state, choose the Alarms tab. The Alarms tab shows you a full list of your endpoint alarms, along with details about their status and conditions. The following screenshot shows a list of alarms in this section that have been configured for an endpoint.

Screenshot of the alarms tab on the endpoint details page which shows a list of CloudWatch alarms.

An alarm’s status can be In alarm, OK, or Insufficient data if there isn’t enough metrics data being collected.

To create a new alarm for your endpoint, do the following:

  1. In the Alarms tab, choose Create alarm.

  2. The Create alarm page opens. For Alarm name, enter a name for the alarm.

  3. (Optional) Enter a description for the alarm.

  4. For Metric, choose the CloudWatch metric that you want the alarm to track.

  5. For Variant name, choose the endpoint model variant that you want to monitor.

  6. For Statistic, choose one of the available statistics for the metric you selected.

  7. For Period, choose the time period to use for calculating each statistical value. For example, if you choose the Average statistic and a 5 minute period, each data point monitored by the alarm is the average of the metric’s data points at 5 minute intervals.

  8. For Evaluation periods, enter the number of data points that you want the alarm to consider when evaluating whether to enter the alarm state or not.

  9. For Condition, choose the conditional that you want to use for your alarm threshold.

  10. For Threshold value, enter the desired value for your threshold.

  11. (Optional) For Notification, you can choose Add notification to create or specify an Amazon SNS topic that receives a notification when your alarm state changes.

  12. Choose Create alarm.

After creating your alarm, you can return to the Alarms tab to view its status at any time. From this section, you can also select the alarm and either Edit or Delete it.