Deploy your models to an endpoint - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Deploy your models to an endpoint

In Amazon SageMaker Canvas, you can deploy your models to an endpoint to make predictions. SageMaker provides the ML infrastructure for you to host your model on an endpoint with the compute instances that you choose. Then, you can invoke the endpoint (send a prediction request) and get a real-time prediction from your model. With this functionality, you can use your model in production to respond to incoming requests, and you can integrate your model with existing applications and workflows.

To get started, you should have a model that you'd like to deploy. You can deploy a custom model version that you've built, or you can deploy an Amazon SageMaker JumpStart foundation model. For more information about building a model in Canvas, see Build a custom model. For more information about JumpStart foundation models in Canvas, see Use generative AI with foundation models.

Review the following Permissions management section, and then begin creating new deployments in the Deploy a model section.

Permissions management

By default, you have permissions to deploy models to SageMaker Hosting endpoints. SageMaker grants these permissions for all new and existing Canvas user profiles through the AmazonSageMakerCanvasFullAccess policy, which is attached to the Amazon IAM execution role for the SageMaker domain that hosts your Canvas application.

If your Canvas administrator is setting up a new domain or user profile, when they're setting up the domain and following the prerequisite instructions in the Prerequisites for setting up Amazon SageMaker Canvas, SageMaker turns on the model deployment permissions through the Enable direct deployment of Canvas models option, which is enabled by default.

The Canvas administrator can manage model deployment permissions at the user profile level as well. For example, if the administrator doesn't want to grant model deployment permissions to all user profiles when setting up a domain, they can grant permissions to specific users after creating the domain.

The following procedure shows how to modify the model deployment permissions for a specific user profile:

  1. Open the SageMaker console at https://console.amazonaws.cn/sagemaker/.

  2. On the left navigation pane, choose Admin configurations.

  3. Under Admin configurations, choose domains.

  4. From the list of domains, select the user profile’s domain.

  5. On the domain details page, choose the User profile whose permissions you want to edit.

  6. On the User Details page, choose Edit.

  7. In the left navigation pane, choose Canvas settings.

  8. In the ML Ops permissions configuration section, turn on the Enable direct deployment of Canvas models toggle to enable deployment permissions.

  9. Choose Submit to save the changes to your domain settings.

The user profile should now have model deployment permissions.

Deploy a model

To get started with deploying your model, you create a new deployment in Canvas and specify the model version that you want to deploy along with the ML infrastructure, such as the type and number of compute instances that you would like to use for hosting the model.

Canvas suggests a default type and number of instances based on your model type, or you can learn more about the various SageMaker instance types on the Amazon SageMaker pricing page. You are charged based on the SageMaker instance pricing while your endpoint is active.

When deploying JumpStart foundation models, you also have the option to specify the length of the deployment time. You can deploy the model to an endpoint indefinitely (meaning the endpoint is active until you shut it down). Or, if you only need the endpoint for a short period of time and would like to reduce costs, you can deploy the model to an endpoint for a specified amount of time, after which SageMaker shuts down the endpoint for you.

Note

If you deploy a model for a specified amount of time, stay logged in to the Canvas application for the duration of the endpoint. If you log out of or delete the application, then Canvas is unable to shut down the endpoint at the specified time.

After your model is deployed to a SageMaker Hosting real-time inference endpoint, you can begin making predictions by invoking the endpoint.

There are several different ways for you to deploy a model from the Canvas application. You can access the model deployment option through any of the following methods:

  • On the My models page of the Canvas application, choose the model that you want to deploy. Then, from the model’s Versions page, choose the More options icon ( More options icon for the output CSV file. ) next to a model version and select Deploy.

  • When on the details page for a model version, on the Analyze tab, choose the Deploy option.

  • When on the details page for a model version, on the Predict tab, choose the More options icon ( More options icon for the output CSV file. ) at the top of the page and select Deploy.

  • On the ML Ops page of the Canvas application, choose the Deployments tab and then choose Create deployment.

  • For JumpStart foundation models, go to the Ready-to-use models page of the Canvas application. Choose Generate, extract and summarize content. Then, find the JumpStart foundation model that you want to deploy. Choose the model, and on the model's chat page, choose the Deploy button.

All of these methods open the Deploy model side panel, where you specify the deployment configuration for your model. To deploy the model from this panel, do the following:

  1. (Optional) If you’re creating a deployment from the ML Ops page, you’ll have the option to Select model and version. Use the dropdown menus to select the model and model version that you want to deploy.

  2. Enter a name in the Deployment name field.

  3. (For JumpStart foundation models only) Choose a Deployment length. Select Indefinite to leave the endpoint active until you shut it down, or select Specify length and then enter the period of time for which you want the endpoint to remain active.

  4. For Instance type, SageMaker detects a default instance type and number that is suitable for your model. However, you can change the instance type that you would like to use for hosting your model.

    Note

    If you run out of the instance quota for the chosen instance type on your Amazon account, you can request a quota increase. For more information about the default quotas and how to request an increase, see Amazon SageMaker endpoints and quotas in the Amazon General Reference guide.

  5. For Instance count, you can set the number of active instances that are used for your endpoint. SageMaker detects a default number that is suitable for your model, but you can change this number.

  6. When you’re ready to deploy your model, choose Deploy.

Your model should now be deployed to an endpoint. For information about how to view your deployment details or perform various actions, see the following sections.

View your deployments

You might want to check the status or details of a model deployment in Canvas. For example, if your deployment failed, you might want to check the details to troubleshoot.

You can view your Canvas model deployments from the Canvas application or from the Amazon SageMaker console.

To view deployment details from Canvas, choose one of the following procedures:

To view your deployment details from the ML Ops page, do the following:

  1. Open the SageMaker Canvas application.

  2. In the left navigation pane, choose ML Ops.

  3. Choose the Deployments tab.

  4. Choose your deployment by name from the list.

To view your deployment details from a model version’s page, do the following:

  1. In the SageMaker Canvas application, go to your model version’s details page.

  2. Choose the Deploy tab.

  3. On the Deployments section that lists all of the deployment configurations associated with that model version, find your deployment.

  4. Choose the More options icon ( More options icon for the output CSV file. ), and then select View details to open the details page.

The details page for your deployment opens, and you can view information such as the time of the most recent prediction, the endpoint’s status and configuration, and the model version that is currently deployed to the endpoint.

You can also view your currently active Canvas workspace instances and active endpoints from the SageMaker dashboard in the SageMaker console. Your Canvas endpoints are listed alongside any other SageMaker Hosting endpoints that you’ve created, and you can filter them by searching for endpoints with the Canvas tag.

The following screenshot shows the SageMaker dashboard. In the Canvas section, you can see that one workspace instance is in service and four endpoints are active.

Screenshot of the SageMaker dashboard showing the active Canvas workspace instances and endpoints.

Update a deployment configuration

You can also update your deployment configuration. For example, you can deploy an updated model version to the endpoint, or you can update the instance type or number of instances behind the endpoint based on your capacity needs.

There are several different ways for you to update your deployment from the Canvas application. You can use any of the following methods:

  • On the ML Ops page of the Canvas application, you can choose the Deployments tab and select the deployment that you want to update. Then, choose Update configuration.

  • When on the details page for a model version, on the Deploy tab, you can view the deployments for that version. Next to the deployment, choose the More options icon ( More options icon for the output CSV file. ) and then choose Update configuration.

Both of the preceding methods open the Update configuration side panel, where you can make changes to your deployment configuration. To update the configuration, do the following:

  1. For the Select version dropdown menu, you can select a different model version to deploy to the endpoint.

    Note

    When updating a deployment configuration, you can only choose a different model version to deploy. To deploy a different model, create a new deployment.

  2. For Instance type, you can select a different instance type for hosting your model.

  3. For Instance count, you can change the number of active instances that are used for your endpoint.

  4. Choose Save.

Your deployment configuration should now be updated.

Test your deployment

You can test your deployment by invoking the endpoint, or making single prediction requests, through the Canvas application. You can use this functionality to confirm that your endpoint responds to requests before invoking your endpoint programmatically in a production environment.

Test a custom model deployment

You can test a custom model deployment by accessing it through the ML Ops page and making a single invocation, which returns a prediction along with the probability that the prediction is correct.

Note

Execution length is an estimate of the time taken to invoke and get a response from the endpoint in Canvas. For detailed latency metrics, see SageMaker Endpoint Invocation Metrics.

To test your endpoint through the Canvas application, do the following:

  1. Open the SageMaker Canvas application.

  2. In the left navigation panel, choose ML Ops.

  3. Choose the Deployments tab.

  4. From the list of deployments, choose the one with the endpoint that you want to invoke.

  5. On the deployment’s details page, choose the Test deployment tab.

  6. On the deployment testing page, you can modify the Value fields to specify a new data point. For time series forecasting models, you specify the Item ID for which you want to make a forecast.

  7. After modifying the values, choose Update to get the prediction result.

The prediction loads, along with the Invocation result fields which indicate whether or not the invocation was successful and how long the request took to process.

The following screenshot shows a prediction performed in the Canvas application on the Test deployment tab.

The Canvas application showing a test prediction for a deployed model.

For all model types except numeric prediction and time series forecasting, the prediction returns the following fields:

  • predicted_label – the predicted output

  • probability – the probability that the predicted label is correct

  • labels – the list of all the possible labels

  • probabilities – the probabilities corresponding to each label (the order of this list matches the order of the labels)

For numeric prediction models, the prediction only contains the score field, which is the predicted output of the model, such as the predicted price of a house.

For time series forecasting models, the prediction is a graph showing the forecasts by quantile. You can choose Schema view to see the forecasted numeric values for each quantile.

You can continue making single predictions through the deployment testing page, or you can see the following section Invoke your endpoint to learn how to invoke your endpoint programmatically from applications.

Test a JumpStart foundation model deployment

You can chat with a deployed JumpStart foundation model through the Canvas application to test its functionality before invoking it through code.

To chat with a deployed JumpStart foundation model, do the following:

  1. Open the SageMaker Canvas application.

  2. In the left navigation panel, choose ML Ops.

  3. Choose the Deployments tab.

  4. From the list of deployments, find the one that you want to invoke and choose its More options icon ( More options icon for a model deployment. ).

  5. From the context menu, choose Test deployment.

  6. A new Generate, extract and summarize content chat opens with the JumpStart foundation model, and you can begin typing prompts. Note that prompts from this chat are sent as requests to your SageMaker Hosting endpoint.

Invoke your endpoint

After testing your deployment, you can use your endpoint in production with your applications by invoking the endpoint programmatically the same way that you can invoke any other SageMaker real-time endpoint. Invoking an endpoint programmatically returns a response object which contains the same fields as mentioned in the preceding section Test your deployment .

For more detailed information about how to programmatically invoke endpoints, see Invoke models for real-time inference.

The following Python examples show you how to invoke your endpoint based on the model type.

The following example shows you how to invoke a JumpStart foundation model that you've deployed to an endpoint.

import boto3 import pandas as pd client = boto3.client("runtime.sagemaker") body = pd.DataFrame( [['feature_column1', 'feature_column2'], ['feature_column1', 'feature_column2']] ).to_csv(header=False, index=False).encode("utf-8") response = client.invoke_endpoint( EndpointName="endpoint_name", ContentType="text/csv", Body=body, Accept="application/json" )

The following example shows you how to invoke numeric or categorical prediction models.

import boto3 import pandas as pd client = boto3.client("runtime.sagemaker") body = pd.DataFrame(['feature_column1', 'feature_column2'], ['feature_column1', 'feature_column2']).to_csv(header=False, index=False).encode("utf-8") response = client.invoke_endpoint( EndpointName="endpoint_name", ContentType="text/csv", Body=body, Accept="application/json" )

The following example shows you how to invoke time series forecasting models. For a complete example of how to test invoke a time series forecasting model, see Time-Series Forecasting with Amazon SageMaker Autopilot.

import boto3 import pandas as pd csv_path = './real-time-payload.csv' data = pd.read_csv(csv_path) client = boto3.client("runtime.sagemaker") body = data.to_csv(index=False).encode("utf-8") response = client.invoke_endpoint( EndpointName="endpoint_name", ContentType="text/csv", Body=body, Accept="application/json" )

The following example shows you how to invoke image prediction models.

import boto3 client = boto3.client("runtime.sagemaker") with open("example_image.jpg", "rb") as file: body = file.read() response = client.invoke_endpoint( EndpointName="endpoint_name", ContentType="application/x-image", Body=body, Accept="application/json" )

The following example shows you how to invoke text prediction models.

import boto3 import pandas as pd client = boto3.client("runtime.sagemaker") body = pd.DataFrame([["Example text 1"], ["Example text 2"]]).to_csv(header=False, index=False).encode("utf-8") response = client.invoke_endpoint( EndpointName="endpoint_name", ContentType="text/csv", Body=body, Accept="application/json" )

Delete a model deployment

You can delete your model deployment from the Canvas application. This action also deletes the endpoint from the SageMaker console and shuts down any endpoint-related resources.

Note

Optionally, you can delete your endpoint through the SageMaker console or using the SageMaker DeleteEndpoint API. For more information, see Delete Endpoints and Resources. However, when you delete the endpoint through the SageMaker console or APIs instead of the Canvas application, the list of deployments in Canvas isn’t automatically updated. You must also delete the deployment from the Canvas application to remove it from the list.

To delete a deployment in Canvas, do the following:

  1. Open the SageMaker Canvas application.

  2. In the left navigation panel, choose ML Ops.

  3. Choose the Deployments tab.

  4. From the list of deployments, choose the one that you want to delete.

  5. At the top of the deployment details page, choose the More options icon ( More options icon for the output CSV file. ).

  6. Choose Delete deployment.

  7. In the Delete deployment dialog box, choose Delete.

Your deployment and SageMaker Hosting endpoint should now be deleted from both Canvas and the SageMaker console.