SageMaker Autopilot - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

SageMaker Autopilot

Important

As of November 30, 2023, Autopilot's UI is migrating to Amazon SageMaker Canvas as part of the updated Amazon SageMaker Studio experience. SageMaker Canvas provides data scientists with no-code capabilities for tasks such as data preparation, feature engineering, algorithm selection, training and tuning, inference, continuous model monitoring, and more. SageMaker Canvas supports a variety of use cases, including computer vision, demand forecasting, intelligent search, and generative AI.

Users of Amazon SageMaker Studio Classic, the previous experience of Studio, can continue using the Autopilot UI in Studio Classic. Users with coding experience can continue using all API references in any supported SDK for technical implementation.

If you have been using Autopilot in Studio Classic until now and want to migrate to SageMaker Canvas, you might have to grant additional permissions to your user profile or IAM role so that you can create and use the SageMaker Canvas application. For more information, see Migrate from Autopilot in Studio Classic to SageMaker Canvas.

All UI-related instructions in this guide pertain to Autopilot's standalone features before migrating to Amazon SageMaker Canvas. Users following these instructions should use Studio Classic.

Amazon SageMaker Autopilot is a feature set that simplifies and accelerates various stages of the machine learning workflow by automating the process of building and deploying machine learning models (AutoML).

Autopilot performs the following key tasks that you can use on autopilot or with various degrees of human guidance:

  • Data analysis and preprocessing: Autopilot identifies your specific problem type, handles missing values, normalizes your data, selects features, and overall prepares the data for model training.

  • Model selection: Autopilot explores a variety of algorithms and uses a cross-validation resampling technique to generate metrics that evaluate the predictive quality of the algorithms based on predefined objective metrics.

  • Hyperparameter optimization: Autopilot automates the search for optimal hyperparameter configurations.

  • Model training and evaluation: Autopilot automates the process of training and evaluating various model candidates. It splits the data into training and validation sets, trains the selected model candidates using the training data, and evaluates their performance on the unseen data of the validation set. Lastly, it ranks the optimized model candidates based on their performance and identifies the best performing model.

  • Model deployment: Once Autopilot has identified the best performing model, it provides the option to deploy the model automatically by generating the model artifacts and the endpoint exposing an API. External applications can send data to the endpoint and receive the corresponding predictions or inferences.

Autopilot supports building machine learning models on large datasets up to hundreds of GBs.

The following diagram outlines the tasks of this AutoML process managed by Autopilot.


      Overview of Amazon SageMaker Autopilot AutoML process.

Depending on your comfort level with the machine learning process and coding experience, you can use Autopilot in different ways:

  • Using the Studio Classic UI, users can choose between a no-code experience or have some level of human input.

    Note

    Only experiments created from tabular data for problem types such as regression or classification are available via the Studio Classic UI.

  • Using the AutoML API, users with coding experience can use available SDKs to create AutoML jobs. This approach provides greater flexibility and customization options and is available for all problem types.

Autopilot currently supports the following problem types:

Note

For regression or classification problems involving tabular data, users can choose between two options: using the Studio Classic user interface or the API Reference.

Tasks such as text and image classification, time-series forecasting, and fine-tuning of large language models are exclusively available through the version 2 of the AutoML REST API. If your language of choice is Python, you can refer to Amazon SDK for Python (Boto3) or the AutoMLV2 object of the Amazon SageMaker Python SDK directly.

Users who prefer the convenience of a user interface can use Amazon SageMaker Canvas to access pre-trained models and generative AI foundation models, or create custom models tailored for specific text, image classification, forecasting needs, or generative AI.

Additionally, Autopilot helps users understand how models make predictions by automatically generating reports that show the importance of each individual feature. This provides transparency and insights into the factors influencing the predictions, which can be used by risk and compliance teams and external regulators. Autopilot also provides a model performance report, which encompasses a summary of evaluation metrics, a confusion matrix, various visualizations such as receiver operating characteristic curves and precision-recall curves, and more. The specific content of each report vary depending on the problem type of the Autopilot experiment.

The explainability and performance reports for the best model candidate in an Autopilot experiment are available for text, image, and tabular data classification problem types.

For tabular data use cases such as regression or classification, Autopilot offers additional visibility into how the data was wrangled and how the model candidates were selected, trained, and tuned by generating notebooks that contain the code used to explore the data and find the best performing model. These notebooks provide an interactive and exploratory environment to help you learn about the impact of various inputs or the trade-offs made in the experiments. You can experiment further with the higher performing model candidate by making your own modifications to the data exploration and candidate definition notebooks provided by Autopilot.

With Amazon SageMaker, you pay only for what you use. You pay for the underlying compute and storage resources within SageMaker or other Amazon services, based on your usage. For more information about the cost of using SageMaker, see Amazon SageMaker Pricing.