

# Amazon SageMaker notebook instances
<a name="nbi"></a>

An Amazon SageMaker notebook instance is a machine learning (ML) compute instance running the Jupyter Notebook application. One of the best ways for machine learning (ML) practitioners to use Amazon SageMaker AI is to train and deploy ML models using SageMaker notebook instances. The SageMaker notebook instances help create the environment by initiating Jupyter servers on Amazon Elastic Compute Cloud (Amazon EC2) and providing preconfigured kernels with the following packages: the Amazon SageMaker Python SDK, Amazon SDK for Python (Boto3), Amazon Command Line Interface (Amazon CLI), Conda, Pandas, deep learning framework libraries, and other libraries for data science and machine learning.

Use Jupyter notebooks in your notebook instance to:
+ prepare and process data
+ write code to train models
+ deploy models to SageMaker hosting
+ test or validate your models

For information about pricing with Amazon SageMaker notebook instance, see [Amazon SageMaker Pricing](https://www.amazonaws.cn/sagemaker/pricing/).

## Maintenance
<a name="nbi-maintenance"></a>

SageMaker AI updates the underlying software for Amazon SageMaker Notebook Instances at least once every 90 days. Some maintenance updates, such as operating system upgrades, may require your application to be taken offline for a short period of time. It is not possible to perform any operations during this period while the underlying software is being updated. We recommend that you restart your notebooks at least once every 30 days to automatically consume patches.

If the notebook instance isn't updated and is running unsecure software, SageMaker AI might periodically update the instance as part of regular maintenance. During these updates, data outside of the folder `/home/ec2-user/SageMaker` is not persisted.

For more information, contact [Amazon Web Services Support](http://www.amazonaws.cn/support-plans/).

## Machine Learning with the SageMaker Python SDK
<a name="gs-ml-with-sagemaker-pysdk"></a>

To train, validate, deploy, and evaluate an ML model in a SageMaker notebook instance, use the SageMaker Python SDK. The SageMaker Python SDK abstracts Amazon SDK for Python (Boto3) and SageMaker API operations. It enables you to integrate with and orchestrate other Amazon services, such as Amazon Simple Storage Service (Amazon S3) for saving data and model artifacts, Amazon Elastic Container Registry (ECR) for importing and servicing the ML models, Amazon Elastic Compute Cloud (Amazon EC2) for training and inference.

You can also take advantage of SageMaker AI features that help you deal with every stage of a complete ML cycle: data labeling, data preprocessing, model training, model deployment, evaluation on prediction performance, and monitoring the quality of model in production.

If you're a first-time SageMaker AI user, we recommend you to use the SageMaker Python SDK, following the end-to-end ML tutorial. To find the open source documentation, see the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

**Topics**
+ [Maintenance](#nbi-maintenance)
+ [Machine Learning with the SageMaker Python SDK](#gs-ml-with-sagemaker-pysdk)
+ [Tutorial for building models with Notebook Instances](gs-console.md)
+ [AL2023 notebook instances](nbi-al2023.md)
+ [Amazon Linux 2 notebook instances](nbi-al2.md)
+ [JupyterLab versioning](nbi-jl.md)
+ [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md)
+ [Access Notebook Instances](howitworks-access-ws.md)
+ [Update a Notebook Instance](nbi-update.md)
+ [Customization of a SageMaker notebook instance using an LCC script](notebook-lifecycle-config.md)
+ [Set the Notebook Kernel](howitworks-set-kernel.md)
+ [Git repositories with SageMaker AI Notebook Instances](nbi-git-repo.md)
+ [Notebook Instance Metadata](nbi-metadata.md)
+ [Monitor Jupyter Logs in Amazon CloudWatch Logs](jupyter-logs.md)

# Tutorial for building models with Notebook Instances
<a name="gs-console"></a>

This Get Started tutorial walks you through how to create a SageMaker notebook instance, open a Jupyter notebook with a preconfigured kernel with the Conda environment for machine learning, and start a SageMaker AI session to run an end-to-end ML cycle. You'll learn how to save a dataset to a default Amazon S3 bucket automatically paired with the SageMaker AI session, submit a training job of an ML model to Amazon EC2, and deploy the trained model for prediction by hosting or batch inferencing through Amazon EC2. 

This tutorial explicitly shows a complete ML flow of training the XGBoost model from the SageMaker AI built-in model pool. You use the [US Adult Census dataset](https://archive.ics.uci.edu/ml/datasets/adult), and you evaluate the performance of the trained SageMaker AI XGBoost model on predicting individuals' income.
+ [SageMaker AI XGBoost](https://docs.amazonaws.cn/sagemaker/latest/dg/xgboost.html) – The [XGBoost](https://xgboost.readthedocs.io/en/latest/) model is adapted to the SageMaker AI environment and preconfigured as Docker containers. SageMaker AI provides a suite of [built-in algorithms](https://docs.amazonaws.cn/sagemaker/latest/dg/algos.html) that are prepared for using SageMaker AI features. To learn more about what ML algorithms are adapted to SageMaker AI, see [Choose an Algorithm](https://docs.amazonaws.cn/sagemaker/latest/dg/algorithms-choose.html) and [Use Amazon SageMaker Built-in Algorithms](https://docs.amazonaws.cn/sagemaker/latest/dg/algos.html). For the SageMaker AI built-in algorithm API operations, see [First-Party Algorithms](https://sagemaker.readthedocs.io/en/stable/algorithms/index.html) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).
+ [Adult Census dataset](https://archive.ics.uci.edu/ml/datasets/adult) – The dataset from the [1994 Census bureau database](http://www.census.gov/en.html) by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). The SageMaker AI XGBoost model is trained using this dataset to predict if an individual makes over \$150,000 a year or less.

**Topics**
+ [Create an Amazon SageMaker Notebook Instance for the tutorial](gs-setup-working-env.md)
+ [Create a Jupyter notebook in the SageMaker notebook instance](ex1-prepare.md)
+ [Prepare a dataset](ex1-preprocess-data.md)
+ [Train a Model](ex1-train-model.md)
+ [Deploy the model to Amazon EC2](ex1-model-deployment.md)
+ [Evaluate the model](ex1-test-model.md)
+ [Clean up Amazon SageMaker notebook instance resources](ex1-cleanup.md)

# Create an Amazon SageMaker Notebook Instance for the tutorial
<a name="gs-setup-working-env"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

An Amazon SageMaker notebook instance is a fully-managed machine learning (ML) Amazon Elastic Compute Cloud (Amazon EC2) compute instance. An Amazon SageMaker notebook instance runs the Jupyter Notebook application. Use the notebook instance to create and manage Jupyter notebooks for preprocessing data, train ML models, and deploy ML models.

**To create a SageMaker notebook instance**  
![\[Animated screenshot that shows how to create a SageMaker notebook instance.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-create-instance.gif)

1. Open the Amazon SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/).

1. Choose **Notebook instances**, and then choose **Create notebook instance**.

1. On the **Create notebook instance** page, provide the following information (if a field is not mentioned, leave the default values):

   1. For **Notebook instance name**, type a name for your notebook instance.

   1. For **Notebook Instance type**, choose `ml.t2.medium`. This is the least expensive instance type that notebook instances support, and is enough for this exercise. If a `ml.t2.medium` instance type isn't available in your current Amazon Region, choose `ml.t3.medium`.

   1. For **Platform Identifier**, choose a platform type to create the notebook instance on. This platform type defines the Operating System and the JupyterLab version that your notebook instance is created with. The latest and recommended version is `notebook-al2023-v1`, for an Amazon Linux 2023 notebook instance. For information about platform identifier types, see [AL2023 notebook instances](nbi-al2023.md) and [Amazon Linux 2 notebook instances](nbi-al2.md). For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).

   1. For **IAM role**, choose **Create a new role**, and then choose **Create role**. This IAM role automatically gets permissions to access any S3 bucket that has `sagemaker` in the name. It gets these permissions through the `AmazonSageMakerFullAccess` policy, which SageMaker AI attaches to the role. 
**Note**  
If you want to grant the IAM role permission to access S3 buckets without `sagemaker` in the name, you need to attach the `S3FullAccess` policy. You can also limit the permissions to specific S3 buckets to the IAM role. For more information and examples of adding bucket policies to the IAM role, see [Bucket Policy Examples](https://docs.amazonaws.cn/AmazonS3/latest/userguide/example-bucket-policies.html).

   1. Choose **Create notebook instance**. 

      In a few minutes, SageMaker AI launches a notebook instance and attaches a 5 GB of Amazon EBS storage volume to it. The notebook instance has a preconfigured Jupyter notebook server, SageMaker AI and Amazon SDK libraries, and a set of Anaconda libraries.

      For more information about creating a SageMaker notebook instance, see [Create a Notebook Instance](https://docs.amazonaws.cn/sagemaker/latest/dg/howitworks-create-ws.html). 

## (Optional) Change SageMaker Notebook Instance Settings
<a name="gs-change-ni-settings"></a>

To change the ML compute instance type or the size of the Amazon EBS storage of a SageMaker AI notebook instance, edit the notebook instance settings.

**To change and update the SageMaker Notebook instance type and the EBS volume**

1. On the **Notebook instances** page in the SageMaker AI console, choose your notebook instance.

1. Choose **Actions**, choose **Stop**, and then wait until the notebook instance fully stops.

1. After the notebook instance status changes to **Stopped**, choose **Actions**, and then choose **Update settings**.

   1. For **Notebook instance type**, choose a different ML instance type.

   1. For **Volume size in GB**, type a different integer to specify a new EBS volume size.
**Note**  
EBS storage volumes are encrypted, so SageMaker AI can't determine the amount of available free space on the volume. Because of this, you can increase the volume size when you update a notebook instance, but you can't decrease the volume size. If you want to decrease the size of the ML storage volume in use, create a new notebook instance with the desired size. 

1. At the bottom of the page, choose **Update notebook instance**. 

1. When the update is complete, **Start** the notebook instance with the new settings.

For more information about updating SageMaker notebook instance settings, see [Update a Notebook Instance](https://docs.amazonaws.cn/sagemaker/latest/dg/nbi-update.html). 

## (Optional) Advanced Settings for SageMaker Notebook Instances
<a name="gs-ni-advanced-settings"></a>

The following tutorial video shows how to set up and use SageMaker notebook instances through the SageMaker AI console. It includes advanced options, such as SageMaker AI lifecycle configuration and importing GitHub repositories. (Length: 26:04)

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/X5CLunIzj3U/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/X5CLunIzj3U)


For complete documentation about SageMaker notebook instance, see [Use Amazon SageMaker notebook Instances](https://docs.amazonaws.cn/sagemaker/latest/dg/nbi.html).

# Create a Jupyter notebook in the SageMaker notebook instance
<a name="ex1-prepare"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To start scripting for training and deploying your model, create a Jupyter notebook in the SageMaker notebook instance. Using the Jupyter notebook, you can run machine learning (ML) experiments for training and inference while using SageMaker AI features and the Amazon infrastructure.

**To create a Jupyter notebook**

1. Open the notebook instance as follows:

   1. Sign in to the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/).

   1. On the **Notebook instances** page, open your notebook instance by choosing either:
      + **Open JupyterLab** for the JupyterLab interface
      + **Open Jupyter** for the classic Jupyter view
**Note**  
If the notebook instance status shows **Pending** in the **Status** column, your notebook instance is still being created. The status will change to **InService** when the notebook instance is ready to use. 

1. Create a notebook as follows: 
   + If you opened the notebook in the JupyterLab view, on the **File** menu, choose **New**, and then choose **Notebook**. For **Select Kernel**, choose **conda\$1python3**. This preinstalled environment includes the default Anaconda installation and Python 3.
   + If you opened the notebook in the classic Jupyter view, on the **Files** tab, choose **New**, and then choose **conda\$1python3**. This preinstalled environment includes the default Anaconda installation and Python 3.

1. Save the notebooks as follows:
   + In the JupyterLab view, choose **File**, choose **Save Notebook As...**, and then rename the notebook.
   + In the Jupyter classic view, choose **File**, choose **Save as...**, and then rename the notebook.

# Prepare a dataset
<a name="ex1-preprocess-data"></a>

In this step, you load the [Adult Census dataset](https://archive.ics.uci.edu/ml/datasets/adult) to your notebook instance using the SHAP (SHapley Additive exPlanations) Library, review the dataset, transform it, and upload it to Amazon S3. SHAP is a game theoretic approach to explain the output of any machine learning model. For more information about SHAP, see [Welcome to the SHAP documentation](https://shap.readthedocs.io/en/latest/).

To run the following example, paste the sample code into a cell in your notebook instance.

## Load Adult Census Dataset Using SHAP
<a name="ex1-preprocess-data-pull-data"></a>

Using the SHAP library, import the Adult Census dataset as shown following:

```
import shap
X, y = shap.datasets.adult()
X_display, y_display = shap.datasets.adult(display=True)
feature_names = list(X.columns)
feature_names
```

**Note**  
If the current Jupyter kernel does not have the SHAP library, install it by running the following `conda` command:  

```
%conda install -c conda-forge shap
```
If you're using JupyterLab, you must manually refresh the kernel after the installation and updates have completed. Run the following IPython script to shut down the kernel (the kernel will restart automatically):  

```
import IPython
IPython.Application.instance().kernel.do_shutdown(True)
```

The `feature_names` list object should return the following list of features: 

```
['Age',
 'Workclass',
 'Education-Num',
 'Marital Status',
 'Occupation',
 'Relationship',
 'Race',
 'Sex',
 'Capital Gain',
 'Capital Loss',
 'Hours per week',
 'Country']
```

**Tip**  
If you're starting with unlabeled data, you can use Amazon SageMaker Ground Truth to create a data labeling workflow in minutes. To learn more, see [Label Data](https://docs.amazonaws.cn/sagemaker/latest/dg/data-label.html). 

## Overview the Dataset
<a name="ex1-preprocess-data-inspect"></a>

Run the following script to display the statistical overview of the dataset and histograms of the numeric features.

```
display(X.describe())
hist = X.hist(bins=30, sharey=True, figsize=(20, 10))
```

![\[Overview of the Adult Census dataset.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-prepare-data-1.png)


**Tip**  
If you want to use a dataset that needs to be cleaned and transformed, you can simplify and streamline data preprocessing and feature engineering using Amazon SageMaker Data Wrangler. To learn more, see [Prepare ML Data with Amazon SageMaker Data Wrangler](https://docs.amazonaws.cn/sagemaker/latest/dg/data-wrangler.html).

## Split the Dataset into Train, Validation, and Test Datasets
<a name="ex1-preprocess-data-transform"></a>

Using Sklearn, split the dataset into a training set and a test set. The training set is used to train the model, while the test set is used to evaluate the performance of the final trained model. The dataset is randomly sorted with the fixed random seed: 80 percent of the dataset for training set and 20 percent of it for a test set.

```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train_display = X_display.loc[X_train.index]
```

Split the training set to separate out a validation set. The validation set is used to evaluate the performance of the trained model while tuning the model's hyperparameters. 75 percent of the training set becomes the final training set, and the rest is the validation set.

```
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)
X_train_display = X_display.loc[X_train.index]
X_val_display = X_display.loc[X_val.index]
```

Using the pandas package, explicitly align each dataset by concatenating the numeric features with the true labels.

```
import pandas as pd
train = pd.concat([pd.Series(y_train, index=X_train.index,
                             name='Income>50K', dtype=int), X_train], axis=1)
validation = pd.concat([pd.Series(y_val, index=X_val.index,
                            name='Income>50K', dtype=int), X_val], axis=1)
test = pd.concat([pd.Series(y_test, index=X_test.index,
                            name='Income>50K', dtype=int), X_test], axis=1)
```

Check if the dataset is split and structured as expected:

```
train
```

![\[The example training dataset.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-prepare-data-2-train.png)


```
validation
```

![\[The example validation dataset.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-prepare-data-2-validation.png)


```
test
```

![\[The example test dataset.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-prepare-data-2-test.png)


## Convert the Train and Validation Datasets to CSV Files
<a name="ex1-preprocess-data-transform-2"></a>

Convert the `train` and `validation` dataframe objects to CSV files to match the input file format for the XGBoost algorithm.

```
# Use 'csv' format to store the data
# The first column is expected to be the output column
train.to_csv('train.csv', index=False, header=False)
validation.to_csv('validation.csv', index=False, header=False)
```

## Upload the Datasets to Amazon S3
<a name="ex1-preprocess-data-transform-4"></a>

Using the SageMaker AI and Boto3, upload the training and validation datasets to the default Amazon S3 bucket. The datasets in the S3 bucket will be used by a compute-optimized SageMaker instance on Amazon EC2 for training. 

The following code sets up the default S3 bucket URI for your current SageMaker AI session, creates a new `demo-sagemaker-xgboost-adult-income-prediction` folder, and uploads the training and validation datasets to the `data` subfolder.

```
import sagemaker, boto3, os
bucket = sagemaker.Session().default_bucket()
prefix = "demo-sagemaker-xgboost-adult-income-prediction"

boto3.Session().resource('s3').Bucket(bucket).Object(
    os.path.join(prefix, 'data/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(
    os.path.join(prefix, 'data/validation.csv')).upload_file('validation.csv')
```

Run the following Amazon CLI to check if the CSV files are successfully uploaded to the S3 bucket.

```
! aws s3 ls {bucket}/{prefix}/data --recursive
```

This should return the following output:

![\[Output of the CLI command to check the datasets in the S3 bucket.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-prepare-data-3.png)


# Train a Model
<a name="ex1-train-model"></a>

In this step, you choose a training algorithm and run a training job for the model. The [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) provides framework estimators and generic estimators to train your model while orchestrating the machine learning (ML) lifecycle accessing the SageMaker AI features for training and the Amazon infrastructures, such as Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3). For more information about SageMaker AI built-in framework estimators, see [Frameworks](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html)in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) documentation. For more information about built-in algorithms, see [Built-in algorithms and pretrained models in Amazon SageMaker](algos.md).

**Topics**
+ [Choose the Training Algorithm](#ex1-train-model-select-algorithm)
+ [Create and Run a Training Job](#ex1-train-model-sdk)

## Choose the Training Algorithm
<a name="ex1-train-model-select-algorithm"></a>

To choose the right algorithm for your dataset, you typically need to evaluate different models to find the most suitable models to your data. For simplicity, the SageMaker AI [XGBoost algorithm with Amazon SageMaker AI](xgboost.md) built-in algorithm is used throughout this tutorial without the pre-evaluation of models.

**Tip**  
If you want SageMaker AI to find an appropriate model for your tabular dataset, use Amazon SageMaker Autopilot that automates a machine learning solution. For more information, see [SageMaker Autopilot](autopilot-automate-model-development.md).

## Create and Run a Training Job
<a name="ex1-train-model-sdk"></a>

After you figured out which model to use, start constructing a SageMaker AI estimator for training. This tutorial uses the XGBoost built-in algorithm for the SageMaker AI generic estimator.

**To run a model training job**

1. Import the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) and start by retrieving the basic information from your current SageMaker AI session.

   ```
   import sagemaker
   
   region = sagemaker.Session().boto_region_name
   print("Amazon Region: {}".format(region))
   
   role = sagemaker.get_execution_role()
   print("RoleArn: {}".format(role))
   ```

   This returns the following information:
   + `region` – The current Amazon Region where the SageMaker AI notebook instance is running.
   + `role` – The IAM role used by the notebook instance.
**Note**  
Check the SageMaker Python SDK version by running `sagemaker.__version__`. This tutorial is based on `sagemaker>=2.20`. If the SDK is outdated, install the latest version by running the following command:   

   ```
   ! pip install -qU sagemaker
   ```
If you run this installation in your exiting SageMaker Studio or notebook instances, you need to manually refresh the kernel to finish applying the version update.

1. Create an XGBoost estimator using the `sagemaker.estimator.Estimator` class. In the following example code, the XGBoost estimator is named `xgb_model`.

   ```
   from sagemaker.debugger import Rule, ProfilerRule, rule_configs
   from sagemaker.session import TrainingInput
   
   s3_output_location='s3://{}/{}/{}'.format(bucket, prefix, 'xgboost_model')
   
   container=sagemaker.image_uris.retrieve("xgboost", region, "1.2-1")
   print(container)
   
   xgb_model=sagemaker.estimator.Estimator(
       image_uri=container,
       role=role,
       instance_count=1,
       instance_type='ml.m4.xlarge',
       volume_size=5,
       output_path=s3_output_location,
       sagemaker_session=sagemaker.Session(),
       rules=[
           Rule.sagemaker(rule_configs.create_xgboost_report()),
           ProfilerRule.sagemaker(rule_configs.ProfilerReport())
       ]
   )
   ```

   To construct the SageMaker AI estimator, specify the following parameters:
   + `image_uri` – Specify the training container image URI. In this example, the SageMaker AI XGBoost training container URI is specified using `sagemaker.image_uris.retrieve`.
   + `role` – The Amazon Identity and Access Management (IAM) role that SageMaker AI uses to perform tasks on your behalf (for example, reading training results, call model artifacts from Amazon S3, and writing training results to Amazon S3). 
   + `instance_count` and `instance_type` – The type and number of Amazon EC2 ML compute instances to use for model training. For this training exercise, you use a single `ml.m4.xlarge` instance, which has 4 CPUs, 16 GB of memory, an Amazon Elastic Block Store (Amazon EBS) storage, and a high network performance. For more information about EC2 compute instance types, see [Amazon EC2 Instance Types](https://www.amazonaws.cn/ec2/instance-types/). For more information about billing, see [Amazon SageMaker pricing](https://www.amazonaws.cn/sagemaker/pricing/). 
   + `volume_size` – The size, in GB, of the EBS storage volume to attach to the training instance. This must be large enough to store training data if you use `File` mode (`File` mode is on by default). If you don't specify this parameter, its value defaults to 30.
   + `output_path` – The path to the S3 bucket where SageMaker AI stores the model artifact and training results.
   + `sagemaker_session` – The session object that manages interactions with SageMaker API operations and other Amazon service that the training job uses.
   + `rules` – Specify a list of SageMaker Debugger built-in rules. In this example, the `create_xgboost_report()` rule creates an XGBoost report that provides insights into the training progress and results, and the `ProfilerReport()` rule creates a report regarding the EC2 compute resource utilization. For more information, see [SageMaker Debugger interactive report for XGBoost](debugger-report-xgboost.md).
**Tip**  
If you want to run distributed training of large sized deep learning models, such as convolutional neural networks (CNN) and natural language processing (NLP) models, use SageMaker AI Distributed for data parallelism or model parallelism. For more information, see [Distributed training in Amazon SageMaker AI](distributed-training.md).

1. Set the hyperparameters for the XGBoost algorithm by calling the `set_hyperparameters` method of the estimator. For a complete list of XGBoost hyperparameters, see [XGBoost hyperparameters](xgboost_hyperparameters.md).

   ```
   xgb_model.set_hyperparameters(
       max_depth = 5,
       eta = 0.2,
       gamma = 4,
       min_child_weight = 6,
       subsample = 0.7,
       objective = "binary:logistic",
       num_round = 1000
   )
   ```
**Tip**  
You can also tune the hyperparameters using the SageMaker AI hyperparameter optimization feature. For more information, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md). 

1. Use the `TrainingInput` class to configure a data input flow for training. The following example code shows how to configure `TrainingInput` objects to use the training and validation datasets you uploaded to Amazon S3 in the [Split the Dataset into Train, Validation, and Test Datasets](ex1-preprocess-data.md#ex1-preprocess-data-transform) section.

   ```
   from sagemaker.session import TrainingInput
   
   train_input = TrainingInput(
       "s3://{}/{}/{}".format(bucket, prefix, "data/train.csv"), content_type="csv"
   )
   validation_input = TrainingInput(
       "s3://{}/{}/{}".format(bucket, prefix, "data/validation.csv"), content_type="csv"
   )
   ```

1. To start model training, call the estimator's `fit` method with the training and validation datasets. By setting `wait=True`, the `fit` method displays progress logs and waits until training is complete.

   ```
   xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True)
   ```

   For more information about model training, see [Train a Model with Amazon SageMaker](how-it-works-training.md). This tutorial training job might take up to 10 minutes.

   After the training job has done, you can download an XGBoost training report and a profiling report generated by SageMaker Debugger. The XGBoost training report offers you insights into the training progress and results, such as the loss function with respect to iteration, feature importance, confusion matrix, accuracy curves, and other statistical results of training. For example, you can find the following loss curve from the XGBoost training report which clearly indicates that there is an overfitting problem.  
![\[The chart in the XGBoost training report.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-train-loss-curve-validation-overfitting.png)

   Run the following code to specify the S3 bucket URI where the Debugger training reports are generated and check if the reports exist.

   ```
   rule_output_path = xgb_model.output_path + "/" + xgb_model.latest_training_job.job_name + "/rule-output"
   ! aws s3 ls {rule_output_path} --recursive
   ```

   Download the Debugger XGBoost training and profiling reports to the current workspace:

   ```
   ! aws s3 cp {rule_output_path} ./ --recursive
   ```

   Run the following IPython script to get the file link of the XGBoost training report:

   ```
   from IPython.display import FileLink, FileLinks
   display("Click link below to view the XGBoost Training report", FileLink("CreateXgboostReport/xgboost_report.html"))
   ```

   The following IPython script returns the file link of the Debugger profiling report that shows summaries and details of the EC2 instance resource utilization, system bottleneck detection results, and python operation profiling results:

   ```
   profiler_report_name = [rule["RuleConfigurationName"] 
                           for rule in xgb_model.latest_training_job.rule_job_summary() 
                           if "Profiler" in rule["RuleConfigurationName"]][0]
   profiler_report_name
   display("Click link below to view the profiler report", FileLink(profiler_report_name+"/profiler-output/profiler-report.html"))
   ```
**Tip**  
If the HTML reports do not render plots in the JupyterLab view, you must choose **Trust HTML** at the top of the reports.  
To identify training issues, such as overfitting, vanishing gradients, and other problems that prevents your model from converging, use SageMaker Debugger and take automated actions while prototyping and training your ML models. For more information, see [Amazon SageMaker Debugger](train-debugger.md). To find a complete analysis of model parameters, see the [Explainability with Amazon SageMaker Debugger](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_census_explanations/xgboost-census-debugger-rules.html#Explainability-with-Amazon-SageMaker-Debugger) example notebook. 

You now have a trained XGBoost model. SageMaker AI stores the model artifact in your S3 bucket. To find the location of the model artifact, run the following code to print the model\$1data attribute of the `xgb_model` estimator:

```
xgb_model.model_data
```

**Tip**  
To measure biases that can occur during each stage of the ML lifecycle (data collection, model training and tuning, and monitoring of ML models deployed for prediction), use SageMaker Clarify. For more information, see [Model Explainability](clarify-model-explainability.md). For an end-to-end example, see the [Fairness and Explainability with SageMaker Clarify](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-clarify/fairness_and_explainability/fairness_and_explainability.html) example notebook.

# Deploy the model to Amazon EC2
<a name="ex1-model-deployment"></a>

To get predictions, deploy your model to Amazon EC2 using Amazon SageMaker AI.

**Topics**
+ [Deploy the Model to SageMaker AI Hosting Services](#ex1-deploy-model)
+ [(Optional) Use SageMaker AI Predictor to Reuse the Hosted Endpoint](#ex1-deploy-model-sdk-use-endpoint)
+ [(Optional) Make Prediction with Batch Transform](#ex1-batch-transform)

## Deploy the Model to SageMaker AI Hosting Services
<a name="ex1-deploy-model"></a>

To host a model through Amazon EC2 using Amazon SageMaker AI, deploy the model that you trained in [Create and Run a Training Job](ex1-train-model.md#ex1-train-model-sdk) by calling the `deploy` method of the `xgb_model` estimator. When you call the `deploy` method, you must specify the number and type of EC2 ML instances that you want to use for hosting an endpoint.

```
import sagemaker
from sagemaker.serializers import CSVSerializer
xgb_predictor=xgb_model.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    serializer=CSVSerializer()
)
```
+ `initial_instance_count` (int) – The number of instances to deploy the model.
+ `instance_type` (str) – The type of instances that you want to operate your deployed model.
+ `serializer` (int) – Serialize input data of various formats (a NumPy array, list, file, or buffer) to a CSV-formatted string. We use this because the XGBoost algorithm accepts input files in CSV format.

The `deploy` method creates a deployable model, configures the SageMaker AI hosting services endpoint, and launches the endpoint to host the model. For more information, see the [SageMaker AI generic Estimator's deploy class method](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). To retrieve the name of endpoint that's generated by the `deploy` method, run the following code:

```
xgb_predictor.endpoint_name
```

This should return the endpoint name of the `xgb_predictor`. The format of the endpoint name is `"sagemaker-xgboost-YYYY-MM-DD-HH-MM-SS-SSS"`. This endpoint stays active in the ML instance, and you can make instantaneous predictions at any time unless you shut it down later. Copy this endpoint name and save it to reuse and make real-time predictions elsewhere in SageMaker Studio or SageMaker AI notebook instances.

**Tip**  
To learn more about compiling and optimizing your model for deployment to Amazon EC2 instances or edge devices, see [Compile and Deploy Models with Neo](https://docs.amazonaws.cn/sagemaker/latest/dg/neo.html).

## (Optional) Use SageMaker AI Predictor to Reuse the Hosted Endpoint
<a name="ex1-deploy-model-sdk-use-endpoint"></a>

After you deploy the model to an endpoint, you can set up a new SageMaker AI predictor by pairing the endpoint and continuously make real-time predictions in any other notebooks. The following example code shows how to use the SageMaker AI Predictor class to set up a new predictor object using the same endpoint. Re-use the endpoint name that you used for the `xgb_predictor`.

```
import sagemaker
xgb_predictor_reuse=sagemaker.predictor.Predictor(
    endpoint_name="sagemaker-xgboost-YYYY-MM-DD-HH-MM-SS-SSS",
    sagemaker_session=sagemaker.Session(),
    serializer=sagemaker.serializers.CSVSerializer()
)
```

The `xgb_predictor_reuse` Predictor behaves exactly the same as the original `xgb_predictor`. For more information, see the [SageMaker AI Predictor](https://sagemaker.readthedocs.io/en/stable/predictors.html#sagemaker.predictor.RealTimePredictor) class in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

## (Optional) Make Prediction with Batch Transform
<a name="ex1-batch-transform"></a>

Instead of hosting an endpoint in production, you can run a one-time batch inference job to make predictions on a test dataset using the SageMaker AI batch transform. After your model training has completed, you can extend the estimator to a `transformer` object, which is based on the [SageMaker AI Transformer](https://sagemaker.readthedocs.io/en/stable/api/inference/transformer.html) class. The batch transformer reads in input data from a specified S3 bucket and makes predictions.

**To run a batch transform job**

1. Run the following code to convert the feature columns of the test dataset to a CSV file and uploads to the S3 bucket:

   ```
   X_test.to_csv('test.csv', index=False, header=False)
   
   boto3.Session().resource('s3').Bucket(bucket).Object(
   os.path.join(prefix, 'test/test.csv')).upload_file('test.csv')
   ```

1. Specify S3 bucket URIs of input and output for the batch transform job as shown following:

   ```
   # The location of the test dataset
   batch_input = 's3://{}/{}/test'.format(bucket, prefix)
   
   # The location to store the results of the batch transform job
   batch_output = 's3://{}/{}/batch-prediction'.format(bucket, prefix)
   ```

1. Create a transformer object specifying the minimal number of parameters: the `instance_count` and `instance_type` parameters to run the batch transform job, and the `output_path` to save prediction data as shown following: 

   ```
   transformer = xgb_model.transformer(
       instance_count=1, 
       instance_type='ml.m4.xlarge', 
       output_path=batch_output
   )
   ```

1. Initiate the batch transform job by executing the `transform()` method of the `transformer` object as shown following:

   ```
   transformer.transform(
       data=batch_input, 
       data_type='S3Prefix',
       content_type='text/csv', 
       split_type='Line'
   )
   transformer.wait()
   ```

1. When the batch transform job is complete, SageMaker AI creates the `test.csv.out` prediction data saved in the `batch_output` path, which should be in the following format: `s3://sagemaker-<region>-111122223333/demo-sagemaker-xgboost-adult-income-prediction/batch-prediction`. Run the following Amazon CLI to download the output data of the batch transform job:

   ```
   ! aws s3 cp {batch_output} ./ --recursive
   ```

   This should create the `test.csv.out` file under the current working directory. You'll be able to see the float values that are predicted based on the logistic regression of the XGBoost training job.

# Evaluate the model
<a name="ex1-test-model"></a>

Now that you have trained and deployed a model using Amazon SageMaker AI, evaluate the model to ensure that it generates accurate predictions on new data. For model evaluation, use the test dataset that you created in [Prepare a dataset](ex1-preprocess-data.md).

## Evaluate the Model Deployed to SageMaker AI Hosting Services
<a name="ex1-test-model-endpoint"></a>

To evaluate the model and use it in production, invoke the endpoint with the test dataset and check whether the inferences you get returns a target accuracy you want to achieve.

**To evaluate the model**

1. Set up the following function to predict each line of the test set. In the following example code, the `rows` argument is to specify the number of lines to predict at a time. You can change the value of it to perform a batch inference that fully utilizes the instance's hardware resource.

   ```
   import numpy as np
   def predict(data, rows=1000):
       split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
       predictions = ''
       for array in split_array:
           predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])
       return np.fromstring(predictions[1:], sep=',')
   ```

1. Run the following code to make predictions of the test dataset and plot a histogram. You need to take only the feature columns of the test dataset, excluding the 0th column for the actual values.

   ```
   import matplotlib.pyplot as plt
   
   predictions=predict(test.to_numpy()[:,1:])
   plt.hist(predictions)
   plt.show()
   ```  
![\[A histogram of predicted values.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-eval-predicted-values-histogram.png)

1. The predicted values are float type. To determine `True` or `False` based on the float values, you need to set a cutoff value. As shown in the following example code, use the Scikit-learn library to return the output confusion metrics and classification report with a cutoff of 0.5.

   ```
   import sklearn
   
   cutoff=0.5
   print(sklearn.metrics.confusion_matrix(test.iloc[:, 0], np.where(predictions > cutoff, 1, 0)))
   print(sklearn.metrics.classification_report(test.iloc[:, 0], np.where(predictions > cutoff, 1, 0)))
   ```

   This should return the following confusion matrix:  
![\[An example of confusion matrix and statistics after getting the inference of the deployed model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-evaluate-confusion-matrix.png)

1. To find the best cutoff with the given test set, compute the log loss function of the logistic regression. The log loss function is defined as the negative log-likelihood of a logistic model that returns prediction probabilities for its ground truth labels. The following example code numerically and iteratively calculates the log loss values (`-(y*log(p)+(1-y)log(1-p)`), where `y` is the true label and `p` is a probability estimate of the corresponding test sample. It returns a log loss versus cutoff graph.

   ```
   import matplotlib.pyplot as plt
   
   cutoffs = np.arange(0.01, 1, 0.01)
   log_loss = []
   for c in cutoffs:
       log_loss.append(
           sklearn.metrics.log_loss(test.iloc[:, 0], np.where(predictions > c, 1, 0))
       )
   
   plt.figure(figsize=(15,10))
   plt.plot(cutoffs, log_loss)
   plt.xlabel("Cutoff")
   plt.ylabel("Log loss")
   plt.show()
   ```

   This should return the following log loss curve.  
![\[Example following log loss curve.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/get-started-ni/gs-ni-evaluate-logloss-vs-cutoff.png)

1. Find the minimum points of the error curve using the NumPy `argmin` and `min` functions:

   ```
   print(
       'Log loss is minimized at a cutoff of ', cutoffs[np.argmin(log_loss)], 
       ', and the log loss value at the minimum is ', np.min(log_loss)
   )
   ```

   This should return: `Log loss is minimized at a cutoff of 0.53, and the log loss value at the minimum is 4.348539186773897`.

   Instead of computing and minimizing the log loss function, you can estimate a cost function as an alternative. For example, if you want to train a model to perform a binary classification for a business problem such as a customer churn prediction problem, you can set weights to the elements of confusion matrix and calculate the cost function accordingly.

You have now trained, deployed, and evaluated your first model in SageMaker AI.

**Tip**  
To monitor model quality, data quality, and bias drift, use Amazon SageMaker Model Monitor and SageMaker AI Clarify. To learn more, see [Amazon SageMaker Model Monitor](https://docs.amazonaws.cn/sagemaker/latest/dg/model-monitor.html), [Monitor Data Quality](https://docs.amazonaws.cn/sagemaker/latest/dg/model-monitor-data-quality.html), [Monitor Model Quality](https://docs.amazonaws.cn/sagemaker/latest/dg/model-monitor-model-quality.html), [Monitor Bias Drift](https://docs.amazonaws.cn/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html), and [Monitor Feature Attribution Drift](https://docs.amazonaws.cn/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html).

**Tip**  
To get human review of low confidence ML predictions or a random sample of predictions, use Amazon Augmented AI human review workflows. For more information, see [Using Amazon Augmented AI for Human Review](https://docs.amazonaws.cn/sagemaker/latest/dg/a2i-use-augmented-ai-a2i-human-review-loops.html).

# Clean up Amazon SageMaker notebook instance resources
<a name="ex1-cleanup"></a>

To avoid incurring unnecessary charges, use the Amazon Web Services Management Console to delete the endpoints and resources that you created while running the exercises. 

**Note**  
Training jobs and logs cannot be deleted and are retained indefinitely.

**Note**  
If you plan to explore other exercises in this guide, you might want to keep some of these resources, such as your notebook instance, S3 bucket, and IAM role.

 

1. Open the Amazon SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/) and delete the following resources:
   + The endpoint. Deleting the endpoint also deletes the ML compute instance or instances that support it.

     1. Under **Inference**, choose **Endpoints**.

     1. Choose the endpoint that you created in the example, choose **Actions**, and then choose **Delete**.
   + The endpoint configuration.

     1. Under **Inference**, choose **Endpoint configurations**.

     1. Choose the endpoint configuration that you created in the example, choose **Actions**, and then choose **Delete**.
   + The model.

     1. Under **Inference**, choose **Models**.

     1. Choose the model that you created in the example, choose **Actions**, and then choose **Delete**.
   + The notebook instance. Before deleting the notebook instance, stop it.

     1. Under **Notebook**, choose **Notebook instances**.

     1. Choose the notebook instance that you created in the example, choose **Actions**, and then choose **Stop**. The notebook instance takes several minutes to stop. When the **Status** changes to **Stopped**, move on to the next step.

     1. Choose **Actions**, and then choose **Delete**.

1. Open the Amazon S3 console at [https://console.amazonaws.cn/s3/](https://console.amazonaws.cn/s3/), and then delete the bucket that you created for storing model artifacts and the training dataset. 

1. Open the Amazon CloudWatch console at [https://console.amazonaws.cn/cloudwatch/](https://console.amazonaws.cn/cloudwatch/), and then delete all of the log groups that have names starting with `/aws/sagemaker/`.

# AL2023 notebook instances
<a name="nbi-al2023"></a>

Amazon SageMaker notebook instances currently support AL2023 operating systems. AL2023 is now the latest and recommended operating system for notebook instances. You can select the operating system that your notebook instance is based on when you create the notebook instance.

SageMaker AI supports notebook instances based on the following AL2023 operating systems.
+ **notebook-al2023-v1**: These notebook instances support JupyterLab version 4. For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).

**Topics**
+ [Supported instance types](#nbi-al2023-instances)
+ [Available kernels](#nbi-al2023-kernel)

## Supported instance types
<a name="nbi-al2023-instances"></a>

AL2023 supports instance types listed under **Notebook Instances** in [SageMaker AI Pricing](https://www.amazonaws.cn/sagemaker/pricing/), with the exception that AL2023 does not support `ml.p2`, `ml.p3`, `ml.g3` instances.

## Available kernels
<a name="nbi-al2023-kernel"></a>

The following table gives information about the available kernels for SageMaker notebook instances. All of these images are supported on notebook instances based on the `notebook-al2023-v1` operating system.


| Kernel name | Description | 
| --- | --- | 
| R | A kernel used to perform data analysis and visualization using R code from a Jupyter notebook. | 
| Sparkmagic (PySpark) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the Python programming language. This kernel comes with Python 3.10. | 
| Sparkmagic (Spark) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the Scala programming language. This kernel comes with Python 3.10. | 
| Sparkmagic (SparkR) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the R programming language. This kernel comes with Python 3.10. | 
| conda\$1python3 | A conda environment that comes pre-installed with popular packages for data science and machine learning. This kernel comes with Python 3.10. | 
| conda\$1pytorch | A conda environment that comes pre-installed with PyTorch version 2.7.0, as well as popular data science and machine learning packages. This kernel comes with Python 3.10. | 

# Amazon Linux 2 notebook instances
<a name="nbi-al2"></a>

**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](nbi-jl.md#nbi-jl-version-maintenance).

**Note**  
AL2023 is the latest and recommended operating system available for notebook instances. To learn more, see [AL2023 notebook instances](nbi-al2023.md).

Amazon SageMaker notebook instances currently support Amazon Linux 2 (AL2) operating systems. You can select the operating system that your notebook instance is based on when you create the notebook instance.

SageMaker AI supports notebook instances based on the following Amazon Linux 2 operating systems.
+ **notebook-al2-v1** (deprecated): These notebook instances supported JupyterLab version 1. As of June 30, 2025, you can no longer create new instances with this platform identifier. For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).
+ **notebook-al2-v2** (deprecated): These notebook instances supported JupyterLab version 3. As of June 30, 2025, you can no longer create new instances with this platform identifier. For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).
+ **notebook-al2-v3**: These notebook instances support JupyterLab version 4. For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).

Notebook instances created before 08/18/2021 automatically run on Amazon Linux (AL1). Notebook instances based on AL1 entered a maintenance phase as of 12/01/2022 and are no longer available for new notebook instance creation as of 02/01/2023. To replace AL1, you now have the option to create Amazon SageMaker notebook instances with AL2. For more information, see [AL1 Maintenance Phase Plan](#nbi-al2-deprecation).

**Topics**
+ [Supported instance types](#nbi-al2-instances)
+ [Available Kernels](#nbi-al2-kernel)
+ [AL1 Maintenance Phase Plan](#nbi-al2-deprecation)

## Supported instance types
<a name="nbi-al2-instances"></a>

Amazon Linux 2 supports instance types listed under **Notebook Instances** in [Amazon SageMaker Pricing](https://www.amazonaws.cn/sagemaker/pricing/) with the exception that Amazon Linux 2 does not support `ml.p2` instances.

## Available Kernels
<a name="nbi-al2-kernel"></a>

The following table gives information about the available kernels for SageMaker notebook instances. All of these images are supported on notebook instances based on the `notebook-al2-v1`, `notebook-al2-v2`, and `notebook-al2-v3` operating systems.

SageMaker notebook instance kernels


| Kernel name | Description | 
| --- | --- | 
| R | A kernel used to perform data analysis and visualization using R code from a Jupyter notebook. | 
| Sparkmagic (PySpark) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the Python programming language. This kernel comes with Python 3.10. | 
| Sparkmagic (Spark) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the Scala programming language. This kernel comes with Python 3.10. | 
| Sparkmagic (SparkR) | A kernel used to do data science with remote Spark clusters from Jupyter notebooks using the R programming language. This kernel comes with Python 3.10. | 
| conda\$1python3 | A conda environment that comes pre-installed with popular packages for data science and machine learning. This kernel comes with Python 3.10. | 
| conda\$1pytorch\$1p310 |  A conda environment that comes pre-installed with PyTorch version 2.2.0, as well as popular data science and machine learning packages. This kernel comes with Python 3.10. | 
| conda\$1tensorflow2\$1p310 | A conda environment that comes pre-installed with TensorFlow version 2.16.0, as well as popular data science and machine learning packages. This kernel comes with Python 3.10. | 

## AL1 Maintenance Phase Plan
<a name="nbi-al2-deprecation"></a>

The following table is a timeline for when AL1 entered its extended maintenance phase. The AL1 maintenance phase also coincides with the deprecation of Python 2 and Chainer. Notebooks based on AL2 do not have managed Python 2 and Chainer kernels.


|  Date  |  Description  | 
| --- | --- | 
|  08/18/2021  |  Notebook instances based on AL2 are launched. Newly launched notebook instances still default to AL1. AL1 is supported with security patches and updates, but no new features. You can choose between the two operating systems when launching a new notebook instance.  | 
|  10/31/2022  |  The default platform identifier for SageMaker notebook instances changes from Amazon Linux (al1-v1) to Amazon Linux 2 (al2-v2). You can choose between the two operating systems when launching a new notebook instance.  | 
|  12/01/2022  |  AL1 is no longer supported with non-critical security patches and updates. AL1 still receives fixes for [critical](https://nvd.nist.gov/vuln-metrics/cvss#) security-related issues. You can still launch instances on AL1, but assume the risks associated with using an unsupported operating system.  | 
|  02/01/2023  |  AL1 is no longer an available option for new notebook instance creation. After this date, customers can create notebook instances with the AL2 platform identifiers. Existing notebooks with an `INSERVICE` status should be migrated to the latest platform since continuous availability of AL1 notebook instances cannot be guaranteed.  | 
|  03/31/2024  |  AL1 reaches its end of life on notebook instances on March 31, 2024. After this date, AL1 will no longer receive any security updates, bug fixes, or be available for new notebook instance creation.  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/nbi-al2.html)  | 

### Migrating to Amazon Linux 2
<a name="nbi-al2-upgrade"></a>

Your existing AL1 notebook instance is not automatically migrated to Amazon Linux 2. To upgrade your AL1 notebook instance to Amazon Linux 2, you must create a new notebook instance, replicate your code and environment, and delete your old notebook instance. For more information, see the [Amazon Linux 2 migration blog](https://amazonaws-china.com/blogs/machine-learning/migrate-your-work-to-amazon-sagemaker-notebook-instance-with-amazon-linux-2/ ).

# JupyterLab versioning
<a name="nbi-jl"></a>

**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](#nbi-jl-version-maintenance).

The Amazon SageMaker notebook instance interface is based on JupyterLab, which is a web-based interactive development environment for notebooks, code, and data. Notebooks now support using either JupyterLab 1, JupyterLab 3, or JupyterLab 4. A single notebook instance can run a single instance of JupyterLab (at most). You can have multiple notebook instances with different JupyterLab versions. 

You can configure your notebook to run your preferred JupyterLab version by selecting the appropriate platform identifier. Use either the Amazon CLI or the SageMaker AI console when creating your notebook instance. For more information about platform identifiers, see [AL2023 notebook instances](nbi-al2023.md) and [Amazon Linux 2 notebook instances](nbi-al2.md). If you don’t explicitly configure a platform identifier, your notebook instance defaults to running JupyterLab 1. 

**Topics**
+ [JupyterLab version maintenance](#nbi-jl-version-maintenance)
+ [JupyterLab 4](#nbi-jl-4)
+ [JupyterLab 3](#nbi-jl-3)
+ [Create a notebook with your JupyterLab version](nbi-jl-create.md)
+ [View the JupyterLab version of a notebook from the console](nbi-jl-view.md)

## JupyterLab version maintenance
<a name="nbi-jl-version-maintenance"></a>

JupyterLab 1 and JupyterLab 3 platforms reached end of standard support on June 30, 2025. As of this date:
+ You can no longer create new or restart stopped JupyterLab 1 and JupyterLab 3 notebook instances.
+ Existing in-service JupyterLab 1 and JupyterLab 3 notebook instances may continue to function, but no longer receive SageMaker AI security updates or critical bug fixes.
+ You are responsible for managing the security of these deprecated instances.
+ If issues arise with existing JupyterLab 1 or JupyterLab 3 notebook instances, SageMaker AI cannot guarantee their continued availability. You must migrate your workload to a JupyterLab 4 notebook instance.

Migrate your work to JupyterLab 4 notebook instances (the latest version's platform identifier is [notebook-al2023-v1](nbi-al2023.md)) to ensure you have a secure and supported environment. This allows you to leverage the latest versions of Jupyter notebooks, JupyterLab, and other ML libraries. For instructions, see [ migrate your work to an SageMaker AI notebook instance with Amazon Linux 2](https://amazonaws-china.com/blogs//machine-learning/migrate-your-work-to-amazon-sagemaker-notebook-instance-with-amazon-linux-2/).

## JupyterLab 4
<a name="nbi-jl-4"></a>

JupyterLab 4 support is available only on the Amazon Linux 2 operating system platform. JupyterLab 4 includes the following features that are not available in JupyterLab 3:
+ Optimized rendering for a faster experience
+ Opt-in settings for faster tab switching and better performance with long notebooks. For more information, see the blog post [ JupyterLab 4.0 is Here](https://blog.jupyter.org/jupyterlab-4-0-is-here-388d05e03442).
+ Upgraded text editor
+ New extension manager installing from pypi
+ Added improvements to the UI, including document search and accessibility improvements

You can run JupyterLab 4 by specifying [notebook-al2023-v1](nbi-al2023.md) (the latest and recommended version) or [notebook-al2-v3](nbi-al2.md) as the platform identifier when creating your notebook instance.

**Note**  
If you attempt to migrate to a JupyterLab 4 Notebook Instance from another JupyterLab version, the package version changes between JupyterLab 3 and JupyterLab 4 might break any existing lifecycle configurations or Jupyter/JupyterLab extensions.

**Package version changes**

JupyterLab 4 has the following package version changes from JupyterLab 3:
+ JupyterLab has been upgraded from 3.x to 4.x.
+ Jupyter notebook has been upgraded from 6.x to 7.x.
+ jupyterlab-git has been updated to version 0.50.0.

## JupyterLab 3
<a name="nbi-jl-3"></a>

**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](#nbi-jl-version-maintenance).

 JupyterLab 3 support is available only on the Amazon Linux 2 operating system platform. JupyterLab 3 includes the following features that are not available in JupyterLab 1. For more information about these features, see [JupyterLab 3.0 is released\$1](https://blog.jupyter.org/jupyterlab-3-0-is-out-4f58385e25bb). 
+  Visual debugger when using the following kernels: 
  +  conda\$1pytorch\$1p38 
  +  conda\$1tensorflow2\$1p38 
  +  conda\$1amazonei\$1pytorch\$1latest\$1p37 
+ File browser filter
+ Table of Contents (TOC)
+ Multi-language support
+ Simple mode
+ Single interface mode
+ Live editing SVG files with updated rendering
+ User interface for notebook cell tags

### Important changes to JupyterLab 3
<a name="nbi-jl-3-changes"></a>

 For information about important changes when using JupyterLab 3, see the following JupyterLab change logs: 
+  [v2.0.0](https://github.com/jupyterlab/jupyterlab/releases) 
+  [v3.0.0](https://jupyterlab.readthedocs.io/en/stable/getting_started/changelog.html#for-developers) 

 **Package version changes** 

 JupyterLab 3 has the following package version changes from JupyterLab 1: 
+  JupyterLab has been upgraded from 1.x to 3.x.
+  Jupyter notebook has been upgraded from 5.x to 6.x.
+  jupyterlab-git has been updated to version 0.37.1.
+  nbserverproxy 0.x (0.3.2) has been replaced with jupyter-server-proxy 3.x (3.2.1).

# Create a notebook with your JupyterLab version
<a name="nbi-jl-create"></a>

**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](nbi-jl.md#nbi-jl-version-maintenance).

 You can select the JupyterLab version when creating your notebook instance from the console following the steps in [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md). 

 You can also select the JupyterLab version by passing the `platform-identifier` parameter when creating your notebook instance using the Amazon CLI as follows: 

```
create-notebook-instance --notebook-instance-name <NEW_NOTEBOOK_NAME> \
--instance-type <INSTANCE_TYPE> \
--role-arn <YOUR_ROLE_ARN> \
--platform-identifier notebook-al2-v3
```

# View the JupyterLab version of a notebook from the console
<a name="nbi-jl-view"></a>

**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](nbi-jl.md#nbi-jl-version-maintenance).

 You can view the JupyterLab version of a notebook using the following procedure: 

1. Open the Amazon SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/).

1. From the left navigation, select **Notebook**.

1.  From the dropdown menu, select **Notebook instances** to navigate to the **Notebook instances** page. 

1.  From the list of notebook instances, select your notebook instance name. 

1.  On the **Notebook instance settings** page, view the **Platform Identifier** to see the JupyterLab version of the notebook. 

# Create an Amazon SageMaker notebook instance
<a name="howitworks-create-ws"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

An Amazon SageMaker notebook instance is a ML compute instance running the Jupyter Notebook application. SageMaker AI manages creating the instance and related resources. Use Jupyter notebooks in your notebook instance to:
+ prepare and process data
+ write code to train models
+ deploy models to SageMaker AI hosting
+ test or validate your models

To create a notebook instance, use either the SageMaker AI console or the [  `CreateNotebookInstance`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateNotebookInstance.html) API.

The notebook instance type you choose depends on how you use your notebook instance. Ensure that your notebook instance is not bound by memory, CPU, or IO. To load a dataset into memory on the notebook instance for exploration or preprocessing, choose an instance type with enough RAM memory for your dataset. This requires an instance with at least 16 GB of memory (.xlarge or larger). If you plan to use the notebook for compute intensive preprocessing, we recommend you choose a compute-optimized instance such as a c4 or c5.

A best practice when using a SageMaker notebook is to use the notebook instance to orchestrate other Amazon services. For example, you can use the notebook instance to manage large dataset processing. To do this, make calls to Amazon Glue for ETL (extract, transform, and load) services or Amazon EMR for mapping and data reduction using Hadoop. You can use Amazon services as temporary forms of computation or storage for your data.

You can store and retrieve your training and test data using an Amazon Simple Storage Service bucket. You can then use SageMaker AI to train and build your model. As a result, the instance type of your notebook would have no bearing on the speed of your model training and testing.

After receiving the request, SageMaker AI does the following:
+ **Creates a network interface**—If you choose the optional VPC configuration, SageMaker AI creates the network interface in your VPC. It uses the subnet ID that you provide in the request to determine which Availability Zone to create the subnet in. SageMaker AI associates the security group that you provide in the request with the subnet. For more information, see [Connect a Notebook Instance in a VPC to External Resources](appendix-notebook-and-internet-access.md). 
+ **Launches an ML compute instance**—SageMaker AI launches an ML compute instance in a SageMaker AI VPC. SageMaker AI performs the configuration tasks that allow it to manage your notebook instance. If you specified your VPC, SageMaker AI enables traffic between your VPC and the notebook instance.
+ **Installs Anaconda packages and libraries for common deep learning platforms**—SageMaker AI installs all of the Anaconda packages that are included in the installer. For more information, see [Anaconda package list](https://docs.anaconda.com/free/anaconda/pkg-docs/). SageMaker AI also installs the TensorFlow and Apache MXNet deep learning libraries. 
+ **Attaches an ML storage volume**—SageMaker AI attaches an ML storage volume to the ML compute instance. You can use the volume as a working area to clean up the training dataset or to temporarily store validation, test, or other data. Choose any size between 5 GB and 16384 GB, in 1 GB increments, for the volume. The default is 5 GB. ML storage volumes are encrypted, so SageMaker AI can't determine the amount of available free space on the volume. Because of this, you can increase the volume size when you update a notebook instance, but you can't decrease the volume size. If you want to decrease the size of the ML storage volume in use, create a new notebook instance with the desired size.

  Only files and data saved within the `/home/ec2-user/SageMaker` folder persist between notebook instance sessions. Files and data that are saved outside this directory are overwritten when the notebook instance stops and restarts. Each notebook instance's `/tmp` directory provides a minimum of 10 GB of storage in an instance store. An instance store is temporary, block-level storage that isn't persistent. When the instance is stopped or restarted, SageMaker AI deletes the directory's contents and any operating system customizations. This temporary storage is part of the root volume of the notebook instance.

  If the notebook instance isn't updated and is running unsecure software, SageMaker AI might periodically update the instance as part of regular maintenance. During these updates, data outside of the folder `/home/ec2-user/SageMaker` is not persisted. For more information about maintenance and security patches, see [Maintenance](nbi.md#nbi-maintenance).

  If the instance type used by the notebook instance has NVMe support, customers can use the NVMe instance store volumes available for that instance type. For instances with NVMe store volumes, all instance store volumes are automatically attached to the instance at launch. For more information about instance types and their associated NVMe store volumes, see the [Amazon Elastic Compute Cloud Instance Type Details](https://www.amazonaws.cn/ec2/instance-types/).

  To make the attached NVMe store volume available for your notebook instance, complete the steps in [Make instance store volumes available on your instance ](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/add-instance-store-volumes.html#making-instance-stores-available-on-your-instances). Complete the steps with root access or by using a lifecycle configuration script.
**Note**  
NVMe instance store volumes are not persistent storage. This storage is short-lived with the instance and must be reconfigured every time an instance with this storage is launched.

**To create a SageMaker AI notebook instance:**

1. Open the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/). 

1. Choose **Notebook instances**, then choose **Create notebook instance**.

1. On the **Create notebook instance** page, provide the following information: 

   1. For **Notebook instance name**, type a name for your notebook instance.

   1. For **Notebook instance type**, choose an instance size suitable for your use case. For a list of supported instance types and quotas, see [Amazon SageMaker AI Service Quotas](https://docs.amazonaws.cn/general/latest/gr/sagemaker.html#limits_sagemaker).

   1. For **Platform Identifier**, choose a platform type to create the notebook instance on. This platform type dictates the Operating System and the JupyterLab version that your notebook instance is created with. The latest and recommended version is `notebook-al2023-v1`, for an Amazon Linux 2023 notebook instance. As of June 30, 2025, only JupyterLab 4 is supported for new instances. For information about platform identifier types, see [AL2023 notebook instances](nbi-al2023.md) and [Amazon Linux 2 notebook instances](nbi-al2.md). For information about JupyterLab versions, see [JupyterLab versioning](nbi-jl.md).
**Important**  
JupyterLab 1 and JupyterLab 3 are no longer supported as of June 30, 2025. You can no longer create new or restart stopped notebook instances using these versions. Existing in-service instances may continue to function but will not receive security updates or bug fixes. Migrate to JupyterLab 4 notebook instances for continued support. For more information, see [JupyterLab version maintenance](nbi-jl.md#nbi-jl-version-maintenance).

   1. (Optional) **Additional configuration** lets advanced users create a shell script that can run when you create or start the instance. This script, called a lifecycle configuration script, can be used to set the environment for the notebook or to perform other functions. For information, see [Customization of a SageMaker notebook instance using an LCC script](notebook-lifecycle-config.md).

   1. (Optional) **Additional configuration** also lets you specify the size, in GB, of the ML storage volume that is attached to the notebook instance. You can choose a size between 5 GB and 16,384 GB, in 1 GB increments. You can use the volume to clean up the training dataset or to temporarily store validation or other data.

   1. (Optional) For **Minimum IMDS Version**, select a version from the dropdown list. If this value is set to v1, both versions can be used with the notebook instance. If v2 is selected, then only IMDSv2 can be used with the notebook instance. For information about IMDSv2, see [Use IMDSv2](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).
**Note**  
Starting October 31, 2022, the default minimum IMDS Version for SageMaker notebook instances changes from IMDSv1 to IMDSv2.   
Starting February 1, 2023, IMDSv1 is no longer be available for new notebook instance creation. After this date, you can create notebook instances with a minimum IMDS version of 2.

   1. For **IAM role**, choose either an existing IAM role in your account with the necessary permissions to access SageMaker AI resources or **Create a new role**. If you choose **Create a new role**, SageMaker AI creates an IAM role named `AmazonSageMaker-ExecutionRole-YYYYMMDDTHHmmSS`. The Amazon managed policy `AmazonSageMakerFullAccess` is attached to the role. The role provides permissions that allow the notebook instance to call SageMaker AI and Amazon S3.

   1. For **Root access**, to give root access for all notebook instance users, choose **Enable**. To remove root access for users, choose **Disable**.If you give root access, all notebook instance users have administrator privileges and can access and edit all files on it. 

   1. (Optional) **Encryption key** lets you encrypt data on the ML storage volume attached to the notebook instance using an Amazon Key Management Service (Amazon KMS) key. If you plan to store sensitive information on the ML storage volume, consider encrypting the information. 

   1. (Optional) **Network** lets you put your notebook instance inside a Virtual Private Cloud (VPC). A VPC provides additional security and limits access to resources in the VPC from sources outside the VPC. For more information on VPCs, see [Amazon VPC User Guide](https://docs.amazonaws.cn/vpc/latest/userguide/).

      **To add your notebook instance to a VPC:**

      1. Choose the **VPC** and a **SubnetId**.

      1. For **Security Group**, choose your VPC's default security group. 

      1. If you need your notebook instance to have internet access, enable direct internet access. For **Direct internet access**, choose **Enable**. Internet access can make your notebook instance less secure. For more information, see [Connect a Notebook Instance in a VPC to External Resources](appendix-notebook-and-internet-access.md). 

   1. (Optional) To associate Git repositories with the notebook instance, choose a default repository and up to three additional repositories. For more information, see [Git repositories with SageMaker AI Notebook Instances](nbi-git-repo.md).

   1. Choose **Create notebook instance**. 

      In a few minutes, Amazon SageMaker AI launches an ML compute instance—in this case, a notebook instance—and attaches an ML storage volume to it. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries. For more information, see the [  `CreateNotebookInstance`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateNotebookInstance.html) API. 

1. When the status of the notebook instance is `InService`, in the console, the notebook instance is ready to use. Choose **Open Jupyter** next to the notebook name to open the classic Jupyter dashboard.
**Note**  
To augment the security of your Amazon SageMaker notebook instance, all regional `notebook.region.sagemaker.aws` domains are registered in the internet [Public Suffix List (PSL)](https://publicsuffix.org/). For further security, we recommend that you use cookies with a `__Host-` prefix to set sensitive cookies for the domains of your SageMaker notebook instances. This helps to defend your domain against cross-site request forgery attempts (CSRF). For more information, see the [Set-Cookie](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes) page in the [mozilla.org](https://www.mozilla.org/en-GB/?v=1) developer documentation website.

    You can choose **Open JupyterLab** to open the JupyterLab dashboard. The dashboard provides access to your notebook instance.

   For more information about Jupyter notebooks, see [The Jupyter notebook](https://jupyter-notebook.readthedocs.io/en/stable/).

# Access Notebook Instances
<a name="howitworks-access-ws"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To access your Amazon SageMaker notebook instances, choose one of the following options: 
+ Use the console.

  Choose **Notebook instances**. The console displays a list of notebook instances in your account. To open a notebook instance with a standard Jupyter interface, choose **Open Jupyter** for that instance. To open a notebook instance with a JupyterLab interface, choose **Open JupyterLab** for that instance.  
![\[Example Notebook instances section in the console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/ws-notebook-10.png)

  The console uses your sign-in credentials to send a [  `CreatePresignedNotebookInstanceUrl`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreatePresignedNotebookInstanceUrl.html) API request to SageMaker AI. SageMaker AI returns the URL for your notebook instance, and the console opens the URL in another browser tab and displays the Jupyter notebook dashboard. 
**Note**  
The URL that you get from a call to [  `CreatePresignedNotebookInstanceUrl`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreatePresignedNotebookInstanceUrl.html) is valid only for 5 minutes. If you try to use the URL after the 5-minute limit expires, you are directed to the Amazon Web Services Management Console sign-in page.
+ Use the API.

  To get the URL for the notebook instance, call the [ `CreatePresignedNotebookInstanceUrl`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreatePresignedNotebookInstanceUrl.html) API and use the URL that the API returns to open the notebook instance.

Use the Jupyter notebook dashboard to create and manage notebooks and to write code. For more information about Jupyter notebooks, see [http://jupyter.org/documentation.html](http://jupyter.org/documentation.html).

# Update a Notebook Instance
<a name="nbi-update"></a>

After you create a notebook instance, you can update it using the SageMaker AI console and [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_UpdateNotebookInstance.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_UpdateNotebookInstance.html) API operation.

You can update the tags of a notebook instance that is `InService`. To update any other attribute of a notebook instance, its status must be `Stopped`.

**To update a notebook instance in the SageMaker AI console:**

1. Open the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/). 

1. Choose **Notebook instances**.

1. Choose the notebook instance that you want to update by selecting the notebook instance **Name** from the list.

1. If your notebook **Status** is not `Stopped`, select the **Stop** button to stop the notebook instance. 

   When you do this, the notebook instance status changes to `Stopping`. Wait until the status changes to `Stopped` to complete the following steps. 

1. Select the **Edit** button to open the **Edit notebook instance** page. For information about the notebook properties you can update, see [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md).

1. Update your notebook instance and select the **Update notebook instance** button at the bottom of the page when you are done to return to the notebook instances page. Your notebook instance status changes to **Updating**. 

   When the notebook instance update is complete, the status changes to `Stopped`.

# Customization of a SageMaker notebook instance using an LCC script
<a name="notebook-lifecycle-config"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

A *lifecycle configuration* (LCC) provides shell scripts that run only when you create the notebook instance or whenever you start one. When you create a notebook instance, you can create a new LCC or attach an LCC that you already have. Lifecycle configuration scripts are useful for the following use cases:
+ Installing packages or sample notebooks on a notebook instance
+ Configuring networking and security for a notebook instance
+ Using a shell script to customize a notebook instance

You can also use a lifecycle configuration script to access Amazon services from your notebook. For example, you can create a script that lets you use your notebook to control other Amazon resources, such as an Amazon EMR instance.

We maintain a public repository of notebook lifecycle configuration scripts that address common use cases for customizing notebook instances at [https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples).

**Note**  
Each script has a limit of 16384 characters.  
The value of the `$PATH` environment variable that is available to both scripts is `/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin`. The working directory, which is the value of the `$PWD` environment variable, is `/`.  
View CloudWatch Logs for notebook instance lifecycle configurations in log group `/aws/sagemaker/NotebookInstances` in log stream `[notebook-instance-name]/[LifecycleConfigHook]`.  
Scripts cannot run for longer than 5 minutes. If a script runs for longer than 5 minutes, it fails and the notebook instance is not created or started. To help decrease the run time of scripts, try the following:  
Cut down on necessary steps. For example, limit which conda environments in which to install large packages.
Run tasks in parallel processes.
Use the `nohup` command in your script.

You can see a list of notebook instance lifecycle configurations you previously created by choosing **Lifecycle configuration** in the SageMaker AI console. You can attach a notebook instance LCC when you create a new notebook instance. For more information about creating a notebook instance, see [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md).

# Create a lifecycle configuration script
<a name="notebook-lifecycle-config-create"></a>

The following procedure shows how to create a lifecycle configuration script for use with an Amazon SageMaker notebook instance. For more information about creating a notebook instance, see [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md).

**To create a lifecycle configuration**

1. Open the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/). 

1. On the left navigation pane, choose **Admin configurations**.

1. Under **Admin configurations**, choose **Lifecycle configurations**. 

1. From the **Lifecycle configurations** page, choose the **Notebook Instance** tab.

1. Choose **Create configuration**.

1. For **Name**, type a name using alphanumeric characters and "-", but no spaces. The name can have a maximum of 63 characters.

1. (Optional) To create a script that runs when you create the notebook and every time you start it, choose **Start notebook**.

1. In the **Start notebook** editor, type the script.

1. (Optional) To create a script that runs only once, when you create the notebook, choose **Create notebook**.

1. In the **Create notebook** editor, type the script configure networking.

1. Choose **Create configuration**.

## Lifecycle Configuration Best Practices
<a name="nbi-lifecycle-config-bp"></a>

The following are best practices for using lifecycle configurations:

**Important**  
We do not recommend storing sensitive information in your lifecycle configuration script.

**Important**  
Lifecycle configuration scripts run with root access and the notebook instance's IAM execution role privileges, regardless of the root access setting for notebook users. Principals with permissions to create or modify lifecycle configurations and update notebook instances can execute code with the execution role's credentials. See [Control root access to a SageMaker notebook instance](nbi-root-access.md) for more information.
+ Lifecycle configurations run as the `root` user. If your script makes any changes within the `/home/ec2-user/SageMaker` directory, (for example, installing a package with `pip`), use the command `sudo -u ec2-user` to run as the `ec2-user` user. This is the same user that Amazon SageMaker AI runs as.
+ SageMaker AI notebook instances use `conda` environments to implement different kernels for Jupyter notebooks. If you want to install packages that are available to one or more notebook kernels, enclose the commands to install the packages with `conda` environment commands that activate the conda environment that contains the kernel where you want to install the packages.

  For example, if you want to install a package only for the `python3` environment, use the following code:

  ```
  #!/bin/bash
  sudo -u ec2-user -i <<EOF
  
  # This will affect only the Jupyter kernel called "conda_python3".
  source activate python3
  
  # Replace myPackage with the name of the package you want to install.
  pip install myPackage
  # You can also perform "conda install" here as well.
  
  source deactivate
  
  EOF
  ```

  If you want to install a package in all conda environments in the notebook instance, use the following code:

  ```
  #!/bin/bash
  sudo -u ec2-user -i <<EOF
  
  # Note that "base" is special environment name, include it there as well.
  for env in base /home/ec2-user/anaconda3/envs/*; do
      source /home/ec2-user/anaconda3/bin/activate $(basename "$env")
  
      # Installing packages in the Jupyter system environment can affect stability of your SageMaker
      # Notebook Instance.  You can remove this check if you'd like to install Jupyter extensions, etc.
      if [ $env = 'JupyterSystemEnv' ]; then
        continue
      fi
  
      # Replace myPackage with the name of the package you want to install.
      pip install --upgrade --quiet myPackage
      # You can also perform "conda install" here as well.
  
      source /home/ec2-user/anaconda3/bin/deactivate
  done
  
  EOF
  ```
+ You must store all conda environments in the default environments folder (/home/user/anaconda3/envs).

**Important**  
When you create or change a script, we recommend that you use a text editor that provides Unix-style line breaks, such as the text editor available in the console when you create a notebook. Copying text from a non-Linux operating system might introduce incompatible line breaks and result in an unexpected error.

# External library and kernel installation
<a name="nbi-add-external"></a>

**Important**  
Currently, all packages in notebook instance environments are licensed for use with Amazon SageMaker AI and do not require additional commercial licenses. However, this might be subject to change in the future, and we recommend reviewing the licensing terms regularly for any updates.

Amazon SageMaker notebook instances come with multiple environments already installed. These environments contain Jupyter kernels and Python packages including: scikit, Pandas, NumPy, TensorFlow, and MXNet. These environments, along with all files in the `sample-notebooks` folder, are refreshed when you stop and start a notebook instance. You can also install your own environments that contain your choice of packages and kernels.

The different Jupyter kernels in Amazon SageMaker notebook instances are separate conda environments. For information about conda environments, see [Managing environments](https://conda.io/docs/user-guide/tasks/manage-environments.html) in the *Conda* documentation.

Install custom environments and kernels on the notebook instance's Amazon EBS volume. This ensures that they persist when you stop and restart the notebook instance, and that any external libraries you install are not updated by SageMaker AI. To do that, use a lifecycle configuration that includes both a script that runs when you create the notebook instance (`on-create)` and a script that runs each time you restart the notebook instance (`on-start`). For more information about using notebook instance lifecycle configurations, see [Customization of a SageMaker notebook instance using an LCC script](notebook-lifecycle-config.md). There is a GitHub repository that contains sample lifecycle configuration scripts at [SageMaker AI Notebook Instance Lifecycle Config Samples](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples).

The examples at [https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/persistent-conda-ebs/on-create.sh](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/persistent-conda-ebs/on-create.sh) and [https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/persistent-conda-ebs/on-start.sh](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/persistent-conda-ebs/on-start.sh) show the best practice for installing environments and kernels on a notebook instance. The `on-create` script installs the `ipykernel` library to create custom environments as Jupyter kernels, then uses `pip install` and `conda install` to install libraries. You can adapt the script to create custom environments and install libraries that you want. SageMaker AI does not update these libraries when you stop and restart the notebook instance, so you can ensure that your custom environment has specific versions of libraries that you want. The `on-start` script installs any custom environments that you create as Jupyter kernels, so that they appear in the dropdown list in the Jupyter **New** menu.

## Package installation tools
<a name="nbi-add-external-tools"></a>

SageMaker notebooks support the following package installation tools:
+ conda install
+ pip install

You can install packages using the following methods:
+ Lifecycle configuration scripts.

  For example scripts, see [SageMaker AI Notebook Instance Lifecycle Config Samples](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples). For more information on lifecycle configuration, see [Customize a Notebook Instance Using a Lifecycle Configuration Script](https://docs.amazonaws.cn/sagemaker/latest/dg/notebook-lifecycle-config.html).
+ Notebooks – The following commands are supported.
  + `%conda install`
  + `%pip install`
+ The Jupyter terminal – You can install packages using pip and conda directly.

From within a notebook you can use the system command syntax (lines starting with \$1) to install packages, for example, `!pip install` and `!conda install`. More recently, new commands have been added to IPython: `%pip` and `%conda`. These commands are the recommended way to install packages from a notebook as they correctly take into account the active environment or interpreter being used. For more information, see [Add %pip and %conda magic functions](https://github.com/ipython/ipython/pull/11524).

### Conda
<a name="nbi-add-external-tools-conda"></a>

Conda is an open source package management system and environment management system, which can install packages and their dependencies. SageMaker AI supports using Conda with either of the two main channels, the default channel, and the conda-forge channel. For more information, see [Conda channels](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html). The conda-forge channel is a community channel where contributors can upload packages.

**Note**  
Due to how Conda resolves the dependency graph, installing packages from conda-forge can take significantly longer (in the worst cases, upwards of 10 minutes).

The Deep Learning AMI comes with many conda environments and many packages preinstalled. Due to the number of packages preinstalled, finding a set of packages that are guaranteed to be compatible is difficult. You may see a warning "The environment is inconsistent, please check the package plan carefully". Despite this warning, SageMaker AI ensures that all the SageMaker AI provided environments are correct. SageMaker AI cannot guarantee that any user installed packages will function correctly.

**Note**  
Users of SageMaker AI, Amazon Deep Learning AMIs and Amazon EMR can access the commercial Anaconda repository without taking a commercial license through February 1, 2024 when using Anaconda in those services. For any usage of the commercial Anaconda repository after February 1, 2024, customers are responsible for determining their own Anaconda license requirements.

Conda has two methods for activating environments: conda activate/deactivate, and source activate/deactivate. For more information, see [Should I use 'conda activate' or 'source activate' in Linux](https://stackoverflow.com/questions/49600611/python-anaconda-should-i-use-conda-activate-or-source-activate-in-linux).

SageMaker AI supports moving Conda environments onto the Amazon EBS volume, which is persisted when the instance is stopped. The environments aren't persisted when the environments are installed to the root volume, which is the default behavior. For an example lifecycle script, see [persistent-conda-ebs](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/tree/master/scripts/persistent-conda-ebs).

**Supported conda operations (see note at the bottom of this topic)**
+ conda install of a package in a single environment
+ conda install of a package in all environments
+ conda install of a R package in the R environment
+ Installing a package from the main conda repository
+ Installing a package from conda-forge
+ Changing the Conda install location to use EBS
+ Supporting both conda activate and source activate

### Pip
<a name="nbi-add-external-tools-pip"></a>

Pip is the de facto tool for installing and managing Python packages. Pip searches for packages on the Python Package Index (PyPI) by default. Unlike Conda, pip doesn't have built in environment support, and is not as thorough as Conda when it comes to packages with native/system library dependencies. Pip can be used to install packages in Conda environments.

You can use alternative package repositories with pip instead of the PyPI. For an example lifecycle script, see [on-start.sh](https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/add-pypi-repository/on-start.sh).

**Supported pip operations (see note at the bottom of this topic)**
+ Using pip to install a package without an active conda environment (install packages system wide)
+ Using pip to install a package in a conda environment
+ Using pip to install a package in all conda environments
+ Changing the pip install location to use EBS
+ Using an alternative repository to install packages with pip

### Unsupported
<a name="nbi-add-external-tools-misc"></a>

SageMaker AI aims to support as many package installation operations as possible. However, if the packages were installed by SageMaker AI or DLAMI, and you use the following operations on these packages, it might make your notebook instance unstable:
+ Uninstalling
+ Downgrading
+ Upgrading

We do not provide support for installing packages via yum install or installing R packages from CRAN.

Due to potential issues with network conditions or configurations, or the availability of Conda or PyPi, we cannot guarantee that packages will install in a fixed or deterministic amount of time.

**Note**  
We cannot guarantee that a package installation will be successful. Attempting to install a package in an environment with incompatible dependencies can result in a failure. In such a case you should contact the library maintainer to see if it is possible to update the package dependencies. Alternatively you can attempt to modify the environment in such a way as to allow the installation. This modification however will likely mean removing or updating existing packages, which means we can no longer guarantee stability of this environment.

# Notebook Instance Software Updates
<a name="nbi-software-updates"></a>

Amazon SageMaker AI periodically tests and releases software that is installed on notebook instances. This includes:
+ Kernel updates
+ Security patches
+ Amazon SDK updates
+ [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) updates
+ Open source software updates

To ensure that you have the most recent software updates, stop and restart your notebook instance, either in the SageMaker AI console or by calling [  `StopNotebookInstance`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_StopNotebookInstance.html).

You can also manually update software installed on your notebook instance while it is running by using update commands in a terminal or in a notebook.

**Note**  
Updating kernels and some packages might depend on whether root access is enabled for the notebook instance. For more information, see [Control root access to a SageMaker notebook instance](nbi-root-access.md).

You can check the [Personal Health Dashboard](http://www.amazonaws.cn/premiumsupport/technology/personal-health-dashboard/) or the security bulletin at [Security Bulletins](https://www.amazonaws.cn/security/security-bulletins/) for updates.

# Control an Amazon EMR Spark Instance Using a Notebook
<a name="nbi-lifecycle-config-emr"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

You can use a notebook instance created with a custom lifecycle configuration script to access Amazon services from your notebook. For example, you can create a script that lets you use your notebook with Sparkmagic to control other Amazon resources, such as an Amazon EMR instance. You can then use the Amazon EMR instance to process your data instead of running the data analysis on your notebook. This allows you to create a smaller notebook instance because you won't use the instance to process data. This is helpful when you have large datasets that would require a large notebook instance to process the data.

The process requires three procedures using the Amazon SageMaker AI console:
+ Create the Amazon EMR Spark instance
+ Create the Jupyter Notebook
+ Test the notebook-to-Amazon EMR connection

**To create an Amazon EMR Spark instance that can be controlled from a notebook using Sparkmagic**

1. Open the Amazon EMR console at [https://console.amazonaws.cn/elasticmapreduce/](https://console.amazonaws.cn/elasticmapreduce/).

1. In the navigation pane, choose **Create cluster**.

1. On the **Create Cluster - Quick Options** page, under **Software configuration**, choose **Spark: Spark 2.4.4 on Hadoop 2.8.5 YARN with Ganglia 3.7.2 and Zeppelin 0.8.2**.

1. Set additional parameters on the page and then choose **Create cluster**.

1. On the **Cluster** page, choose the cluster name that you created. Note the **Master Public DNS**, the **EMR master's security group**, and the VPC name and subnet ID where the EMR cluster was created. You will use these values when you create a notebook.

**To create a notebook that uses Sparkmagic to control an Amazon EMR Spark instance**

1. Open the Amazon SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/).

1. In the navigation pane, under **Notebook instances**, choose **Create notebook**.

1. Enter the notebook instance name and choose the instance type.

1. Choose **Additional configuration**, then, under **Lifecycle configuration**, choose **Create a new lifecycle configuration**.

1. Add the following code to the lifecycle configuration script:

   ```
   # OVERVIEW
   # This script connects an Amazon EMR cluster to an Amazon SageMaker notebook instance that uses Sparkmagic.
   #
   # Note that this script will fail if the Amazon EMR cluster's master node IP address is not reachable.
   #   1. Ensure that the EMR master node IP is resolvable from the notebook instance.
   #      One way to accomplish this is to have the notebook instance and the Amazon EMR cluster in the same subnet.
   #   2. Ensure the EMR master node security group provides inbound access from the notebook instance security group.
   #       Type        - Protocol - Port - Source
   #       Custom TCP  - TCP      - 8998 - $NOTEBOOK_SECURITY_GROUP
   #   3. Ensure the notebook instance has internet connectivity to fetch the SparkMagic example config.
   #
   # https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/
   
   # PARAMETERS
   EMR_MASTER_IP=your.emr.master.ip
   
   
   cd /home/ec2-user/.sparkmagic
   
   echo "Fetching Sparkmagic example config from GitHub..."
   wget https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json
   
   echo "Replacing EMR master node IP in Sparkmagic config..."
   sed -i -- "s/localhost/$EMR_MASTER_IP/g" example_config.json
   mv example_config.json config.json
   
   echo "Sending a sample request to Livy.."
   curl "$EMR_MASTER_IP:8998/sessions"
   ```

1. In the `PARAMETERS` section of the script, replace `your.emr.master.ip` with the Master Public DNS name for the Amazon EMR instance.

1. Choose **Create configuration**.

1. On the **Create notebook** page, choose **Network - optional**.

1. Choose the VPC and subnet where the Amazon EMR instance is located.

1. Choose the security group used by the Amazon EMR master node.

1. Choose **Create notebook instance**.

While the notebook instance is being created, the status is **Pending**. After the instance has been created and the lifecycle configuration script has successfully run, the status is **InService**.

**Note**  
If the notebook instance can't connect to the Amazon EMR instance, SageMaker AI can't create the notebook instance. The connection can fail if the Amazon EMR instance and notebook are not in the same VPC and subnet, if the Amazon EMR master security group is not used by the notebook, or if the Master Public DNS name in the script is incorrect. 

**To test the connection between the Amazon EMR instance and the notebook**

1.  When the status of the notebook is **InService**, choose **Open Jupyter** to open the notebook.

1. Choose **New**, then choose **Sparkmagic (PySpark)**.

1. In the code cell, enter **%%info** and then run the cell.

   The output should be similar to the following

   ```
   Current session configs: {'driverMemory': '1000M', 'executorCores': 2, 'kind': 'pyspark'}
                       No active sessions.
   ```

# Set the Notebook Kernel
<a name="howitworks-set-kernel"></a>

Amazon SageMaker AI provides several kernels for Jupyter that provide support for Python 2 and 3, Apache MXNet, TensorFlow, and PySpark. To set a kernel for a new notebook in the Jupyter notebook dashboard, choose **New**, and then choose the kernel from the list. For more information about the available kernels, see [Available Kernels](nbi-al2.md#nbi-al2-kernel).

![\[Location of the New drop-down list in the Jupyter notebook dashboard.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/nbi-set-kernel.png)


You can also create a custom kernel that you can use in your notebook instance. For information, see [External library and kernel installation](nbi-add-external.md).

# Git repositories with SageMaker AI Notebook Instances
<a name="nbi-git-repo"></a>

Associate Git repositories with your notebook instance to save your notebooks in a source control environment that persists even if you stop or delete your notebook instance. You can associate one default repository and up to three additional repositories with a notebook instance. The repositories can be hosted in Amazon CodeCommit, GitHub, or on any other Git server. Associating Git repositories with your notebook instance can be useful for:
+ Persistence - Notebooks in a notebook instance are stored on durable Amazon EBS volumes, but they do not persist beyond the life of your notebook instance. Storing notebooks in a Git repository enables you to store and use notebooks even if you stop or delete your notebook instance.
+ Collaboration - Peers on a team often work on machine learning projects together. Storing your notebooks in Git repositories allows peers working in different notebook instances to share notebooks and collaborate on them in a source-control environment.
+ Learning - Many Jupyter notebooks that demonstrate machine learning techniques are available in publicly hosted Git repositories, such as on GitHub. You can associate your notebook instance with a repository to easily load Jupyter notebooks contained in that repository.

There are two ways to associate a Git repository with a notebook instance:
+ Add a Git repository as a resource in your Amazon SageMaker AI account. Then, to access the repository, you can specify an Amazon Secrets Manager secret that contains credentials. That way, you can access repositories that require authentication.
+ Associate a public Git repository that is not a resource in your account. If you do this, you cannot specify credentials to access the repository.

**Topics**
+ [Add a Git repository to your Amazon SageMaker AI account](nbi-git-resource.md)
+ [Create a Notebook Instance with an Associated Git Repository](nbi-git-create.md)
+ [Associate a CodeCommit Repository in a Different Amazon Account with a Notebook Instance](nbi-git-cross.md)
+ [Use Git Repositories in a Notebook Instance](git-nbi-use.md)

# Add a Git repository to your Amazon SageMaker AI account
<a name="nbi-git-resource"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To manage your GitHub repositories, easily associate them with your notebook instances, and associate credentials for repositories that require authentication, add the repositories as resources in your Amazon SageMaker AI account. You can view a list of repositories that are stored in your account and details about each repository in the SageMaker AI console and by using the API.

You can add Git repositories to your SageMaker AI account in the SageMaker AI console or by using the Amazon CLI.

**Note**  
You can use the SageMaker AI API [  `CreateCodeRepository`](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateCodeRepository.html) to add Git repositories to your SageMaker AI account, but step-by-step instructions are not provided here.

## Add a Git repository to your SageMaker AI account (Console)
<a name="nbi-git-resource-console"></a>

**To add a Git repository as a resource in your SageMaker AI account**

1. Open the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/).

1. Under **Notebook**, choose **Git repositories**, then choose **Add repository**.

1. To add an CodeCommit repository, choose **Amazon CodeCommit**. To add a GitHub or other Git-based repository, choose **GitHub/Other Git-based repo**.

**To add an existing CodeCommit repository**

1. Choose **Use existing repository**.

1. For **Repository**, choose a repository from the list.

1. Enter a name to use for the repository in SageMaker AI. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen).

1. Choose **Add repository**.

**To create a new CodeCommit repository**

1. Choose **Create new repository**.

1. Enter a name for the repository that you can use in both CodeCommit and SageMaker AI. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen).

1. Choose **Create repository**.

**To add a Git repository hosted somewhere other than CodeCommit**

1. Choose **GitHub/Other Git-based repo**.

1. Enter a name of up to 63 characters. Valid characters include alpha-numeric characters, a hyphen (-), and 0-9.

1. Enter the URL for the repository. Do not provide a username in the URL. Add the sign-in credentials in Amazon Secrets Manager as described in the next step.

1. For **Git credentials**, choose the credentials to use to authenticate to the repository. This is necessary only if the Git repository is private.
**Note**  
If you have two-factor authentication enabled for your Git repository, enter a personal access token generated by your Git service provider in the `password` field.

   1. To use an existing Amazon Secrets Manager secret, choose **Use existing secret**, and then choose a secret from the list. For information about creating and storing a secret, see [Creating a Basic Secret](https://docs.amazonaws.cn/secretsmanager/latest/userguide/manage_create-basic-secret.html) in the *Amazon Secrets Manager User Guide*. The name of the secret you use must contain the string `sagemaker`.
**Note**  
The secret must have a staging label of `AWSCURRENT` and must be in the following format:  
`{"username": UserName, "password": Password}`  
For GitHub repositories, we recommend using a personal access token in the `password` field. For information, see [https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/).

   1. To create a new Amazon Secrets Manager secret, choose **Create secret**, enter a name for the secret, and then enter the sign-in credentials to use to authenticate to the repository. The name for the secret must contain the string `sagemaker`.
**Note**  
The IAM role you use to create the secret must have the `secretsmanager:GetSecretValue` permission in its IAM policy.  
The secret must have a staging label of `AWSCURRENT` and must be in the following format:  
`{"username": UserName, "password": Password}`  
For GitHub repositories, we recommend using a personal access token.

   1. To not use any credentials, choose **No secret**.

1. Choose **Create secret**.

# Add a Git repository to your Amazon SageMaker AI account (CLI)
<a name="nbi-git-resource-cli"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

Use the `create-code-repository` Amazon CLI command to add a Git repository to Amazon SageMaker AI to give users access to external resources. Specify a name for the repository as the value of the `code-repository-name` argument. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen). Also specify the following:
+ The default branch
+ The URL of the Git repository
**Note**  
Do not provide a username in the URL. Add the sign-in credentials in Amazon Secrets Manager as described in the next step.
+ The Amazon Resource Name (ARN) of an Amazon Secrets Manager secret that contains the credentials to use to authenticate the repository as the value of the `git-config` argument

For information about creating and storing a secret, see [Creating a Basic Secret](https://docs.amazonaws.cn/secretsmanager/latest/userguide/manage_create-basic-secret.html) in the *Amazon Secrets Manager User Guide*. The following command creates a new repository named `MyRespository` in your Amazon SageMaker AI account that points to a Git repository hosted at `https://github.com/myprofile/my-repo"`.

For Linux, OS X, or Unix:

```
aws sagemaker create-code-repository \
                    --code-repository-name "MyRepository" \
                    --git-config Branch=branch,RepositoryUrl=https://github.com/myprofile/my-repo,SecretArn=arn:aws:secretsmanager:us-east-2:012345678901:secret:my-secret-ABc0DE
```

For Windows:

```
aws sagemaker create-code-repository ^
                    --code-repository-name "MyRepository" ^
                    --git-config "{\"Branch\":\"master\", \"RepositoryUrl\" :
                    \"https://github.com/myprofile/my-repo\", \"SecretArn\" : \"arn:aws:secretsmanager:us-east-2:012345678901:secret:my-secret-ABc0DE\"}"
```

**Note**  
The secret must have a staging label of `AWSCURRENT` and must be in the following format:  
`{"username": UserName, "password": Password}`  
For GitHub repositories, we recommend using a personal access token.

# Create a Notebook Instance with an Associated Git Repository
<a name="nbi-git-create"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

You can associate Git repositories with a notebook instance when you create the notebook instance by using the Amazon Web Services Management Console, or the Amazon CLI. If you want to use a CodeCommit repository that is in a different Amazon account than the notebook instance, set up cross-account access for the repository. For information, see [Associate a CodeCommit Repository in a Different Amazon Account with a Notebook Instance](nbi-git-cross.md).

**Topics**
+ [Create a Notebook Instance with an Associated Git Repository (Console)](#nbi-git-create-console)
+ [Create a Notebook Instance with an Associated Git Repository (CLI)](nbi-git-create-cli.md)

## Create a Notebook Instance with an Associated Git Repository (Console)
<a name="nbi-git-create-console"></a>

**To create a notebook instance and associate Git repositories in the Amazon SageMaker AI console**

1. Follow the instructions at [Create an Amazon SageMaker Notebook Instance for the tutorial](gs-setup-working-env.md).

1. For **Git repositories**, choose Git repositories to associate with the notebook instance.

   1. For **Default repository**, choose a repository that you want to use as your default repository. SageMaker AI clones this repository as a subdirectory in the Jupyter startup directory at `/home/ec2-user/SageMaker`. When you open your notebook instance, it opens in this repository. To choose a repository that is stored as a resource in your account, choose its name from the list. To add a new repository as a resource in your account, choose **Add a repository to SageMaker AI (opens the Add repository flow in a new window)** and then follow the instructions at [Create a Notebook Instance with an Associated Git Repository (Console)](#nbi-git-create-console). To clone a public repository that is not stored in your account, choose **Clone a public Git repository to this notebook instance only**, and then specify the URL for that repository.

   1. For **Additional repository 1**, choose a repository that you want to add as an additional directory. SageMaker AI clones this repository as a subdirectory in the Jupyter startup directory at `/home/ec2-user/SageMaker`. To choose a repository that is stored as a resource in your account, choose its name from the list. To add a new repository as a resource in your account, choose **Add a repository to SageMaker AI (opens the Add repository flow in a new window)** and then follow the instructions at [Create a Notebook Instance with an Associated Git Repository (Console)](#nbi-git-create-console). To clone a repository that is not stored in your account, choose **Clone a public Git repository to this notebook instance only**, and then specify the URL for that repository.

      Repeat this step up to three times to add up to three additional repositories to your notebook instance.

# Create a Notebook Instance with an Associated Git Repository (CLI)
<a name="nbi-git-create-cli"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[Amazon managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To create a notebook instance and associate Git repositories by using the Amazon CLI, use the `create-notebook-instance` command as follows:
+ Specify the repository that you want to use as your default repository as the value of the `default-code-repository` argument. Amazon SageMaker AI clones this repository as a subdirectory in the Jupyter startup directory at `/home/ec2-user/SageMaker`. When you open your notebook instance, it opens in this repository. To use a repository that is stored as a resource in your SageMaker AI account, specify the name of the repository as the value of the `default-code-repository` argument. To use a repository that is not stored in your account, specify the URL of the repository as the value of the `default-code-repository` argument.
+ Specify up to three additional repositories as the value of the `additional-code-repositories` argument. SageMaker AI clones this repository as a subdirectory in the Jupyter startup directory at `/home/ec2-user/SageMaker`, and the repository is excluded from the default repository by adding it to the `.git/info/exclude` directory of the default repository. To use repositories that are stored as resources in your SageMaker AI account, specify the names of the repositories as the value of the `additional-code-repositories` argument. To use repositories that are not stored in your account, specify the URLs of the repositories as the value of the `additional-code-repositories` argument.

For example, the following command creates a notebook instance that has a repository named `MyGitRepo`, that is stored as a resource in your SageMaker AI account, as a default repository, and an additional repository that is hosted on GitHub:

```
aws sagemaker create-notebook-instance \
                    --notebook-instance-name "MyNotebookInstance" \
                    --instance-type "ml.t2.medium" \
                    --role-arn "arn:aws:iam::012345678901:role/service-role/AmazonSageMaker-ExecutionRole-20181129T121390" \
                    --default-code-repository "MyGitRepo" \
                    --additional-code-repositories "https://github.com/myprofile/my-other-repo"
```

**Note**  
If you use an Amazon CodeCommit repository that does not contain "SageMaker" in its name, add the `codecommit:GitPull` and `codecommit:GitPush` permissions to the role that you pass as the `role-arn` argument to the `create-notebook-instance` command. For information about how to add permissions to a role, see [Adding and Removing IAM Policies](https://docs.amazonaws.cn/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) in the *Amazon Identity and Access Management User Guide*. 

# Associate a CodeCommit Repository in a Different Amazon Account with a Notebook Instance
<a name="nbi-git-cross"></a>

To associate a CodeCommit repository in a different Amazon account with your notebook instance, set up cross-account access for the CodeCommit repository.

**To set up cross-account access for a CodeCommit repository and associate it with a notebook instance:**

1. In the Amazon account that contains the CodeCommit repository, create an IAM policy that allows access to the repository from users in the account that contains your notebook instance. For information, see [Step 1: Create a Policy for Repository Access in AccountA](https://docs.amazonaws.cn/codecommit/latest/userguide/cross-account-administrator-a.html#cross-account-create-policy-a) in the *CodeCommit User Guide*.

1. In the Amazon account that contains the CodeCommit repository, create an IAM role, and attach the policy that you created in the previous step to that role. For information, see [Step 2: Create a Role for Repository Access in AccountA](https://docs.amazonaws.cn/codecommit/latest/userguide/cross-account-administrator-a.html#cross-account-create-role-a) in the *CodeCommit User Guide*.

1. Create a profile in the notebook instance that uses the role that you created in the previous step:

   1. Open the notebook instance.

   1. Open a terminal in the notebook instance.

   1. Edit a new profile by typing the following in the terminal:

      ```
      vi /home/ec2-user/.aws/config
      ```

   1. Edit the file with the following profile information:

      ```
      [profile CrossAccountAccessProfile]
      region = us-west-2
      role_arn = arn:aws:iam::CodeCommitAccount:role/CrossAccountRepositoryContributorRole
      credential_source=Ec2InstanceMetadata
      output = json
      ```

      Where *CodeCommitAccount* is the account that contains the CodeCommit repository, *CrossAccountAccessProfile* is the name of the new profile, and *CrossAccountRepositoryContributorRole* is the name of the role you created in the previous step.

1. On the notebook instance, configure git to use the profile you created in the previous step:

   1. Open the notebook instance.

   1. Open a terminal in the notebook instance.

   1. Edit the Git configuration file typing the following in the terminal:

      ```
      vi /home/ec2-user/.gitconfig
      ```

   1. Edit the file with the following profile information:

      ```
      [credential]
              helper = !aws codecommit credential-helper --profile CrossAccountAccessProfile $@
              UseHttpPath = true
      ```

      Where *CrossAccountAccessProfile* is the name of the profile that you created in the previous step.

# Use Git Repositories in a Notebook Instance
<a name="git-nbi-use"></a>

When you open a notebook instance that has Git repositories associated with it, it opens in the default repository, which is installed in your notebook instance directly under `/home/ec2-user/SageMaker`. You can open and create notebooks, and you can manually run Git commands in a notebook cell. For example:

```
!git pull origin master
```

To open any of the additional repositories, navigate up one folder. The additional repositories are also installed as directories under `/home/ec2-user/SageMaker`.

If you open the notebook instance with a JupyterLab interface, the jupyter-git extension is installed and available to use. For information about the jupyter-git extension for JupyterLab, see [https://github.com/jupyterlab/jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).

When you open a notebook instance in JupyterLab, you see the git repositories associated with it on the left menu:

![\[Example file browser in JupyterLab.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/git-notebook.png)


You can use the jupyter-git extension to manage git visually, instead of using the command line:

![\[Example of the jupyter-git extension in JupyterLab.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/jupyterlab-git.png)


# Notebook Instance Metadata
<a name="nbi-metadata"></a>

When you create a notebook instance, Amazon SageMaker AI creates a JSON file on the instance at the location `/opt/ml/metadata/resource-metadata.json` that contains the `ResourceName` and `ResourceArn` of the notebook instance. You can access this metadata from anywhere within the notebook instance, including in lifecycle configurations. For information about notebook instance lifecycle configurations, see [Customization of a SageMaker notebook instance using an LCC script](notebook-lifecycle-config.md).

**Note**  
The `resource-metadata.json` file can be modified with root access.

The `resource-metadata.json` file has the following structure:

```
{
    "ResourceArn": "NotebookInstanceArn",
    "ResourceName": "NotebookInstanceName"
}
```

You can use this metadata from within the notebook instance to get other information about the notebook instance. For example, the following commands get the tags associated with the notebook instance:

```
NOTEBOOK_ARN=$(jq '.ResourceArn'
            /opt/ml/metadata/resource-metadata.json --raw-output)
aws sagemaker list-tags --resource-arn $NOTEBOOK_ARN
```

The output looks like the following:

```
{
    "Tags": [
        {
            "Key": "test",
            "Value": "true"
        }
    ]
}
```

# Monitor Jupyter Logs in Amazon CloudWatch Logs
<a name="jupyter-logs"></a>

Jupyter logs include important information such as events, metrics, and health information that provide actionable insights when running Amazon SageMaker notebooks. By importing Jupyter logs into CloudWatch Logs, customers can use CloudWatch Logs to detect anomalous behaviors, set alarms, and discover insights to keep the SageMaker AI notebooks running more smoothly. You can access the logs even when the Amazon EC2 instance that hosts the notebook is unresponsive, and use the logs to troubleshoot the unresponsive notebook. Sensitive information such as Amazon account IDs, secret keys, and authentication tokens in presigned URLs are removed so that customers can share logs without leaking private information. 

**To view Jupyter logs for a notebook instance:**

1. Sign in to the Amazon Web Services Management Console and open the SageMaker AI console at [https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/). 

1. Choose **Notebook instances**.

1. In the list of notebook instances, choose the notebook instance for which you want to view Jupyter logs by selecting the Notebook instance **Name**.

   This will bring you to the details page for that notebook instance.

1. Under **Monitor** on the notebook instance details page, choose **View logs**.

1. In the CloudWatch console, choose the log stream for your notebook instance. Its name is in the form `NotebookInstanceName/jupyter.log`.

For more information about monitoring CloudWatch logs for SageMaker AI, see [CloudWatch Logs for Amazon SageMaker AI](logging-cloudwatch.md).