JupyterLab user guide - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

JupyterLab user guide

This guide shows JupyterLab users how to run analytics and machine learning workflows within SageMaker Studio. You can get fast storage and scale your compute up or down, depending on your needs.

JupyterLab supports both private and shared spaces. Private spaces are scoped to a single user in a domain. Shared spaces let other users in your domain collaborate with you in real time. For information about Studio spaces, see Amazon SageMaker Studio spaces.

To get started using JupyterLab, create a space and launch your JupyterLab application. The space running your JupyterLab application is a JupyterLab space. The JupyterLab space uses a single Amazon EC2 instance for your compute and a single Amazon EBS volume for your storage. Everything in your space such as your code, git profile, and environment variables are stored on the same Amazon EBS volume. The volume has 3000 IOPS and a throughput of 125 megabytes per second (MBps). You can use the fast storage to open and run multiple Jupyter notebooks on the same instance. You can also switch kernels in a notebook very quickly.

Your administrator has configured the default Amazon EBS storage settings for your space. The default storage size is 5 GB, but you can increase the amount of space that you get. You can talk to your administrator to provide you with guidelines.

You can switch the Amazon EC2 instance type that you’re using to run JupyterLab, scaling your compute up or down depending on your needs. The Fast launch instances start up much faster than the other instances.

Your administrator might provide you with a lifecycle configuration that customizes your environment. You can specify the lifecycle configuration when you create the space.

If your administrator gives you access to an Amazon EFS, you can configure your JupyterLab space to access it.

By default, the JupyterLab application uses the SageMaker distribution image. This includes support for many machine learning, analytics, and deep learning packages. However, if you need a custom image, your administrator can help provide access to the custom images.

The Amazon EBS volume persists independently from the life of an instance. You won’t lose your data when you change instances. Use the conda and pip package management libraries to create reproducible custom environments that persist even when you switch instance types.

To get started using JupyterLab, create a space or choose the space that your administrator created for you and open JupyterLab.

Use the following procedure to create a space and open JupyterLab.

To create a space and open JupyterLab
  1. Open Studio. For information about opening Studio, see Launch Amazon SageMaker Studio.

  2. Choose JupyterLab.

  3. Choose Create JupyterLab space.

  4. For Name, specify the name of the space.

  5. (Optional) Select Share with my domain to create a shared space.

  6. Choose Create space.

  7. (Optional) For Instance, specify the Amazon EC2 instance that runs the space.

  8. (Optional) For Image, specify an image that your administrator provided to customize your environment.

  9. (Optional) For Space Settings, specify the following:

    • Storage (GB) – Up to 100 GB or the amount that your administrator specifies.

    • Lifecycle Configuration – A lifecycle configuration that your administrator specifies.

    • Attach custom EFS filesystem – An Amazon EFS to which your administrator provides access.

  10. Choose Run space.

  11. Choose Open JupyterLab.

Configure space

After you create a JupyterLab space, you can configure it to do the following:

  • Change the instance type.

  • Change the storage volume.

  • (Admin set up required) Use a custom image.

  • (Admin set up required) Use a lifecycle configuration.

  • (Admin set up required) Attach a custom Amazon EFS.

Important

You must stop the JupyterLab space every time you configure it. Use the following procedure to configure the space.

To configure a space
  1. Within Studio, navigate to the JupyterLab application page.

  2. Choose the name of the space.

  3. (Optional) For Image, specify an image that your administrator provided to customize your environment.

  4. (Optional) For Space Settings, specify the following:

    • Storage (GB) – Up to 100 GB or the amount that your administrator configured for the space.

    • Lifecycle Configuration – A lifecycle configuration that your administrator provides.

    • Attach custom EFS filesystem – An Amazon EFS to which your administrator provides access.

  5. Choose Run space.

When you open the JupyterLab application, your space has the updated configuration.

After you open JupyterLab, you can configure your environment using the terminal. To open the terminal, navigate to the Launcher and choose Terminal.

The following are examples of different ways that you can configure an environment in JupyterLab.

Note

Within Studio, you can use lifecycle configurations to customize your environment, but we recommend using a package manager instead. Using lifecycle configurations is a more error-prone method. It’s easier to add or remove dependencies than it is to debug a lifecycle configuration script. It can also increase the JupyterLab startup time.

For information about lifecycle configurations, see Using lifecycle configurations with JupyterLab.

Customize your environment using a package manager

Use pip or conda to customize your environment. We recommend using package managers instead of lifecycle configuration scripts.

Create and activate your custom environment

This section provides examples of different ways that you can configure an environment in JupyterLab.

A basic conda environment has the minimum number of packages that are required for your workflows in SageMaker. Use the following template to a create a basic conda environment:

# initialize conda for shell interaction conda init # create a new fresh environment conda create --name test-env # check if your new environment is created successfully conda info --envs # activate the new environment conda activate test-env # install packages in your new conda environment conda install pip boto3 pandas ipykernel # list all packages install in your new environment conda list # parse env name information from your new environment export CURRENT_ENV_NAME=$(conda info | grep "active environment" | cut -d : -f 2 | tr -d ' ') # register your new environment as Jupyter Kernel for execution python3 -m ipykernel install --user --name $CURRENT_ENV_NAME --display-name "user-env:($CURRENT_ENV_NAME)" # to exit your new environment conda deactivate

The following image shows the location of the environment that you've created.

The test-env environment is displayed in the top right corner of the screen.

To change your environment, choose it and select an option from the dropdown menu.

The checkmark and its corresponding text shows an example environment that you've previously created.

Choose Select to select a kernel for the environment.

Clean up a conda environment

Cleaning up conda environments that you’re not using can help free up disk space and improve performance. Use the following template to clean up a conda environment:

# list your environments to select an environment to clean conda info --envs # or conda info -e # once you've selected your environment to purge conda remove --name test-env --all # run conda environment list to ensure the target environment is purged conda info --envs # or conda info -e

Create a conda environment with a specific Python version

Cleaning up conda environments that you’re not using can help free up disk space and improve performance. Use the following template to clean up a conda environment:

# create a conda environment with a specific python version conda create --name py38-test-env python=3.8.10 # activate and test your new python version conda activate py38-test-env & python3 --version # Install ipykernel to facilicate env registration conda install ipykernel # parse env name information from your new environment export CURRENT_ENV_NAME=$(conda info | grep "active environment" | cut -d : -f 2 | tr -d ' ') # register your new environment as Jupyter Kernel for execution python3 -m ipykernel install --user --name $CURRENT_ENV_NAME --display-name "user-env:($CURRENT_ENV_NAME)" # deactivate your py38 test environment conda deactivate

Create a conda environment with a specific set of packages

Use the following template to create a conda environment with a specific version of Python and set of packages:

# prefill your conda environment with a set of packages, conda create --name py38-test-env python=3.8.10 pandas matplotlib=3.7 scipy ipykernel # activate your conda environment and ensure these packages exist conda activate py38-test-env # check if these packages exist conda list | grep -E 'pandas|matplotlib|scipy' # parse env name information from your new environment export CURRENT_ENV_NAME=$(conda info | grep "active environment" | cut -d : -f 2 | tr -d ' ') # register your new environment as Jupyter Kernel for execution python3 -m ipykernel install --user --name $CURRENT_ENV_NAME --display-name "user-env:($CURRENT_ENV_NAME)" # deactivate your conda environment conda deactivate

Clone conda from an existing environment

Clone your conda environment to preserve its working state. You experiment in the cloned environment without having to worry about introducing breaking changes in your test environment.

Use the following command to clone an environment.

# create a fresh env from a base environment conda create --name py310-base-ext --clone base # replace 'base' with another env # activate your conda environment and ensure these packages exist conda activate py310-base-ext # install ipykernel to register your env conda install ipykernel # parse env name information from your new environment export CURRENT_ENV_NAME=$(conda info | grep "active environment" | cut -d : -f 2 | tr -d ' ') # register your new environment as Jupyter Kernel for execution python3 -m ipykernel install --user --name $CURRENT_ENV_NAME --display-name "user-env:($CURRENT_ENV_NAME)" # deactivate your conda environment conda deactivate

Clone conda from a reference YAML file

Create a conda environment from a reference YAML file. The following is an example of a YAML file that you can use.

# anatomy of a reference environment.yml name: py311-new-env channels: - conda-forge dependencies: - python=3.11 - numpy - pandas - scipy - matplotlib - pip - ipykernel - pip: - git+https://github.com/huggingface/transformers

Under pip, we recommend specifying only the dependencies that aren't available with conda.

Use the following commands to create a conda environment from a YAML file.

# create your conda environment conda create -f environment.yml # activate your env conda activate py311-new-env

Share environments between instance types

You can share conda environments by saving them to an Amazon EFS directory outside of your Amazon EBS volume. Another user can access the environment in the directory where you saved it.

Important

There are limitations with sharing your environments. For example, we don't recommend an environment meant to run on a GPU Amazon EC2 instance over an environment running on a CPU instance.

Use the following commands as a template to specify the target directory where you’re creating a custom environment. You’re creating a conda within a particular path. You create it within the Amazon EFS directory. You can spin up a new instance and do conda activate path and do it within the Amazon EFS.

# if you know your environment path for your conda environment conda create --prefix /home/sagemaker-user/my-project/py39-test python=3.9 # activate the env with full path from prefix conda activate home/sagemaker-user/my-project/py39-test # parse env name information from your new environment export CURRENT_ENV_NAME=$(conda info | grep "active environment" | awk -F' : ' '{print $2}' | awk -F'/' '{print $NF}') # register your new environment as Jupyter Kernel for execution python3 -m ipykernel install --user --name $CURRENT_ENV_NAME --display-name "user-env-prefix:($CURRENT_ENV_NAME)" # deactivate your conda environment conda deactivate

Use Amazon Q to Expedite Your Machine Learning Workflows

Amazon Q Developer is your AI-powered companion for machine learning development. With Amazon Q Developer, you can:

  • Receive step-by-step guidance on using SageMaker features independently or in combination with other Amazon services.

  • Get sample code to get started on your ML tasks such as data preparation, training, inference, and MLOps.

  • Receive troubleshooting assistance to debug and resolve errors encountered while running code in JupyterLab.

Amazon Q Developer seamlessly integrates into your JupyterLab environment. To use Amazon Q Developer, choose the Q from the left-hand navigation of your JupyterLab environment.

If you don't see the Q icon, your administrator needs to set it up for you. For more information about setting up Amazon Q Developer, see Set up Amazon Q Developer for your users.

Amazon Q automatically provides suggestions to help you write your code. You can also ask for suggestions through the chat interface.

After you get a suggestion, you can either replace the code in the cell or you can add it to a new cell.