Amazon Deep Learning AMI GPU PyTorch 2.6 (Ubuntu 22.04) - Amazon Deep Learning AMIs
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Deep Learning AMI GPU PyTorch 2.6 (Ubuntu 22.04)

For help getting started, see Getting started with DLAMI.

AMI name format

  • Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6.${PATCH-VERSION} (Ubuntu 22.04) ${YYYY-MM-DD}

Supported EC2 instances

  • Please refer to Important changes to DLAMI.

  • Deep Learning with OSS Nvidia Driver supports G4dn, G5, G6, Gr6, P4, P4de, P5, P5e, P5en.

The AMI includes the following:

  • Supported Amazon Service: Amazon EC2

  • Operating System: Ubuntu 22.04

  • Compute Architecture: x86

  • Python: /opt/pytorch/bin/python

  • NVIDIA Driver:

    • OSS Nvidia driver: 570.86.15

  • NVIDIA CUDA12.1 stack:

    • CUDA, NCCL and cuDDN installation path: /usr/local/cuda-12.6/

    • Default CUDA: 12.6

      • PATH /usr/local/cuda points to /usr/local/cuda-12.6/

      • Updated below env vars:

        • LD_LIBRARY_PATH to have /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/targets/x86_64-linux/lib

        • PATH to have /usr/local/cuda/bin/:/usr/local/cuda/include/

    • Compiled system NCCL Version present at /usr/local/cuda/: 2.24.3

    • PyTorch Compiled NCCL Version from PyTorch conda environment: 2.21.5

  • NCCL Tests Location: 

    • all_reduce, all_gather and reduce_scatter: /usr/local/cuda-xx.x/efa/test-cuda-xx.x/

    • To run NCCL tests, LD_LIBRARY_PATH is already with updated with needed paths.

    • Common PATHs are already added to LD_LIBRARY_PATH:

    • /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/lib:/usr/lib

    • LD_LIBRARY_PATH is updated with CUDA version paths

    • /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cud/targets/x86_64-linux/lib

  • EFA Installer: 1.38.0

  • Nvidia GDRCopy: 2.4.1

  • Nvidia Transformer Engine: v1.11.0

  • Amazon OFI NCCL: 1.13.2-aws

    • Installation path: /opt/aws-ofi-nccl/ . Path /opt/aws-ofi-nccl/lib is added to LD_LIBRARY_PATH.

    • Note: PyTorch package comes with dynamically linked Amazon OFI NCCL plugin as a conda package aws-ofi-nccl-dlc package as well and PyTorch will use that package instead of system Amazon OFI NCCL.

  • Amazon CLI v2 as aws2 and Amazon CLI v1 as aws

  • EBS volume type: gp3

  • Python version: 3.11

  • Query AMI-ID with SSM Parameter (example Region is us-east-1):

    • OSS Nvidia Driver:

      aws ssm get-parameter --region us-east-1 \ --name /aws/service/deeplearning/ami/x86_64/oss-nvidia-driver-gpu-pytorch-2.6-ubuntu-22.04/latest/ami-id \ --query "Parameter.Value" \ --output text
  • Query AMI-ID with AWSCLI (example Region is us-east-1):

    • OSS Nvidia Driver:

      aws ec2 describe-images --region us-east-1 \ --owners amazon --filters 'Name=name,Values=Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6.? (Ubuntu 22.04) ????????' 'Name=state,Values=available' \ --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' \ --output text

Notices

PyTorch Deprecation of Anaconda Channel

Starting with PyTorch2.6, Pytorch has deprecated support for Conda (see official announcement ). As a result, Pytorch 2.6 and above will move to using Python Virtual Environments. Tto activate the pytorch venv, please use source /opt/pytorch/bin/activate

P5/P5e Instances:

  • DeviceIndex is unique to each NetworkCard, and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex is 0 or 1. Below is the example of EC2 P5 instance launch command using awscli showing NetworkCardIndex from number 0-31 and DeviceIndex as 0 for first interface and DeviceIndex as 1 for rest 31 interrfaces.

aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ ... "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"
Kernel
  • Kernel version is pinned using command: 

    echo linux-aws hold | sudo dpkg —set-selections echo linux-headers-aws hold | sudo dpkg —set-selections echo linux-image-aws hold | sudo dpkg —set-selections
  • We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions: 

    echo linux-aws install | sudo dpkg —set-selections echo linux-headers-aws install | sudo dpkg —set-selections echo linux-image-aws install | sudo dpkg —set-selections apt-get upgrade -y
  • For each new version of DLAMI, latest available compatible kernel is used.

Release Date: 2025-02-21

AMI name: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.6.0 (Ubuntu 22.04) 20250220

Added

  • Initial release of Deep Learning AMI GPU PyTorch 2.6 (Ubuntu 22.04) series. Including a Python virtual environment pytorch (source /opt/pytorch/bin/activate), complimented with NVIDIA Driver R570, CUDA=12.6, cuDNN=9.7, PyTorch NCCL=2.21.5, and EFA=1.38.0.

    • Starting with PyTorch2.6, Pytorch has deprecated support for Conda (see official announcement ). As a result, Pytorch 2.6 and above will move to using Python Virtual Environments. Tto activate the pytorch venv, please activate using source /opt/pytorch/bin/activate