Amazon Deep Learning OSS AMI GPU PyTorch 2.7 (Ubuntu 22.04) - Amazon Deep Learning AMIs
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Deep Learning OSS AMI GPU PyTorch 2.7 (Ubuntu 22.04)

For help getting started, see Getting started with DLAMI.

AMI name format

  • Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Ubuntu 22.04) ${YYYY-MM-DD}

Supported EC2 instances

The AMI includes the following:

  • Supported Amazon Service: Amazon EC2

  • Operating System: Ubuntu 22.04

  • Compute Architecture: x86

  • Linux Kernel: 6.8

  • NVIDIA Driver: 570.133.20

  • NVIDIA CUDA 12.8 stack:

    • CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-12.8/

    • NCCL Tests Location:

      • all_reduce, all_gather, and reduce_scatter:

        /usr/local/cuda-12.8/efa/test-cuda-12.8/
      • To run NCCL tests, LD_LIBRARY_PATH is already updated with needed paths.

        • Common PATHs are already added to LD_LIBRARY_PATH:

          /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/amazon/ofi-nccl/lib:/usr/local/lib:/usr/lib
        • LD_LIBRARY_PAT is updated with CUDA version paths:

          /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/targets/x86_64-linux/lib
    • Compiled NCCL Version:

      • For CUDA directory of 12.8, compiled NCCL Version 2.26.2+CUDA12.8

    • Default CUDA: 12.8

      • PATH /usr/local/cuda points to CUDA 12.8

      • Updated below env vars:

        • LD_LIBRARY_PATH to have /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/targets/x86_64-linux/lib

        • PATH to have /usr/local/cuda/bin/:/usr/local/cuda/include/

  • EFA Installer: 1.40.0

  • Nvidia GDRCopy: 2.5

  • Nvidia Transformer Engine: 1.11.0

  • Amazon OFI NCCL: 1.14.2-aws

    • Installation path: /opt/amazon/ofi-nccl/. Path /opt/amazon/ofi-nccl/lib is added to LD_LIBRARY_PATH

  • Amazon CLI v2 at /usr/local/bin/aws

  • EBS volume type: gp3

  • Nvidia container toolkit: 1.17.7

    • Version command: nvidia-container-cli -V

  • Docker: 28.2.2

  • Python: /usr/bin/python3.12

  • Query AMI-ID with SSM Parameter (example region is us-east-1):

    aws ssm get-parameter --region us-east-1 \ --name /aws/service/deeplearning/ami/x86_64/oss-nvidia-driver-gpu-pytorch-2.7-ubuntu-22.04/latest/ami-id \ --query "Parameter.Value" \ --output text
  • Query AMI-ID with AWSCLI (example region is us-east-1):

    aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Ubuntu 22.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text

Notices

Flash Attention

  • Flash attention does not yet have an official release for PyTorch 2.7. For this reason, it is temporarily removed from this AMI. Once an official release is made for PyTorch 2.7, we will include it in this AMI.

  • Without flash attention, transformer engine defaults to using cuDNN fused attention. There are currently known issues with fused attention and Blackwell GPUs, like P6-B200 instances.

    • "With compute capability sm10.0 (Blackwell-architecture) GPUs, the FP8 datatype with scaled dot-product attention contains a deadlock that causes the kernel to hang under some circumstances, such as when the problem size is large or the GPU is running multiple kernels simultaneously. A fix is planned for a future release." [cuDNN 9.10.0 release notes]

    • For users seeking to run P6-B200 instances with FP8 data and scaled dot-product attention, please consider installing flash attention manually.

P6-B200 Instances

  • P6-B200 instances require CUDA version 12.8 or higher and NVIDIA driver 570 or newer drivers.

  • P6-B200 contain 8 network interface cards and can be launched using the following Amazon CLI command:

aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instanace,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces ""NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=7,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"

P5/P5e Instances

  • DeviceIndex is unique to each NetworkCard and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex are 0 or 1. Below is an example of EC2 P5 instance launch command using awscli showing NetworkCardIndex for numbers 0-31 and DeviceIndex as 0 for the first interface and 1 for the remaining 31 interfaces.

aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instanace,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces ""NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"

Kernel

  • Kernel version is pinned using command:

    echo linux-aws hold | sudo dkpg -set-selections echo linux-headers-aws hold | sudo dpkg -set-selections echo linux-image-aws hold | sudo dpkg -set-selections
  • We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions:

    echo linux-aws install | sudo dpkg -set-selections echo linux-headers-aws install | sudo dpkg -set-selections echo linux-image-aws install | sudo dpkg -set-selections apt-get upgrade -y
  • For each new version of DLAMI, latest available compatible kernel is used.

PyTorch Deprecation of Anaconda Channel

Starting with PyTorch 2.6, PyTorch has deprecated support for Conda (see official announcement). As a result, PyTorch 2.6 and above will move to using Python Virtual Environments. To activate the PyTorch venv, please use source /opt/pytorch/bin/activate

Release Date: 2025-06-03

AMI name: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Ubuntu 22.04) 20250602

Added

  • Initial release of Deep Learning AMI GPU PyTorch 2.7 (Ubuntu 22.04) series. Including a Python virtual environment pytorch (source /opt/pytorch/bin/activate) complimented with NVIDIA Driver R570, CUDA=12.8, cuDNN=9.10, PyTorch NCCL=2.26.5, and EFA=1.40.0.

Known Issues

  • "With compute capability sm10.0 (Blackwell-architecture) GPUs, the FP8 datatype with scaled dot-product attention contains a deadlock that causes the kernel to hang under some circumstances, such as when the problem size is large or the GPU is running multiple kernels simultaneously. A fix is planned for a future release." [cuDNN 9.10.0 release notes]

    • For users seeking to run P6-B200 instances with FP8 data and scaled dot-product attention, please consider installing flash attention manually.