Amazon Deep Learning OSS AMI GPU PyTorch 2.7 (Amazon Linux 2023) - Amazon Deep Learning AMIs
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Deep Learning OSS AMI GPU PyTorch 2.7 (Amazon Linux 2023)

For help getting started, see Getting started with DLAMI.

AMI name format

  • Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Amazon Linux 2023) ${YYYY-MM-DD}

Supported EC2 instances

The AMI includes the following:

  • Supported Amazon Service: Amazon EC2

  • Operating System: Amazon Linux 2023

  • Compute Architecture: x86

  • Linux Kernel: 6.1

  • NVIDIA Driver: 570.133.20

  • NVIDIA CUDA 12.8 stack:

    • CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-12.8/

    • NCCL Tests Location:

      • all_reduce, all_gather, and reduce_scatter:

        /usr/local/cuda-12.8/efa/test-cuda-12.8/
      • To run NCCL tests, LD_LIBRARY_PATH is already updated with needed paths.

        • Common PATHs are already added to LD_LIBRARY_PATH:

          /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/amazon/ofi-nccl/lib:/usr/local/lib:/usr/lib
        • LD_LIBRARY_PAT is updated with CUDA version paths:

          /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/targets/x86_64-linux/lib
    • Compiled NCCL Version:

      • For CUDA directory of 12.8, compiled NCCL Version 2.26.2+CUDA12.8

    • Default CUDA: 12.8

      • PATH /usr/local/cuda points to CUDA 12.8

      • Updated below env vars:

        • LD_LIBRARY_PATH to have /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/targets/x86_64-linux/lib

        • PATH to have /usr/local/cuda/bin/:/usr/local/cuda/include/

  • EFA Installer: 1.40.0

  • Nvidia GDRCopy: 2.5

  • Amazon OFI NCCL: 1.14.2-aws

    • Installation path: /opt/amazon/ofi-nccl/. Path /opt/amazon/ofi-nccl/lib is added to LD_LIBRARY_PATH

  • Amazon CLI v2 at /usr/local/bin/aws

  • EBS volume type: gp3

  • Nvidia container toolkit: 1.17.7

    • Version command: nvidia-container-cli -V

  • Docker: 25.0.8

  • Python: /usr/bin/python3.12

  • Query AMI-ID with SSM Parameter (example region is us-east-1):

    aws ssm get-parameter --region us-east-1 \ --name /aws/service/deeplearning/ami/x86_64/oss-nvidia-driver-gpu-pytorch-2.7-amazon-linux-2023/latest/ami-id \ --query "Parameter.Value" \ --output text
  • Query AMI-ID with AWSCLI (example region is us-east-1):

    aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Amazon Linux 2023) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text

Notices

P6-B200 Instances

  • P6-B200 instances require CUDA version 12.8 or higher and NVIDIA driver 570 or newer drivers.

  • P6-B200 contain 8 network interface cards and can be launched using the following Amazon CLI command:

aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instanace,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces ""NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=7,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"

P5/P5e Instances

  • DeviceIndex is unique to each NetworkCard and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex are 0 or 1. Below is an example of EC2 P5 instance launch command using awscli showing NetworkCardIndex for numbers 0-31 and DeviceIndex as 0 for the first interface and 1 for the remaining 31 interfaces.

aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instanace,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces ""NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"

Kernel

  • Kernel version is pinned using command:

    sudo dnf versionlock kernel*
  • We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions:

    sudo dnf versionlock delete kernel* sudo dnf update -y
  • For each new version of DLAMI, latest available compatible kernel is used.

PyTorch Deprecation of Anaconda Channel

Starting with PyTorch 2.6, PyTorch has deprecated support for Conda (see official announcement). As a result, PyTorch 2.6 and above will move to using Python Virtual Environments. To activate the PyTorch venv, please use source /opt/pytorch/bin/activate

Release Date: 2025-05-22

AMI name: Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (Amazon Linux 2023) 20250520

Added

  • Initial release of Deep Learning AMI GPU PyTorch 2.7 (Amazon Linux 2023) series. Including a Python virtual environment pytorch (source /opt/pytorch/bin/activate) complimented with NVIDIA Driver R570, CUDA=12.8, cuDNN=9.10, PyTorch NCCL=2.26.2, and EFA=1.40.0.