Current versions and supported instance families Troubleshoot your model container with GPU capabilities Best practices for working with mismatched driver versions

Best practices to minimize interruptions during GPU driver upgrades

SageMaker AI Model Deployment upgrades GPU drivers on the ML instances for Real-time, Batch, and Asynchronous Inference options over time to provide customers access to improvements from the driver providers. Below you can see the GPU version supported for each Inference option. Different driver versions can change how your model interacts with the GPUs. Below are some strategies to help you understand how your application works with different driver versions.

Current versions and supported instance families

Amazon SageMaker AI Inference supports the following drivers and instance families:

Service	GPU	Driver version	CUDA version	Instance types
Real-time	NVIDIA	470	CUDA 11.4	ml.p2., ml.p3., ml.p4d., ml.p4de., ml.g4dn., ml.g5.
		535	CUDA 12.2	ml.p5., ml.g6.
		550	CUDA 12.4	ml.p5e., ml.p5en.
470	CUDA 12.2	ml.p5., ml.g6.
550	CUDA 12.4	ml.p5e., ml.p5en.
Batch	NVIDIA	470	CUDA 11.4	ml.p2., ml.p3., ml.p4d., ml.p4de., ml.g4dn., ml.g5

Troubleshoot your model container with GPU capabilities

If you encounter an issue when running your GPU workload, see the following guidance:

Run the nvidia-smi (NVIDIA System Management Interface) command from within the Docker container. If the NVIDIA System Management Interface detects a GPU detection error or NVIDIA initialization error, it will return the following error message:


Failed to initialize NVML: Driver/library version mismatch

Based on your use case, follow these best practices to resolve the failure or error:

Follow the best practice recommendation described in the If you bring your own (BYO) model containers dropdown.
Follow the best practice recommendation described in the If you use a CUDA compatibility layer dropdown.

Refer to the NVIDIA System Management Interface page on the NVIDIA website for more information.

If your GPU instance uses NVIDIA driver versions that are not compatible with the CUDA version in the Docker container, then deploying an endpoint will fail with the following error message:


 Failure reason CannotStartContainerError. Please ensure the model container for variant <variant_name> starts correctly when invoked with 'docker run <image> serve'

Based on your use case, follow these best practices to resolve the failure or error:

Follow the best practice recommendation described in the The driver my container depends on is greater than the version on the ML GPU instances dropdown.
Follow the best practice recommendation described in the If you use a CUDA compatibility layer dropdown.

Best practices for working with mismatched driver versions

The following provides information on how to update your GPU driver:

No action is required. NVIDIA provides backwards compatibility.

If it is a minor version difference, no action is required. NVIDIA provides minor version forward compatibility.

If it is a major version difference, the CUDA Compatibility Package will need to be installed. Please refer to CUDA Compatibility Package in the NVIDIA documentation.

Important

The CUDA Compatibility Package is not backwards compatible so it needs to be disabled if the driver version on the instance is greater than the CUDA Compatibility Package version.

Ensure no NVIDIA driver packages are bundled in the image which could cause conflict with on host NVIDIA driver version.

To verify if the platform Nvidia driver version supports the CUDA Compatibility Package version installed in the model container, see the CUDA documentation. If the platform Nvidia driver version does not support the CUDA Compatibility Package version, you can disable or remove the CUDA Compatibility Package from the model container image. If the CUDA compatibility libs version is supported by the latest Nvidia driver version, we suggest that you enable the CUDA Compatibility Package based on the detected Nvidia driver version for future compatibility by adding the code snippet below into the container start up shell script (at the ENTRYPOINT script).

The script demonstrates how to dynamically switch the use of the CUDA Compatibility Package based on the detected Nvidia driver version on the deployed host for your model container. When SageMaker releases a newer Nvidia driver version, the installed CUDA Compatibility Package can be turned off automatically if the CUDA application is supported natively on the new driver.


#!/bin/bash

verlt() {
    [ "$1" = "$2" ] && return 1 || [ "$1" = "$(echo -e "$1\n$2" | sort -V | head -n1)" ]
}

if [ -f /usr/local/cuda/compat/libcuda.so.1 ]; then
    CUDA_COMPAT_MAX_DRIVER_VERSION=$(readlink /usr/local/cuda/compat/libcuda.so.1 | cut -d'.' -f 3-)
    echo "CUDA compat package should be installed for NVIDIA driver smaller than ${CUDA_COMPAT_MAX_DRIVER_VERSION}"
    NVIDIA_DRIVER_VERSION=$(sed -n 's/^NVRM.*Kernel Module *\([0-9.]*\).*$/\1/p' /proc/driver/nvidia/version 2>/dev/null || true)
    echo "Current installed NVIDIA driver version is ${NVIDIA_DRIVER_VERSION}"
    if verlt $NVIDIA_DRIVER_VERSION $CUDA_COMPAT_MAX_DRIVER_VERSION; then
        echo "Adding CUDA compat to LD_LIBRARY_PATH"
        export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH
        echo $LD_LIBRARY_PATH
    else
        echo "Skipping CUDA compat setup as newer NVIDIA driver is installed"
    fi
else
    echo "Skipping CUDA compat setup as package not found"
fi

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Inference cost optimization best practices

Best practices for endpoint security