Run GPU-accelerated containers (Linux on EC2) - Amazon EKS
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Help improve this page

Want to contribute to this user guide? Choose the Edit this page on GitHub link that is located in the right pane of every page. Your contributions will help make our user guide better for everyone.

Run GPU-accelerated containers (Linux on EC2)

The Amazon EKS optimized accelerated Amazon Linux AMIs are built on top of the standard Amazon EKS optimized Amazon Linux AMIs. For details on these AMIs, see Amazon EKS optimized accelerated Amazon Linux AMIs. The following text describes how to enable Amazon Neuron-based workloads.

To enable Amazon Neuron (ML accelerator) based workloads

For details on training and inference workloads using Neuron in Amazon EKS, see the following references:

The following procedure describes how to run a workload on a GPU based instance with the Amazon EKS optimized accelerated AMIs.

  1. After your GPU nodes join your cluster, you must apply the NVIDIA device plugin for Kubernetes as a DaemonSet on your cluster. Replace vX.X.X with your desired NVIDIA/k8s-device-plugin version before running the following command.

    kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/deployments/static/nvidia-device-plugin.yml
  2. You can verify that your nodes have allocatable GPUs with the following command.

    kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
  3. Create a file named nvidia-smi.yaml with the following contents. Replace tag with your desired tag for nvidia/cuda. This manifest launches an NVIDIA CUDA container that runs nvidia-smi on a node.

    apiVersion: v1 kind: Pod metadata: name: nvidia-smi spec: restartPolicy: OnFailure containers: - name: nvidia-smi image: nvidia/cuda:tag args: - "nvidia-smi" resources: limits: nvidia.com/gpu: 1
  4. Apply the manifest with the following command.

    kubectl apply -f nvidia-smi.yaml
  5. After the Pod has finished running, view its logs with the following command.

    kubectl logs nvidia-smi

    An example output is as follows.

    Mon Aug 6 20:23:31 20XX +-----------------------------------------------------------------------------+ | NVIDIA-SMI XXX.XX Driver Version: XXX.XX | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 | | N/A 46C P0 47W / 300W | 0MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+