Monitoring - Amazon Deep Learning AMIs
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitoring

Your DLAMI comes preinstalled with several GPU monitoring tools. This guide also mentions tools that are available to download and install.

  • Monitor GPUs with CloudWatch - a preinstalled utility that reports GPU usage statistics to Amazon CloudWatch.

  • nvidia-smi CLI - a utility to monitor overall GPU compute and memory utilization. This is preinstalled on your Amazon Deep Learning AMIs (DLAMI).

  • NVML C library - a C-based API to directly access GPU monitoring and management functions. This used by the nvidia-smi CLI under the hood and is preinstalled on your DLAMI. It also has Python and Perl bindings to facilitate development in those languages. The gpumon.py utility preinstalled on your DLAMI uses the pynvml package from nvidia-ml-py.

  • NVIDIA DCGM - A cluster management tool. Visit the developer page to learn how to install and configure this tool.

Tip

Check out NVIDIA's developer blog for the latest info on using the CUDA tools installed your DLAMI: