Amazon SageMaker Profiler

Amazon SageMaker Profiler is currently in preview release and available at no cost in supported Amazon Web Services Regions. The generally available version of Amazon SageMaker Profiler (if any) may include features and pricing that are different than those offered in preview.

Amazon SageMaker Profiler is a capability of Amazon SageMaker AI that provides a detailed view into the Amazon compute resources provisioned during training deep learning models on SageMaker AI. It focuses on profiling the CPU and GPU usage, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across CPUs and GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. SageMaker Profiler also offers a user interface (UI) that visualizes the profile, a statistical summary of profiled events, and the timeline of a training job for tracking and understanding the time relationship of the events between GPUs and CPUs.

Note

SageMaker Profiler supports PyTorch and TensorFlow and is available in Amazon Deep Learning Containers for SageMaker AI. To learn more, see Supported framework images, Amazon Web Services Regions, and instance types.

For data scientists

Training deep learning models on a large compute cluster often has computational optimization problems, such as bottlenecks, kernel launch latencies, memory limit, and low resource utilization.

To identify such computational performance issues, you need to profile deeper into the compute resources to understand which kernels introduce latencies and which operations cause bottlenecks. Data scientists can take the benefit from using the SageMaker Profiler UI for visualizing the detailed profile of training jobs. The UI provides a dashboard furnished with summary charts and a timeline interface to track every event on the compute resources. Data scientists can also add custom annotations to track certain parts of the training job using the SageMaker Profiler Python modules.

For administrators

Through the Profiler landing page in the SageMaker AI console or SageMaker AI domain, you can manage the Profiler application users if you are an administrator of an Amazon account or SageMaker AI domain. Each domain user can access their own Profiler application given the granted permissions. As a SageMaker AI domain administrator and domain user, you can create and delete the Profiler application given the permission level you have.

Topics

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Profile and optimize computational performance

Supported framework images, Amazon Web Services Regions, and instance types