Use Amazon SageMaker Profiler to profile activities on Amazon compute resources - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Use Amazon SageMaker Profiler to profile activities on Amazon compute resources

Amazon SageMaker Profiler is currently in preview release and available at no cost in supported Amazon Web Services Regions. The generally available version of Amazon SageMaker Profiler (if any) may include features and pricing that are different than those offered in preview.

Amazon SageMaker Profiler is a capability of Amazon SageMaker that provides a detailed view into the Amazon compute resources provisioned during training deep learning models on SageMaker. It focuses on profiling the CPU and GPU usage, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across CPUs and GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. SageMaker Profiler also offers a user interface (UI) that visualizes the profile, a statistical summary of profiled events, and the timeline of a training job for tracking and understanding the time relationship of the events between GPUs and CPUs.

Note

SageMaker Profiler supports PyTorch and TensorFlow and is available in Amazon Deep Learning Containers for SageMaker. To learn more, see Supported framework images, Amazon Web Services Regions, and instance types.

For data scientists

Training deep learning models on a large compute cluster often has computational optimization problems, such as bottlenecks, kernel launch latencies, memory limit, and low resource utilization.

To identify such computational performance issues, you need to profile deeper into the compute resources to understand which kernels introduce latencies and which operations cause bottlenecks. Data scientists can take the benefit from using the SageMaker Profiler UI for visualizing the detailed profile of training jobs. The UI provides a dashboard furnished with summary charts and a timeline interface to track every event on the compute resources. Data scientists can also add custom annotations to track certain parts of the training job using the SageMaker Profiler Python modules.

For administrators

Through the Profiler landing page in the SageMaker console or SageMaker domain, you can manage the Profiler application users if you are an administrator of an Amazon account or SageMaker domain. Each domain user can access their own Profiler application given the granted permissions. As a SageMaker domain administrator and domain user, you can create and delete the Profiler application given the permission level you have.

Supported framework images, Amazon Web Services Regions, and instance types

This feature supports the following machine learning frameworks and Amazon Web Services Regions.

Note

To use this feature, make sure that you have at least version 2.180.0 of the SageMaker Python SDK installed.

SageMaker framework images pre-installed with SageMaker Profiler

SageMaker Profiler is pre-installed in the following Amazon Deep Learning Containers for SageMaker.

PyTorch images

PyTorch versions Amazon DLC image URI
2.2.0

763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.2.0-gpu-py310-cu121-ubuntu20.04-sagemaker

2.1.0

763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.1.0-gpu-py310-cu121-ubuntu20.04-sagemaker

2.0.1

763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker

763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu121-ubuntu20.04-sagemaker

1.13.1

763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker

TensorFlow images

TensorFlow versions Amazon DLC image URI
2.13.0

763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.13.0-gpu-py310-cu118-ubuntu20.04-sagemaker

2.12.0

763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.12.0-gpu-py310-cu118-ubuntu20.04-sagemaker

2.11.0

763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.11.0-gpu-py39-cu112-ubuntu20.04-sagemaker

Important

Distribution and maintenance of the framework containers in the preceding tables are under the Framework Support Policy managed by the Amazon Deep Learning Containers service. We highly recommend you to upgrade to the currently supported framework versions, if you are using prior framework versions that are no longer supported.

Note

If you want to use SageMaker Profiler for other framework images or your own Docker images, you can install SageMaker Profiler using the SageMaker Profiler Python package binary files provided in the following section.

SageMaker Profiler Python package binary files

If you want to configure your own Docker container, use SageMaker Profiler in other pre-built containers for PyTorch and TensorFlow, or install the SageMaker Profiler Python package locally, use one the following binary files. Depending on the Python and CUDA versions in your environment, choose one of the following.

PyTorch

TensorFlow

For more information about how to install SageMaker Profiler using the binary files, see (Optional) Install the SageMaker Profiler Python package.

Supported Amazon Web Services Regions

SageMaker Profiler is available in the following Amazon Web Services Regions.

  • US East (N. Virginia) (us-east-1)

  • US East (Ohio) (us-east-2)

  • US West (Oregon) (us-west-2)

  • Europe (Frankfurt) (eu-central-1)

  • Europe (Ireland) (eu-west-1)

Supported instance types

SageMaker Profiler supports profiling of training jobs on the following instance types.

CPU and GPU profiling

  • ml.g4dn.12xlarge

  • ml.g5.24xlarge

  • ml.g5.48xlarge

  • ml.p3dn.24xlarge

  • ml.p4de.24xlarge

  • ml.p4d.24xlarge

  • ml.p5.48xlarge

GPU profiling only

  • ml.g5.2xlarge

  • ml.g5.4xlarge

  • ml.g5.8xlarge

  • ml.g5.16.xlarge

Prerequisites

The following list shows the prerequisites to start using SageMaker Profiler.

  • A SageMaker domain set up with Amazon VPC in your Amazon account.

    For instructions on setting up a domain, see Onboard to Amazon SageMaker domain using quick setup. You also need to add domain user profiles for individual users to access the Profiler UI application. For more information, see Add and remove SageMaker domain user profiles.

  • The following list is the minimum set of permissions for using the Profiler UI application.

    • sagemaker:CreateApp

    • sagemaker:DeleteApp

    • sagemaker:DescribeTrainingJob

    • sagemaker:Search

    • s3:GetObject

    • s3:ListBucket

Prepare and run a training job with SageMaker Profiler

Setting up to running a training job with the SageMaker Profiler consists of two steps: adapting the training script and configuring the SageMaker training job launcher.

Step 1: Adapt your training script using the SageMaker Profiler Python modules

To start capturing kernel runs on GPUs while the training job is running, modify your training script using the SageMaker Profiler Python modules. Import the library and add the start_profiling() and stop_profiling() methods to define the beginning and the end of profiling. You can also use optional custom annotations to add markers in the training script to visualize hardware activities during particular operations in each step.

Note that the annotators extract operations from GPUs. For profiling operations in CPUs, you don’t need to add any additional annotations. CPU profiling is also activated when you specify the profiling configuration, which you’ll practice in Step 2: Create a SageMaker framework estimator and activate SageMaker Profiler.

Note

Profiling an entire training job is not the most efficient use of resources. We recommend profiling at most 300 steps of a training job.

Important

The release on December 14, 2023 involves a breaking change. The SageMaker Profiler Python package name is changed from smppy to smprof. This is effective in the SageMaker Framework Containers for TensorFlow v2.12 and later.

If you use one of the previous versions of the SageMaker Framework Containers such TensorFlow v2.11.0, the SageMaker Profiler Python package is still available as smppy. If you are uncertain about which version or the package name you should use, replace the import statement of the SageMaker Profiler package with the following code snippet.

try: import smprof except ImportError: # backward-compatability for TF 2.11 and PT 1.13.1 images import smppy as smprof

Approach 1. Use the context manager smprof.annotate to annotate full functions

You can wrap full functions with the smprof.annotate() context manager. This wrapper is recommended if you want to profile by functions instead of code lines. The following example script shows how to implement the context manager to wrap the training loop and full functions in each iteration.

import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): with smprof.annotate("step_"+str(i)): inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() with smprof.annotate("Forward"): outputs = net(inputs) with smprof.annotate("Loss"): loss = criterion(outputs, labels) with smprof.annotate("Backward"): loss.backward() with smprof.annotate("Optimizer"): optimizer.step() SMProf.stop_profiling()

Approach 2. Use smprof.annotation_begin() and smprof.annotation_end() to annotate specific code line in functions

You can also define annotations to profile specific code lines. You can set the exact starting point and end point of profiling at the level of individual code lines, not by the functions. For example, in the following script, the step_annotator is defined at the beginning of each iteration and ends at the end of the iteration. Meanwhile, other detailed annotators for each operations are defined and wrap around the target operations throughout each iteration.

import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): step_annotator = smprof.annotation_begin("step_" + str(i)) inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() forward_annotator = smprof.annotation_begin("Forward") outputs = net(inputs) smprof.annotation_end(forward_annotator) loss_annotator = smprof.annotation_begin("Loss") loss = criterion(outputs, labels) smprof.annotation_end(loss_annotator) backward_annotator = smprof.annotation_begin("Backward") loss.backward() smprof.annotation_end(backward_annotator) optimizer_annotator = smprof.annotation_begin("Optimizer") optimizer.step() smprof.annotation_end(optimizer_annotator) smprof.annotation_end(step_annotator) SMProf.stop_profiling()

After annotating and setting up the profiler initiation modules, save the script to submit using a SageMaker training job launcher in the following Step 2. The sample launcher assumes that the training script is named train_with_profiler_demo.py.

Step 2: Create a SageMaker framework estimator and activate SageMaker Profiler

The following procedure shows how to prepare a SageMaker framework estimator for training using the SageMaker Python SDK.

  1. Set up a profiler_config object using the ProfilerConfig and Profiler modules as follows.

    from sagemaker import ProfilerConfig, Profiler profiler_config = ProfilerConfig( profile_params = Profiler(cpu_profiling_duration=3600) )

    The following is the description of the Profiler module and its argument.

    • Profiler: The module for activating SageMaker Profiler with the training job.

      • cpu_profiling_duration (int): Specify the time duration in seconds for profiling on CPUs. Default is 3600 seconds.

  2. Create a SageMaker framework estimator with the profiler_config object created in the previous step. The following code shows an example of creating a PyTorch estimator. If you want to create a TensorFlow estimator, import sagemaker.tensorflow.TensorFlow instead, and specify one of the TensorFlow versions supported by SageMaker Profiler. For more information about supported frameworks and instance types, see SageMaker framework images pre-installed with SageMaker Profiler.

    import sagemaker from sagemaker.pytorch import PyTorch estimator = PyTorch( framework_version="2.0.0", role=sagemaker.get_execution_role(), entry_point="train_with_profiler_demo.py", # your training job entry point source_dir=source_dir, # source directory for your training script output_path=output_path, base_job_name="sagemaker-profiler-demo", hyperparameters=hyperparameters, # if any instance_count=1, # Recommended to test with < 8 instance_type=ml.p4d.24xlarge, profiler_config=profiler_config )
  3. Start the training job by running the fit method. With wait=False, you can silence the training job logs and let it run in the background.

    estimator.fit(wait=False)

While running the training job or after the job has completed, you can go to the next topic at Open the SageMaker Profiler UI application and start exploring and visualizing the saved profiles.

If you want to directly access the profile data saved in the Amazon S3 bucket, use the following script to retrieve the S3 URI.

import os # This is an ad-hoc function to get the S3 URI # to where the profile output data is saved def get_detailed_profiler_output_uri(estimator): config_name = None for processing in estimator.profiler_rule_configs: params = processing.get("RuleParameters", dict()) rule = config_name = params.get("rule_to_invoke", "") if rule == "DetailedProfilerProcessing": config_name = processing.get("RuleConfigurationName") break return os.path.join( estimator.output_path, estimator.latest_training_job.name, "rule-output", config_name, ) print( f"Profiler output S3 bucket: ", get_detailed_profiler_output_uri(estimator) )

(Optional) Install the SageMaker Profiler Python package

To use SageMaker Profiler on PyTorch or TensorFlow framework images not listed in SageMaker framework images pre-installed with SageMaker Profiler, or on your own custom Docker container for training, you can install SageMaker Profiler by using one of the SageMaker Profiler Python package binary files.

Option 1: Install the SageMaker Profiler package while launching a training job

If you want to use SageMaker Profiler for training jobs using PyTorch or TensorFlow images not listed in SageMaker framework images pre-installed with SageMaker Profiler, create a requirements.txt file and locate it under the path you specify to the source_dir parameter of the SageMaker framework estimator in Step 2. For more information about setting up a requirements.txt file in general, see Using third-party libraries in the SageMaker Python SDK documentation. In the requirements.txt file, add one of the S3 bucket paths for the SageMaker Profiler Python package binary files.

# requirements.txt https://smppy.s3.amazonaws.com/tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl

Option 2: Install the SageMaker Profiler package in your custom Docker containers

If you use a custom Docker container for training, add one of the SageMaker Profiler Python package binary files to your Dockerfile.

# Install the smprof package version compatible with your CUDA version RUN pip install https://smppy.s3.amazonaws.com/tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl

For guidance on running a custom Docker container for training on SageMaker in general, see Adapting your own training container.

Open the SageMaker Profiler UI application

You can access the SageMaker Profiler UI application through the following options.

Option 1: Launch the SageMaker Profiler UI from the domain details page

If you have access to the SageMaker console, you can take this option.

Navigate to the domain details page

The following procedure shows how to navigate to the domain details page.

  1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.

  2. On the left navigation pane, choose domains.

  3. From the list of domains, select the domain in which you want to launch the SageMaker Profiler application.

Launch the SageMaker Profiler UI application

The following procedure shows how to launch the SageMaker Profiler application that is scoped to a user profile.

  1. On the domain details page, choose the User profiles tab.

  2. Identify the user profile for which you want to launch the SageMaker Profiler UI application.

  3. Choose Launch for the selected user profile, and choose Profiler.

Option 2: Launch the SageMaker Profiler UI application from the SageMaker Profiler landing page in the SageMaker console

The following procedure describes how to launch the SageMaker Profiler UI application from the SageMaker Profiler landing page in the SageMaker console. If you have access to the SageMaker console, you can take this option.

  1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.

  2. On the left navigation pane, choose Profiler.

  3. Under Get started, select the domain in which you want to launch the Studio Classic application. If your user profile only belongs to one domain, you do not see the option for selecting a domain.

  4. Select the user profile for which you want to launch the SageMaker Profiler UI application. If there is no user profile in the domain, choose Create user profile. For more information about creating a new user profile, see Add and Remove User Profiles.

  5. Choose Open Profiler.

Option 3: Use the application launcher function in the SageMaker Python SDK

If you are a SageMaker domain user and have access only to SageMaker Studio, you can access the SageMaker Profiler UI application through SageMaker Studio Classic by running the sagemaker.interactive_apps.detail_profiler_app.DetailProfilerApp function.

Note that SageMaker Studio Classic is the previous Studio UI experience before re:Invent 2023, and is migrated as an application into a newly designed Studio UI at re:Invent 2023. The SageMaker Profiler UI application is available at SageMaker domain level, and thus requires your domain ID and user profile name. Currently, the DetailedProfilerApp function only works within the SageMaker Studio Classic application; the function properly takes in the domain and user profile information from SageMaker Studio Classic.

For domain, domain users, and Studio created before re:Invent 2023, Studio Classic would be the default experience unless you have updated it following the instructions at Migrating from Amazon SageMaker Studio Classic. If this is your case, there's no further action needed, and you can directly launch the SageMaker Profiler UI application by running the DetailProfilerApp funciton.

If you created a new domain and Studio after re:Invent 2023, launch the Studio Classic application within the Studio UI and then run the DetailProfilerApp function to launch the SageMaker Profiler UI application.

Note that the DetailedProfilerApp function doesn’t work in other SageMaker machine learning IDEs, such as the SageMaker Studio JupyterLab application, the SageMaker Studio Code Editor application, and SageMaker Notebook instances. If you run the DetailedProfilerApp function in those IDEs, it returns a URL to the Profiler landing page in the SageMaker console, instead of a direct link to open the Profiler UI application.

Explore the profile output data visualized in the SageMaker Profiler UI

This section walks through the SageMaker Profiler UI and provides tips for how to use and gain insights from it.

Load profile

When you open the SageMaker Profiler UI, the Load profile page opens up. To load and generate the Dashboard and Timeline, go through the following procedure.

To load the profile of a training job
  1. From the List of training jobs section, use the check box to choose the training job for which you want to load the profile.

  2. Choose Load. The job name should appear in the Loaded profile section at the top.

  3. Choose the radio button on the left of the Job name to generate the Dashboard and Timeline. Note that when you choose the radio button, the UI automatically opens the Dashboard. Note also that if you generate the visualizations while the job status and loading status still appear to be in progress, the SageMaker Profiler UI generates Dashboard plots and a Timeline up to the most recent profile data collected from the ongoing training job or the partially loaded profile data.

Tip

You can load and visualize one profile at a time. To load another profile, you must first unload the previously loaded profile. To unload a profile, use the trash bin icon on the right end of the profile in the Loaded profile section.


                    A screenshot of the Load profile page in the SageMaker Profiler
                        UI

Dashboard

After you finish loading and selecting the training job, the UI opens the Dashboard page furnished with the following panels by default.

  • GPU active time – This pie chart shows the percentage of GPU active time versus GPU idle time. You can check if your GPUs are more active than idle throughout the entire training job. GPU active time is based on the profile data points with a utilization rate greater than 0%, whereas GPU idle time is the profiled data points with 0% utilization.

  • GPU utilization over time – This timeline graph shows the average GPU utilization rate over time per node, aggregating all of the nodes in a single chart. You can check if the GPUs have an unbalanced workload, under-utilization issues, bottlenecks, or idle issues during certain time intervals. To track the utilization rate at the individual GPU level and related kernel runs, use the Timeline interface. Note that the GPU activity collection starts from where you added the profiler starter function SMProf.start_profiling() in your training script, and stops at SMProf.stop_profiling().

  • CPU active time – This pie chart shows the percentage of CPU active time versus CPU idle time. You can check if your CPUs are more active than idle throughout the entire training job. CPU active time is based on the profiled data points with a utilization rate greater than 0%, whereas CPU idle time is the profiled data points with 0% utilization.

  • CPU utilization over time – This timeline graph shows the average CPU utilization rate over time per node, aggregating all of the nodes in a single chart. You can check if the CPUs are bottlenecked or underutilized during certain time intervals. To track the utilization rate of the CPUs aligned with the individual GPU utilization and kernel runs, use the Timeline interface. Note that the utilization metrics start from the start from the job initialization.

  • Time spent by all GPU kernels – This pie chart shows all GPU kernels operated throughout the training job. It shows the top 15 GPU kernels by default as individual sectors and all other kernels in one sector. Hover over the sectors to see more detailed information. The value shows the total time of the GPU kernels operated in seconds, and the percentage is based on the entire time of the profile.

  • Time spent by top 15 GPU kernels – This pie chart shows all GPU kernels operated throughout the training job. It shows the top 15 GPU kernels as individual sectors. Hover over the sectors to see more detailed information. The value shows the total time of the GPU kernels operated in seconds, and the percentage is based on the entire time of the profile.

  • Launch counts of all GPU kernels – This pie chart shows the number of counts for every GPU kernel launched throughout the training job. It shows the top 15 GPU kernels as individual sectors and all other kernels in one sector. Hover over the sectors to see more detailed information. The value shows the total count of the launched GPU kernels, and the percentage is based on the entire count of all kernels.

  • Launch counts of top 15 GPU kernels – This pie chart shows the number of counts of every GPU kernel launched throughout the training job. It shows the top 15 GPU kernels. Hover over the sectors to see more detailed information. The value shows the total count of the launched GPU kernels, and the percentage is based on the entire count of all kernels.

  • Step time distribution – This histogram shows the distribution of step durations on GPUs. This plot is generated only after you add the step annotator in your training script.

  • Kernel precision distribution – This pie chart shows the percentage of time spent on running kernels in different data types such as FP32, FP16, INT32, and INT8.

  • GPU activity distribution – This pie chart shows the percentage of time spent on GPU activities, such as running kernels, memory (memcpy and memset), and synchronization (sync).

  • GPU memory operations distribution – This pie chart shows the percentage of time spent on GPU memory operations. This visualizes the memcopy activities and helps identify if your training job is spending excessive time on certain memory operations.

  • Create a new histogram – Create a new diagram of a custom metric you annotated manually during Step 1: Adapt your training script using the SageMaker Profiler Python modules. When adding a custom annotation to a new histogram, select or type the name of the annotation you added in the training script. For example, in the demo training script in Step 1, step, Forward, Backward, Optimize, and Loss are the custom annotations. While creating a new histogram, these annotation names should appear in the drop-down menu for metric selection. If you choose Backward, the UI adds the histogram of the time spent on backward passes throughout the profiled time to the Dashboard. This type of histogram is useful for checking if there are outliers taking abnormally longer time and causing bottleneck problems.

The following screenshots show the GPU and CPU active time ratio and the average GPU and CPU utilization rate with respect to time per compute node.


                    A screenshot of the Dashboard page in the SageMaker Profiler
                        UI

The following screenshot shows an example of pie charts for comparing how many times the GPU kernels are launched and measuring the time spent on running them. In the Time spent by all GPU kernels and Launch counts of all GPU kernels panels, you can also specify an integer to the input field for k to adjust the number of legend to show in the plots. For example, if you specify 10, the plots show the top 10 most run and launched kernels respectively.


                    A screenshot of the Dashboard page in the SageMaker Profiler
                        UI

The following screenshot shows an example of step time duration histogram, and pie charts for the kernel precision distribution, GPU activity distribution, and GPU memory operation distribution.


                    A screenshot of the Dashboard page in the SageMaker Profiler
                        UI

Timeline interface

To gain a detailed view into the compute resources at the level of operations and kernels scheduled on the CPUs and run on the GPUs, use the Timeline interface.

You can zoom in and out and pan left or right in the timeline interface using your mouse, the [w, a, s, d] keys, or the four arrow keys on the keyboard.

Tip

For more tips on the keyboard shortcuts to interact with the Timeline interface, choose Keyboard shortcuts in the left pane.

The timeline tracks are organized in a tree structure, giving you information from the host level to the device level. For example, if you run N instances with eight GPUs in each, the timeline structure of each instance would be as follows.

  • algo-inode – This is what SageMaker tags to assign jobs to provisioned instances. The digit inode is randomly assigned. For example, if you use 4 instances, this section expands from algo-1 to algo-4.

    • CPU – In this section, you can check the average CPU utilization rate and performance counters.

    • GPUs – In this section, you can check the average GPU utilization rate, individual GPU utilization rate, and kernels.

      • SUM Utilization – The average GPU utilization rates per instance.

      • HOST-0 PID-123 – A unique name assigned to each process track. The acronym PID is the process ID, and the number appended to it is the process ID number that's recorded during data capture from the process. This section shows the following information from the process.

        • GPU-inum_gpu utilization – The utilization rate of the inum_gpu-th GPU over time.

        • GPU-inum_gpu device – The kernel runs on the inum_gpu-th GPU device.

          • stream icuda_stream – CUDA streams showing kernel runs on the GPU device. To learn more about CUDA streams, see the slides in PDF at CUDA C/C++ Streams and Concurrency provided by NVIDIA.

        • GPU-inum_gpu host – The kernel launches on the inum_gpu-th GPU host.

The following several screenshots show the Timeline of the profile of a training job run on ml.p4d.24xlarge instances, which are equipped with 8 NVIDIA A100 Tensor Core GPUs in each.

The following is a zoomed-out view of the profile, printing a dozen of steps including an intermittent data loader between step_232 and step_233 for fetching the next data batch.


                    A screenshot of the Timeline page in the SageMaker Profiler UI,
                        which visualizes the profile of a sample training job.

For each CPU, you can track the CPU utilization and performance counters, such as "clk_unhalted_ref.tsc" and "itlb_misses.miss_causes_a_walk", which are indicative of instructions run on the CPU.

For each GPU, you can see a host timeline and a device timeline. Kernel launches are on the host timeline and kernel runs are on the device timeline. You can also see annotations (such as forward, backward, and optimize) if you have added in training script in the GPU host timeline.

In the timeline view, you can also track kernel launch-and-run pairs. This helps you understand how a kernel launch scheduled on a host (CPU) is run on the corresponding GPU device.

Tip

Press the f key to zoom into the selected kernel.

The following screenshot is a zoomed-in view into step_233 and step_234 from the previous screenshot. The timeline interval selected in the following screenshot is the AllReduce operation, an essential communication and synchronization step in distributed training, run on the GPU-0 device. In the screenshot, note that the kernel launch in the GPU-0 host connects to the kernel run in the GPU-0 device stream 1, indicated with the arrow in cyan color.


                    A screenshot of the Timeline page in the SageMaker Profiler
                        UI

Also two information tabs appear in the bottom pane of the UI when you select a timeline interval, as shown in the previous screenshot. The Current Selection tab shows the details of the selected kernel and the connected kernel launch from the host. The connection direction is always from host (CPU) to device (GPU) since each GPU kernel is always called from a CPU. The Connections tab shows the chosen kernel launch and run pair. You can select either of them to move it to the center of the Timeline view.

The following screenshot zooms in further into the AllReduce operation launch and run pair.


                    A screenshot of the Timeline page in the SageMaker Profiler
                        UI

Information

In Information, you can access information about the loaded training job, such as the instance type, Amazon Resource Names (ARNs) of compute resources provisioned for the job, node names, and hyperparameters.

Settings

The SageMaker Profiler UI application instance is configured to shut down after 2 hours of idle time by default. In Settings, use the following settings to adjust the auto shutdown timer.

  • Enable app auto shutdown – Choose and set to Enabled to let the application automatically shut down after the specified number of hours of idle time. To turn off the auto-shutdown functionality, choose Disabled.

  • Auto shutdown threshold in hours – If you choose Enabled for Enable app auto shutdown, you can set the threshold time in hours for the application to shut down automatically. This is set to 2 by default.

Frequently asked questions about using SageMaker Profiler

Use the following frequently asked questions to find answers about using SageMaker Profiler.

Q. I’m getting an error message, ModuleNotFoundError: No module named 'smppy'

Since December 2023, the name of the SageMaker Profiler Python package has changed from smppy to smprof to resolve a duplicate package name issue; smppy is already used by an open source package.

Therefore, if you have been using smppy since before December 2023 and experiencing this ModuleNotFoundError issue, it might be due to the outdated package name in your training script while having the latested smprof package installed or using one of the latest SageMaker framework images pre-installed with SageMaker Profiler. In this case, make sure that you replace all mentions of smppy with smprof throughout your training script.

While updating the SageMaker Profiler Python package name in your training scripts, to avoid confusion around which version of the package name you should use, consider using a conditional import statement as shown in the following code snippet.

try: import smprof except ImportError: # backward-compatability for TF 2.11 and PT 1.13.1 images import smppy as smprof

Also note that if you have been using smppy while upgrading to the latest PyTorch or TensorFlow versions, make sure that you install the latest smprof package by following instructions at (Optional) Install the SageMaker Profiler Python package.

Q. I’m getting an error message, ModuleNotFoundError: No module named 'smprof'

First, make sure that you use one of the officially supported SageMaker Framework Containers. If you don’t use one of those, you can install the smprof package by following instructions at (Optional) Install the SageMaker Profiler Python package.

Q. I’m not able to import ProfilerConfig

If you are unable to import ProfilerConfig in your job launcher script using the SageMaker Python SDK, your local environment or the Jupyter kernel might have a significantly outdated version of the SageMaker Python SDK. Make sure that you upgrade the SDK to the latest version.

$ pip install --upgrade sagemaker

Q. I’m getting an error message, aborted: core dumped when importing smprof into my training script

In an earlier version of smprof, this issue occurs with PyTorch 2.0+ and PyTorch Lightning. To resolve this issue, also install the latest smprof package by following instructions at (Optional) Install the SageMaker Profiler Python package.

Q. I cannot find the SageMaker Profiler UI from SageMaker Studio. How can I find it?

If you have access to the SageMaker console, choose one of the following options.

If you are a domain user and don't have access to the SageMaker console, you can access the application through SageMaker Studio Classic. If this is your case, choose the following option.

Considerations

Consider the following when using SageMaker Profiler.