Use Amazon SageMaker Profiler to profile activities on Amazon compute resources
Amazon SageMaker Profiler is currently in preview release and available at no cost in supported Amazon Web Services Regions. The generally available version of Amazon SageMaker Profiler (if any) may include features and pricing that are different than those offered in preview. |
Amazon SageMaker Profiler is a capability of Amazon SageMaker that provides a detailed view into the Amazon compute resources provisioned during training deep learning models on SageMaker. It focuses on profiling the CPU and GPU usage, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across CPUs and GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. SageMaker Profiler also offers a user interface (UI) that visualizes the profile, a statistical summary of profiled events, and the timeline of a training job for tracking and understanding the time relationship of the events between GPUs and CPUs.
Note
SageMaker Profiler supports PyTorch and TensorFlow and is available in Amazon Deep Learning Containers for SageMaker
For data scientists
Training deep learning models on a large compute cluster often has computational optimization problems, such as bottlenecks, kernel launch latencies, memory limit, and low resource utilization.
To identify such computational performance issues, you need to profile deeper into the compute resources to understand which kernels introduce latencies and which operations cause bottlenecks. Data scientists can take the benefit from using the SageMaker Profiler UI for visualizing the detailed profile of training jobs. The UI provides a dashboard furnished with summary charts and a timeline interface to track every event on the compute resources. Data scientists can also add custom annotations to track certain parts of the training job using the SageMaker Profiler Python modules.
For administrators
Through the Profiler landing page in the SageMaker console or SageMaker domain, you can manage the Profiler application users if you are an administrator of an Amazon account or SageMaker domain. Each domain user can access their own Profiler application given the granted permissions. As a SageMaker domain administrator and domain user, you can create and delete the Profiler application given the permission level you have.
Supported framework images, Amazon Web Services Regions, and instance types
This feature supports the following machine learning frameworks and Amazon Web Services Regions.
Note
To use this feature, make sure that you have at least version 2.180.0
SageMaker framework images pre-installed with SageMaker Profiler
SageMaker Profiler is pre-installed in the following Amazon Deep Learning Containers for SageMaker
PyTorch images
PyTorch versions | Amazon DLC image URI |
---|---|
2.2.0 |
|
2.1.0 |
|
2.0.1 |
|
1.13.1 |
|
TensorFlow images
TensorFlow versions | Amazon DLC image URI |
---|---|
2.13.0 |
|
2.12.0 |
|
2.11.0 |
|
Important
Distribution and maintenance of the framework containers in the preceding
tables are under the Framework Support Policy managed by the Amazon Deep Learning
Containers service. We highly recommend you to upgrade to the currently
supported framework versions
Note
If you want to use SageMaker Profiler for other framework images or your own Docker images, you can install SageMaker Profiler using the SageMaker Profiler Python package binary files provided in the following section.
SageMaker Profiler Python package binary files
If you want to configure your own Docker container, use SageMaker Profiler in other pre-built containers for PyTorch and TensorFlow, or install the SageMaker Profiler Python package locally, use one the following binary files. Depending on the Python and CUDA versions in your environment, choose one of the following.
PyTorch
-
Python3.8, CUDA 11.3:
https://smppy.s3.amazonaws.com/pytorch/cu113/smprof-0.3.334-cp38-cp38-linux_x86_64.whl
-
Python3.9, CUDA 11.7:
https://smppy.s3.amazonaws.com/pytorch/cu117/smprof-0.3.334-cp39-cp39-linux_x86_64.whl
-
Python3.10, CUDA 11.8:
https://smppy.s3.amazonaws.com/pytorch/cu118/smprof-0.3.334-cp310-cp310-linux_x86_64.whl
-
Python3.10, CUDA 12.1:
https://smppy.s3.amazonaws.com/pytorch/cu121/smprof-0.3.334-cp310-cp310-linux_x86_64.whl
TensorFlow
-
Python3.9, CUDA 11.2:
https://smppy.s3.amazonaws.com/tensorflow/cu112/smprof-0.3.334-cp39-cp39-linux_x86_64.whl
-
Python3.10, CUDA 11.8:
https://smppy.s3.amazonaws.com/tensorflow/cu118/smprof-0.3.334-cp310-cp310-linux_x86_64.whl
For more information about how to install SageMaker Profiler using the binary files, see (Optional) Install the SageMaker Profiler Python package.
Supported Amazon Web Services Regions
SageMaker Profiler is available in the following Amazon Web Services Regions.
-
US East (N. Virginia) (
us-east-1
) -
US East (Ohio) (
us-east-2
) -
US West (Oregon) (
us-west-2
) -
Europe (Frankfurt) (
eu-central-1
) -
Europe (Ireland) (
eu-west-1
)
Supported instance types
SageMaker Profiler supports profiling of training jobs on the following instance types.
CPU and GPU profiling
-
ml.g4dn.12xlarge
-
ml.g5.24xlarge
-
ml.g5.48xlarge
-
ml.p3dn.24xlarge
-
ml.p4de.24xlarge
-
ml.p4d.24xlarge
-
ml.p5.48xlarge
GPU profiling only
-
ml.g5.2xlarge
-
ml.g5.4xlarge
-
ml.g5.8xlarge
-
ml.g5.16.xlarge
Prerequisites
The following list shows the prerequisites to start using SageMaker Profiler.
-
A SageMaker domain set up with Amazon VPC in your Amazon account.
For instructions on setting up a domain, see Onboard to Amazon SageMaker domain using quick setup. You also need to add domain user profiles for individual users to access the Profiler UI application. For more information, see Add and remove SageMaker domain user profiles.
-
The following list is the minimum set of permissions for using the Profiler UI application.
-
sagemaker:CreateApp
-
sagemaker:DeleteApp
-
sagemaker:DescribeTrainingJob
-
sagemaker:Search
-
s3:GetObject
-
s3:ListBucket
-
Prepare and run a training job with SageMaker Profiler
Setting up to running a training job with the SageMaker Profiler consists of two steps: adapting the training script and configuring the SageMaker training job launcher.
Topics
Step 1: Adapt your training script using the SageMaker Profiler Python modules
To start capturing kernel runs on GPUs while the training job is running, modify
your training script using the SageMaker Profiler Python modules. Import the library and add
the start_profiling()
and stop_profiling()
methods to
define the beginning and the end of profiling. You can also use optional custom
annotations to add markers in the training script to visualize hardware activities
during particular operations in each step.
Note that the annotators extract operations from GPUs. For profiling operations in CPUs, you don’t need to add any additional annotations. CPU profiling is also activated when you specify the profiling configuration, which you’ll practice in Step 2: Create a SageMaker framework estimator and activate SageMaker Profiler.
Note
Profiling an entire training job is not the most efficient use of resources. We recommend profiling at most 300 steps of a training job.
Important
The release on December 14, 2023 involves a
breaking change. The SageMaker Profiler Python package name is changed from
smppy
to smprof
. This is effective in the SageMaker Framework Containers
If you use one of the previous versions of the SageMaker Framework Containerssmppy
. If you are uncertain
about which version or the package name you should use, replace the import
statement of the SageMaker Profiler package with the following code snippet.
try: import smprof except ImportError: # backward-compatability for TF 2.11 and PT 1.13.1 images import smppy as smprof
Approach 1. Use the context manager
smprof.annotate
to annotate full functions
You can wrap full functions with the smprof.annotate()
context
manager. This wrapper is recommended if you want to profile by functions instead of
code lines. The following example script shows how to implement the context manager
to wrap the training loop and full functions in each iteration.
import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): with smprof.annotate(
"step_"+str(i)
): inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() with smprof.annotate("Forward"
): outputs = net(inputs) with smprof.annotate("Loss"
): loss = criterion(outputs, labels) with smprof.annotate("Backward"
): loss.backward() with smprof.annotate("Optimizer"
): optimizer.step() SMProf.stop_profiling()
Approach 2. Use
smprof.annotation_begin()
and smprof.annotation_end()
to annotate specific code line in functions
You can also define annotations to profile specific code lines. You can set the
exact starting point and end point of profiling at the level of individual code
lines, not by the functions. For example, in the following script, the
step_annotator
is defined at the beginning of each iteration and
ends at the end of the iteration. Meanwhile, other detailed annotators for each
operations are defined and wrap around the target operations throughout each
iteration.
import smprof SMProf = smprof.SMProfiler.instance() config = smprof.Config() config.profiler = { "EnableCuda": "1", } SMProf.configure(config) SMProf.start_profiling() for epoch in range(args.epochs): if world_size > 1: sampler.set_epoch(epoch) tstart = time.perf_counter() for i, data in enumerate(trainloader, 0): step_annotator = smprof.annotation_begin(
"step_" + str(i)
) inputs, labels = data inputs = inputs.to("cuda", non_blocking=True) labels = labels.to("cuda", non_blocking=True) optimizer.zero_grad() forward_annotator = smprof.annotation_begin("Forward"
) outputs = net(inputs) smprof.annotation_end(forward_annotator) loss_annotator = smprof.annotation_begin("Loss"
) loss = criterion(outputs, labels) smprof.annotation_end(loss_annotator) backward_annotator = smprof.annotation_begin("Backward"
) loss.backward() smprof.annotation_end(backward_annotator) optimizer_annotator = smprof.annotation_begin("Optimizer"
) optimizer.step() smprof.annotation_end(optimizer_annotator) smprof.annotation_end(step_annotator) SMProf.stop_profiling()
After annotating and setting up the profiler initiation modules, save the script
to submit using a SageMaker training job launcher in the following Step 2. The
sample launcher assumes that the training script is named
train_with_profiler_demo.py
.
Step 2: Create a SageMaker framework estimator and activate SageMaker Profiler
The following procedure shows how to prepare a SageMaker framework estimator for training using the SageMaker Python SDK.
-
Set up a
profiler_config
object using theProfilerConfig
andProfiler
modules as follows.from sagemaker import ProfilerConfig, Profiler profiler_config = ProfilerConfig( profile_params = Profiler(cpu_profiling_duration=3600) )
The following is the description of the
Profiler
module and its argument.-
Profiler
: The module for activating SageMaker Profiler with the training job.-
cpu_profiling_duration
(int): Specify the time duration in seconds for profiling on CPUs. Default is 3600 seconds.
-
-
-
Create a SageMaker framework estimator with the
profiler_config
object created in the previous step. The following code shows an example of creating a PyTorch estimator. If you want to create a TensorFlow estimator, importsagemaker.tensorflow.TensorFlow
instead, and specify one of the TensorFlow versions supported by SageMaker Profiler. For more information about supported frameworks and instance types, see SageMaker framework images pre-installed with SageMaker Profiler.import sagemaker from sagemaker.pytorch import PyTorch estimator = PyTorch( framework_version="
2.0.0
", role=sagemaker.get_execution_role(), entry_point="train_with_profiler_demo.py
", # your training job entry point source_dir=source_dir
, # source directory for your training script output_path=output_path
, base_job_name="sagemaker-profiler-demo
", hyperparameters=hyperparameters
, # if any instance_count=1
, # Recommended to test with < 8 instance_type=ml.p4d.24xlarge
, profiler_config=profiler_config
) -
Start the training job by running the
fit
method. Withwait=False
, you can silence the training job logs and let it run in the background.estimator.fit(wait=False)
While running the training job or after the job has completed, you can go to the next topic at Open the SageMaker Profiler UI application and start exploring and visualizing the saved profiles.
If you want to directly access the profile data saved in the Amazon S3 bucket, use the following script to retrieve the S3 URI.
import os # This is an ad-hoc function to get the S3 URI # to where the profile output data is saved def get_detailed_profiler_output_uri(estimator): config_name = None for processing in estimator.profiler_rule_configs: params = processing.get("RuleParameters", dict()) rule = config_name = params.get("rule_to_invoke", "") if rule == "DetailedProfilerProcessing": config_name = processing.get("RuleConfigurationName") break return os.path.join( estimator.output_path, estimator.latest_training_job.name, "rule-output", config_name, ) print( f"Profiler output S3 bucket: ", get_detailed_profiler_output_uri(estimator) )
(Optional) Install the SageMaker Profiler Python package
To use SageMaker Profiler on PyTorch or TensorFlow framework images not listed in SageMaker framework images pre-installed with SageMaker Profiler, or on your own custom Docker container for training, you can install SageMaker Profiler by using one of the SageMaker Profiler Python package binary files.
Option 1: Install the SageMaker Profiler package while launching a training job
If you want to use SageMaker Profiler for training jobs using PyTorch or TensorFlow images
not listed in SageMaker framework images pre-installed
with SageMaker Profiler, create a
requirements.txt
file and locate it under the path you specify to
the source_dir
parameter of the SageMaker framework estimator in Step 2. For more information about
setting up a requirements.txt
file in general, see Using third-party librariesrequirements.txt
file, add one of
the S3 bucket paths for the SageMaker Profiler Python package binary
files.
# requirements.txt https://smppy.s3.amazonaws.com/
tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl
Option 2: Install the SageMaker Profiler package in your custom Docker containers
If you use a custom Docker container for training, add one of the SageMaker Profiler Python package binary files to your Dockerfile.
# Install the smprof package version compatible with your CUDA version RUN pip install https://smppy.s3.amazonaws.com/
tensorflow/cu112/smprof-0.3.332-cp39-cp39-linux_x86_64.whl
For guidance on running a custom Docker container for training on SageMaker in general,
see Adapting your own training container
Open the SageMaker Profiler UI application
You can access the SageMaker Profiler UI application through the following options.
Topics
Option 1: Launch the SageMaker Profiler UI from the domain details page
If you have access to the SageMaker console, you can take this option.
Navigate to the domain details page
The following procedure shows how to navigate to the domain details page.
-
Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/
. -
On the left navigation pane, choose domains.
-
From the list of domains, select the domain in which you want to launch the SageMaker Profiler application.
Launch the SageMaker Profiler UI application
The following procedure shows how to launch the SageMaker Profiler application that is scoped to a user profile.
-
On the domain details page, choose the User profiles tab.
-
Identify the user profile for which you want to launch the SageMaker Profiler UI application.
-
Choose Launch for the selected user profile, and choose Profiler.
Option 2: Launch the SageMaker Profiler UI application from the SageMaker Profiler landing page in the SageMaker console
The following procedure describes how to launch the SageMaker Profiler UI application from the SageMaker Profiler landing page in the SageMaker console. If you have access to the SageMaker console, you can take this option.
-
Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/
. -
On the left navigation pane, choose Profiler.
-
Under Get started, select the domain in which you want to launch the Studio Classic application. If your user profile only belongs to one domain, you do not see the option for selecting a domain.
-
Select the user profile for which you want to launch the SageMaker Profiler UI application. If there is no user profile in the domain, choose Create user profile. For more information about creating a new user profile, see Add and Remove User Profiles.
-
Choose Open Profiler.
Option 3: Use the application launcher function in the SageMaker Python SDK
If you are a SageMaker domain user and have access only to SageMaker Studio, you
can access the SageMaker Profiler UI application through SageMaker Studio Classic by running the sagemaker.interactive_apps.detail_profiler_app.DetailProfilerApp
Note that SageMaker Studio Classic is the previous Studio UI experience before re:Invent
2023, and is migrated as an application into a newly designed Studio UI at
re:Invent 2023. The SageMaker Profiler UI application is available at SageMaker domain level,
and thus requires your domain ID and user profile name. Currently, the
DetailedProfilerApp
function only works within the SageMaker Studio Classic
application; the function properly takes in the domain and user profile
information from SageMaker Studio Classic.
For domain, domain users, and Studio created before re:Invent 2023,
Studio Classic would be the default experience unless you have updated it following the
instructions at Migrating from
Amazon SageMaker Studio Classic. If this is your case, there's no further action needed,
and you can directly launch the SageMaker Profiler UI application by running the
DetailProfilerApp
funciton.
If you created a new domain and Studio after re:Invent 2023, launch the
Studio Classic application within the Studio UI and then run the
DetailProfilerApp
function to launch the SageMaker Profiler UI
application.
Note that the DetailedProfilerApp
function doesn’t work in other SageMaker
machine learning IDEs, such as the SageMaker Studio JupyterLab application, the
SageMaker Studio Code Editor application, and SageMaker Notebook instances. If you run the
DetailedProfilerApp
function in those IDEs, it returns a URL to the
Profiler landing page in the SageMaker console, instead of a direct link to open the
Profiler UI application.
Explore the profile output data visualized in the SageMaker Profiler UI
This section walks through the SageMaker Profiler UI and provides tips for how to use and gain insights from it.
Load profile
When you open the SageMaker Profiler UI, the Load profile page opens up. To load and generate the Dashboard and Timeline, go through the following procedure.
To load the profile of a training job
-
From the List of training jobs section, use the check box to choose the training job for which you want to load the profile.
-
Choose Load. The job name should appear in the Loaded profile section at the top.
-
Choose the radio button on the left of the Job name to generate the Dashboard and Timeline. Note that when you choose the radio button, the UI automatically opens the Dashboard. Note also that if you generate the visualizations while the job status and loading status still appear to be in progress, the SageMaker Profiler UI generates Dashboard plots and a Timeline up to the most recent profile data collected from the ongoing training job or the partially loaded profile data.
Tip
You can load and visualize one profile at a time. To load another profile, you must first unload the previously loaded profile. To unload a profile, use the trash bin icon on the right end of the profile in the Loaded profile section.
Dashboard
After you finish loading and selecting the training job, the UI opens the Dashboard page furnished with the following panels by default.
-
GPU active time – This pie chart shows the percentage of GPU active time versus GPU idle time. You can check if your GPUs are more active than idle throughout the entire training job. GPU active time is based on the profile data points with a utilization rate greater than 0%, whereas GPU idle time is the profiled data points with 0% utilization.
-
GPU utilization over time – This timeline graph shows the average GPU utilization rate over time per node, aggregating all of the nodes in a single chart. You can check if the GPUs have an unbalanced workload, under-utilization issues, bottlenecks, or idle issues during certain time intervals. To track the utilization rate at the individual GPU level and related kernel runs, use the Timeline interface. Note that the GPU activity collection starts from where you added the profiler starter function
SMProf.start_profiling()
in your training script, and stops atSMProf.stop_profiling()
. -
CPU active time – This pie chart shows the percentage of CPU active time versus CPU idle time. You can check if your CPUs are more active than idle throughout the entire training job. CPU active time is based on the profiled data points with a utilization rate greater than 0%, whereas CPU idle time is the profiled data points with 0% utilization.
-
CPU utilization over time – This timeline graph shows the average CPU utilization rate over time per node, aggregating all of the nodes in a single chart. You can check if the CPUs are bottlenecked or underutilized during certain time intervals. To track the utilization rate of the CPUs aligned with the individual GPU utilization and kernel runs, use the Timeline interface. Note that the utilization metrics start from the start from the job initialization.
-
Time spent by all GPU kernels – This pie chart shows all GPU kernels operated throughout the training job. It shows the top 15 GPU kernels by default as individual sectors and all other kernels in one sector. Hover over the sectors to see more detailed information. The value shows the total time of the GPU kernels operated in seconds, and the percentage is based on the entire time of the profile.
-
Time spent by top 15 GPU kernels – This pie chart shows all GPU kernels operated throughout the training job. It shows the top 15 GPU kernels as individual sectors. Hover over the sectors to see more detailed information. The value shows the total time of the GPU kernels operated in seconds, and the percentage is based on the entire time of the profile.
-
Launch counts of all GPU kernels – This pie chart shows the number of counts for every GPU kernel launched throughout the training job. It shows the top 15 GPU kernels as individual sectors and all other kernels in one sector. Hover over the sectors to see more detailed information. The value shows the total count of the launched GPU kernels, and the percentage is based on the entire count of all kernels.
-
Launch counts of top 15 GPU kernels – This pie chart shows the number of counts of every GPU kernel launched throughout the training job. It shows the top 15 GPU kernels. Hover over the sectors to see more detailed information. The value shows the total count of the launched GPU kernels, and the percentage is based on the entire count of all kernels.
-
Step time distribution – This histogram shows the distribution of step durations on GPUs. This plot is generated only after you add the step annotator in your training script.
-
Kernel precision distribution – This pie chart shows the percentage of time spent on running kernels in different data types such as FP32, FP16, INT32, and INT8.
-
GPU activity distribution – This pie chart shows the percentage of time spent on GPU activities, such as running kernels, memory (
memcpy
andmemset
), and synchronization (sync
). -
GPU memory operations distribution – This pie chart shows the percentage of time spent on GPU memory operations. This visualizes the
memcopy
activities and helps identify if your training job is spending excessive time on certain memory operations. -
Create a new histogram – Create a new diagram of a custom metric you annotated manually during Step 1: Adapt your training script using the SageMaker Profiler Python modules. When adding a custom annotation to a new histogram, select or type the name of the annotation you added in the training script. For example, in the demo training script in Step 1,
step
,Forward
,Backward
,Optimize
, andLoss
are the custom annotations. While creating a new histogram, these annotation names should appear in the drop-down menu for metric selection. If you chooseBackward
, the UI adds the histogram of the time spent on backward passes throughout the profiled time to the Dashboard. This type of histogram is useful for checking if there are outliers taking abnormally longer time and causing bottleneck problems.
The following screenshots show the GPU and CPU active time ratio and the average GPU and CPU utilization rate with respect to time per compute node.
The following screenshot shows an example of pie charts for comparing how many times the GPU kernels are launched and measuring the time spent on running them. In the Time spent by all GPU kernels and Launch counts of all GPU kernels panels, you can also specify an integer to the input field for k to adjust the number of legend to show in the plots. For example, if you specify 10, the plots show the top 10 most run and launched kernels respectively.
The following screenshot shows an example of step time duration histogram, and pie charts for the kernel precision distribution, GPU activity distribution, and GPU memory operation distribution.
Timeline interface
To gain a detailed view into the compute resources at the level of operations and kernels scheduled on the CPUs and run on the GPUs, use the Timeline interface.
You can zoom in and out and pan left or right in the timeline interface using your
mouse, the [w, a, s, d]
keys, or the four arrow keys on the
keyboard.
Tip
For more tips on the keyboard shortcuts to interact with the Timeline interface, choose Keyboard shortcuts in the left pane.
The timeline tracks are organized in a tree structure, giving you information from
the host level to the device level. For example, if you run N
instances
with eight GPUs in each, the timeline structure of each instance would be as
follows.
-
algo-inode – This is what SageMaker tags to assign jobs to provisioned instances. The digit inode is randomly assigned. For example, if you use 4 instances, this section expands from algo-1 to algo-4.
-
CPU – In this section, you can check the average CPU utilization rate and performance counters.
-
GPUs – In this section, you can check the average GPU utilization rate, individual GPU utilization rate, and kernels.
-
SUM Utilization – The average GPU utilization rates per instance.
-
HOST-0 PID-123 – A unique name assigned to each process track. The acronym PID is the process ID, and the number appended to it is the process ID number that's recorded during data capture from the process. This section shows the following information from the process.
-
GPU-inum_gpu utilization – The utilization rate of the inum_gpu-th GPU over time.
-
GPU-inum_gpu device – The kernel runs on the inum_gpu-th GPU device.
-
stream icuda_stream – CUDA streams showing kernel runs on the GPU device. To learn more about CUDA streams, see the slides in PDF at CUDA C/C++ Streams and Concurrency
provided by NVIDIA.
-
-
GPU-inum_gpu host – The kernel launches on the inum_gpu-th GPU host.
-
-
-
The following several screenshots show the Timeline of the
profile of a training job run on ml.p4d.24xlarge
instances, which are
equipped with 8 NVIDIA A100 Tensor Core GPUs in each.
The following is a zoomed-out view of the profile, printing a dozen of steps
including an intermittent data loader between step_232
and
step_233
for fetching the next data batch.
For each CPU, you can track the CPU utilization and performance counters, such as
"clk_unhalted_ref.tsc"
and
"itlb_misses.miss_causes_a_walk"
, which are indicative of
instructions run on the CPU.
For each GPU, you can see a host timeline and a device timeline. Kernel launches are on the host timeline and kernel runs are on the device timeline. You can also see annotations (such as forward, backward, and optimize) if you have added in training script in the GPU host timeline.
In the timeline view, you can also track kernel launch-and-run pairs. This helps you understand how a kernel launch scheduled on a host (CPU) is run on the corresponding GPU device.
Tip
Press the f
key to zoom into the selected kernel.
The following screenshot is a zoomed-in view into step_233
and
step_234
from the previous screenshot. The timeline interval
selected in the following screenshot is the AllReduce
operation, an
essential communication and synchronization step in distributed training, run on the
GPU-0 device. In the screenshot, note that the kernel launch in the GPU-0 host
connects to the kernel run in the GPU-0 device stream 1, indicated with the arrow in
cyan color.
Also two information tabs appear in the bottom pane of the UI when you select a timeline interval, as shown in the previous screenshot. The Current Selection tab shows the details of the selected kernel and the connected kernel launch from the host. The connection direction is always from host (CPU) to device (GPU) since each GPU kernel is always called from a CPU. The Connections tab shows the chosen kernel launch and run pair. You can select either of them to move it to the center of the Timeline view.
The following screenshot zooms in further into the AllReduce
operation launch and run pair.
Information
In Information, you can access information about the loaded training job, such as the instance type, Amazon Resource Names (ARNs) of compute resources provisioned for the job, node names, and hyperparameters.
Settings
The SageMaker Profiler UI application instance is configured to shut down after 2 hours of idle time by default. In Settings, use the following settings to adjust the auto shutdown timer.
-
Enable app auto shutdown – Choose and set to Enabled to let the application automatically shut down after the specified number of hours of idle time. To turn off the auto-shutdown functionality, choose Disabled.
-
Auto shutdown threshold in hours – If you choose Enabled for Enable app auto shutdown, you can set the threshold time in hours for the application to shut down automatically. This is set to 2 by default.
Frequently asked questions about using SageMaker Profiler
Use the following frequently asked questions to find answers about using SageMaker Profiler.
Q. I’m getting an error message, ModuleNotFoundError: No
module named 'smppy'
Since December 2023, the name of the SageMaker Profiler Python package has changed from
smppy
to smprof
to resolve a duplicate package name issue;
smppy
is already used by an open source package.
Therefore, if you have been using smppy
since before December 2023 and
experiencing this ModuleNotFoundError
issue, it might be due to the
outdated package name in your training script while having the latested
smprof
package installed or using one of the latest SageMaker framework images pre-installed
with SageMaker Profiler. In this case, make sure that you replace
all mentions of smppy
with smprof
throughout your training
script.
While updating the SageMaker Profiler Python package name in your training scripts, to avoid confusion around which version of the package name you should use, consider using a conditional import statement as shown in the following code snippet.
try: import smprof except ImportError: # backward-compatability for TF 2.11 and PT 1.13.1 images import smppy as smprof
Also note that if you have been using smppy
while upgrading to the latest
PyTorch or TensorFlow versions, make sure that you install the latest
smprof
package by following instructions at (Optional) Install the SageMaker Profiler
Python package.
Q. I’m getting an error message, ModuleNotFoundError: No
module named 'smprof'
First, make sure that you use one of the officially supported SageMaker Framework
Containers. If you don’t use one of those, you can install the smprof
package by following instructions at (Optional) Install the SageMaker Profiler
Python package.
Q. I’m not able to import
ProfilerConfig
If you are unable to import ProfilerConfig
in your job launcher
script using the SageMaker Python SDK, your local environment or the Jupyter kernel
might have a significantly outdated version of the SageMaker Python SDK. Make sure
that you upgrade the SDK to the latest version.
$ pip install --upgrade sagemaker
Q. I’m getting an error message, aborted: core dumped when
importing smprof into my training script
In an earlier version of smprof
, this issue occurs with PyTorch 2.0+
and PyTorch Lightning. To resolve this issue, also install the latest
smprof
package by following instructions at (Optional) Install the SageMaker Profiler
Python package.
Q. I cannot find the SageMaker Profiler UI from SageMaker Studio. How can I find it?
If you have access to the SageMaker console, choose one of the following options.
If you are a domain user and don't have access to the SageMaker console, you can access the application through SageMaker Studio Classic. If this is your case, choose the following option.
Considerations
Consider the following when using SageMaker Profiler.
-
SageMaker Profiler is not compatible with SageMaker managed warm pools.