Supported Frameworks and Algorithms
The following table shows SageMaker machine learning frameworks and algorithms supported by Debugger.
SageMaker-supported frameworks and algorithms | Monitoring system bottlenecks | Profiling deep learning framework operations | Debugging output tensors |
---|---|---|---|
Amazon TensorFlow deep learning containers |
Amazon TensorFlow deep learning containers |
||
Amazon PyTorch deep learning containers |
Amazon PyTorch deep learning containers |
||
- |
Amazon MXNet deep learning containers |
||
1.0-1, 1.2-1, 1.3-1 |
- |
1.0-1, 1.2-1, 1.3-1 |
|
SageMaker built-in algorithms using image URIs Custom training containers (with the Amazon deep learning container images, public Docker images, or your own Docker images) |
- |
Custom training containers (available for TensorFlow, PyTorch, MXNet, and XGBoost with manual hook registration) |
-
Monitoring system bottlenecks – Monitor the system utilization rate for resources such as CPU, GPU, memories, network, and data I/O metrics. This is a framework and model agnostic feature and available for any training jobs in SageMaker.
-
Profiling deep learning framework operations – Profile the deep learning operations of the TensorFlow and PyTorch frameworks, such as step durations, data loaders, forward and backward operations, Python profiling metrics, and framework-specific metrics.
Warning
SageMaker Debugger deprecates the framework profiling feature starting from TensorFlow 2.11 and PyTorch 2.0. You can still use the feature in the previous versions of the frameworks and SDKs as follows.
-
SageMaker Python SDK <= v2.130.0
-
PyTorch >= v1.6.0, < v2.0
-
TensorFlow >= v2.3.1, < v2.11
See also Amazon SageMaker Debugger Release Notes: March 16, 2023.
-
-
Debugging output tensors – Track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost.
Important
For the TensorFlow framework with Keras, SageMaker Debugger deprecates the zero code change support for debugging models built using the
tf.keras
modules of TensorFlow 2.6 and later. This is due to breaking changes announced in the TensorFlow 2.6.0 release note. For instructions on how to update your training script, see Adapt Your TensorFlow Training Script. Important
Since PyTorch v1.12.0 and later, SageMaker Debugger deprecates the zero code change support for debugging models.
This is due to breaking changes that cause SageMaker Debugger to interfere with the
torch.jit
functionality. For instructions on how to update your training script, see Adapt Your PyTorch Training Script.
If the framework or algorithm that you want to train and debug is not listed in the
table, go to the Amazon Discussion
Forum
Amazon Web Services Regions
Amazon SageMaker Debugger is available in all regions where Amazon SageMaker is in service except the following region.
Asia Pacific (Jakarta):
ap-southeast-3
To find if Amazon SageMaker is in service in your Amazon Web Services Region, see Amazon Regional
Services
Use Debugger with Custom Training Containers
Bring your training containers to SageMaker and gain insights into your training jobs using Debugger. Maximize your work efficiency by optimizing your model on Amazon EC2 instances using the monitoring and debugging features.
For more information about how to build your training container with the
sagemaker-debugger
client library, push it to the Amazon Elastic Container Registry
(Amazon ECR), and monitor and debug, see Use Debugger with Custom Training
Containers.
Debugger Open-Source GitHub Repositories
Debugger APIs are provided through the SageMaker Python SDK and designed to construct
Debugger hook and rule configurations for the SageMaker
CreateTrainingJob and
DescribeTrainingJob API operations. The sagemaker-debugger
client library provides tools to register hooks and access the
training data through its trial feature, all through its
flexible and powerful API operations. It supports the machine learning frameworks
TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6 and later.
For direct resources about the Debugger and sagemaker-debugger
API
operations, see the following links:
If you use the SDK for Java to conduct SageMaker training jobs and want to configure Debugger APIs, see the following references: