Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅
中国的 Amazon Web Services 服务入门
(PDF)。
本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
使用 SageMaker Python SDK Debugger 启动训练作业
要使用 SageMaker Debugger 配置 SageMaker AI 估算器,请使用 Amazon SageMaker Python SDK 并指定 Debugger 特定的参数。要充分利用调试功能,需要配置三个参数:debugger_hook_config、tensorboard_output_config 和 rules。
使用 Debugger 特定的参数构造 SageMaker AI 估算器
此部分中的代码示例显示如何使用 Debugger 特定的参数构造 SageMaker AI 估算器。
以下代码示例是用于构造 SageMaker AI 框架估算器的模板,不能直接执行。您需要继续完成下一个部分中的内容,配置 Debugger 特定的参数。
- PyTorch
-
# An example of constructing a SageMaker AI PyTorch estimator
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config=DebuggerHookConfig(...)
rules=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=PyTorch(
entry_point="directory/to/your_training_script.py",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo",
instance_count=1,
instance_type="ml.p3.2xlarge",
framework_version="1.12.0",
py_version="py37",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config,
rules=rules
)
estimator.fit(wait=False)
- TensorFlow
-
# An example of constructing a SageMaker AI TensorFlow estimator
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config=DebuggerHookConfig(...)
rules=[
Rule.sagemaker(rule_configs.built_in_rule()),
ProfilerRule.sagemaker(rule_configs.BuiltInRule())
]
estimator=TensorFlow(
entry_point="directory/to/your_training_script.py",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo",
instance_count=1,
instance_type="ml.p3.2xlarge",
framework_version="2.9.0",
py_version="py39",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config,
rules=rules
)
estimator.fit(wait=False)
- MXNet
-
# An example of constructing a SageMaker AI MXNet estimator
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config=DebuggerHookConfig(...)
rules=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=MXNet(
entry_point="directory/to/your_training_script.py",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo",
instance_count=1,
instance_type="ml.p3.2xlarge",
framework_version="1.7.0",
py_version="py37",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config,
rules=rules
)
estimator.fit(wait=False)
- XGBoost
-
# An example of constructing a SageMaker AI XGBoost estimator
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config=DebuggerHookConfig(...)
rules=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=XGBoost(
entry_point="directory/to/your_training_script.py",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo",
instance_count=1,
instance_type="ml.p3.2xlarge",
framework_version="1.5-1",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config,
rules=rules
)
estimator.fit(wait=False)
- Generic estimator
-
# An example of constructing a SageMaker AI generic estimator using the XGBoost algorithm base image
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker import image_uris
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config=DebuggerHookConfig(...)
rules=[
Rule.sagemaker(rule_configs.built_in_rule())
]
region=boto3.Session().region_name
xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
estimator=Estimator(
role=sagemaker.get_execution_role()
image_uri=xgboost_container,
base_job_name="debugger-demo",
instance_count=1,
instance_type="ml.m5.2xlarge",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config,
rules=rules
)
estimator.fit(wait=False)
配置以下参数以激活 SageMaker Debugger:
SageMaker Debugger 将输出张量安全地保存在 S3 存储桶的子文件夹中。例如,账户中默认 S3 存储桶 URI 的格式为 s3://amzn-s3-demo-bucket-sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/。SageMaker Debugger 创建两个子文件夹:debug-output 和 rule-output。如果您添加 tensorboard_output_config 参数,则还会找到 tensorboard-output 文件夹。
请参阅以下主题,查找更多详细说明如何配置 Debugger 特定参数的示例。