Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅
中国的 Amazon Web Services 服务入门
(PDF)。
本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
使用 Python 使用调试器启动训练作业 SageMaker SDK
要使用调试器配置 SageMaker 估算器,请使用 Amaz on Pyth SageMaker on SDK 并指定 SageMaker 调试器特定的参数。要充分利用调试功能,需要配置三个参数:debugger_hook_config
、tensorboard_output_config
和 rules
。
使用调试器特定的参数构造 SageMaker 估计器
本节中的代码示例展示了如何使用调试器特定的参数构造 SageMaker 估计器。
以下代码示例是用于构造 SageMaker 框架估算器的模板,不能直接执行。您需要继续完成下一个部分中的内容,配置 Debugger 特定的参数。
- PyTorch
-
# An example of constructing a SageMaker PyTorch estimator
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=PyTorch(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.12.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- TensorFlow
-
# An example of constructing a SageMaker TensorFlow estimator
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
,
ProfilerRule.sagemaker(rule_configs.BuiltInRule())
]
estimator=TensorFlow(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="2.9.0
",
py_version="py39
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- MXNet
-
# An example of constructing a SageMaker MXNet estimator
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=MXNet(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.7.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- XGBoost
-
# An example of constructing a SageMaker XGBoost estimator
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=XGBoost(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.5-1
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- Generic estimator
-
# An example of constructing a SageMaker generic estimator using the XGBoost algorithm base image
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker import image_uris
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
region=boto3.Session().region_name
xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
estimator=Estimator(
role=sagemaker.get_execution_role()
image_uri=xgboost_container,
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.m5.2xlarge
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
配置以下参数以激活 SageMaker 调试器:
SageMaker 调试器将输出张量安全地保存在 S3 存储桶的子文件夹中。例如,您的账户中默认 S3 存储桶URI的格式为s3://sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/
。 SageMaker 调试器创建了两个子文件夹:debug-output
、和。rule-output
如果您添加 tensorboard_output_config
参数,则还会找到 tensorboard-output
文件夹。
请参阅以下主题,查找更多详细说明如何配置 Debugger 特定参数的示例。