SageMaker HyperPod cluster management - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

SageMaker HyperPod cluster management

The following topics discuss logging and managing SageMaker HyperPod clusters.

Logging SageMaker HyperPod events

All events and logs from SageMaker HyperPod are saved to Amazon CloudWatch under the log group name /aws/sagemaker/Clusters/[ClusterName]/[ClusterID]. Every call to the CreateCluster API creates a new log group. The following list contains all of the available log streams collected in each log group.

Log Group Name Log Stream Name
/aws/sagemaker/Clusters/[ClusterName]/[ClusterID] LifecycleConfig/[instance-group-name]/[instance-id]

Logging SageMaker HyperPod at instance level

You can access the LifecycleScript logs published to CloudWatch during cluster instance configuration. Every instance within the created cluster generates a separate log stream, distinguishable by the LifecycleConfig/[instance-group-name]/[instance-id] format.

All logs that are written to /var/log/provision/provisioning.log are uploaded to the preceding CloudWatch stream. Sample LifecycleScripts at 1.architectures/5.sagemaker_hyperpods/LifecycleScripts/base-config redirect their stdout and stderr to this location. If you are using your custom scripts, write your logs to the /var/log/provision/provisioning.log location for them to be available in CloudWatch.

Tagging resources

Amazon Tagging system helps manage, identify, organize, search for, and filter resources. SageMaker HyperPod supports tagging, so you can manage the clusters as an Amazon resource. During cluster creation or editing an existing cluster, you can add or edit tags for the cluster. To learn more about tagging in general, see Tagging your Amazon resources.

Using the SageMaker HyperPod console UI

When you are creating a new cluster and editing a cluster, you can add, remove, or edit tags.

Using the SageMaker HyperPod APIs

When you write a CreateCluster or UpdateCluster API request file in JSON format, edit the Tags section.

Using the Amazon CLI tagging commands for SageMaker

To tag a cluster

Use aws sagemaker add-tags as follows.

aws sagemaker add-tags --resource-arn cluster_ARN --tags Key=string,Value=string

To untag a cluster

Use aws sagemaker delete-tags as follows.

aws sagemaker delete-tags --resource-arn cluster_ARN --tag-keys "tag_key"

To list tags for a resource

Use aws sagemaker list-tags as follows.

aws sagemaker list-tags --resource-arn cluster_ARN