Request Syntax Request Parameters Response Syntax Response Elements Errors See Also

DescribeCluster

Retrieves information of a SageMaker HyperPod cluster.

Request Syntax


{
   "ClusterName": "string"
}

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

ClusterName

The string name or the Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: (arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12})|([a-zA-Z0-9](-*[a-zA-Z0-9]){0,62})

Required: Yes

Response Syntax


{
   "ClusterArn": "string",
   "ClusterName": "string",
   "ClusterStatus": "string",
   "CreationTime": number,
   "FailureMessage": "string",
   "InstanceGroups": [ 
      { 
         "CurrentCount": number,
         "ExecutionRole": "string",
         "InstanceGroupName": "string",
         "InstanceStorageConfigs": [ 
            { ... }
         ],
         "InstanceType": "string",
         "LifeCycleConfig": { 
            "OnCreate": "string",
            "SourceS3Uri": "string"
         },
         "OnStartDeepHealthChecks": [ "string" ],
         "OverrideVpcConfig": { 
            "SecurityGroupIds": [ "string" ],
            "Subnets": [ "string" ]
         },
         "ScheduledUpdateConfig": { 
            "DeploymentConfig": { 
               "AutoRollbackConfiguration": [ 
                  { 
                     "AlarmName": "string"
                  }
               ],
               "RollingUpdatePolicy": { 
                  "MaximumBatchSize": { 
                     "Type": "string",
                     "Value": number
                  },
                  "RollbackMaximumBatchSize": { 
                     "Type": "string",
                     "Value": number
                  }
               },
               "WaitIntervalInSeconds": number
            },
            "ScheduleExpression": "string"
         },
         "Status": "string",
         "TargetCount": number,
         "ThreadsPerCore": number,
         "TrainingPlanArn": "string",
         "TrainingPlanStatus": "string"
      }
   ],
   "NodeRecovery": "string",
   "Orchestrator": { 
      "Eks": { 
         "ClusterArn": "string"
      }
   },
   "RestrictedInstanceGroups": [ 
      { 
         "CurrentCount": number,
         "EnvironmentConfig": { 
            "FSxLustreConfig": { 
               "PerUnitStorageThroughput": number,
               "SizeInGiB": number
            },
            "S3OutputPath": "string"
         },
         "ExecutionRole": "string",
         "InstanceGroupName": "string",
         "InstanceStorageConfigs": [ 
            { ... }
         ],
         "InstanceType": "string",
         "OnStartDeepHealthChecks": [ "string" ],
         "OverrideVpcConfig": { 
            "SecurityGroupIds": [ "string" ],
            "Subnets": [ "string" ]
         },
         "ScheduledUpdateConfig": { 
            "DeploymentConfig": { 
               "AutoRollbackConfiguration": [ 
                  { 
                     "AlarmName": "string"
                  }
               ],
               "RollingUpdatePolicy": { 
                  "MaximumBatchSize": { 
                     "Type": "string",
                     "Value": number
                  },
                  "RollbackMaximumBatchSize": { 
                     "Type": "string",
                     "Value": number
                  }
               },
               "WaitIntervalInSeconds": number
            },
            "ScheduleExpression": "string"
         },
         "Status": "string",
         "TargetCount": number,
         "ThreadsPerCore": number,
         "TrainingPlanArn": "string",
         "TrainingPlanStatus": "string"
      }
   ],
   "VpcConfig": { 
      "SecurityGroupIds": [ "string" ],
      "Subnets": [ "string" ]
   }
}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

ClusterArn

The Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12}

ClusterName

The name of the SageMaker HyperPod cluster.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: [a-zA-Z0-9](-*[a-zA-Z0-9])*

ClusterStatus

The status of the SageMaker HyperPod cluster.

Type: String

CreationTime

The time when the SageMaker Cluster is created.

Type: Timestamp

FailureMessage

The failure message of the SageMaker HyperPod cluster.

Type: String

InstanceGroups

The instance groups of the SageMaker HyperPod cluster.

Type: Array of ClusterInstanceGroupDetails objects

NodeRecovery

The node recovery mode configured for the SageMaker HyperPod cluster.

Type: String

Valid Values: Automatic | None

Orchestrator

The type of orchestrator used for the SageMaker HyperPod cluster.

Type: ClusterOrchestrator object

RestrictedInstanceGroups

The specialized instance groups for training models like Amazon Nova to be created in the SageMaker HyperPod cluster.

Type: Array of ClusterRestrictedInstanceGroupDetails objects

VpcConfig

Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. For more information, see Give SageMaker Access to Resources in your Amazon VPC.

Type: VpcConfig object

Errors

For information about the errors that are common to all actions, see Common Errors.

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400