Creating SageMaker HyperPod clusters using Amazon CloudFormation templates - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Creating SageMaker HyperPod clusters using Amazon CloudFormation templates

You can create SageMaker HyperPod clusters using the CloudFormation templates for HyperPod. You must install Amazon CLI to proceed.

Configure resources in the console and deploy using CloudFormation

You can configure resources using the Amazon Web Services Management Console and deploy using the CloudFormation templates.

Follow these steps.

  1. Instead of choosing Submit, choose Download CloudFormation template parameters at the end of the tutorial in Getting started with SageMaker HyperPod using the SageMaker AI console. The tutorial contains important configuration information you will need to create your cluster successfully.

    Important

    If you choose Submit, you will not be able to deploy a cluster with the same name until you delete the cluster.

    After you choose Download CloudFormation template parameters, the Using the configuration file to create the cluster using the Amazon CLI window will appear on the right side of the page.

  2. On the Using the configuration file to create the cluster using the Amazon CLI window, choose Download configuration parameters file. The file will be downloaded to your machine. You can edit the configuration JSON file based on your needs or leave it as-is, if no change is required.

  3. In the terminal, navigate to the location of the parameter file file://params.json.

  4. Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url https://aws-sagemaker-hyperpod-cluster-setup.amazonaws.com/templates-slurm/main-stack-slurm-based-template.yaml --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  5. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  6. After the status of the cluster turns to InService, you can start logging into the cluster nodes. To access the cluster nodes and start running ML workloads, see Jobs on SageMaker HyperPod clusters.

Configure and deploy resources using CloudFormation

You can configure and deploy resources using the CloudFormation templates for SageMaker HyperPod.

Follow these steps.

  1. Download a CloudFormation template for SageMaker HyperPod from the sagemaker-hyperpod-cluster-setup GitHub repository.

  2. Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url URL_of_the_file_that_contains_the_template_body --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  3. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  4. After the status of the cluster turns to InService, you can start logging into the cluster nodes.