Creating SageMaker HyperPod clusters using Amazon CloudFormation templates - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Creating SageMaker HyperPod clusters using Amazon CloudFormation templates

You can create SageMaker HyperPod clusters using the CloudFormation templates for HyperPod. You must install Amazon CLI to proceed.

Configure resources in the console and deploy using CloudFormation

You can configure resources using the Amazon Web Services Management Console and deploy using the CloudFormation templates.

Follow these steps.

  1. Follow instructions in Getting started with SageMaker HyperPod using the SageMaker AI console to configure your Amazon resources that you will need to create your cluster.

  2. At the end of the Create cluster page, choose Download CloudFormation template parameters. This will open the Using the configuration file to create the cluster using the Amazon CLI window on the right of the page.

  3. On the Using the configuration file to create the cluster using the Amazon CLI window, choose Download configuration parameters file. The file will be downloaded to your machine. You can edit the configuration JSON file based on your needs or leave it as is if no change is required.

  4. Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url https://aws-sagemaker-hyperpod-cluster-setup.amazonaws.com/templates-slurm/main-stack-slurm-based-template.yaml --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  5. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  6. After the status of the cluster turns to InService, you can start logging into the cluster nodes. To access the cluster nodes and start running ML workloads, see Jobs on SageMaker HyperPod clusters.

Configure resources and deploy using CloudFormation

You can configure resources and deploy using the CloudFormation templates for SageMaker HyperPod.

Follow these steps.

  1. Download a CloudFormation template for SageMaker HyperPod from the sagemaker-hyperpod-cluster-setup GitHub repository.

  2. Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.

    aws cloudformation create-stack --stack-name my-stack --template-url URL_of_the_file_that_contains_the_template_body --parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
  3. To view the status of the resources provisioning, navigate to the CloudFormation console.

    After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.

  4. After the status of the cluster turns to InService, you can start logging into the cluster nodes. To access the cluster nodes and start running ML workloads, see Jobs on SageMaker HyperPod clusters.