Creating SageMaker HyperPod clusters using Amazon CloudFormation templates
You can create SageMaker HyperPod clusters using the CloudFormation templates for HyperPod. You must install Amazon CLI to proceed.
In this topic:
Configure resources in the console and deploy using CloudFormation
You can configure resources using the Amazon Web Services Management Console and deploy using the CloudFormation templates.
Follow these steps.
-
Instead of choosing Submit, choose Download CloudFormation template parameters at the end of the tutorial in Getting started with SageMaker HyperPod using the SageMaker AI console. The tutorial contains important configuration information you will need to create your cluster successfully.
Important
If you choose Submit, you will not be able to deploy a cluster with the same name until you delete the cluster.
After you choose Download CloudFormation template parameters, the Using the configuration file to create the cluster using the Amazon CLI window will appear on the right side of the page.
-
On the Using the configuration file to create the cluster using the Amazon CLI window, choose Download configuration parameters file. The file will be downloaded to your machine. You can edit the configuration JSON file based on your needs or leave it as-is, if no change is required.
-
In the terminal, navigate to the location of the parameter file
file://params.json
. -
Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.
aws cloudformation create-stack --stack-name
my-stack
--template-urlhttps://aws-sagemaker-hyperpod-cluster-setup.amazonaws.com/templates-slurm/main-stack-slurm-based-template.yaml
--parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM -
To view the status of the resources provisioning, navigate to the CloudFormation console
. After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.
-
After the status of the cluster turns to
InService
, you can start logging into the cluster nodes. To access the cluster nodes and start running ML workloads, see Jobs on SageMaker HyperPod clusters.
Configure and deploy resources using CloudFormation
You can configure and deploy resources using the CloudFormation templates for SageMaker HyperPod.
Follow these steps.
-
Download a CloudFormation template for SageMaker HyperPod from the sagemaker-hyperpod-cluster-setup
GitHub repository. -
Run the create-stack Amazon CLI command to deploy the CloudFormation stack that will provision the configured resources and create the HyperPod cluster.
aws cloudformation create-stack --stack-name
my-stack
--template-urlURL_of_the_file_that_contains_the_template_body
--parameters file://params.json --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM -
To view the status of the resources provisioning, navigate to the CloudFormation console.
After the cluster creation completes, view the new cluster under Clusters in the main pane of the SageMaker HyperPod console. You can check the status of it displayed under the Status column.
-
After the status of the cluster turns to
InService
, you can start logging into the cluster nodes.