

# Setting up multiple controller nodes for a SageMaker HyperPod Slurm cluster
Setting up multiple controller nodes

This topic explains how to configure multiple controller (head) nodes in a SageMaker HyperPod Slurm cluster using lifecycle scripts. Before you start, review the prerequisites listed in [Prerequisites for using SageMaker HyperPod](sagemaker-hyperpod-prerequisites.md) and familiarize yourself with the lifecycle scripts in [Customizing SageMaker HyperPod clusters using lifecycle scripts](sagemaker-hyperpod-lifecycle-best-practices-slurm.md). The instructions in this topic use Amazon CLI commands in Amazon Linux environment. Note that the environment variables used in these commands are available in the current session unless explicitly preserved.

**Topics**
+ [

# Provisioning resources using Amazon CloudFormation stacks
](sagemaker-hyperpod-multihead-slurm-cfn.md)
+ [

# Creating and attaching an IAM policy
](sagemaker-hyperpod-multihead-slurm-iam.md)
+ [

# Preparing and uploading lifecycle scripts
](sagemaker-hyperpod-multihead-slurm-scripts.md)
+ [

# Creating a SageMaker HyperPod cluster
](sagemaker-hyperpod-multihead-slurm-create.md)
+ [

# Considering important notes
](sagemaker-hyperpod-multihead-slurm-notes.md)
+ [

# Reviewing environment variables reference
](sagemaker-hyperpod-multihead-slurm-variables-reference.md)

# Provisioning resources using Amazon CloudFormation stacks
Provisioning resources

To set up multiple controller nodes in a HyperPod Slurm cluster, provision Amazon resources through two Amazon CloudFormation stacks: [Provision basic resources](#sagemaker-hyperpod-multihead-slurm-cfn-basic) and [Provision additional resources to support multiple controller nodes](#sagemaker-hyperpod-multihead-slurm-cfn-multihead).

## Provision basic resources


Follow these steps to provision basic resources for your Amazon SageMaker HyperPod Slurm cluster.

1. Download the [sagemaker-hyperpod.yaml](https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/5.sagemaker-hyperpod/sagemaker-hyperpod.yaml) template file to your machine. This YAML file is an Amazon CloudFormation template that defines the following resources to create for your Slurm cluster.
   + An execution IAM role for the compute node instance group
   + An Amazon S3 bucket to store the lifecycle scripts
   + Public and private subnets (private subnets have internet access through NAT gateways)
   + Internet Gateway/NAT gateways
   + Two Amazon EC2 security groups
   + An Amazon FSx volume to store configuration files

1. Run the following CLI command to create a Amazon CloudFormation stack named `sagemaker-hyperpod`. Define the Availability Zone (AZ) IDs for your cluster in `PrimarySubnetAZ` and `BackupSubnetAZ`. For example, *use1-az4* is an AZ ID for an Availability Zone in the `us-east-1` Region. For more information, see [Availability Zone IDs](https://docs.amazonaws.cn//ram/latest/userguide/working-with-az-ids.html) and [Setting up SageMaker HyperPod clusters across multiple AZs](sagemaker-hyperpod-prerequisites.md#sagemaker-hyperpod-prerequisites-multiple-availability-zones).

   ```
   aws cloudformation deploy \
   --template-file /path_to_template/sagemaker-hyperpod.yaml \
   --stack-name sagemaker-hyperpod \
   --parameter-overrides PrimarySubnetAZ=use1-az4 BackupSubnetAZ=use1-az1 \
   --capabilities CAPABILITY_IAM
   ```

   For more information, see [deploy](https://docs.amazonaws.cn//cli/latest/reference/cloudformation/deploy/) from the Amazon Command Line Interface Reference. The stack creation can take a few minutes to complete. When it's complete, you will see the following in your command line interface.

   ```
   Waiting for changeset to be created..
   Waiting for stack create/update to complete
   Successfully created/updated stack - sagemaker-hyperpod
   ```

1. (Optional) Verify the stack in the [Amazon CloudFormation console](https://console.aws.amazon.com/cloudformation/home).
   + From the left navigation, choose **Stack**.
   + On the **Stack** page, find and choose **sagemaker-hyperpod**.
   + Choose the tabs like **Resources** and **Outputs** to review the resources and outputs.

1. Create environment variables from the stack (`sagemaker-hyperpod`) outputs. You will use values of these variables to [Provision additional resources to support multiple controller nodes](#sagemaker-hyperpod-multihead-slurm-cfn-multihead).

   ```
   source .env
   PRIMARY_SUBNET=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`PrimaryPrivateSubnet`].OutputValue' --output text)
   BACKUP_SUBNET=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`BackupPrivateSubnet`].OutputValue' --output text)
   EMAIL=$(bash -c 'read -p "INPUT YOUR SNSSubEmailAddress HERE: " && echo $REPLY')
   DB_USER_NAME=$(bash -c 'read -p "INPUT YOUR DB_USER_NAME HERE: " && echo $REPLY')
   SECURITY_GROUP=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`SecurityGroup`].OutputValue' --output text)
   ROOT_BUCKET_NAME=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`AmazonS3BucketName`].OutputValue' --output text)
   SLURM_FSX_DNS_NAME=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`FSxLustreFilesystemDNSname`].OutputValue' --output text)
   SLURM_FSX_MOUNT_NAME=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`FSxLustreFilesystemMountname`].OutputValue' --output text)
   COMPUTE_NODE_ROLE=$(aws --region $REGION cloudformation describe-stacks --stack-name $SAGEMAKER_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`AmazonSagemakerClusterExecutionRoleArn`].OutputValue' --output text)
   ```

   When you see prompts asking for your email address and database user name, enter values like the following.

   ```
   INPUT YOUR SNSSubEmailAddress HERE: Email_address_to_receive_SNS_notifications
   INPUT YOUR DB_USER_NAME HERE: Database_user_name_you_define
   ```

   To verify variable values, use the `print $variable` command.

   ```
   print $REGION
   us-east-1
   ```

## Provision additional resources to support multiple controller nodes


Follow these steps to provision additional resources for your Amazon SageMaker HyperPod Slurm cluster with multiple controller nodes.

1. Download the [sagemaker-hyperpod-slurm-multi-headnode.yaml](https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/5.sagemaker-hyperpod/sagemaker-hyperpod-slurm-multi-headnode.yaml) template file to your machine. This second YAML file is an Amazon CloudFormation template that defines the additional resources to create for multiple controller nodes support in your Slurm cluster.
   + An execution IAM role for the controller node instance group
   + An Amazon RDS for MariaDB instance
   + An Amazon SNS topic and subscription
   + Amazon Secrets Manager credentials for Amazon RDS for MariaDB

1. Run the following CLI command to create a Amazon CloudFormation stack named `sagemaker-hyperpod-mh`. This second stack uses the Amazon CloudFormation template to create additional Amazon resources to support the multiple controller nodes architecture.

   ```
   aws cloudformation deploy \
   --template-file /path_to_template/slurm-multi-headnode.yaml \
   --stack-name sagemaker-hyperpod-mh \
   --parameter-overrides \
   SlurmDBSecurityGroupId=$SECURITY_GROUP \
   SlurmDBSubnetGroupId1=$PRIMARY_SUBNET \
   SlurmDBSubnetGroupId2=$BACKUP_SUBNET \
   SNSSubEmailAddress=$EMAIL \
   SlurmDBUsername=$DB_USER_NAME \
   --capabilities CAPABILITY_NAMED_IAM
   ```

   For more information, see [deploy](https://docs.amazonaws.cn//cli/latest/reference/cloudformation/deploy/) from the Amazon Command Line Interface Reference. The stack creation can take a few minutes to complete. When it's complete, you will see the following in your command line interface.

   ```
   Waiting for changeset to be created..
   Waiting for stack create/update to complete
   Successfully created/updated stack - sagemaker-hyperpod-mh
   ```

1. (Optional) Verify the stack in the [Amazon Cloud Formation console](https://console.aws.amazon.com/cloudformation/home).
   + From the left navigation, choose **Stack**.
   + On the **Stack** page, find and choose **sagemaker-hyperpod-mh**.
   + Choose the tabs like **Resources** and **Outputs** to review the resources and outputs.

1. Create environment variables from the stack (`sagemaker-hyperpod-mh`) outputs. You will use values of these variables to update the configuration file (`provisioning_parameters.json`) in [Preparing and uploading lifecycle scripts](sagemaker-hyperpod-multihead-slurm-scripts.md).

   ```
   source .env
   SLURM_DB_ENDPOINT_ADDRESS=$(aws --region us-east-1 cloudformation describe-stacks --stack-name $MULTI_HEAD_SLURM_STACK --query 'Stacks[0].Outputs[?OutputKey==`SlurmDBEndpointAddress`].OutputValue' --output text)
   SLURM_DB_SECRET_ARN=$(aws --region us-east-1 cloudformation describe-stacks --stack-name $MULTI_HEAD_SLURM_STACK --query 'Stacks[0].Outputs[?OutputKey==`SlurmDBSecretArn`].OutputValue' --output text)
   SLURM_EXECUTION_ROLE_ARN=$(aws --region us-east-1 cloudformation describe-stacks --stack-name $MULTI_HEAD_SLURM_STACK --query 'Stacks[0].Outputs[?OutputKey==`SlurmExecutionRoleArn`].OutputValue' --output text)
   SLURM_SNS_FAILOVER_TOPIC_ARN=$(aws --region us-east-1 cloudformation describe-stacks --stack-name $MULTI_HEAD_SLURM_STACK --query 'Stacks[0].Outputs[?OutputKey==`SlurmFailOverSNSTopicArn`].OutputValue' --output text)
   ```

# Creating and attaching an IAM policy
Creating and attaching an IAM policy

This section explains how to create an IAM policy and attach it to the execution role you created in [Provision additional resources to support multiple controller nodes](sagemaker-hyperpod-multihead-slurm-cfn.md#sagemaker-hyperpod-multihead-slurm-cfn-multihead).

1. Download the [IAM policy example](https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/5.sagemaker-hyperpod/1.AmazonSageMakerClustersExecutionRolePolicy.json) to your machine from the GitHub repository.

1. Create an IAM policy with the downloaded example, using the [create-policy](https://docs.amazonaws.cn//cli/latest/reference/iam/create-policy.html) CLI command.

   ```
   aws --region us-east-1 iam create-policy \
       --policy-name AmazonSagemakerExecutionPolicy \
       --policy-document file://1.AmazonSageMakerClustersExecutionRolePolicy.json
   ```

   Example output of the command.

   ```
   {
       "Policy": {
           "PolicyName": "AmazonSagemakerExecutionPolicy",
           "PolicyId": "ANPAXISIWY5UYZM7WJR4W",
           "Arn": "arn:aws:iam::111122223333:policy/AmazonSagemakerExecutionPolicy",
           "Path": "/",
           "DefaultVersionId": "v1",
           "AttachmentCount": 0,
           "PermissionsBoundaryUsageCount": 0,
           "IsAttachable": true,
           "CreateDate": "2025-01-22T20:01:21+00:00",
           "UpdateDate": "2025-01-22T20:01:21+00:00"
       }
   }
   ```

1. Attach the policy `AmazonSagemakerExecutionPolicy` to the Slurm execution role you created in [Provision additional resources to support multiple controller nodes](sagemaker-hyperpod-multihead-slurm-cfn.md#sagemaker-hyperpod-multihead-slurm-cfn-multihead), using the [attach-role-policy](https://docs.amazonaws.cn//cli/latest/reference/iam/attach-role-policy.html) CLI command.

   ```
   aws --region us-east-1 iam attach-role-policy \
       --role-name AmazonSagemakerExecutionRole \
       --policy-arn arn:aws:iam::111122223333:policy/AmazonSagemakerExecutionPolicy
   ```

   This command doesn't produce any output.

   (Optional) If you use environment variables, here are the example commands.
   + To get the role name and policy name 

     ```
     POLICY=$(aws --region $REGION iam list-policies --query 'Policies[?PolicyName==AmazonSagemakerExecutionPolicy].Arn' --output text)
     ROLENAME=$(aws --region $REGION iam list-roles --query "Roles[?Arn=='${SLURM_EXECUTION_ROLE_ARN}'].RoleName" —output text)
     ```
   + To attach the policy

     ```
     aws  --region us-east-1 iam attach-role-policy \
          --role-name $ROLENAME --policy-arn $POLICY
     ```

For more information, see [IAM role for SageMaker HyperPod](sagemaker-hyperpod-prerequisites-iam.md#sagemaker-hyperpod-prerequisites-iam-role-for-hyperpod).

# Preparing and uploading lifecycle scripts
Preparing and uploading lifecycle scripts

After creating all the required resources, you'll need to set up [lifecycle scripts](https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts) for your SageMaker HyperPod cluster. These [lifecycle scripts](https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts) provide a [base configuration](https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config) you can use to create a basic HyperPod Slurm cluster.

## Prepare the lifecycle scripts


Follow these steps to get the lifecycle scripts.

1. Download the [lifecycle scripts](https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts) from the GitHub repository to your machine.

1. Upload the [lifecycle scripts](https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts) to the Amazon S3 bucket you created in [Provision basic resources](sagemaker-hyperpod-multihead-slurm-cfn.md#sagemaker-hyperpod-multihead-slurm-cfn-basic), using the [cp](https://docs.amazonaws.cn//cli/latest/reference/s3/cp.html) CLI command.

   ```
   aws s3 cp --recursive LifeCycleScripts/base-config s3://${ROOT_BUCKET_NAME}/LifeCycleScripts/base-config
   ```

## Create configuration file


Follow these steps to create the configuration file and upload it to the same Amazon S3 bucket where you store the lifecycle scripts.

1. Create a configuration file named `provisioning_parameters.json` with the following configuration. Note that `slurm_sns_arn` is optional. If not provided, HyperPod will not set up the Amazon SNS notifications.

   ```
   cat <<EOF > /tmp/provisioning_parameters.json
   {
     "version": "1.0.0",
     "workload_manager": "slurm",
     "controller_group": "$CONTOLLER_IG_NAME",
     "login_group": "my-login-group",
     "worker_groups": [
       {
         "instance_group_name": "$COMPUTE_IG_NAME",
         "partition_name": "dev"
       }
     ],
     "fsx_dns_name": "$SLURM_FSX_DNS_NAME",
     "fsx_mountname": "$SLURM_FSX_MOUNT_NAME",
     "slurm_configurations": {
       "slurm_database_secret_arn": "$SLURM_DB_SECRET_ARN",
       "slurm_database_endpoint": "$SLURM_DB_ENDPOINT_ADDRESS",
       "slurm_shared_directory": "/fsx",
       "slurm_database_user": "$DB_USER_NAME",
       "slurm_sns_arn": "$SLURM_SNS_FAILOVER_TOPIC_ARN"
     }
   }
   EOF
   ```

1. Upload the `provisioning_parameters.json` file to the same Amazon S3 bucket where you store the lifecycle scripts.

   ```
   aws s3 cp /tmp/provisioning_parameters.json s3://${ROOT_BUCKET_NAME}/LifeCycleScripts/base-config/provisioning_parameters.json
   ```
**Note**  
If you are using API-driven configuration, the `provisioning_parameters.json` file is not required. With API-driven configuration, you define Slurm node types, partitions, and FSx mounting directly in the CreateCluster API payload. For details, see [Getting started with SageMaker HyperPod using the Amazon CLI](smcluster-getting-started-slurm-cli.md).

## Verify files in Amazon S3 bucket


After you upload all the lifecycle scripts and the `provisioning_parameters.json` file, your Amazon S3 bucket should look like the following.

![\[Image showing all the lifecycle scripts uploaded to the Amazon S3 bucket in the Amazon Simple Storage Service console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/hyperpod/hyperpod-lifecycle-scripts-s3.png)


For more information, see [Start with base lifecycle scripts provided by HyperPod](https://docs.amazonaws.cn//sagemaker/latest/dg/sagemaker-hyperpod-lifecycle-best-practices-slurm-slurm-base-config.html).

# Creating a SageMaker HyperPod cluster
Creating a cluster

After setting up all the required resources and uploading the scripts to the Amazon S3 bucket, you can create a cluster.

1. To create a cluster, run the [https://docs.amazonaws.cn//cli/latest/reference/sagemaker/create-cluster.html](https://docs.amazonaws.cn//cli/latest/reference/sagemaker/create-cluster.html) Amazon CLI command. The creation process can take up to 15 minutes to complete.

   ```
   aws --region $REGION sagemaker create-cluster \
       --cluster-name $HP_CLUSTER_NAME \
       --vpc-config '{
           "SecurityGroupIds":["'$SECURITY_GROUP'"],
           "Subnets":["'$PRIMARY_SUBNET'", "'$BACKUP_SUBNET'"]
       }' \
       --instance-groups '[{                  
       "InstanceGroupName": "'$CONTOLLER_IG_NAME'",
       "InstanceType": "ml.t3.medium",
       "InstanceCount": 2,
       "LifeCycleConfig": {
           "SourceS3Uri": "s3://'$BUCKET_NAME'",
           "OnCreate": "on_create.sh"
       },
       "ExecutionRole": "'$SLURM_EXECUTION_ROLE_ARN'",
       "ThreadsPerCore": 1
   },
   {
       "InstanceGroupName": "'$COMPUTE_IG_NAME'",          
       "InstanceType": "ml.c5.xlarge",
       "InstanceCount": 2,
       "LifeCycleConfig": {
           "SourceS3Uri": "s3://'$BUCKET_NAME'",
           "OnCreate": "on_create.sh"
       },
       "ExecutionRole": "'$COMPUTE_NODE_ROLE'",
       "ThreadsPerCore": 1
   }]'
   ```

   After successful execution, the command returns the cluster ARN like the following.

   ```
   {
       "ClusterArn": "arn:aws:sagemaker:us-east-1:111122223333:cluster/cluster_id"
   }
   ```

1. (Optional) To check the status of your cluster, you can use the SageMaker AI console ([https://console.amazonaws.cn/sagemaker/](https://console.amazonaws.cn/sagemaker/)). From the left navigation, choose **HyperPod Clusters**, then choose **Cluster Management**. Choose a cluster name to open the cluster details page. If your cluster is created successfully, you will see the cluster status is **InService**.  
![\[Image showing a HyperPod Slurm cluster with multiple controller nodes in the Amazon SageMaker AI console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/hyperpod/hyperpod-lifecycle-multihead-cluster.png)

# Considering important notes
Considerations

This section provides several important notes which you might find helpful. 

1. To migrate to a multi-controller Slurm cluster, complete these steps.

   1. Follow the instructions in [Provisioning resources using Amazon CloudFormation stacks](sagemaker-hyperpod-multihead-slurm-cfn.md) to provision all the required resources.

   1. Follow the instructions in [Preparing and uploading lifecycle scripts](sagemaker-hyperpod-multihead-slurm-scripts.md) to upload the updated lifecycle scripts. When updating the `provisioning_parameters.json` file, move your existing controller group to the `worker_groups` section, and add a new controller group name in the `controller_group` section.

   1. Run the [update-cluster](https://docs.amazonaws.cn/cli/latest/reference/sagemaker/update-cluster.html) API call to create a new controller group and keep the original compute instance groups and controller group.

1. To scale down the number of controller nodes, use the [update-cluster](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-cluster.html) CLI command. For each controller instance group, the minimum number of controller nodes you can scale down to is 1. This means that you cannot scale down the number of controller nodes to 0.
**Important**  
For clusters created before Jan 24, 2025, you must first update your cluster software using the [UpdateClusterSoftware](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_UpdateClusterSoftware.html) API before running the [update-cluster](https://docs.amazonaws.cn/cli/latest/reference/sagemaker/update-cluster.html) CLI command.

   The following is an example CLI command to scale down the number of controller nodes.

   ```
   aws sagemaker update-cluster \
       --cluster-name my_cluster \
       --instance-groups '[{                  
       "InstanceGroupName": "controller_ig_name",
       "InstanceType": "ml.t3.medium",
       "InstanceCount": 3,
       "LifeCycleConfig": {
           "SourceS3Uri": "s3://amzn-s3-demo-bucket1",
           "OnCreate": "on_create.sh"
       },
       "ExecutionRole": "slurm_execution_role_arn",
       "ThreadsPerCore": 1
   },
   {
       "InstanceGroupName": "compute-ig_name",       
       "InstanceType": "ml.c5.xlarge",
       "InstanceCount": 2,
       "LifeCycleConfig": {
           "SourceS3Uri": "s3://amzn-s3-demo-bucket1",
           "OnCreate": "on_create.sh"
       },
       "ExecutionRole": "compute_node_role_arn",
       "ThreadsPerCore": 1
   }]'
   ```

1. To batch delete the controller nodes, use the [batch-delete-cluster-nodes](https://docs.amazonaws.cn/cli/latest/reference/sagemaker/batch-delete-cluster-nodes.html) CLI command. For each controller instance group, you must keep at least one controller node. If you want to batch delete all the controller nodes, the API operation won't work.
**Important**  
For clusters created before Jan 24, 2025, you must first update your cluster software using the [UpdateClusterSoftware](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_UpdateClusterSoftware.html) API before running the [batch-delete-cluster-nodes](https://docs.amazonaws.cn/cli/latest/reference/sagemaker/batch-delete-cluster-nodes.html) CLI command.

   The following is an example CLI command to batch delete the controller nodes.

   ```
   aws sagemaker batch-delete-cluster-nodes --cluster-name my_cluster --node-ids instance_ids_to_delete
   ```

1. To troubleshoot your cluster creation issues, check the failure message from the cluster details page in your SageMaker AI console. You can also use CloudWatch logs to troubleshoot cluster creation issues. From the CloudWatch console, choose **Log groups**. Then, search `clusters` to see the list of log groups related to your cluster creation.  
![\[Image showing Amazon SageMaker HyperPod cluster log groups in the CloudWatch console.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/hyperpod/hyperpod-lifecycle-multihead-logs.png)

# Reviewing environment variables reference
Environment variables reference

The following environment variables are defined and used in the tutorial of [Setting up multiple controller nodes for a SageMaker HyperPod Slurm cluster](sagemaker-hyperpod-multihead-slurm-setup.md). These environment variables are only available in the current session unless explicitly preserved. They are defined using the `$variable_name` syntax. Variables with key/value pairs represent Amazon-created resources, while variables without keys are user-defined.


**Environment variables reference**  

| Variable | Description | 
| --- | --- | 
| \$1BACKUP\$1SUBNET |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1COMPUTE\$1IG\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1COMPUTE\$1NODE\$1ROLE |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1CONTOLLER\$1IG\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1DB\$1USER\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1EMAIL |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1PRIMARY\$1SUBNET |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1POLICY |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1REGION |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1ROOT\$1BUCKET\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SECURITY\$1GROUP |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1DB\$1ENDPOINT\$1ADDRESS |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1DB\$1SECRET\$1ARN |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1EXECUTION\$1ROLE\$1ARN |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1FSX\$1DNS\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1FSX\$1MOUNT\$1NAME |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 
| \$1SLURM\$1SNS\$1FAILOVER\$1TOPIC\$1ARN |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/sagemaker-hyperpod-multihead-slurm-variables-reference.html)  | 