Cluster management with custom AMIs
After the custom AMI is built, you can use it for creating or updating an Amazon SageMaker HyperPod cluster. You can also scale up or add instance groups that use the new AMI.
Permissions required for cluster operations
Add the following permissions to the cluster admin user who operates and configures SageMaker HyperPod clusters. The following policy example includes the minimum set of permissions for cluster administrators to run the SageMaker HyperPod core APIs and manage SageMaker HyperPod clusters with custom AMI.
Note that AMI and AMI EBS snapshot sharing permissions are included through
ModifyImageAttribute
and ModifySnapshotAttribute
API permissions as part of the following policy. For scoping down the sharing
permissions, you can take the following steps:
-
Add tags to control the AMI sharing permissions to AMI and AMI snapshot. For example, you can tag the AMI with
AllowSharing
astrue
. -
Add the context key in the policy to only allow AMI sharing for AMIs tagged with certain tags.
The following policy is a scoped down policy to ensure only AMIs
tagged with AllowSharing
as true
are
allowed.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "
arn:aws:iam::111122223333:role/your-execution-role-name
" }, { "Effect": "Allow", "Action": [ "sagemaker:CreateCluster", "sagemaker:DeleteCluster", "sagemaker:DescribeCluster", "sagemaker:DescribeCluterNode", "sagemaker:ListClusterNodes", "sagemaker:ListClusters", "sagemaker:UpdateCluster", "sagemaker:UpdateClusterSoftware", "sagemaker:BatchDeleteClusterNodes", "eks:DescribeCluster", "eks:CreateAccessEntry", "eks:DescribeAccessEntry", "eks:DeleteAccessEntry", "eks:AssociateAccessPolicy", "iam:CreateServiceLinkedRole", "ec2:DescribeImages", "ec2:DescribeSnapshots" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:ModifyImageAttribute", "ec2:ModifySnapshotAttribute" ], "Resource": "*", "Condition": { "StringEquals": { "ec2:ResourceTag/AllowSharing": "true" } } } ] }
Create a cluster
You can specify your custom AMI in the ImageId
field for the
CreateCluster
operation.
The following example shows how to create a cluster with a custom AMI:
aws sagemaker create-cluster \ --cluster-name
<exampleClusterName>
\ --orchestrator 'Eks={ClusterArn='<eks_cluster_arn>
'}' \ --node-provisioning-mode Continuous \ --instance-groups '{ "InstanceGroupName": "<exampleGroupName>
", "InstanceType": "ml.c5.2xlarge", "InstanceCount": 2, "LifeCycleConfig": { "SourceS3Uri": "<s3://amzn-s3-demo-bucket>
", "OnCreate": "on_create_noop.sh" }, "ImageId": "<your_custom_ami>
", "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>
", "ThreadsPerCore": 1, "InstanceStorageConfigs": [ { "EbsVolumeConfig": { "VolumeSizeInGB": 200 } } ] }' --vpc-config '{ "SecurityGroupIds": ["<security_group>
"], "Subnets": ["<subnet>
"] }'
Update the cluster software
If you want to update an existing instance group on your cluster with your
custom AMI, you can use the UpdateClusterSoftware
operation and specify
your custom AMI in the ImageId
field. Note that unless you specify
the name of a specific instance group in your request, then the new image is applied to all
of the instance groups in your cluster.
The following example shows how to update a cluster's platform software with a custom AMI:
aws sagemaker update-cluster-software \ --cluster-name
<exampleClusterName>
\ --instance-groups<instanceGroupToUpdate>
\ --image-id<customAmiId>
Scale up an instance group
The following example shows how to scale up an instance group for a cluster using a custom AMI:
aws sagemaker update-cluster \ --cluster-name
<exampleClusterName>
--instance-groups '[{ "InstanceGroupName": "<exampleGroupName>
", "InstanceType": "ml.c5.2xlarge", "InstanceCount": 2, "LifeCycleConfig": { "SourceS3Uri": "<s3://amzn-s3-demo-bucket>
", "OnCreate": "on_create_noop.sh" }, "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>
", "ThreadsPerCore": 1, "ImageId": "<your_custom_ami>
" }]'
Add an instance group
The following example shows how to add an instance group to a cluster using a custom AMI:
aws sagemaker update-cluster \ --cluster-name "
<exampleClusterName>
" \ --instance-groups '{ "InstanceGroupName": "<exampleGroupName>
", "InstanceType": "ml.c5.2xlarge", "InstanceCount": 2, "LifeCycleConfig": { "SourceS3Uri": "<s3://amzn-s3-demo-bucket>
", "OnCreate": "on_create_noop.sh" }, "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>
", "ThreadsPerCore": 1, "ImageId": "<your_custom_ami>
" }' '{ "InstanceGroupName": "<exampleGroupName2>
", "InstanceType": "ml.c5.2xlarge", "InstanceCount": 1, "LifeCycleConfig": { "SourceS3Uri": "<s3://amzn-s3-demo-bucket>
", "OnCreate": "on_create_noop.sh" }, "ExecutionRole": "<arn:aws:iam::444455556666:role/Admin>
", "ThreadsPerCore": 1, "ImageId": "<your_custom_ami>
" }'