设置和可视化 HyperPod 任务管理仪表板需要以下权限。本节扩展了中列出的权限集群管理员的 IAM 用户。
要管理任务监管,请使用示例策略:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sagemaker:ListClusters",
"sagemaker:DescribeCluster",
"sagemaker:ListComputeQuotas",
"sagemaker:CreateComputeQuota",
"sagemaker:UpdateComputeQuota",
"sagemaker:DescribeComputeQuota",
"sagemaker:DeleteComputeQuota",
"sagemaker:ListClusterSchedulerConfigs",
"sagemaker:DescribeClusterSchedulerConfig",
"sagemaker:CreateClusterSchedulerConfig",
"sagemaker:UpdateClusterSchedulerConfig",
"sagemaker:DeleteClusterSchedulerConfig",
"eks:ListAddons",
"eks:CreateAddon",
"eks:DescribeAddon",
"eks:DescribeCluster",
"eks:DescribeAccessEntry",
"eks:ListAssociatedAccessPolicies",
"eks:AssociateAccessPolicy",
"eks:DisassociateAccessPolicy"
],
"Resource": "*"
}
]
}
要授予管理 Amazon O CloudWatch bservability Amazon EKS 和通过 SageMaker AI 控制台查看 HyperPod 集群控制面板的权限,请使用以下示例策略:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"eks:ListAddons",
"eks:CreateAddon",
"eks:UpdateAddon",
"eks:DescribeAddon",
"eks:DescribeAddonVersions",
"sagemaker:DescribeCluster",
"sagemaker:DescribeClusterNode",
"sagemaker:ListClusterNodes",
"sagemaker:ListClusters",
"sagemaker:ListComputeQuotas",
"sagemaker:DescribeComputeQuota",
"sagemaker:ListClusterSchedulerConfigs",
"sagemaker:DescribeClusterSchedulerConfig",
"eks:DescribeCluster",
"cloudwatch:GetMetricData",
"eks:AccessKubernetesApi"
],
"Resource": "*"
}
]
}
导航到控制台中的 “ SageMaker HyperPod 控制面板” 选项卡以安装 Amazon O CloudWatch bservability EKS。要确保控制面板中包含与任务治理相关的指标,请启用 Kueue 指标复选框。在达到免费套餐限制后,启用 Kueue CloudWatch 指标会启用指标费用。有关更多信息,请参阅 Amazon CloudWatch 定价中的指标。
使用以下 EKS Amazon CLI 命令安装插件:
aws eks create-addon --cluster-name cluster-name
--addon-name amazon-cloudwatch-observability
--configuration-values "configuration json
"
以下是配置值的 JSON 示例:
{
"agent": {
"config": {
"logs": {
"metrics_collected": {
"kubernetes": {
"kueue_container_insights": true,
"enhanced_container_insights": true
},
"application_signals": { }
}
},
"traces": {
"traces_collected": {
"application_signals": { }
}
}
},
},
}