Slurm 示例 SGE 和 Torque 示例 Amazon Batch 示例

示例

以下示例配置演示了使用以下方法的 Amazon ParallelCluster 配置 Slurm, Torque和 Amazon Batch 调度程序。

注意

从 2.11.5 版开始， Amazon ParallelCluster 不支持使用 SGE 或 Torque 调度器。

Slurm Workload Manager (`slurm`)

以下示例使用 slurm 计划程序启动集群。该示例配置启动 1 个包含 2 个作业队列的集群。第一个队列 spot 最初具有 2 个可用的 t3.micro 竞价型实例。它可以纵向扩展到最多 10 个实例，并可在 10 分钟没有作业运行时缩减到最少 1 个实例（可使用 scaledown_idletime 设置进行调整）。第二个队列 ondemand 从没有实例开始，最多可以纵向扩展到 5 个 t3.micro 按需型实例。


[global]
update_check = true
sanity_check = true
cluster_template = slurm

[aws]
aws_region_name = <your Amazon Web Services 区域>

[vpc public]
master_subnet_id = <your subnet>
vpc_id = <your VPC>

[cluster slurm]
key_name = <your EC2 keypair name>
base_os = alinux2                   # optional, defaults to alinux2
scheduler = slurm
master_instance_type = t3.micro     # optional, defaults to t3.micro
vpc_settings = public
queue_settings = spot,ondemand

[queue spot]
compute_resource_settings = spot_i1
compute_type = spot                 # optional, defaults to ondemand

[compute_resource spot_i1]
instance_type = t3.micro
min_count = 1                       # optional, defaults to 0
initial_count = 2                   # optional, defaults to 0

[queue ondemand]
compute_resource_settings = ondemand_i1

[compute_resource ondemand_i1]
instance_type = t3.micro
max_count = 5                       # optional, defaults to 10

Son of Grid Engine (`sge`) 和 Torque Resource Manager (`torque`)

注意

此示例仅适用于 Amazon ParallelCluster 2.11.4 及以下版本。从 2.11.5 版开始， Amazon ParallelCluster 不支持使用 SGE 或 Torque 调度器。

以下示例使用 torque 划 sge 调度器启动集群。要将 SGE，更改scheduler = torque为scheduler = sge。该示例配置允许最多 5 个并发节点，且当 10 分钟没有作业运行时缩减到 2 个节点。


[global]
update_check = true
sanity_check = true
cluster_template = torque

[aws]
aws_region_name = <your Amazon Web Services 区域>

[vpc public]
master_subnet_id = <your subnet>
vpc_id = <your VPC>

[cluster torque]
key_name = <your EC2 keypair name>but they aren't eligible for future updates
base_os = alinux2                   # optional, defaults to alinux2
scheduler = torque                  # optional, defaults to sge
master_instance_type = t3.micro     # optional, defaults to t3.micro
vpc_settings = public
initial_queue_size = 2              # optional, defaults to 0
maintain_initial_size = true        # optional, defaults to false
max_queue_size = 5                  # optional, defaults to 10

注意

从 2.11.5 版开始， Amazon ParallelCluster 不支持使用 SGE 或 Torque 调度器。如果您使用这些版本，则可以继续使用它们，或者从 Amazon 服务和 Support 团队那里获得故障排除 Amazon 支持。

Amazon Batch (`awsbatch`)

以下示例使用 awsbatch 计划程序启动集群。它设置为根据您的作业资源需求来选择更好的实例类型。

该示例配置允许最多 40 个并发 vCPUs，并且在 10 分钟内没有作业运行时缩小到零（可使用该scaledown_idletime设置进行调整）。


[global]
update_check = true
sanity_check = true
cluster_template = awsbatch

[aws]
aws_region_name = <your Amazon Web Services 区域>

[vpc public]
master_subnet_id = <your subnet>
vpc_id = <your VPC>

[cluster awsbatch]
scheduler = awsbatch
compute_instance_type = optimal # optional, defaults to optimal
min_vcpus = 0                   # optional, defaults to 0
desired_vcpus = 0               # optional, defaults to 4
max_vcpus = 40                  # optional, defaults to 20
base_os = alinux2               # optional, defaults to alinux2, controls the base_os of
                                # the head node and the docker image for the compute fleet
key_name = <your EC2 keypair name>
vpc_settings = public

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

[vpc] 部分

如何 Amazon ParallelCluster 运作

示例

注意

目录

Slurm Workload Manager (slurm)

Son of Grid Engine (sge) 和 Torque Resource Manager (torque)

注意

注意

Amazon Batch (awsbatch)

Slurm Workload Manager (`slurm`)

Son of Grid Engine (`sge`) 和 Torque Resource Manager (`torque`)

Amazon Batch (`awsbatch`)