使用 Amazon ParallelCluster 和 awsbatch 调度器运行 MPI 作业 - Amazon ParallelCluster
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 Amazon ParallelCluster 和 awsbatch 调度器运行 MPI 作业

本教程将指导您完成使用 awsbatch 作为计划程序运行 MPI 作业的过程。

先决条件

创建集群

首先,我们为使用 awsbatch 作为计划程序的集群创建一个配置。确保将 vpc 部分和 key_name 字段中的缺失数据与配置时创建的资源一起插入。

[global] sanity_check = true [aws] aws_region_name = us-east-1 [cluster awsbatch] base_os = alinux # Replace with the name of the key you intend to use. key_name = key-####### vpc_settings = my-vpc scheduler = awsbatch compute_instance_type = optimal min_vcpus = 2 desired_vcpus = 2 max_vcpus = 24 [vpc my-vpc] # Replace with the id of the vpc you intend to use. vpc_id = vpc-####### # Replace with id of the subnet for the Head node. master_subnet_id = subnet-####### # Replace with id of the subnet for the Compute nodes. # A NAT Gateway is required for MNP. compute_subnet_id = subnet-#######

现在,您可以开始创建集群了。我们将创建的集群称为 awsbatch-tutorial

$ pcluster create -c /path/to/the/created/config/aws_batch.config -t awsbatch awsbatch-tutorial

创建集群时,您会看到类似以下内容的输出:

Beginning cluster creation for cluster: awsbatch-tutorial Creating stack named: parallelcluster-awsbatch Status: parallelcluster-awsbatch - CREATE_COMPLETE MasterPublicIP: 54.160.xxx.xxx ClusterUser: ec2-user MasterPrivateIP: 10.0.0.15

登录到头节点

Amazon ParallelCluster Batch CLI 命令在安装了 Amazon ParallelCluster 的客户端计算机上可用。但是,我们将通过 SSH 进入头节点并从那里提交作业。这使我们能够利用在头实例和所有运行 Amazon Batch 作业的 Docker 实例之间共享的 NFS 卷。

使用您的 SSH pem 文件登录到头节点。

$ pcluster ssh awsbatch-tutorial -i /path/to/keyfile.pem

登录后,请运行命令 awsbqueuesawsbhosts,以显示配置的 Amazon Batch 队列和正在运行的 Amazon ECS 实例。

[ec2-user@ip-10-0-0-111 ~]$ awsbqueues jobQueueName status --------------------------------- -------- parallelcluster-awsbatch-tutorial VALID [ec2-user@ip-10-0-0-111 ~]$ awsbhosts ec2InstanceId instanceType privateIpAddress publicIpAddress runningJobs ------------------- -------------- ------------------ ----------------- ------------- i-0d6a0c8c560cd5bed m4.large 10.0.0.235 34.239.174.236 0

如您在输出中看到的,我们有一个正在运行的主机。这是由于我们在配置中为 min_vcpus 选择的值导致的。如果要显示有关 Amazon Batch 队列和主机的其他详细信息,请将 -d 标志添加到命令。

使用 Amazon Batch 运行首个作业

在迁移到 MPI 之前,我们先创建一个虚拟作业,此作业休眠一小段时间,然后输出其自己的主机名,同时问候作为参数传递的名称。

使用以下内容创建名为“hellojob.sh”的文件。

#!/bin/bash sleep 30 echo "Hello $1 from $HOSTNAME" echo "Hello $1 from $HOSTNAME" > "/shared/secret_message_for_${1}_by_${AWS_BATCH_JOB_ID}"

接下来,使用 awsbsub 提交作业并验证其是否运行。

$ awsbsub -jn hello -cf hellojob.sh Luca Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 (hello) has been submitted.

查看您的队列并检查该作业的状态。

$ awsbstat jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- -------- ------------------- ----------- ---------- 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 hello RUNNING 2018-11-12 09:41:29 - -

输出提供了改作业的详细信息。

$ awsbstat 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 jobId : 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 jobName : hello createdAt : 2018-11-12 09:41:21 startedAt : 2018-11-12 09:41:29 stoppedAt : - status : RUNNING statusReason : - jobDefinition : parallelcluster-myBatch:1 jobQueue : parallelcluster-myBatch command : /bin/bash -c 'aws s3 --region us-east-1 cp s3://parallelcluster-mybatch-lui1ftboklhpns95/batch/job-hellojob_sh-1542015680924.sh /tmp/batch/job-hellojob_sh-1542015680924.sh; bash /tmp/batch/job-hellojob_sh-1542015680924.sh Luca' exitCode : - reason : - vcpus : 1 memory[MB] : 128 nodes : 1 logStream : parallelcluster-myBatch/default/c75dac4a-5aca-4238-a4dd-078037453554 log : https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/batch/job;stream=parallelcluster-myBatch/default/c75dac4a-5aca-4238-a4dd-078037453554 -------------------------

请注意,作业当前处于 RUNNING 状态。请等候 30 秒,以便作业完成,然后再次运行 awsbstat

$ awsbstat jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- -------- ------------------- ----------- ----------

现在,您可以看到作业处于 SUCCEEDED 状态。

$ awsbstat -s SUCCEEDED jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- --------- ------------------- ------------------- ---------- 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 hello SUCCEEDED 2018-11-12 09:41:29 2018-11-12 09:42:00 0

由于队列中现在没有作业,因此,我们可以通过 awsbout 命令检查输出。

$ awsbout 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 2018-11-12 09:41:29: Starting Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 download: s3://parallelcluster-mybatch-lui1ftboklhpns95/batch/job-hellojob_sh-1542015680924.sh to tmp/batch/job-hellojob_sh-1542015680924.sh 2018-11-12 09:42:00: Hello Luca from ip-172-31-4-234

我们可以看到作业在实例“ip-172-31-4-234”上成功运行。

如果您查看 /shared 目录,您将找到您的私有消息。

要浏览本教程中未涵盖的所有可用功能,请参阅 Amazon ParallelCluster Batch CLI 文档。在您准备好继续本教程后,我们继续并查看如何提交 MPI 作业。

在多节点并行环境中运行 MPI 作业

当仍然登录到头节点时,在 /shared 目录中创建一个名为 mpi_hello_world.c 的文件。将以下 MPI 程序添加到该文件中:

// Copyright 2011 www.mpitutorial.com // // An intro MPI hello world program that uses MPI_Init, MPI_Comm_size, // MPI_Comm_rank, MPI_Finalize, and MPI_Get_processor_name. // #include <mpi.h> #include <stdio.h> #include <stddef.h> int main(int argc, char** argv) { // Initialize the MPI environment. The two arguments to MPI Init are not // currently used by MPI implementations, but are there in case future // implementations might need the arguments. MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. No more MPI calls can be made after this MPI_Finalize(); }

现在,将以下代码保存为 submit_mpi.sh

#!/bin/bash echo "ip container: $(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)" echo "ip host: $(curl -s "http://169.254.169.254/latest/meta-data/local-ipv4")" # get shared dir IFS=',' _shared_dirs=(${PCLUSTER_SHARED_DIRS}) _shared_dir=${_shared_dirs[0]} _job_dir="${_shared_dir}/${AWS_BATCH_JOB_ID%#*}-${AWS_BATCH_JOB_ATTEMPT}" _exit_code_file="${_job_dir}/batch-exit-code" if [[ "${AWS_BATCH_JOB_NODE_INDEX}" -eq "${AWS_BATCH_JOB_MAIN_NODE_INDEX}" ]]; then echo "Hello I'm the main node $HOSTNAME! I run the mpi job!" mkdir -p "${_job_dir}" echo "Compiling..." /usr/lib64/openmpi/bin/mpicc -o "${_job_dir}/mpi_hello_world" "${_shared_dir}/mpi_hello_world.c" echo "Running..." /usr/lib64/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --allow-run-as-root --machinefile "${HOME}/hostfile" "${_job_dir}/mpi_hello_world" # Write exit status code echo "0" > "${_exit_code_file}" # Waiting for compute nodes to terminate sleep 30 else echo "Hello I'm the compute node $HOSTNAME! I let the main node orchestrate the mpi processing!" # Since mpi orchestration happens on the main node, we need to make sure the containers representing the compute # nodes are not terminated. A simple trick is to wait for a file containing the status code to be created. # All compute nodes are terminated by Amazon Batch if the main node exits abruptly. while [ ! -f "${_exit_code_file}" ]; do sleep 2 done exit $(cat "${_exit_code_file}") fi

我们现在已准备就绪,可以提交第一个 MPI 作业并使其在 3 个节点上并发运行:

$ awsbsub -n 3 -cf submit_mpi.sh

现在,我们监控作业状态并等待其进入 RUNNING 状态:

$ watch awsbstat -d

在作业进入 RUNNING 状态后,我们可以查看其输出。要显示主节点的输出,请将 #0 附加到作业 ID。要显示计算节点的输出,请使用 #1#2

[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0 2018-11-27 15:50:10: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0 2018-11-27 15:50:10: Initializing the environment... 2018-11-27 15:50:10: Starting ssh agents... 2018-11-27 15:50:11: Agent pid 7 2018-11-27 15:50:11: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa) 2018-11-27 15:50:11: Mounting shared file system... 2018-11-27 15:50:11: Generating hostfile... 2018-11-27 15:50:11: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:26: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:41: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:56: Detected 3/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:51:11: Starting the job... download: s3://parallelcluster-awsbatch-tutorial-iwyl4458saiwgwvg/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh 2018-11-27 15:51:12: ip container: 10.0.0.180 2018-11-27 15:51:12: ip host: 10.0.0.245 2018-11-27 15:51:12: Compiling... 2018-11-27 15:51:12: Running... 2018-11-27 15:51:12: Hello I'm the main node! I run the mpi job! 2018-11-27 15:51:12: Warning: Permanently added '10.0.0.199' (RSA) to the list of known hosts. 2018-11-27 15:51:12: Warning: Permanently added '10.0.0.147' (RSA) to the list of known hosts. 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 1 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 5 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 0 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 4 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 2 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 3 out of 6 processors [ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1 2018-11-27 15:50:52: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1 2018-11-27 15:50:52: Initializing the environment... 2018-11-27 15:50:52: Starting ssh agents... 2018-11-27 15:50:52: Agent pid 7 2018-11-27 15:50:52: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa) 2018-11-27 15:50:52: Mounting shared file system... 2018-11-27 15:50:52: Generating hostfile... 2018-11-27 15:50:52: Starting the job... download: s3://parallelcluster-awsbatch-tutorial-iwyl4458saiwgwvg/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh 2018-11-27 15:50:53: ip container: 10.0.0.199 2018-11-27 15:50:53: ip host: 10.0.0.227 2018-11-27 15:50:53: Compiling... 2018-11-27 15:50:53: Running... 2018-11-27 15:50:53: Hello I'm a compute node! I let the main node orchestrate the mpi execution!

我们现在可以确认作业已成功完成:

[ec2-user@ip-10-0-0-111 ~]$ awsbstat -s ALL jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ------------- --------- ------------------- ------------------- ---------- 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d submit_mpi_sh SUCCEEDED 2018-11-27 15:50:10 2018-11-27 15:51:26 -

注意:如果要在作业结束之前终止作业,您可以使用 awsbkill 命令。