使用运行 MPI 作业Amazon ParallelCluster和awsbatch计划程序 - Amazon ParallelCluster
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用运行 MPI 作业Amazon ParallelCluster和awsbatch计划程序

本教程将指导您完成使用 awsbatch 作为计划程序运行 MPI 作业的过程。

先决条件

创建集群

首先,我们为使用 awsbatch 作为计划程序的集群创建一个配置。确保将 vpc 部分和 key_name 字段中的缺失数据与配置时创建的资源一起插入。

[global] sanity_check = true [aws] aws_region_name = us-east-1 [cluster awsbatch] base_os = alinux # Replace with the name of the key you intend to use. key_name = key-####### vpc_settings = my-vpc scheduler = awsbatch compute_instance_type = optimal min_vcpus = 2 desired_vcpus = 2 max_vcpus = 24 [vpc my-vpc] # Replace with the id of the vpc you intend to use. vpc_id = vpc-####### # Replace with id of the subnet for the Head node. master_subnet_id = subnet-####### # Replace with id of the subnet for the Compute nodes. # A NAT Gateway is required for MNP. compute_subnet_id = subnet-#######

现在,您可以开始创建集群了。我们将创建的集群称为 awsbatch-tutorial

$ pcluster create -c /path/to/the/created/config/aws_batch.config -t awsbatch awsbatch-tutorial

创建集群时,您会看到类似以下内容的输出:

Beginning cluster creation for cluster: awsbatch-tutorial Creating stack named: parallelcluster-awsbatch Status: parallelcluster-awsbatch - CREATE_COMPLETE MasterPublicIP: 54.160.xxx.xxx ClusterUser: ec2-user MasterPrivateIP: 10.0.0.15

登录你的 head 节点

Amazon ParallelCluster Batch CLI 命令在安装了 Amazon ParallelCluster 的客户端计算机上可用。但是,我们将通过 SSH 进入头节点并从那里提交作业。这使我们能够利用在头实例和所有运行的 Docker 实例之间共享的 NFS 卷Amazon Batch个作业。

使用您的 SSH pem 文件登录到您的头节点。

$ pcluster ssh awsbatch-tutorial -i /path/to/keyfile.pem

登录后,请运行命令awsbqueuesawsbhosts显示已配置的Amazon Batch队列和正在运行的 Amazon ECS 实例。

[ec2-user@ip-10-0-0-111 ~]$ awsbqueues jobQueueName status --------------------------------- -------- parallelcluster-awsbatch-tutorial VALID [ec2-user@ip-10-0-0-111 ~]$ awsbhosts ec2InstanceId instanceType privateIpAddress publicIpAddress runningJobs ------------------- -------------- ------------------ ----------------- ------------- i-0d6a0c8c560cd5bed m4.large 10.0.0.235 34.239.174.236 0

如您在输出中看到的,我们有一个正在运行的主机。这是由于我们在配置中为 min_vcpus 选择的值导致的。如果要显示有关 Amazon Batch 队列和主机的其他详细信息,请将 -d 标志添加到命令。

使用 Amazon Batch 运行首个作业

在迁移到 MPI 之前,我们先创建一个虚拟作业,此作业休眠一小段时间,然后输出其自己的主机名,同时问候作为参数传递的名称。

使用以下内容创建名为“hellojob.sh”的文件。

#!/bin/bash sleep 30 echo "Hello $1 from $HOSTNAME" echo "Hello $1 from $HOSTNAME" > "/shared/secret_message_for_${1}_by_${AWS_BATCH_JOB_ID}"

接下来,使用 awsbsub 提交作业并验证其是否运行。

$ awsbsub -jn hello -cf hellojob.sh Luca Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 (hello) has been submitted.

查看您的队列并检查该作业的状态。

$ awsbstat jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- -------- ------------------- ----------- ---------- 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 hello RUNNING 2018-11-12 09:41:29 - -

输出提供了改作业的详细信息。

$ awsbstat 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 jobId : 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 jobName : hello createdAt : 2018-11-12 09:41:21 startedAt : 2018-11-12 09:41:29 stoppedAt : - status : RUNNING statusReason : - jobDefinition : parallelcluster-myBatch:1 jobQueue : parallelcluster-myBatch command : /bin/bash -c 'aws s3 --region us-east-1 cp s3://parallelcluster-mybatch-lui1ftboklhpns95/batch/job-hellojob_sh-1542015680924.sh /tmp/batch/job-hellojob_sh-1542015680924.sh; bash /tmp/batch/job-hellojob_sh-1542015680924.sh Luca' exitCode : - reason : - vcpus : 1 memory[MB] : 128 nodes : 1 logStream : parallelcluster-myBatch/default/c75dac4a-5aca-4238-a4dd-078037453554 log : https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/batch/job;stream=parallelcluster-myBatch/default/c75dac4a-5aca-4238-a4dd-078037453554 -------------------------

请注意,作业当前处于 RUNNING 状态。请等候 30 秒,以便作业完成,然后再次运行 awsbstat

$ awsbstat jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- -------- ------------------- ----------- ----------

现在,您可以看到作业处于 SUCCEEDED 状态。

$ awsbstat -s SUCCEEDED jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ----------- --------- ------------------- ------------------- ---------- 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 hello SUCCEEDED 2018-11-12 09:41:29 2018-11-12 09:42:00 0

由于队列中现在没有作业,因此,我们可以通过 awsbout 命令检查输出。

$ awsbout 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 2018-11-12 09:41:29: Starting Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 download: s3://parallelcluster-mybatch-lui1ftboklhpns95/batch/job-hellojob_sh-1542015680924.sh to tmp/batch/job-hellojob_sh-1542015680924.sh 2018-11-12 09:42:00: Hello Luca from ip-172-31-4-234

我们可以看到作业在实例“ip-172-31-4-234”上成功运行。

如果您查看 /shared 目录,您将找到您的私有消息。

要浏览本教程中未涵盖的所有可用功能,请参阅 Amazon ParallelCluster Batch CLI 文档。在您准备好继续本教程后,我们继续并查看如何提交 MPI 作业。

在多节点并行环境中运行 MPI 作业

当仍然登录到头节点时,在/shared名为的目录mpi_hello_world.c. 将以下 MPI 程序添加到该文件中:

// Copyright 2011 www.mpitutorial.com // // An intro MPI hello world program that uses MPI_Init, MPI_Comm_size, // MPI_Comm_rank, MPI_Finalize, and MPI_Get_processor_name. // #include <mpi.h> #include <stdio.h> #include <stddef.h> int main(int argc, char** argv) { // Initialize the MPI environment. The two arguments to MPI Init are not // currently used by MPI implementations, but are there in case future // implementations might need the arguments. MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. No more MPI calls can be made after this MPI_Finalize(); }

现在,将以下代码保存为 submit_mpi.sh

#!/bin/bash echo "ip container: $(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)" echo "ip host: $(curl -s "http://169.254.169.254/latest/meta-data/local-ipv4")" # get shared dir IFS=',' _shared_dirs=(${PCLUSTER_SHARED_DIRS}) _shared_dir=${_shared_dirs[0]} _job_dir="${_shared_dir}/${AWS_BATCH_JOB_ID%#*}-${AWS_BATCH_JOB_ATTEMPT}" _exit_code_file="${_job_dir}/batch-exit-code" if [[ "${AWS_BATCH_JOB_NODE_INDEX}" -eq "${AWS_BATCH_JOB_MAIN_NODE_INDEX}" ]]; then echo "Hello I'm the main node $HOSTNAME! I run the mpi job!" mkdir -p "${_job_dir}" echo "Compiling..." /usr/lib64/openmpi/bin/mpicc -o "${_job_dir}/mpi_hello_world" "${_shared_dir}/mpi_hello_world.c" echo "Running..." /usr/lib64/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --allow-run-as-root --machinefile "${HOME}/hostfile" "${_job_dir}/mpi_hello_world" # Write exit status code echo "0" > "${_exit_code_file}" # Waiting for compute nodes to terminate sleep 30 else echo "Hello I'm the compute node $HOSTNAME! I let the main node orchestrate the mpi processing!" # Since mpi orchestration happens on the main node, we need to make sure the containers representing the compute # nodes are not terminated. A simple trick is to wait for a file containing the status code to be created. # All compute nodes are terminated by Amazon Batch if the main node exits abruptly. while [ ! -f "${_exit_code_file}" ]; do sleep 2 done exit $(cat "${_exit_code_file}") fi

我们现在已准备就绪,可以提交第一个 MPI 作业并使其在 3 个节点上并发运行:

$ awsbsub -n 3 -cf submit_mpi.sh

现在,我们监控作业状态并等待其进入 RUNNING 状态:

$ watch awsbstat -d

在作业进入 RUNNING 状态后,我们可以查看其输出。要显示主节点的输出,请将 #0 附加到作业 ID。要显示计算节点的输出,请使用 #1#2

[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0 2018-11-27 15:50:10: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0 2018-11-27 15:50:10: Initializing the environment... 2018-11-27 15:50:10: Starting ssh agents... 2018-11-27 15:50:11: Agent pid 7 2018-11-27 15:50:11: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa) 2018-11-27 15:50:11: Mounting shared file system... 2018-11-27 15:50:11: Generating hostfile... 2018-11-27 15:50:11: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:26: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:41: Detected 1/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:50:56: Detected 3/3 compute nodes. Waiting for all compute nodes to start. 2018-11-27 15:51:11: Starting the job... download: s3://parallelcluster-awsbatch-tutorial-iwyl4458saiwgwvg/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh 2018-11-27 15:51:12: ip container: 10.0.0.180 2018-11-27 15:51:12: ip host: 10.0.0.245 2018-11-27 15:51:12: Compiling... 2018-11-27 15:51:12: Running... 2018-11-27 15:51:12: Hello I'm the main node! I run the mpi job! 2018-11-27 15:51:12: Warning: Permanently added '10.0.0.199' (RSA) to the list of known hosts. 2018-11-27 15:51:12: Warning: Permanently added '10.0.0.147' (RSA) to the list of known hosts. 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 1 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 5 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 0 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 4 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 2 out of 6 processors 2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 3 out of 6 processors [ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1 2018-11-27 15:50:52: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1 2018-11-27 15:50:52: Initializing the environment... 2018-11-27 15:50:52: Starting ssh agents... 2018-11-27 15:50:52: Agent pid 7 2018-11-27 15:50:52: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa) 2018-11-27 15:50:52: Mounting shared file system... 2018-11-27 15:50:52: Generating hostfile... 2018-11-27 15:50:52: Starting the job... download: s3://parallelcluster-awsbatch-tutorial-iwyl4458saiwgwvg/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh 2018-11-27 15:50:53: ip container: 10.0.0.199 2018-11-27 15:50:53: ip host: 10.0.0.227 2018-11-27 15:50:53: Compiling... 2018-11-27 15:50:53: Running... 2018-11-27 15:50:53: Hello I'm a compute node! I let the main node orchestrate the mpi execution!

我们现在可以确认作业已成功完成:

[ec2-user@ip-10-0-0-111 ~]$ awsbstat -s ALL jobId jobName status startedAt stoppedAt exitCode ------------------------------------ ------------- --------- ------------------- ------------------- ---------- 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d submit_mpi_sh SUCCEEDED 2018-11-27 15:50:10 2018-11-27 15:51:26 -

注意:如果要在作业结束之前终止作业,您可以使用 awsbkill 命令。