创建集群登录到头节点使用运行你的第一份作业 Amazon Batch 在多节点并行环境中运行 MPI 作业

使用 Amazon ParallelCluster 和`awsbatch`调度器运行 MPI 作业

本教程将指导您完成使用 awsbatch 作为计划程序运行 MPI 作业的过程。

先决条件

Amazon ParallelCluster 已安装。
Amazon CLI 已安装并配置。
你有一个 EC2 key pair。
您拥有具有运行 pcluster CLI 所需的权限的 IAM 角色。

创建集群

首先，我们为使用 awsbatch 作为计划程序的集群创建一个配置。确保将 vpc 部分和 key_name 字段中的缺失数据与配置时创建的资源一起插入。


[global]
sanity_check = true

[aws]
aws_region_name = us-east-1

[cluster awsbatch]
base_os = alinux
# Replace with the name of the key you intend to use.
key_name = key-#######
vpc_settings = my-vpc
scheduler = awsbatch
compute_instance_type = optimal
min_vcpus = 2
desired_vcpus = 2
max_vcpus = 24

[vpc my-vpc]
# Replace with the id of the vpc you intend to use.
vpc_id = vpc-#######
# Replace with id of the subnet for the Head node.
master_subnet_id = subnet-#######
# Replace with id of the subnet for the Compute nodes.
# A NAT Gateway is required for MNP.
compute_subnet_id = subnet-#######

现在，您可以开始创建集群了。让我们称呼我们的集群awsbatch-tutorial。


$ pcluster create -c /path/to/the/created/config/aws_batch.config -t awsbatch awsbatch-tutorial

创建集群时，您会看到类似以下内容的输出：


Beginning cluster creation for cluster: awsbatch-tutorial
Creating stack named: parallelcluster-awsbatch
Status: parallelcluster-awsbatch - CREATE_COMPLETE
MasterPublicIP: 54.160.xxx.xxx
ClusterUser: ec2-user
MasterPrivateIP: 10.0.0.15

登录到头节点

Amazon ParallelCluster Batch CLI 命令均可在安装的客户端计算机上使用。 Amazon ParallelCluster 但是，我们将通过 SSH 进入头节点并从那里提交作业。这使我们能够利用头部和运行 Amazon Batch 作业的所有 Docker 实例之间共享的 NFS 卷。

使用您的 SSH pem 文件登录到头节点。


$ pcluster ssh awsbatch-tutorial -i /path/to/keyfile.pem

登录后，运行命令awsbqueues和awsbhosts以显示配置的 Amazon Batch 队列和正在运行的 Amazon ECS 实例。


[ec2-user@ip-10-0-0-111 ~]$ awsbqueues
jobQueueName                       status
---------------------------------  --------
parallelcluster-awsbatch-tutorial  VALID

[ec2-user@ip-10-0-0-111 ~]$ awsbhosts
ec2InstanceId        instanceType    privateIpAddress    publicIpAddress      runningJobs
-------------------  --------------  ------------------  -----------------  -------------
i-0d6a0c8c560cd5bed  m4.large        10.0.0.235          34.239.174.236                 0

如您在输出中看到的，我们有一个正在运行的主机。这是由于我们在配置中为 min_vcpus 选择的值导致的。如果要显示有关 Amazon Batch 队列和主机的更多详细信息，请在命令中添加-d标志。

使用运行你的第一份作业 Amazon Batch

在迁移到 MPI 之前，我们先创建一个虚拟作业，此作业休眠一小段时间，然后输出其自己的主机名，同时问候作为参数传递的名称。

使用以下内容创建名为“hellojob.sh”的文件。


#!/bin/bash

sleep 30
echo "Hello $1 from $HOSTNAME"
echo "Hello $1 from $HOSTNAME" > "/shared/secret_message_for_${1}_by_${AWS_BATCH_JOB_ID}"

接下来，使用 awsbsub 提交作业并验证其是否运行。


$ awsbsub -jn hello -cf hellojob.sh Luca
Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 (hello) has been submitted.

查看您的队列并检查该作业的状态。


$ awsbstat
jobId                                 jobName      status    startedAt            stoppedAt    exitCode
------------------------------------  -----------  --------  -------------------  -----------  ----------
6efe6c7c-4943-4c1a-baf5-edbfeccab5d2  hello        RUNNING   2018-11-12 09:41:29  -            -

输出提供了改作业的详细信息。


$ awsbstat 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
jobId                    : 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
jobName                  : hello
createdAt                : 2018-11-12 09:41:21
startedAt                : 2018-11-12 09:41:29
stoppedAt                : -
status                   : RUNNING
statusReason             : -
jobDefinition            : parallelcluster-exampleBatch:1
jobQueue                 : parallelcluster-exampleBatch
command                  : /bin/bash -c 'aws s3 --region us-east-1 cp s3://amzn-s3-demo-bucket/batch/job-hellojob_sh-1542015680924.sh /tmp/batch/job-hellojob_sh-1542015680924.sh; bash /tmp/batch/job-hellojob_sh-1542015680924.sh Luca'
exitCode                 : -
reason                   : -
vcpus                    : 1
memory[MB]               : 128
nodes                    : 1
logStream                : parallelcluster-exampleBatch/default/c75dac4a-5aca-4238-a4dd-078037453554
log                      : https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/batch/job;stream=parallelcluster-exampleBatch/default/c75dac4a-5aca-4238-a4dd-078037453554
-------------------------

请注意，作业当前处于 RUNNING 状态。请等候 30 秒，以便作业完成，然后再次运行 awsbstat。


$ awsbstat
jobId                                 jobName      status    startedAt            stoppedAt    exitCode
------------------------------------  -----------  --------  -------------------  -----------  ----------

现在，您可以看到作业处于 SUCCEEDED 状态。


$ awsbstat -s SUCCEEDED
jobId                                 jobName      status     startedAt            stoppedAt              exitCode
------------------------------------  -----------  ---------  -------------------  -------------------  ----------
6efe6c7c-4943-4c1a-baf5-edbfeccab5d2  hello        SUCCEEDED  2018-11-12 09:41:29  2018-11-12 09:42:00           0

由于队列中现在没有作业，因此，我们可以通过 awsbout 命令检查输出。


$ awsbout 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
2018-11-12 09:41:29: Starting Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
download: s3://amzn-s3-demo-bucket/batch/job-hellojob_sh-1542015680924.sh to tmp/batch/job-hellojob_sh-1542015680924.sh
2018-11-12 09:42:00: Hello Luca from ip-172-31-4-234

我们可以看到作业在实例“ip-172-31-4-234”上成功运行。

如果您查看 /shared 目录，您将找到您的私有消息。

要浏览本教程中未涵盖的所有可用功能，请参阅 Amazon ParallelCluster Batch CLI 文档。在您准备好继续本教程后，我们继续并查看如何提交 MPI 作业。

在多节点并行环境中运行 MPI 作业

当仍然登录到头节点时，在 /shared 目录中创建一个名为 mpi_hello_world.c 的文件。将以下 MPI 程序添加到该文件中：


// Copyright 2011 www.mpitutorial.com
//
// An intro MPI hello world program that uses MPI_Init, MPI_Comm_size,
// MPI_Comm_rank, MPI_Finalize, and MPI_Get_processor_name.
//
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>

int main(int argc, char** argv) {
  // Initialize the MPI environment. The two arguments to MPI Init are not
  // currently used by MPI implementations, but are there in case future
  // implementations might need the arguments.
  MPI_Init(NULL, NULL);

  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  // Print off a hello world message
  printf("Hello world from processor %s, rank %d out of %d processors\n",
         processor_name, world_rank, world_size);

  // Finalize the MPI environment. No more MPI calls can be made after this
  MPI_Finalize();
}

现在，将以下代码保存为 submit_mpi.sh：


#!/bin/bash
echo "ip container: $(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)"
echo "ip host: $(curl -s "http://169.254.169.254/latest/meta-data/local-ipv4")"

# get shared dir
IFS=',' _shared_dirs=(${PCLUSTER_SHARED_DIRS})
_shared_dir=${_shared_dirs[0]}
_job_dir="${_shared_dir}/${AWS_BATCH_JOB_ID%#*}-${AWS_BATCH_JOB_ATTEMPT}"
_exit_code_file="${_job_dir}/batch-exit-code"

if [[ "${AWS_BATCH_JOB_NODE_INDEX}" -eq  "${AWS_BATCH_JOB_MAIN_NODE_INDEX}" ]]; then
    echo "Hello I'm the main node $HOSTNAME! I run the mpi job!"

    mkdir -p "${_job_dir}"

    echo "Compiling..."
    /usr/lib64/openmpi/bin/mpicc -o "${_job_dir}/mpi_hello_world" "${_shared_dir}/mpi_hello_world.c"

    echo "Running..."
    /usr/lib64/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --allow-run-as-root --machinefile "${HOME}/hostfile" "${_job_dir}/mpi_hello_world"

    # Write exit status code
    echo "0" > "${_exit_code_file}"
    # Waiting for compute nodes to terminate
    sleep 30
else
    echo "Hello I'm the compute node $HOSTNAME! I let the main node orchestrate the mpi processing!"
    # Since mpi orchestration happens on the main node, we need to make sure the containers representing the compute
    # nodes are not terminated. A simple trick is to wait for a file containing the status code to be created.
    # All compute nodes are terminated by Amazon Batch if the main node exits abruptly.
    while [ ! -f "${_exit_code_file}" ]; do
        sleep 2
    done
    exit $(cat "${_exit_code_file}")
fi

我们现在已准备就绪，可以提交第一个 MPI 作业并使其在 3 个节点上并发运行：


$ awsbsub -n 3 -cf submit_mpi.sh

现在，我们监控作业状态并等待其进入 RUNNING 状态：


$ watch awsbstat -d

在作业进入 RUNNING 状态后，我们可以查看其输出。要显示主节点的输出，请将 #0 附加到作业 ID。要显示计算节点的输出，请使用 #1 和 #2：


[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0
2018-11-27 15:50:10: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0
2018-11-27 15:50:10: Initializing the environment...
2018-11-27 15:50:10: Starting ssh agents...
2018-11-27 15:50:11: Agent pid 7
2018-11-27 15:50:11: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
2018-11-27 15:50:11: Mounting shared file system...
2018-11-27 15:50:11: Generating hostfile...
2018-11-27 15:50:11: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:26: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:41: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:56: Detected 3/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:51:11: Starting the job...
download: s3://amzn-s3-demo-bucket/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh
2018-11-27 15:51:12: ip container: 10.0.0.180
2018-11-27 15:51:12: ip host: 10.0.0.245
2018-11-27 15:51:12: Compiling...
2018-11-27 15:51:12: Running...
2018-11-27 15:51:12: Hello I'm the main node! I run the mpi job!
2018-11-27 15:51:12: Warning: Permanently added '10.0.0.199' (RSA) to the list of known hosts.
2018-11-27 15:51:12: Warning: Permanently added '10.0.0.147' (RSA) to the list of known hosts.
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 1 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 5 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 0 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 4 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 2 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 3 out of 6 processors

[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1
2018-11-27 15:50:52: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1
2018-11-27 15:50:52: Initializing the environment...
2018-11-27 15:50:52: Starting ssh agents...
2018-11-27 15:50:52: Agent pid 7
2018-11-27 15:50:52: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
2018-11-27 15:50:52: Mounting shared file system...
2018-11-27 15:50:52: Generating hostfile...
2018-11-27 15:50:52: Starting the job...
download: s3://amzn-s3-demo-bucket/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh
2018-11-27 15:50:53: ip container: 10.0.0.199
2018-11-27 15:50:53: ip host: 10.0.0.227
2018-11-27 15:50:53: Compiling...
2018-11-27 15:50:53: Running...
2018-11-27 15:50:53: Hello I'm a compute node! I let the main node orchestrate the mpi execution!

我们现在可以确认作业已成功完成：


[ec2-user@ip-10-0-0-111 ~]$ awsbstat -s ALL
jobId                                 jobName        status     startedAt            stoppedAt            exitCode
------------------------------------  -------------  ---------  -------------------  -------------------  ----------
5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d  submit_mpi_sh  SUCCEEDED  2018-11-27 15:50:10  2018-11-27 15:51:26  -

注意：如果要在作业结束之前终止作业，您可以使用 awsbkill 命令。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

构建自定义 Amazon ParallelCluster AMI

使用自定义 KMS 密钥对磁盘加密

使用 Amazon ParallelCluster 和awsbatch调度器运行 MPI 作业