

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 使用 Amazon ParallelCluster 和`awsbatch`调度器运行 MPI 作业
<a name="tutorials_03_batch_mpi"></a>

本教程将指导您完成使用 `awsbatch` 作为计划程序运行 MPI 作业的过程。

**先决条件**
+ Amazon ParallelCluster [已安装](install.md)。
+  Amazon CLI [已安装并配置。](https://docs.amazonaws.cn/cli/latest/userguide/getting-started-install.html)
+ 您拥有 [EC2 密钥对](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/ec2-key-pairs.html)。
+ 您拥有具有运行 [`pcluster`](pcluster.md) CLI 所需的[权限](iam.md#example-parallelcluser-policies)的 IAM 角色。

## 创建 集群
<a name="creating-the-cluster"></a>

首先，我们为使用 `awsbatch` 作为计划程序的集群创建一个配置。确保将 `vpc` 部分和 `key_name` 字段中的缺失数据与配置时创建的资源一起插入。

```
[global]
sanity_check = true

[aws]
aws_region_name = us-east-1

[cluster awsbatch]
base_os = alinux
# Replace with the name of the key you intend to use.
key_name = key-#######
vpc_settings = my-vpc
scheduler = awsbatch
compute_instance_type = optimal
min_vcpus = 2
desired_vcpus = 2
max_vcpus = 24

[vpc my-vpc]
# Replace with the id of the vpc you intend to use.
vpc_id = vpc-#######
# Replace with id of the subnet for the Head node.
master_subnet_id = subnet-#######
# Replace with id of the subnet for the Compute nodes.
# A NAT Gateway is required for MNP.
compute_subnet_id = subnet-#######
```

现在，您可以开始创建集群了。让我们称呼我们的集群*awsbatch-tutorial*。

```
$ pcluster create -c /path/to/the/created/config/aws_batch.config -t awsbatch awsbatch-tutorial
```

创建集群时，您会看到类似以下内容的输出：

```
Beginning cluster creation for cluster: awsbatch-tutorial
Creating stack named: parallelcluster-awsbatch
Status: parallelcluster-awsbatch - CREATE_COMPLETE
MasterPublicIP: 54.160.xxx.xxx
ClusterUser: ec2-user
MasterPrivateIP: 10.0.0.15
```

## 登录到头节点
<a name="logging-into-your-head-node"></a>

[Amazon ParallelCluster Batch CLI](awsbatchcli.md) 命令均可在安装的客户端计算机上使用。 Amazon ParallelCluster 但是，我们将通过 SSH 进入头节点并从那里提交作业。这使我们能够利用头部和运行 Amazon Batch 作业的所有 Docker 实例之间共享的 NFS 卷。

使用您的 SSH pem 文件登录到头节点。

```
$ pcluster ssh awsbatch-tutorial -i /path/to/keyfile.pem
```

登录后，运行命令`awsbqueues`和`awsbhosts`以显示配置的 Amazon Batch 队列和正在运行的 Amazon ECS 实例。

```
[ec2-user@ip-10-0-0-111 ~]$ awsbqueues
jobQueueName                       status
---------------------------------  --------
parallelcluster-awsbatch-tutorial  VALID

[ec2-user@ip-10-0-0-111 ~]$ awsbhosts
ec2InstanceId        instanceType    privateIpAddress    publicIpAddress      runningJobs
-------------------  --------------  ------------------  -----------------  -------------
i-0d6a0c8c560cd5bed  m4.large        10.0.0.235          34.239.174.236                 0
```

如您在输出中看到的，我们有一个正在运行的主机。这是由于我们在配置中为 [`min_vcpus`](cluster-definition.md#min-vcpus) 选择的值导致的。如果要显示有关 Amazon Batch 队列和主机的更多详细信息，请在命令中添加`-d`标志。

## 使用运行你的第一份作业 Amazon Batch
<a name="running-your-first-job-using-aws-batch"></a>

在迁移到 MPI 之前，我们先创建一个虚拟作业，此作业休眠一小段时间，然后输出其自己的主机名，同时问候作为参数传递的名称。

使用以下内容创建名为“hellojob.sh”的文件。

```
#!/bin/bash

sleep 30
echo "Hello $1 from $HOSTNAME"
echo "Hello $1 from $HOSTNAME" > "/shared/secret_message_for_${1}_by_${AWS_BATCH_JOB_ID}"
```

接下来，使用 `awsbsub` 提交作业并验证其是否运行。

```
$ awsbsub -jn hello -cf hellojob.sh Luca
Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2 (hello) has been submitted.
```

查看您的队列并检查该作业的状态。

```
$ awsbstat
jobId                                 jobName      status    startedAt            stoppedAt    exitCode
------------------------------------  -----------  --------  -------------------  -----------  ----------
6efe6c7c-4943-4c1a-baf5-edbfeccab5d2  hello        RUNNING   2018-11-12 09:41:29  -            -
```

输出提供了改作业的详细信息。

```
$ awsbstat 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
jobId                    : 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
jobName                  : hello
createdAt                : 2018-11-12 09:41:21
startedAt                : 2018-11-12 09:41:29
stoppedAt                : -
status                   : RUNNING
statusReason             : -
jobDefinition            : parallelcluster-exampleBatch:1
jobQueue                 : parallelcluster-exampleBatch
command                  : /bin/bash -c 'aws s3 --region us-east-1 cp s3://amzn-s3-demo-bucket/batch/job-hellojob_sh-1542015680924.sh /tmp/batch/job-hellojob_sh-1542015680924.sh; bash /tmp/batch/job-hellojob_sh-1542015680924.sh Luca'
exitCode                 : -
reason                   : -
vcpus                    : 1
memory[MB]               : 128
nodes                    : 1
logStream                : parallelcluster-exampleBatch/default/c75dac4a-5aca-4238-a4dd-078037453554
log                      : https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/batch/job;stream=parallelcluster-exampleBatch/default/c75dac4a-5aca-4238-a4dd-078037453554
-------------------------
```

请注意，作业当前处于 `RUNNING` 状态。请等候 30 秒，以便作业完成，然后再次运行 `awsbstat`。

```
$ awsbstat
jobId                                 jobName      status    startedAt            stoppedAt    exitCode
------------------------------------  -----------  --------  -------------------  -----------  ----------
```

现在，您可以看到作业处于 `SUCCEEDED` 状态。

```
$ awsbstat -s SUCCEEDED
jobId                                 jobName      status     startedAt            stoppedAt              exitCode
------------------------------------  -----------  ---------  -------------------  -------------------  ----------
6efe6c7c-4943-4c1a-baf5-edbfeccab5d2  hello        SUCCEEDED  2018-11-12 09:41:29  2018-11-12 09:42:00           0
```

由于队列中现在没有作业，因此，我们可以通过 `awsbout` 命令检查输出。

```
$ awsbout 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
2018-11-12 09:41:29: Starting Job 6efe6c7c-4943-4c1a-baf5-edbfeccab5d2
download: s3://amzn-s3-demo-bucket/batch/job-hellojob_sh-1542015680924.sh to tmp/batch/job-hellojob_sh-1542015680924.sh
2018-11-12 09:42:00: Hello Luca from ip-172-31-4-234
```

我们可以看到作业在实例“ip-172-31-4-234”上成功运行。

如果您查看 `/shared` 目录，您将找到您的私有消息。

要浏览本教程中未涵盖的所有可用功能，请参阅 [Amazon ParallelCluster Batch CLI 文档](awsbatchcli.md)。在您准备好继续本教程后，我们继续并查看如何提交 MPI 作业。

## 在多节点并行环境中运行 MPI 作业
<a name="running-an-mpi-job-in-a-multi-node-parallel-environment"></a>

当仍然登录到头节点时，在 `/shared` 目录中创建一个名为 `mpi_hello_world.c` 的文件。将以下 MPI 程序添加到该文件中：

```
// Copyright 2011 www.mpitutorial.com
//
// An intro MPI hello world program that uses MPI_Init, MPI_Comm_size,
// MPI_Comm_rank, MPI_Finalize, and MPI_Get_processor_name.
//
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>

int main(int argc, char** argv) {
  // Initialize the MPI environment. The two arguments to MPI Init are not
  // currently used by MPI implementations, but are there in case future
  // implementations might need the arguments.
  MPI_Init(NULL, NULL);

  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  // Print off a hello world message
  printf("Hello world from processor %s, rank %d out of %d processors\n",
         processor_name, world_rank, world_size);

  // Finalize the MPI environment. No more MPI calls can be made after this
  MPI_Finalize();
}
```

现在，将以下代码保存为 `submit_mpi.sh`：

```
#!/bin/bash
echo "ip container: $(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)"
echo "ip host: $(curl -s "http://169.254.169.254/latest/meta-data/local-ipv4")"

# get shared dir
IFS=',' _shared_dirs=(${PCLUSTER_SHARED_DIRS})
_shared_dir=${_shared_dirs[0]}
_job_dir="${_shared_dir}/${AWS_BATCH_JOB_ID%#*}-${AWS_BATCH_JOB_ATTEMPT}"
_exit_code_file="${_job_dir}/batch-exit-code"

if [[ "${AWS_BATCH_JOB_NODE_INDEX}" -eq  "${AWS_BATCH_JOB_MAIN_NODE_INDEX}" ]]; then
    echo "Hello I'm the main node $HOSTNAME! I run the mpi job!"

    mkdir -p "${_job_dir}"

    echo "Compiling..."
    /usr/lib64/openmpi/bin/mpicc -o "${_job_dir}/mpi_hello_world" "${_shared_dir}/mpi_hello_world.c"

    echo "Running..."
    /usr/lib64/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --allow-run-as-root --machinefile "${HOME}/hostfile" "${_job_dir}/mpi_hello_world"

    # Write exit status code
    echo "0" > "${_exit_code_file}"
    # Waiting for compute nodes to terminate
    sleep 30
else
    echo "Hello I'm the compute node $HOSTNAME! I let the main node orchestrate the mpi processing!"
    # Since mpi orchestration happens on the main node, we need to make sure the containers representing the compute
    # nodes are not terminated. A simple trick is to wait for a file containing the status code to be created.
    # All compute nodes are terminated by Amazon Batch if the main node exits abruptly.
    while [ ! -f "${_exit_code_file}" ]; do
        sleep 2
    done
    exit $(cat "${_exit_code_file}")
fi
```

我们现在已准备就绪，可以提交第一个 MPI 作业并使其在 3 个节点上并发运行：

```
$ awsbsub -n 3 -cf submit_mpi.sh
```

现在，我们监控作业状态并等待其进入 `RUNNING` 状态：

```
$ watch awsbstat -d
```

在作业进入 `RUNNING` 状态后，我们可以查看其输出。要显示主节点的输出，请将 `#0` 附加到作业 ID。要显示计算节点的输出，请使用 `#1` 和 `#2`：

```
[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0
2018-11-27 15:50:10: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#0
2018-11-27 15:50:10: Initializing the environment...
2018-11-27 15:50:10: Starting ssh agents...
2018-11-27 15:50:11: Agent pid 7
2018-11-27 15:50:11: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
2018-11-27 15:50:11: Mounting shared file system...
2018-11-27 15:50:11: Generating hostfile...
2018-11-27 15:50:11: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:26: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:41: Detected 1/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:50:56: Detected 3/3 compute nodes. Waiting for all compute nodes to start.
2018-11-27 15:51:11: Starting the job...
download: s3://amzn-s3-demo-bucket/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh
2018-11-27 15:51:12: ip container: 10.0.0.180
2018-11-27 15:51:12: ip host: 10.0.0.245
2018-11-27 15:51:12: Compiling...
2018-11-27 15:51:12: Running...
2018-11-27 15:51:12: Hello I'm the main node! I run the mpi job!
2018-11-27 15:51:12: Warning: Permanently added '10.0.0.199' (RSA) to the list of known hosts.
2018-11-27 15:51:12: Warning: Permanently added '10.0.0.147' (RSA) to the list of known hosts.
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 1 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 5 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-180.ec2.internal, rank 0 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-199.ec2.internal, rank 4 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 2 out of 6 processors
2018-11-27 15:51:13: Hello world from processor ip-10-0-0-147.ec2.internal, rank 3 out of 6 processors

[ec2-user@ip-10-0-0-111 ~]$ awsbout -s 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1
2018-11-27 15:50:52: Job id: 5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d#1
2018-11-27 15:50:52: Initializing the environment...
2018-11-27 15:50:52: Starting ssh agents...
2018-11-27 15:50:52: Agent pid 7
2018-11-27 15:50:52: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
2018-11-27 15:50:52: Mounting shared file system...
2018-11-27 15:50:52: Generating hostfile...
2018-11-27 15:50:52: Starting the job...
download: s3://amzn-s3-demo-bucket/batch/job-submit_mpi_sh-1543333713772.sh to tmp/batch/job-submit_mpi_sh-1543333713772.sh
2018-11-27 15:50:53: ip container: 10.0.0.199
2018-11-27 15:50:53: ip host: 10.0.0.227
2018-11-27 15:50:53: Compiling...
2018-11-27 15:50:53: Running...
2018-11-27 15:50:53: Hello I'm a compute node! I let the main node orchestrate the mpi execution!
```

我们现在可以确认作业已成功完成：

```
[ec2-user@ip-10-0-0-111 ~]$ awsbstat -s ALL
jobId                                 jobName        status     startedAt            stoppedAt            exitCode
------------------------------------  -------------  ---------  -------------------  -------------------  ----------
5b4d50f8-1060-4ebf-ba2d-1ae868bbd92d  submit_mpi_sh  SUCCEEDED  2018-11-27 15:50:10  2018-11-27 15:51:26  -
```

注意：如果要在作业结束之前终止作业，您可以使用 `awsbkill` 命令。