Running your first job on Amazon ParallelCluster - Amazon ParallelCluster
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Running your first job on Amazon ParallelCluster

This tutorial walks you through running your first Hello World job on Amazon ParallelCluster

When using the Amazon ParallelCluster command line interface (CLI) or API, you only pay for the Amazon resources that are created when you create or update Amazon ParallelCluster images and clusters. For more information, see Amazon services used by Amazon ParallelCluster.

The Amazon ParallelCluster UI is built on a serverless architecture and you can use it within the Amazon Free Tier category for most cases. For more information, see Amazon ParallelCluster UI costs.

Prerequisites

Verifying your installation

First, we verify that Amazon ParallelCluster is correctly, including the Node.js dependency, installed and configured.

$ node --version v16.8.0 $ pcluster version { "version": "3.7.0" }

This returns the running version of Amazon ParallelCluster.

Creating your first cluster

Now it's time to create your first cluster. Because the workload for this tutorial isn't performance intensive, we can use the default instance size of t2.micro. (For production workloads, you choose an instance size that best fits your needs.) Let's call your cluster hello-world.

$ pcluster create-cluster \ --cluster-name hello-world \ --cluster-configuration hello-world.yaml
Note

The Amazon Web Services Region to use must be specified for most pcluster commands. If it's not specified in the AWS_DEFAULT_REGION environment variable, or the region setting in the [default] section of the ~/.aws/config file, then the --region parameter must be provided on the pcluster command line.

If the output gives you a message about configuration, you need to run the following to configure Amazon ParallelCluster:

$ pcluster configure --config hello-world.yaml

If the pcluster create-cluster command succeeds, you see output similar to the following:

{ "cluster": { "clusterName": "hello-world", "cloudformationStackStatus": "CREATE_IN_PROGRESS", "cloudformationStackArn": "arn:aws-cn:cloudformation:xxx:stack/xxx", "region": "...", "version": "...", "clusterStatus": "CREATE_IN_PROGRESS" } }

You monitor the creation of the cluster using:

$ pcluster describe-cluster --cluster-name hello-world

The clusterStatus reports "CREATE_IN_PROGRESS" while the cluster is being created. The clusterStatus transitions to "CREATE_COMPLETE" when the cluster is created successfully. The output also provides us with the publicIpAddress and privateIpAddress of our head node.

Logging into your head node

Use your OpenSSH pem file to log into your head node.

$ pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem

After you log in, run the command sinfo to verify that your compute nodes are set up and configured.

$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST queue1* up infinite 10 idle~ queue1-dy-queue1t2micro-[1-10]

The output shows that we have one queue in our cluster, with up to ten nodes.

Running your first job using Slurm

Next, we create a job that sleeps for a little while and then outputs its own hostname. Create a file called hellojob.sh, with the following contents.

#!/bin/bash sleep 30 echo "Hello World from $(hostname)"

Next, submit the job using sbatch, and verify that it runs.

$ sbatch hellojob.sh Submitted batch job 2

Now, you can view your queue and check the status of the job. The provisioning of a new Amazon EC2 instance is started in the background. You can monitor the status of the cluster instances with the sinfo command.

$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2 queue1 hellojob ec2-user CF 3:30 1 queue1-dy-queue1t2micro-1

The output shows that the job has been submitted to queue1. Wait 30 seconds for the job to finish, and then run squeue again.

$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

Now that there are no jobs in the queue, we can check for output in our current directory.

$ ls -l total 8 -rw-rw-r-- 1 ec2-user ec2-user 57 Sep 1 14:25 hellojob.sh -rw-rw-r-- 1 ec2-user ec2-user 43 Sep 1 14:30 slurm-2.out

In the output, we see a "out" file. We can see output from our job:

$ cat slurm-2.out Hello World from queue1-dy-queue1t2micro-1

The output also shows that our job ran successfully on instance queue1-dy-queue1t2micro-1.

In the cluster you just created, only the home directory is shared among all nodes of the cluster.

To learn more about creating and using clusters, see Best practices.

If your application requires shared software, libraries, or data, consider the following options: