Amazon EC2 console output logs

When Amazon ParallelCluster detects that a static compute node instance terminates unexpectedly, it attempts to retrieve the Amazon EC2 console output from the terminated node instance after a period of time elapses. This way, if the compute node was unable to communicate with Amazon CloudWatch, useful troubleshooting information on why the node terminated might still be retrieved from the console output. This console output is recorded in the /var/log/parallelcluster/compute_console_output log on the head node. For more information about the Amazon EC2 console output, see Instance console output in the Amazon EC2 User Guide for Linux Instances.

By default, Amazon ParallelCluster only retrieves the console output from a sample subset of terminated nodes. This prevents the cluster head node from being overwhelmed with multiple console output requests caused by large numbers of terminations. By default, Amazon ParallelCluster waits 5 minutes between termination detection and console output retrieval to give Amazon EC2 time to retrieve the final console output from the nodes.

You can edit the sample size and wait time parameter values in the /etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf file on the head node.

This feature is added in Amazon ParallelCluster version 3.5.0.

Amazon EC2 console output parameters

You can edit the values of the following Amazon EC2 console output parameters in the /etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf file on the head node.

`compute_console_logging_enabled`

To disable console output log collection, set compute_console_logging_enabled to false. The default is true.

You can update this parameter at any time, without stopping the compute fleet.

`compute_console_logging_max_sample_size`

compute_console_logging_max_sample_size sets the maximum number of compute nodes from which Amazon ParallelCluster collects console outputs each time it detects an unexpected termination. If this value is less than 1, Amazon ParallelCluster retrieves the console output from all terminated nodes. The default value is 1.

You can update this parameter at any time, without stopping the compute fleet.

`compute_console_wait_time`

compute_console_wait_time sets the time, in seconds, that Amazon ParallelCluster waits between detecting a node failure and collecting the console output from that node. You can increase the wait time if you determine that Amazon EC2 needs more time to collect the final output from the terminated node. The default value is 300 seconds (5 minutes).

You can update this parameter at any time, without stopping the compute fleet.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

pcluster CLI logs

Retrieve PCUI and Amazon ParallelCluster runtime logs