Release notes and document history - Amazon ParallelCluster
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Release notes and document history

The following table describes the major updates and new features for the Amazon ParallelCluster User Guide. We also update the documentation frequently to address the feedback that you send us.

ChangeDescriptionDate

Amazon ParallelCluster UI version 2024.07.1 released

We're excited to announce the release of Amazon ParallelCluster UI version 2024.07.1.

Changes:

  • Add support for Amazon ParallelCluster 3.10.1.

Bug fixes:

  • Fixed a bug that was breaking the rendering of job accounting info.

  • Fixed a bug in the feature flagging mechanism that was disabling all PC 3.2.0+ features on PC 3.10.0+.

Security:

See the full changelog.

July 24, 2024

Amazon ParallelCluster version 3.10.1 released

We're excited to announce the release of Amazon ParallelCluster 3.10.1.

Bug fix:

  • Fix image build failure in China regions.

July 8, 2024

Amazon ParallelCluster UI version 2024.07.0 released

We're excited to announce the release of Amazon ParallelCluster UI version 2024.07.0.

Features:

  • Added support for Amazon ParallelCluster version 3.10.0.

July 2, 2024

Amazon ParallelCluster version 3.10.0 released

We're excited to announce the release of Amazon ParallelCluster 3.10.0

To upgrade, type sudo pip install --upgrade aws-parallelcluster.

Enhancements:

  • Add new configuration section Scheduling/SlurmSettings/ExternalSlurmdbd to connect the cluster to an external Slurmdbd.

  • Allow build-image to be run in an isolated network.

  • Add support for Amazon Linux 2023.

  • Add support for price-capacity-optimized as an AllocationStrategy.

  • Add validator to prevent the use of Placement Groups with Capacity Blocks.

Changes:

  • CentOS 7 is no longer supported.

  • Upgrade Cinc Client to version to 18.4.12 from 18.2.7.

  • Upgrade munge to version 0.5.16 (from 0.5.15).

  • Upgrade Pmix to 5.0.2 (from 4.2.9).

  • Upgrade third-party cookbook dependencies:

    • apt-7.5.22 (from apt-7.5.14)

    • openssh-2.11.12 (from openssh-2.11.3)

  • Remove third-party cookbook: selinux-6.1.12.

  • Upgrade EFA installer to 1.32.0.

    • Efa-driver: efa-2.8.0-1

    • Efa-config: efa-config-1.16-1

    • Efa-profile: efa-profile-1.7-1

    • Libfabric-aws: libfabric-aws-1.21.0-1

    • Rdma-core: rdma-core-50.0-1

    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.2-12

  • Upgrade NVIDIA driver to version 535.183.01 (from 535.154.05).

  • Upgrade Python to 3.9.19 (from 3.9.17).

  • Upgrade Intel MPI Library to 2021.12.1.8 (from 2021.9.0.43482).

Bug fixes:

  • Fix Data Repository Associations configuration to make AutoExportPolicy and AutoImportPolicy optional.

  • Fixed an issue during cluster deletion that now completes compute fleet cleanup when instances are either in shutting-down or terminated state. This is to avoid cluster deletion failures for instance types with longer termination cycles.

  • Allow cloudwatch dashboard to be enabled and alarms to be disabled in the Monitoring section of the cluster config.

  • Allow ParallelCluster Custom Resource to suppress validators using PclusterCluster/SuppressValidators.

  • Removed /etc/profile.d/pcluster.sh so that it's not executed at every user login and cfn_bootstrap_virtualenv isn't added in PATH environment variable.

  • Fix ParallelCluster API spec by replacing field failureReason with failures in DescribeCluster response.

  • Fix ParallelCluster API spec by adding the CloudFormation stack status that were missing: IMPORT_*, REVIEW_IN_PROGRESS, and UPDATE_FAILED.

  • Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.

  • Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when EFS is used for shared internal data.

  • On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the configuration coming from Parallelcluster.

  • Fix image build failure with RHEL 8.10 or newer.

June 27, 2024

Terraform Provider for Amazon ParallelCluster 1.0.0 released

We're excited to announce the release of Terraform Provider for Amazon ParallelCluster 1.0.0.

Features:

June 26, 2024

Terraform Module for Amazon ParallelCluster 1.0.0 released

We're excited to announce the release of Terraform Module for Amazon ParallelCluster 1.0.0.

Features:

June 26, 2024

Amazon ParallelCluster version 3.9.3 released

We're excited to announce the release of Amazon ParallelCluster 3.9.3

To upgrade, type sudo pip install --upgrade aws-parallelcluster

Features:

  • Added support for FSx Lustre as a shared storage type in us-iso-east-1.

Bug fixes:

  • Remove cloud_dns from the SlurmctldParameters in the Slurm config to avoid Slurm fanout issues.

    This isn't required, since we set the IP addresses on instance launch.

June 19, 2024

Amazon ParallelCluster version 3.9.2 released

We're excited to announce the release of Amazon ParallelCluster 3.9.2

Features:

  • Upgrade Slurm to 23.11.7 (from 23.11.4).

  • For more details, see the CHANGELOG 3.9.2 on GitHub.

May 28, 2024

Amazon ParallelCluster UI version 2024.05.0 released

Amazon ParallelCluster UI version 2024.05.0 released.

Bug Fixes:

  • Fixed a bug in the frontend blocking the UI when the user opens the Job Status panel.

  • Full Changelog

May 14, 2024

Amazon ParallelCluster UI version 2024.04.0 released

Amazon ParallelCluster UI version 2024.04.0 released.

Features:

  • Added support for Amazon ParallelCluster version 3.9.1

  • Full Changelog

April 17, 2024

Amazon ParallelCluster version 3.9.1 released

We're excited to announce the release of Amazon ParallelCluster 3.9.1

To upgrade, enter the following: sudo pip install --upgrade aws-parallelcluster

Bug fixes

  • Remove recursive deletion of shared storage mountdir when unmounting filesystems as part of update-cluster operation.

April 11, 2024

Amazon ParallelCluster version 3.9.1 released

We're excited to announce the release of Amazon ParallelCluster 3.9.1

To upgrade, enter the following: sudo pip install --upgrade aws-parallelcluster

Bug fixes

  • Remove recursive deletion of shared storage mountdir when unmounting filesystems as part of update-cluster operation.

April 11, 2024

Amazon ParallelCluster UI version 2024.03.0 released

Amazon ParallelCluster UI version 2024.03.0 released.

Features:

  • Added support for Amazon ParallelCluster version 3.9.0

  • Added support for Ubuntu 22.04 and Red Hat Enterprise Linux 9

  • Deprecated Ubuntu 18.04

Bugfixes

  • Fixed issue causing some clusters to not appear when using many clusters

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

March 12, 2024

Amazon ParallelCluster version 3.9.0 released

We're excited to announce the release of Amazon ParallelCluster 3.9.0

To upgrade, enter the following: sudo pip install --upgrade aws-parallelcluster

Enhancements:

  • Add the configuration parameter DeploymentSettings/DefaultUserHome to allow users to move the default user's home directory to /local/home instead of /home (default).

  • Permit to update MinCount, MaxCount, Queue and ComputeResource configuration parameters without the need to stop the compute fleet. It's now possible to update them by setting Scheduling/SlurmSettings/QueueUpdateStrategy to TERMINATE. Amazon ParallelCluster will terminate only the nodes removed during a resize of the cluster capacity performed through a cluster update.

  • Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap, FsxOpenZfs and FileCache without replacing the compute and login fleet.

  • Add support for RHEL9.

  • Add support for Rocky Linux 9 as CustomAmi created through build-image process. No public official Amazon ParallelCluster Rocky9 Linux AMI is made available at this time.

  • Remove CommunicationParameters from the Custom Slurm Settings deny list.

  • Add DeploymentSettings/DisableSudoAccessForDefaultUser parameter to disable sudo access of default user in supported OSes.

  • Changes to FSx for Lustre file systems created by ParallelCluster: Change the Lustre server version to 2.15.

  • Add possibility to choose between Open and Closed Source Nvidia Drivers when building an AMI, through the ['cluster']['nvidia']['kernel_open'] cookbook node attribute.

  • * Add a clustermgtd config option ec2_instance_missing_max_count to allow a configurable amount of retries for eventual Amazon EC2 describe instances consistency with run instances.

Changes

  • Upgrade Slurm to 23.11.4 (from 23.02.7).

  • Upgrade NVIDIA driver to version 535.154.05.

  • Add support for Python 3.11, 3.12 in pcluster CLI and aws-parallelcluster-batch-cli.

  • Build network interfaces using network card index from NetworkCardIndex list of Amazon EC2 DescribeInstances response, instead of looping over MaximumNetworkCards range.

  • Fail cluster creation when using instance types P3, G3, P2 and G2 because their GPU architecture is not compatible with Open Source Nvidia Drivers (OpenRM) introduced as part of 3.8.0 release.

  • Upgrade third-party cookbook dependencies: nfs-5.1.2 (from nfs-5.0.0)

  • Upgrade EFA installer to 1.30.0.

    • Efa-driver: efa-2.6.0-1

    • Efa-config: efa-config-1.15-1

    • Efa-profile: efa-profile-1.6-1

    • Libfabric-aws: libfabric-aws-1.19.0

    • Rdma-core: rdma-core-46.0-1

    • Open MPI: openmpi40-aws-4.1.6-2 and openmpi50-aws-5.0.0-11

  • Upgrade NICE DCV to version 2023.1-16388.

    • server: 2023.1.16388-1

    • xdcv: 2023.1.565-1

    • gl: 2023.1.1047-1

    • web_viewer: 2023.1.16388-1

Bug fixes

  • Fix issue making job fail when submitted as active directory user from login nodes. The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.

  • Refactor IAM policies defined in CloudFormation template parallelclutser-policies.yaml to prevent ParallelCluster API deployment failure caused by policies exceeding IAM limits.

  • Fix issue making login nodes fail to bootstrap when the head node takes more time than expected in writing keys.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

March 5, 2024

Amazon ParallelCluster UI version 2024.02.0 released

Amazon ParallelCluster UI version 2024.02.0 released

Changes:

  • Updated the Lambda runtime environment to Python v3.9

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

February 8, 2024

Amazon ParallelCluster UI version 2023.12.0 released

Amazon ParallelCluster UI version 2023.12.0 released.

Features:

  • Added support for PCUI deployment with private networking.

  • Added possibility to optionally apply a Permissions Boundary to every IAM role created by the PCUI and PCAPI infrastructures

  • Added possibility to optionally apply a prefix to every IAM role and policy created by the PCUI and PCAPI infrastructure.

  • Added support for ParallelCluster version 3.8.0, without feature parity in the wizard.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

December 21, 2023

Amazon ParallelCluster version 3.8.0 released

Amazon ParallelCluster version 3.8.0 released.

Enhancements:

  • Add support for Amazon EC2 Capacity Blocks for ML.

  • Add support for Rocky Linux 8 as CustomAmi created through build-image process. No public official Amazon ParallelCluster Rocky8 Linux AMI is made available at this time.

  • Add Scheduling/ScalingStrategy parameter to control the cluster scaling strategy to use when launching Amazon EC2 instances for Slurm compute nodes. Possible values are all-or-nothing, greedy-all-or-nothing, best-effort, with all-or-nothing being the default.

  • Add HeadNode/SharedStorageType parameter to use EFS storage instead of NFS exports from the head node root volume for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and /home data. This enhancement reduces the load on the head node networking.

  • Allow for mounting /home as an EFS or FSx external shared storage via the SharedStorage section of the config file.

  • Add new parameter SlurmSettings/MungeKeySecretArn to permit to use an external user-defined MUNGE key from Amazon Secrets Manager.

  • Add Monitoring/Alarms/Enabled parameter to toggle Amazon CloudWatch Alarms for the cluster.

  • Add head node alarms to monitor Amazon EC2 health checks, CPU utilization and the overall status of the head node, and add them to the CloudWatch Dashboard created with the cluster.

  • Add support for Data Repository Associations when using PERSISTENT_2 as DeploymentType for a managed FSx for Lustre.

  • Add Scheduling/SlurmSettings/Database/DatabaseName parameter to allow users to specify a custom name for the database on the database server to be used for Slurm accounting.

  • Make InstanceType an optional configuration parameter when configuring CapacityReservationTarget/CapacityReservationId in the compute resource.

  • Add possibility to specify a prefix for IAM roles and policies created by Amazon ParallelCluster API.

  • Add possibility to specify a permissions boundary to be applied for IAM roles and policies created by Amazon ParallelCluster API.

Changes

  • Upgrade Slurm to 23.02.7 (from 23.02.6).

  • Upgrade NVIDIA driver to version 535.129.03.

  • Upgrade CUDA Toolkit to version 12.2.2.

  • Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.

  • Remove support of all_or_nothing_batch configuration parameter in the Slurm resume program, in favor of the new Scheduling/ScalingStrategy cluster configuration.

  • Changed cluster alarms naming convention to '[cluster-name]-[component-name]-[metric]'.

  • Change default EBS volume types in ADC regions from gp2 to gp3, for both the root and additional volumes.

  • The optional permissions boundary for the Amazon ParallelCluster API is now applied to every IAM role created by the API infrastructure.

    • Upgrade EFA installer to 1.29.1.

    • Efa-driver: efa-2.6.0-1

    • Efa-config: efa-config-1.15-1

    • Efa-profile: efa-profile-1.5-1

    • Libfabric-aws: libfabric-aws-1.19.0-1

    • Rdma-core: rdma-core-46.0-1

    • Open MPI: openmpi40-aws-4.1.6-1

  • Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.

  • Upgrade aws-cfn-bootstrap to version 2.0-28.

  • Add support for Python 3.10 in aws-parallelcluster-batch-cli.

Bug fixes

  • Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.

  • Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.

  • Fix disabling Slurm power save mode when setting ScaledownIdletime = -1.

  • Fix hard-coded path to Slurm installation dir in update_slurm_database_password.sh script for Slurm Accounting.

December 19, 2023

Amazon ParallelCluster version 3.7.2 released

Amazon ParallelCluster version 3.7.2 released.

Changes:

  • Upgrade Slurm to 23.02.6.

October 25, 2023

Amazon ParallelCluster UI version 2023.10.0 released

Amazon ParallelCluster UI version 2023.10.0 released.

Features:

  • Added support for ParallelCluster 3.7.2 with feature parity in the wizard limited to FSx File Cache and memory based scheduling compatibility with multiple instance types.

Bug fixes:

  • Fixed issue causing UI errors when PCUI does not have permissions to interact with Cost Explorer.

Improvements

  • Improved security by reducing the access token TTL from 10 minutes to 5 minutes.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

October 20, 2023

Amazon ParallelCluster version 3.7.1 released

Amazon ParallelCluster version 3.7.1 released.

Changes:

  • Upgrade Slurm to 23.02.5 (from 23.02.4).

    • Upgrade Pmix to 4.2.6 (from 3.2.3).

    • Upgrade libjwt to 1.15.3 (from 1.12.0).

  • Upgrade EFA installer to 1.26.1, fixing RDMA writedata issue in P5.

    • Efa-driver: efa-2.5.0-1.

    • Efa-config: efa-config-1.15-1.

    • Efa-profile: efa-profile-1.5-1.

    • Libfabric-aws: libfabric-aws-1.18.2-1.

    • ERdma-core: rdma-core-46.0-1.

    • Open MPI: openmpi40-aws-4.1.5-4.

September 22, 2023

Amazon ParallelCluster version 3.7.0 released

Amazon ParallelCluster version 3.7.0 released.

Enhancements:

  • Support configuration of static and dynamic node priorities in compute resources by using a Amazon ParallelCluster configuration YAML file.

  • Add support for Ubuntu 22. RSA keys are not supported by default.

  • Add the queue configuration setting JobExclusiveAllocation to allocate nodes in a partition exclusively to a single job at any given time.

  • Allow Override aws-parallelcluster-node package at cluster create and cluster update time. For the head node, this applies for cluster update. Useful for development purposes only.

  • Avoid NFS server start on compute nodes.

  • Add support for log-in nodes.

  • Allow memory-based scheduling when multiple instance types are specified for a Slurm Compute Resource.

  • Add support to mount existing Amazon File Cache as shared storage.

Changes:

  • Assign Slurm dynamic nodes a priority (weight) of 1000 by default. By doing this, Slurm can prioritize idle static nodes over idle dynamic nodes.

  • Make aws-parallelcluster-node daemons only handle Amazon ParallelCluster managed Slurm partitions.

  • Increase EFS-utils watchdog poll interval to 10 seconds. This change applies when EncryptionInTransit is set to true, which is the only condition that causes the watchdog to run.

  • Upgrade the EFA installer to 1.25.1.

    • Efa-driver: efa-2.5.0-1 (from efa-2.1.1g)

    • Efa-config: efa-config-1.15-1 (from efa-config-1.13-1)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.18.1-0 (from libfabric-aws-1.17.1-1)

    • Rdma-core: rdma-core-46.0-1 (from rdma-core-43.0-1)

    • Open MPI: openmpi40-aws-4.1.5-4 (from openmpi40-aws-4.1.5-1)

  • Upgrade Slurm to version 23.02.4.

  • Change the default value of Imds/ImdsSupport from v1.0 to v2.0.

  • Deprecate Ubuntu 18.

  • Update the default root volume size to 40 GB to account for limits on Centos 7.

  • Restrict permission on file /tmp/wait_condition_handle.txt within the head node so that only root can read it.

  • Create a Slurm partition-nodelist mapping JSON file to be used by the node package daemons to recognize PC-managed Slurm partitions and nodelists.

  • Upgrade NVIDIA driver to version 535.54.03.

  • Upgrade CUDA library to version 12.2.0.

  • Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.

  • Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.

  • Upgrade NICE DCV to version 2023.0-15487.

    • Server: 2023.0.15487-1

    • xdcv: 2023.0.551-1

    • gl: 2023.0.1039-1

    • web_viewer: 2023.0.15487-1

Bug fixes:

  • Add validation to the ScaledownIdletime value, to prevent setting a value lower than -1.

  • Fix cluster create failure with Ubuntu Deep Learning AMI on GPU instances with DCV enabled.

  • Fix issue causing dangling IAM policies to be created when creating ParallelCluster CloudFormation custom resource provider with CustomLambdaRole.

  • Fix an issue that was causing misalignment of compute nodes DNS name on instances with multiple network interfaces, when using SlurmSettings/Dns/UseEc2Hostnames equals to True

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

August 30, 2023

Documentation only release

Amazon ParallelCluster version 3 specific user guide published.

Documentation only release:

  • Amazon ParallelCluster version 3 has its own separate user guide.

July 17, 2023

Amazon ParallelCluster version 3.6.1 released

Amazon ParallelCluster version 3.6.1 released.

Changes:

  • Avoid duplication of nodes seen by clustermgtd if compute nodes are added to multiple Slurm partitions.

Bug fixes:

  • Remove hard coding of root volume device name (/dev/sda1 and /dev/xvda) and retrieve it from the AMI(s) used during create-cluster.

  • Fix cluster create failure when using CloudFormation custom resource with ElasticIp set to True.

  • Fix cluster create and update failures when using a Amazon CloudFormation custom resource with large configuration files.

  • Fix an issue that prevented ptrace protection from being disabled on Ubuntu and that didn't permit Cross Memory Attach (CMA) in libfabric.

  • Fix fast insufficient capacity fail-over logic when using multiple instance types and no instances are returned.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

July 5, 2023

Amazon ParallelCluster UI version 2023.06.0 released

Amazon ParallelCluster UI version 2023.06.0 released.

Changes:

  • Upgraded the default Amazon ParallelCluster API version to 3.6.0.

Bug fixes:

  • Fixed broken deployment for Amazon GovCloud (US-West) Region.

  • Split panel now correctly loads cluster details after creation has started.

Notes:

  • The Cost Monitoring feature is not available in Amazon GovCloud (US) Regions.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

June 7, 2023

Amazon ParallelCluster version 3.6.0 released

Amazon ParallelCluster version 3.6.0 released.

Documentation:

Enhancements:

  • Add support for RHEL8.

  • Add an Amazon CloudFormation custom resource for creating and managing clusters with CloudFormation.

  • Add support for customizing the cluster Slurm configuration in the Amazon ParallelCluster configuration YAML file.

  • Build Slurm with support for LUA.

  • Increase the limit on the maximum number of queues per cluster from 10 to 50. Each queue can have up to 50 compute resources. Each cluster can have up to 50 compute resources.

  • Add support for specifying a sequence of multiple custom action scripts for an event configured in OnNodeStart, OnNodeConfigured, and OnNodeUpdated parameters.

  • Add new configuration section HealthChecks / Gpu, for applying GPU health checks on a compute node before a job is run.

  • Add support for Tags in the SlurmQueues and SlurmQueues / ComputeResources configuration.

  • Add support for DetailedMonitoring in the Monitoring configuration.

  • Add mem_used_percent and disk_used_percent metrics for head node memory and root volume disk utilization tracking in the Amazon ParallelCluster CloudWatch dashboard, and set up alarms for monitoring these metrics.

  • Add log rotation support for Amazon ParallelCluster managed logs.

  • Track common compute node errors and dynamic node longest idle time in the CloudWatch Dashboard.

  • Enforce the DCV Authenticator Server to use at least TLS-1.2 protocol when creating the SSL Socket.

  • Install the NVIDIA Data Center GPU Manager (DCGM) package on all supported operating systems except aarch64 centos7 and alinux2.

  • Load the kernel module nvidia-uvm by default to provide Unified Virtual Memory (UVM) functionality to the CUDA driver.

  • Install the NVIDIA Persistence Daemon as a system service.

Changes:

  • Upgrade Slurm to version 23.02.2 (from version 22.05.8).

  • Upgrade munge to version 0.5.15 (from version 0.5.14).

  • Set the Slurm TreeWidth to 30.

  • Set the Slurm prolog and epilog configurations to target directory /opt/slurm/etc/scripts/prolog.d/ and /opt/slurm/etc/scripts/epilog.d/ respectively.

  • Set Slurm BatchStartTimeout to 3 minutes maximum for running Prolog scripts during compute node registration.

  • Increase the default RetentionInDays of CloudWatch logs from 14 to 180 days.

  • Upgrade the EFA installer to 1.22.1.

    • Dkms: 2.8.3-2

    • Efa-driver: efa-2.1.1g (no change)

    • Efa-config: efa-config-1.13-1 (no change)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.17.1-1 (from libfabric-aws-1.17.0-1)

    • Rdma-core: rdma-core-43.0-1 (no change)

    • Open MPI: openmpi40-aws-4.1.5-1 (no change)

  • Upgrade the Lustre client version to 2.12 on Amazon Linux 2. Lustre client 2.12 has been installed on Ubuntu 20.04, 18.04, and CentOS >= 7.7.

  • Upgrade the Lustre client version to 2.10.8 on CentOS 7.6.

  • Upgrade the NVIDIA driver to version 470.182.03 (from version 470.141.03).

  • Upgrade the NVIDIA Fabric Manager to version 470.182.03 (from version 470.141.03).

  • Upgrade the NVIDIA CUDA Toolkit to version 11.8.0 (from version 11.7.1).

  • Upgrade the NVIDIA CUDA sample to version 11.8.0.

  • Upgrade the Intel MPI Library to Version 2021 Update 9 (from Version 2021 Update 6). For more information, see Intel® MPI Library 2021 Update 9.

  • Upgrade NICE DCV to version 2023.0-15022 (from version 2022.2-14521).

    • server: 2023.0.15022-1 (from version 2022.2-14521-1).

    • xdcv: 2023.0.547-1 (from version 2022.2.519-1).

    • gl: 2023.0.1027-1 (from version 2022.2.1012-1).

    • web_viewer: 2023.0.15022-1 (from version 2022.2.14521-1).

  • Upgrade aws-cfn-bootstrap to version 2.0-24.

  • Upgrade image used by the CodeBuild environment when building container images for Amazon Batch clusters:

    • aws/codebuild/amazonlinux2-x86_64-standard:4.0 (from aws/codebuild/amazonlinux2-x86_64-standard:3.0).

    • aws/codebuild/amazonlinux2-aarch64-standard:2.0 (from aws/codebuild/amazonlinux2-aarch64-standard:1.0).

Bug fixes:

  • Fix Amazon EFS and Amazon FSx network security group validators to avoid reporting false errors.

  • Fix missing tagging of resources created by Image Builder during the build-image operation.

  • Fix update policy for MaxCount to always perform numerical comparisons on the MaxCount property.

  • Fix IP alignment on compute node instances with multiple network cards.

  • Fix replacement of StoragePass in the slurm_parallelcluster_slurmdbd.conf when a queue parameter update is performed and the Slurm accounting configurations are not updated.

  • Fix issue that causes dangling security groups to be created when creating a cluster with an existing EFS file system.

  • Fix issue causing the cfn-hup daemon to fail when it gets restarted.

  • Consider dynamic nodes with INVALID_REG flag as bootstrap failures for Slurm protected mode. Static nodes failing Slurm registration are already treated as bootstrap failures after the node_replacement_timeout.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

May 22, 2023

Amazon ParallelCluster UI version 2023.05.0 released

Amazon ParallelCluster UI version 2023.05.0 released.

Enhancements:

  • Starting with Amazon ParallelCluster version 3.6.0, add support for RHEL 8.

  • Add cluster cost monitoring.

  • Starting with Amazon ParallelCluster version 3.6.0, increase queue and compute resource quotas.

Changes:

  • Improved the cluster creation wizard user interface.

  • Increased the speed of Amazon ParallelCluster UI deployment.

  • Improved the interface for adding a new user.

  • Queues are in the head node subnet by default.

Bug fixes:

  • Switch to the correct region after cluster creation completes.

  • Fix the loading indicator display in the "Edit cluster" feature.

  • Fix cluster creation when the EBS SnapshotId property is removed.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

May 16, 2023

Amazon ParallelCluster UI version 2023.04.0 released

Amazon ParallelCluster UI version 2023.04.0 released.

Enhancements:

  • Cluster create wizard re-design.

  • Cluster logs page re-design.

  • Add custom name setting for shared storage.

  • Add multiple storage selection when adding storage to a cluster.

  • Add DeletionPolicy support for Amazon EFS and FSx for Lustre.

  • Add ImdsSupport setting in cluster configuration.

  • Add support for C7 instance types.

  • Added tutorial Reverting to a previous Amazon Systems Manager document version.

Changes:

  • Cluster configuration YAML up to 1MB in size.

  • User isn't logged out due to an authorization with Boto3 IAM temporary credentials.

  • Disabled multi-threading options when an HPC instance is selected.

  • Removed disable rollback on cluster create page.

  • User is prevented from using the Amazon ParallelCluster UI until the required information is provided.

  • Up to 10 queues can be added.

  • The SSM-SessionManagerRunShell document is not overwritten during Amazon ParallelCluster UI installation.

Bug fixes:

  • Fix broken reset password link.

  • Fix broken delete stack caused by EcrPrivateRepository not being empty

  • Fixed initialization issue of the Generate SSH Keys check-box in Multiple user management properties section.

  • Fixed crash caused be a job with undefined properties.

  • Fixed SCRATCH FSx settings.

  • Fixed Start and Stop instances button, still enabled after being clicked once.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub.

April 17, 2023

Amazon ParallelCluster version 3.5.1 released

Amazon ParallelCluster version 3.5.1 released.

Enhancements:

Changes:

  • Upgrade EFA installer to 1.22.0.

    • Efa-driver: efa-2.1.1g (from efa-2.1.1-1)

    • Efa-config: efa-config-1.13-1 (from efa-config-1.12-1)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.17.0-1 (from libfabric-aws-1.16.1amzn3.0-1)

    • Rdma-core: rdma-core-43.0-1 (no change)

    • Open MPI: openmpi40-aws-4.1.5-1 (from openmpi40-aws-4.1.4-3)

    Upgrade NICE DCV to version 2022.2-14521.

    • server: 2022.2.14521-1

    • xdcv: 2022.2.519-1

    • gl: 2022.2.1012-1

    • web_viewer: 2022.2.14521-1

Bug fixes:

  • Fix potential node launch failures caused by pattern matching between MountDir and /etc/exports when removing shared Amazon EBS volumes as part of a cluster update.

  • Fix to prevent compute_console_output log file truncation at every clustermgtd iteration.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

March 29, 2023

Amazon ParallelCluster version 3.5.0 released

Amazon ParallelCluster version 3.5.0 released.

Enhancements:

  • Access and manage clusters with the Amazon ParallelCluster UI.

  • Add versioned Amazon ParallelCluster policies in a CloudFormation template that you can reference in your workloads.

  • Add a Amazon ParallelCluster Python library that you can use with your own code.

  • Add logging of compute node console output to Amazon CloudWatch on compute node bootstrap failure.

  • Add failures field containing failure code and reason to describe-cluster output when cluster creation fails.

  • Add validators to prevent malicious string injection while calling the subprocess module.

  • Fail cluster creation if cluster status changes to PROTECTED while provisioning static nodes.

Changes:

  • Upgrade to Slurm version 22.05.8 (from version 22.05.7)

  • Upgrade EFA installer to 1.21.0.

    • Efa-driver: efa-2.1.1-1 (from efa-2.1)

    • Efa-config: efa-config-1.12-1 (from efa-config-1.11-1)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.16.1amzn3.0-1 (from libfabric-aws-1.16.1)

    • Rdma-core: rdma-core-43.0-1 (from rdma-core-43.0-2)

    • Open MPI: openmpi40-aws-4.1.4-3 (no change)

  • Make Slurm controller logs more verbose and enable additional logging for the Slurm power save plugin.

Bug fixes:

  • Fix cluster database creation by verifying that the cluster name is not longer than 40 characters when Slurm accounting is enabled.

  • Fix an issue in clustermgtd that caused compute nodes, rebooted through Slurm, to be replaced if the Amazon EC2 instance status checks fail.

  • Fix an issue that prevented compute nodes, with capacity reservations shared by other accounts, from launching because of an incorrect IAM policy on the head node.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, aws-parallelcluster-node, and aws-parallelcluster-ui packages on GitHub.

February 20, 2023

Amazon ParallelCluster version 3.4.1 released

Amazon ParallelCluster version 3.4.1 released.

Bug fixes:

  • Fix a Slurm scheduler issue that could cause the incorrect application of updates to its internal registry of compute nodes. As a result if this issue, EC2 instances could become unavailable or could be backed by an incorrect instance type.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

January 13, 2023

Amazon ParallelCluster version 3.4.0 released

Amazon ParallelCluster version 3.4.0 released.

Enhancements:

  • Add support for launching nodes across multiple availability zones to increase capacity availability.

  • Add support for specifying multiple subnets for each queue to increase capacity availability.

  • Add new configuration parameter in Iam / ResourcePrefix to specify a prefix for path and name of IAM resources created by Amazon ParallelCluster.

  • Add new configuration section DeploymentSettings / LambdaFunctionsVpcConfig for specifying the Vpc config used by Amazon ParallelCluster Lambda functions.

  • Add the ability to specify a custom script to run in the head node during a cluster update. The script can be specified with HeadNode / CustomActions / OnNodeUpdated when using Slurm as scheduler.

Changes:

  • Remove creation of Amazon EFS mount targets for existing file systems.

  • Mount EFS file systems using amazon-efs-utils. EFS files systems can be mounted using in-transit encryption and an IAM authorized user.

  • Install stunnel 5.67 on CentOS7 and Ubuntu to support EFS in-transit encryption.

  • Upgrade EFA installer to 1.20.0 (from 1.18.0).

    • Efa-driver: efa-2.1 (from efa-1.16.0-1)

    • Efa-config: efa-config-1.11-1 (no change)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.16.1 (from libfabric-aws-1.16.0~amzn4.0-1)

    • Rdma-core: rdma-core-43.0-2 from (rdma-core-41.0-2)

    • Open MPI: openmpi40-aws-4.1.4-3 from (openmpi40-aws-4.1.4-2)

  • Upgrade Slurm to version 22.05.7 (from 22.05.5).

  • Upgrade Python to 3.9.16 and 3.7.16. (from 3.9.15 and 3.7.13).

  • With Slurm 22.05.7, dynamic nodes in IDLE+CLOUD+COMPLETING+POWER_DOWN+NOT_RESPONDING status aren't considered unhealthy.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

December 22, 2022

Amazon ParallelCluster version 3.3.1 released

Amazon ParallelCluster version 3.3.1 released.

Changes:

  • Official Amazon ParallelCluster product AMIs are now available after Amazon EC2 deprecation at two years.

  • Increase memory size of the Amazon ParallelCluster API Lambda to 2048 in order to reduce cold start penalties and avoid timeouts.

Bug fixes:

  • Prevent replacement of managed FSx for Lustre file systems and loss of data on cluster updates that include changes to the compute fleet subnet ID.

  • SharedStorage DeletionPolicy applies to cluster update actions.

For details of the changes, see the CHANGELOG file for the aws-parallelcluster package on GitHub.

December 2, 2022

Amazon ParallelCluster documentation only hpc6id note

Amazon ParallelCluster documentation-only update

  • Amazon ParallelCluster doesn't support the hpc6id instance type for the HeadNode / InstanceType setting.

December 2, 2022

Amazon ParallelCluster version 3.1.5 released

Amazon ParallelCluster version 3.1.5 released.

Enhancements:

  • Fix Slurm issue that prevents idle nodes termination.

  • Upgrade EFA installer to 1.18.0

    • Efa-driver: efa-1.16.0-1

    • Efa-config: efa-config-1.11-1 (from efa-config-1.9-1)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1 (from libfabric-1.13.2).

    • Rdma-core: rdma-core-41.0-2 (from rdma-core-37.0)

    • Open MPI: openmpi40-aws-4.1.4-2 (from openmpi40-aws-4.1.1-2)

Changes:

  • Add lambda:ListTags and lambda:UntagResource to the ParallelClusterUserRole used by the Amazon ParallelCluster API stack for a cluster update.

  • Upgrade Intel MPI Library to Version 2021 Update 6 (from Version 2021 Update 4). For more information, see Intel® MPI Library 2021 Update 6.

  • Upgrade NVIDIA driver to version 470.141.03 (from 470.103.01).

  • Upgrade NVIDIA Fabric Manager to version 470.141.03 (from 470.103.01).

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

November 16, 2022

Amazon ParallelCluster version 3.3.0 released

Amazon ParallelCluster version 3.3.0 released.

Enhancements:

  • Add support for multiple instance allocation configuration for a compute resource when using Slurm as a scheduler. For more information, see Multiple instance type allocation with Slurm.

  • Add support for adding and removing SharedStorage with a cluster update, using an updated configuration. For more information, see Shared storage.

  • Add new configuration parameter DeletionPolicy for Efs and FsxLustre shared storage settings to support storage retention.

  • Add support for Slurm accounting with new configuration parameter Scheduling / SlurmSettings / Database. For more information, see Slurm accounting with Amazon ParallelCluster.

  • Add support for On-Demand Capacity Reservations (ODCR) and capacity reservation resource groups. For more information, see Launch instances with On-Demand Capacity Reservations (ODCR).

  • Add new configuration parameter to specify the IMDS version to support in a cluster or build image infrastructure in the cluster, Imds / ImdsSupport, and build, Imds / ImdsSupport, configurations.

  • Add support for Networking / PlacementGroup in the SlurmQueues / ComputeResources section.

  • Add support for instances with multiple network interfaces that are limited to only one ENI per device.

  • Improve validation of networking for external Amazon EFS file systems by checking the CIDR block in the attached security group.

  • Add validator to check if configured instance types support placement groups.

  • Configure NFS threads to be min(256, max(8, num_cores * 4)) to ensure better stability and performance.

  • Move NFS installation at build time to reduce configuration time.

  • Enable server-side encryption for the EcrImageBuilder SNS topic that's created when deploying Amazon ParallelCluster API and is used to notify on docker image build events.

Changes:

  • Change the behavior of SlurmQueues / Networking / PlacementGroup / Enabled. It now creates a unique managed placement group for each compute resource instead of a single managed placement group for all compute resources.

  • Add support for SlurmQueues / Networking / PlacementGroup / Name as the preferred naming method.

  • Move head node tags from Launch Template to instance definition to avoid head node replacement on tags updates.

  • Disable multithreading through script executed by cloud-init and not through CpuOptions set in the Launch Template.

  • Upgrade Python to version 3.9 and NodeJS to version 16 in the API infrastructure, API Docker container, and cluster Lambda resources.

  • Remove support for Python 3.6 in aws-parallelcluster-batch-cli.

  • Upgrade Slurm to version 22.05.5 (from 21.08.8-2).

  • Upgrade NVIDIA driver to version 470.141.03 (from 470.129.06).

  • Upgrade NVIDIA Fabric Manager to version 470.141.03 (from 470.129.06).

  • Upgrade NVIDIA CUDA Toolkit to version 11.7.1 (from 11.4.4).

  • Upgrade Python used in Amazon ParallelCluster virtualenvs from 3.7.13 to 3.9.15.

  • Upgrade EFA installer to version 1.18.0.

    • Efa-driver: efa-1.16.0-1 (no change)

    • Efa-config: efa-config-1.11-1 (from efa-config-1.10-1)

    • Efa-profile: efa-profile-1.5-1 (no change)

    • Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1 (from libfabric-aws-1.16.0~amzn2.0-1).

    • Rdma-core: rdma-core-41.0-2 (from rdma-core-37.0)

    • Open MPI: openmpi40-aws-4.1.4-2 (from openmpi40-aws-4.1.1-2)

  • Upgrade NICE DCV to version 2022.1-13300 (from 2022.0-12760).

  • Enable suppression of the SingleSubnetValidator for Queues.

  • Do not replace DRAIN nodes when nodes are in COMPLETING state as Epilog may be still running.

Bug fixes:

  • Fix validation of filters parameter in the Amazon ParallelCluster ListClusterLogStreams command to fail when incorrect filters are passed.

  • Fix validation of parameter SharedStorage / EfsSettings to fail validation when FileSystemId is specified along with other SharedStorage / EfsSettings parameters. Previously, FileSystemId wasn't included.

  • Fix cluster update when changing the order of SharedStorage together with other changes in the configuration.

  • Fix UpdateParallelClusterLambdaRole in the Amazon ParallelCluster API to upload logs to CloudWatch.

  • Fix Cinc not using the local CA certificates bundle when installing packages before any cookbooks are executed.

  • Fix a hang in upgrading ubuntu with pcluster build-image when Build:UpdateOsPackages:Enabled:true is set.

  • Fix parsing of YAML cluster configuration by failing on duplicate keys.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

November 2, 2022

Amazon ParallelCluster documentation only API reference added.

Amazon ParallelCluster documentation-only update

October 27, 2022

Amazon ParallelCluster version 3.2.1 released

Amazon ParallelCluster version 3.2.1 released.

Enhancements:

  • Improve the logic to associate the host routing tables to the different network cards to better support Amazon EC2 instances with several NICs.

Changes:

  • Upgrade NVIDIA driver to version 470.141.03.

  • Upgrade NVIDIA Fabric Manager to version 470.141.03.

  • Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.

  • Upgrade Intel MPI Library to 2021.6.0.602.

  • Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.

Bug fixes:

  • Avoid failing on DescribeCluster when cluster configuration is not available.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

October 3, 2022

Amazon ParallelCluster version 3.2.0 released

Amazon ParallelCluster version 3.2.0 released.

Enhancements:

Changes:

  • Upgrade EFA installer to version 1.17.2.

    • EFA driver: efa-1.16.0-1

    • EFA configuration: efa-config-1.10-1

    • EFA profile: efa-profile-1.5-1

    • Libfabric: libfabric-aws-1.16.0~amzn2.0-1

    • RDMA core: rdma-core-41.0-2

    • Open MPI: openmpi40-aws-4.1.4-2

  • Upgrade NICE DCV to version 2022.0-12760.

  • Upgrade NVIDIA driver to version 470.129.06.

  • Upgrade NVIDIA Fabric Manager to version 470.129.06.

  • Change default EBS volume types from gp2 to gp3 in both the root and additional volumes.

  • Changes to FSx for Lustre file systems created by Amazon ParallelCluster:

    • Change the default deployment type to Scratch_2.

    • Change the Lustre server version to 2.12.

  • Doesn't require PlacementGroup / Enabled to be set to true when passing an existing PlacementGroup / Id.

  • Doesn't allow setting PlacementGroup / Id when PlacementGroup / Enabled is explicitly set to false.

  • Add parallelcluster:cluster-name tag to all resources created by Amazon ParallelCluster.

  • Add lambda:ListTags and lambda:UntagResource to ParallelClusterUserRole used by the Amazon ParallelCluster API stack for cluster update.

  • Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter HeadNode / Imds / Secured is enabled.

  • With a custom AMI, use the AMI root volume size instead of the ParallelCluster default of 35 GiB. The value can be changed in cluster configuration file.

  • Automatic disabling of the compute fleet when the configuration parameter Scheduling / SlurmQueues / ComputeResources / SpotPrice is lower than the minimum required Spot request fulfillment price.

  • Show requested_value and current_value values in the change set when adding or removing a section during an update.

  • Disable aws-ubuntu-eni-helper service, available in Deep Learning AMIs, to avoid conflicts with configure_nw_interface.sh when configuring instances with multiple network cards.

  • Remove support for Python 3.6.

  • Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.

  • Remove the trailing dot when configuring the compute node FQDN.

  • Manage static nodes in POWERING_DOWN.

  • Doesn't replace dynamic node in POWER_DOWN as jobs may be still running.

  • Restart clustermgtd and slurmctld daemons at cluster update time only when Scheduling parameters are updated in the cluster configuration.

  • Update slurmctld and slurmd systemd service files.

  • Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter HeadNode / Imds / Secured is enabled.

  • Set Slurm configuration AuthInfo=cred_expire=70 to reduce the time requeued jobs must wait before starting again when nodes are not available.

  • Upgrade third-party cookbook dependencies:

    • apt-7.4.2 (from apt-7.4.0)

    • line-4.5.2 (from line-4.0.1)

    • openssh-2.10.3 (from openssh-2.9.1)

    • pyenv-3.5.1 (from pyenv-3.4.2)

    • selinux-6.0.4 (from selinux-3.1.1)

    • yum-7.4.0 (from yum-6.1.1)

    • yum-epel-4.5.0 (from yum-epel-4.1.2)

Bug fixes:

  • Fix the default behavior to skip the Amazon ParallelCluster validation and test steps when building a custom AMI.

  • Fix file handle leak in computemgtd.

  • Fix race condition that was sporadically causing launched instances to be immediately terminated because they were not yet available in the EC2 DescribeInstances response.

  • Fix support for the DisableSimultaneousMultithreading parameter on instance types with Arm processors.

  • Fix Amazon ParallelCluster API stack update failure when upgrading from a previous version. Add resource pattern used for the ListImagePipelineImages Action in the EcrImageDeletionLambdaRole.

  • Fix Amazon ParallelCluster API adding missing permissions needed to import or export from Amazon S3 when creating an FSx for Lustre file system.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

July 27, 2022

Amazon ParallelCluster documentation-only updates this year to date

Amazon ParallelCluster documentation-only updates.

July 6, 2022

Amazon ParallelCluster version 3.1.4 released

Amazon ParallelCluster version 3.1.4 released.

Enhancements:

Changes:

  • Upgrade Slurm to version 21.08.8-2.

  • Build Slurm with JWT support.

  • Doesn't require PlacementGroup / Enabled to be set to true when passing an existing PlacementGroup / Id.

  • Add lambda:TagResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster creation and image creation.

Bug fixes:

  • Fix the ability to export a cluster's logs when using the export-cluster-logs command with the --filters option.

  • Fix Amazon Batch Docker entry point to use /home shared directory to coordinate Multi-node-Parallel job execution.

  • Reset node address when setting Slurm unhealthy static node to down to avoid treating static node failed with insufficient capacity as a bootstrap failure node.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

May 16, 2022

Amazon ParallelCluster version 3.1.3 released

Amazon ParallelCluster version 3.1.3 released.

Enhancements:

  • Execute SSH key creation alongside with the creation of HOME directory, for example, during SSH login, when switching to another user and when executing a command as another user.

  • Add support for both FQDN and LDAP Distinguished Names in the configuration parameter DirectoryService / DomainName. The new validator now checks both the syntaxes.

  • New update_directory_service_password.sh script deployed on the head node supports the manual update of the Active Directory password in the SSSD configuration. The password is retrieved by the Amazon Secrets Manager as from the cluster configuration.

  • Add support to deploy API infrastructure in environments without a default VPC.

Changes:

  • Disable deeper C-States in x86_64 official AMIs and AMIs created through build-image command, to guarantee high performance and low latency.

  • OS package updates and security fixes.

  • Change Amazon Linux 2 base images to use AMIs with Kernel 5.10.

Bug fixes:

  • Fix build-image stack in DELETE_FAILED after image built successful, due to new EC2 Image Builder policies.

  • Fix the configuration parameter DirectoryService / DomainAddr conversion to ldap_uri SSSD property when it contains multiples domain addresses.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, and aws-parallelcluster-cookbook packages on GitHub.

April 20, 2022

Amazon ParallelCluster version 3.1.2 released

Amazon ParallelCluster version 3.1.2 released.

Changes:

  • Upgrade Slurm to version 21.08.6 (from 21.08.5).

Bug fixes:

  • Fix the update of /etc/hosts file on compute nodes when a cluster is deployed in subnets without internet access.

  • Fix compute nodes bootstrap to wait for ephemeral drives initialization before joining the cluster.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster package on GitHub.

March 2, 2022

Amazon ParallelCluster version 3.1.1 released

Amazon ParallelCluster version 3.1.1 released.

  • Add support for multiple user cluster environments by integrating with Active Directory (AD) domains managed through Amazon Directory Service.

  • Add support for UseEc2Hostnames in the cluster configuration file. When set to true, use Amazon EC2 default hostnames (e.g. ip-1-2-3-4) for compute nodes.

  • Add support for cluster creation in subnets with no internet access.

  • Add support for multiple compute instance types per queue.

  • Add support for GPU scheduling with Slurm on ARM instances with NVIDIA cards.

  • Add abbreviated flags for cluster-name (-n), region (-r), image-id (-i) and cluster-configuration / image-configuration (-c) to the Amazon ParallelCluster CLI.

  • Add support for NEW_CHANGED_DELETED option for FSx for Lustre AutoImportPolicy parameter.

  • Add parallelcluster:compute-resource-name tag to EC2 LaunchTemplates resources used by compute nodes.

  • Improve security groups created within the cluster to allow inbound connections from custom security groups when SecurityGroups parameters are specified for some head node and/or queues.

  • Install NVIDIA drivers and CUDA library for ARM.

Changes:

  • Upgrade Slurm to version 21.08.5 (from 20.11.8).

  • Upgrade Slurm plugin to version 21.08 (from 20.11).

  • Upgrade NICE DCV to version 2021.3-11591 (from 2021.1-10851).

  • Upgrade NVIDIA driver to version 470.103.01 (from 470.57.02).

  • Upgrade NVIDIA Fabric manager to version 470.103.01 (from 470.57.02).

  • Upgrade CUDA to version 11.4.4 (from 11.4.0).

  • Intel MPI updated to Version 2021 Update 4 (updated from Version 2019 Update 8). For more information, see Intel® MPI Library 2021 Update 4.

  • Upgrade PMIx to version 3.2.3 (from 3.1.5).

  • Remove dumping of failed compute nodes to /home/logs/compute. Compute nodes log files are available in CloudWatch and in Amazon EC2 console logs.

  • Enable potential to suppress SlurmQueues and ComputeResources length validators.

  • Disable package update at instance launch time on Amazon Linux 2.

  • Disable Amazon EC2 ImageBuilder enhanced image metadata when building Amazon ParallelCluster custom images.

  • Explicitly set cloud-init datasource to be EC2. This saves boot time for Ubuntu and CentOS platforms.

  • Use compute resource name rather than instance type in compute fleet launch template name.

  • Redirect stderr and stdout to CLI log file to prevent unwanted text in the pcluster CLI output.

  • Move the configure/install recipes to separate cookbooks that are called from the main one. Existing entrypoints are maintained and backwards compatible.

  • Download dependencies of Intel HPC platform during AMI build time to avoid contacting internet during cluster creation time.

  • Do not strip - from compute resource name when configuring Slurm nodes.

  • Do not configure GPUs in Slurm when NVIDIA driver is not installed.

  • Fix ecs:ListContainerInstances permission in BatchUserRole.

  • Fix exporting of cluster logs when there is no prefix specified, previously exported to a None prefix.

  • Fix rollback not being performed in case of cluster update failure.

  • Fix ecs:ListContainerInstances permission in BatchUserRole.

  • Fix RootVolume schema for the HeadNode by raising an error if an unsupported KmsKeyId is specified.

  • Fix Amazon FSx missing metrics to be displayed in CloudWatch Dashboard.

  • Fix EfaSecurityGroupValidator. Previously, it had potential to produce false failures when custom security groups were provided and EFA was enabled.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub.

February 10, 2022

Amazon ParallelCluster version 3.0.3 released

Amazon ParallelCluster version 3.0.3 released.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster and aws-parallelcluster-cookbook packages on GitHub.

January 17, 2022

Amazon ParallelCluster version 3.0.2 released

Amazon ParallelCluster version 3.0.2 released.

Upgrade Elastic Fabric Adapter installer to 1.14.1

  • EFA config: efa-config-1.9-1 (from efa-config-1.9)

  • EFA profile: efa-profile-1.5-1 (from efa-profile-1.5)

  • EFA Kernel module: efa-1.14.2 (from efa-1.13.0)

  • RDMA core: rdma-core-37.0 (from rdma-core-35)

  • Libfabric: libfabric-1.13.2 (from libfabric-1.13.0)

  • Open MPI: openmpi40-aws-4.1.1-2 (no change)

GPUDirect RDMA is always enabled if supported by the instance type. The GdrSupport configuration option has no effect.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub.

November 5, 2021

Amazon ParallelCluster version 3.0.1 released

Amazon ParallelCluster version 3.0.1 released.

Cluster configuration migration tool

  • Customers can now migrate their cluster configurations from the Amazon ParallelCluster version 2 format to the YAML-based Amazon ParallelCluster version 3 format. For more information, see pcluster3-config-converter.

Head node can be stopped

  • After stopping the compute fleet, the head node can be stopped and later restarted using the Amazon EC2 console or the stop-instances Amazon CLI command.

Default Amazon Web Services Region read from ~/.aws/config file

  • For the pcluster command, if the Amazon Web Services Region is not specified in the configuration file, in the environment, or on the command line, the default Amazon Web Services Region specified in the region setting in the [default] section of the ~/.aws/config file is used.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub.

October 27, 2021

Amazon ParallelCluster version 3.0.0 released

Amazon ParallelCluster version 3.0.0 released.

Support for cluster management via Amazon API Gateway

  • Customers can now manage and deploy clusters through HTTP endpoints with Amazon API Gateway. This opens up new possibilities for scripted or event-driven workflows.

    The Amazon ParallelCluster command line interface (CLI) has also been redesigned for compatibility with this API and includes a new JSON output option. This new functionality makes it possible for customers to implement similar building block capabilities using the CLI as well.

Improved custom AMI creation

  • Customers now have access to a more robust process for creating and managing custom AMIs using EC2 Image Builder. Custom AMIs can now be managed through a separate Amazon ParallelCluster configuration file, and can be created using the pcluster build-image command in the Amazon ParallelCluster command line interface.

For details of the changes, see the CHANGELOG files for the aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub.

September 10, 2021