SageMaker data parallelism library release notes - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

SageMaker data parallelism library release notes

See the following release notes to track the latest updates for the SageMaker distributed data parallelism (SMDDP) library.

The SageMaker distributed data parallelism library v2.2.0

Date: March 4, 2024

New features

  • Added support for PyTorch v2.2.0 with CUDA v12.1.

Integration into Docker containers distributed by the SageMaker model parallelism (SMP) library

This version of the SMDDP library is migrated to The SageMaker model parallelism library v2.2.0.

658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.2.0-gpu-py310-cu121

For Regions where the SMP Docker images are available, see Amazon Web Services Regions.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.2.0/cu121/2024-03-04/smdistributed_dataparallel-2.2.0-cp310-cp310-linux_x86_64.whl

The SageMaker distributed data parallelism library v2.1.0

Date: March 1, 2024

New features

  • Added support for PyTorch v2.1.0 with CUDA v12.1.

Bug fixes

Integration into SageMaker Framework Containers

This version of the SMDDP library passed benchmark testing and is migrated to the following SageMaker Framework Container.

  • PyTorch v2.1.0

    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.1.0-gpu-py310-cu121-ubuntu20.04-sagemaker

Integration into Docker containers distributed by the SageMaker model parallelism (SMP) library

This version of the SMDDP library is migrated to The SageMaker model parallelism library v2.1.0.

658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.1.2-gpu-py310-cu121

For Regions where the SMP Docker images are available, see Amazon Web Services Regions.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.1.0/cu121/2024-02-04/smdistributed_dataparallel-2.1.0-cp310-cp310-linux_x86_64.whl

The SageMaker distributed data parallelism library v2.0.1

Date: December 7, 2023

New features

Known issues

  • There's a CPU memory leak issue from a gradual CPU memory increase while training with SMDDP AllReduce in DDP mode.

Integration into SageMaker Framework Containers

This version of the SMDDP library passed benchmark testing and is migrated to the following SageMaker Framework Container.

  • PyTorch v2.0.1

    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.1/cu118/2023-12-07/smdistributed_dataparallel-2.0.2-cp310-cp310-linux_x86_64.whl

Other changes

  • Starting from this release, documentation for the SMDDP library is fully available in this Amazon SageMaker Developer Guide. In favor of the complete developer guide for SMDDP v2 housed in the Amazon SageMaker Developer Guide, documentation for the additional reference for SMDDP v1.x in the SageMaker Python SDK documentation is no longer supported. If you still need SMP v1.x documentation, see the following snapshot of the documentation at SageMaker Python SDK v2.212.0 documentation.