Supported Frameworks Amazon Web Services Regions Supported Instance Types

Supported Frameworks and Amazon Web Services Regions

Before using the SageMaker model parallelism library, check the supported frameworks and instance types, and determine if there are enough quotas in your Amazon account and Amazon Web Services Region.

Note

To check the latest updates and release notes of the library, see the SageMaker Model Parallel Release Notes in the SageMaker Python SDK documentation.

Supported Frameworks

The SageMaker model parallelism library supports the following deep learning frameworks and is available in Amazon Deep Learning Containers (DLC) or downloadable as a binary file.

PyTorch versions supported by SageMaker AI and the SageMaker model parallelism library

PyTorch version	SageMaker model parallelism library version	`smdistributed-modelparallel` integrated DLC image URI	URL of the binary file**
v2.0.0	`smdistributed-modelparallel==v1.15.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker`	https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-2.0.0/build-artifacts/2023-04-14-20-14/smdistributed_modelparallel-1.15.0-cp310-cp310-linux_x86_64.whl
v1.13.1	`smdistributed-modelparallel==v1.15.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker`	https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-04-17-15-49/smdistributed_modelparallel-1.15.0-cp39-cp39-linux_x86_64.whl
v1.12.1	`smdistributed-modelparallel==v1.13.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker`	https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.1/build-artifacts/2022-12-08-21-34/smdistributed_modelparallel-1.13.0-cp38-cp38-linux_x86_64.whl
v1.12.0	`smdistributed-modelparallel==v1.11.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker`	https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl
v1.11.0	`smdistributed-modelparallel==v1.10.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker`	https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
v1.10.2	`smdistributed-modelparallel==v1.7.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker`	-
v1.10.0	`smdistributed-modelparallel==v1.5.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker`	-
v1.9.1	`smdistributed-modelparallel==v1.4.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.9.1-gpu-py38-cu111-ubuntu20.04`	-
v1.8.1*	`smdistributed-modelparallel==v1.6.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.8.1-gpu-py36-cu111-ubuntu18.04`	-

Note

The SageMaker model parallelism library v1.6.0 and later provides extended features for PyTorch. For more information, see Core Features of the SageMaker Model Parallelism Library.

** The URLs of the binary files are for installing the SageMaker model parallelism library in custom containers. For more information, see Create Your Own Docker Container with the SageMaker Distributed Model Parallel Library.

TensorFlow versions supported by SageMaker AI and the SageMaker model parallelism library

TensorFlow version	SageMaker model parallelism library version	`smdistributed-modelparallel` integrated DLC image URI
v2.6.0	`smdistributed-modelparallel==v1.4.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.6.0-gpu-py38-cu112-ubuntu20.04`
v2.5.1	`smdistributed-modelparallel==v1.4.0`	`763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.5.1-gpu-py37-cu112-ubuntu18.04`

Hugging Face Transformers versions supported by SageMaker AI and the SageMaker distributed data parallel library

The Amazon Deep Learning Containers for Hugging Face use the SageMaker Training Containers for PyTorch and TensorFlow as their base images. To look up the Hugging Face Transformers library versions and paired PyTorch and TensorFlow versions, see the latest Hugging Face Containers and the Prior Hugging Face Container Versions.

Amazon Web Services Regions

The SageMaker data parallel library is available in all of the Amazon Web Services Regions where the Amazon Deep Learning Containers for SageMaker are in service. For more information, see Available Deep Learning Containers Images.

Supported Instance Types

The SageMaker model parallelism library requires one of the following ML instance types.

Instance type
`ml.g4dn.12xlarge`
`ml.p3.16xlarge`
`ml.p3dn.24xlarge`
`ml.p4d.24xlarge`
`ml.p4de.24xlarge`

For specs of the instance types, see the Accelerated Computing section in the Amazon EC2 Instance Types page. For information about instance pricing, see Amazon SageMaker AI Pricing.

If you encountered an error message similar to the following, follow the instructions at Request a service quota increase for SageMaker AI resources.


ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling
    the CreateTrainingJob operation: The account-level service limit 'ml.p3dn.24xlarge
    for training job usage' is 0 Instances, with current utilization of 0 Instances
    and a request delta of 1 Instances.
    Please contact AWS support to request an increase for this limit.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Introduction to Model Parallelism

Core Features