Help improve this page
Want to contribute to this user guide? Choose the Edit this page on GitHub link that is located in the right pane of every page. Your contributions will help make our user guide better for everyone.
Prepare to create an EKS cluster for Machine Learning
There are ways that you can enhance your Machine Learning on EKS experience. Following pages in this section will help you:
-
Understand your choices for using ML on EKS and
-
Help in preparation of your EKS and ML environment.
In particular, this will help you:
-
Choose AMIs: Amazon offers multiple customized AMIs for running ML workloads on EKS. See Run GPU-accelerated containers (Linux on EC2) and Run GPU-accelerated containers (Windows on EC2 G-Series).
-
Customize AMIs: You can further modify Amazon custom AMIs to add other software and drivers needed for your particular use cases. See Create self-managed nodes with Capacity Blocks for ML.
-
Reserve GPUs: Because of the demand for GPUs, to ensure that the GPUs you need are available when you need them, you can reserve the GPUs you need in advance. See Prevent Pods from being scheduled on specific nodes.
-
Add EFA: Add the Elastic Fabric Adapter to improve network performance for inter-node cluster communications. See Add Elastic Fabric Adapter to EKS clusters for ML training.
-
Use AWSInferentia workloads: Create an EKS cluster with Amazon EC2 Inf1 instances. See Use Amazon Inferentia instances with your EKS cluster for Machine Learning.
Topics
- Run GPU-accelerated containers (Linux on EC2)
- Run GPU-accelerated containers (Windows on EC2 G-Series)
- Create a managed node group with Capacity Blocks for ML
- Create self-managed nodes with Capacity Blocks for ML
- Prevent Pods from being scheduled on specific nodes
- Add Elastic Fabric Adapter to EKS clusters for ML training
- Use Amazon Inferentia instances with your EKS cluster for Machine Learning