

# Migrate inference workload from x86 to Amazon Graviton
<a name="realtime-endpoints-graviton"></a>

 [Amazon Graviton](https://aws.amazon.com/ec2/graviton/) is a series of ARM-based processors designed by Amazon. They are more energy efficient than x86-based processors and offer a compelling price-performance ratio. Amazon SageMaker AI offers Graviton-based instances so that you can take advantage of these advanced processors for your inference needs. 

 You can migrate your existing inference workloads from x86-based instances to Graviton-based instances, by using either ARM compatible container images or multi-architecture container images. This guide assumes that you are either using [Amazon Deep Learning container images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md), or your own ARM compatible container images. For more information on building your own images, check [Building your image](https://github.com/aws/deep-learning-containers#building-your-image). 

 At a high level, migrating inference workload from x86-based instances to Graviton-based instances is a four-step process: 

1. Push container images to Amazon Elastic Container Registry (Amazon ECR), an Amazon managed container registry.

1. Create a SageMaker AI Model.

1. Create an endpoint configuration.

1. Create an endpoint.

 The following sections of this guide provide more details regarding the above steps. Replace the {{user placeholder text}} in the code examples with your own information. 

**Topics**
+ [Push container images to Amazon ECR](#realtime-endpoints-graviton-ecr)
+ [Create a SageMaker AI Model](#realtime-endpoints-graviton-model)
+ [Create an endpoint configuration](#realtime-endpoints-graviton-epc)
+ [Create an endpoint](#realtime-endpoints-graviton-ep)

## Push container images to Amazon ECR
<a name="realtime-endpoints-graviton-ecr"></a>

 You can push your container images to Amazon ECR with the Amazon CLI. When using an ARM compatible image, verify that it supports ARM architecture: 

```
docker inspect {{deep-learning-container-uri}}
```

 The response `"Architecture": "arm64"` indicates that the image supports ARM architecture. You can push it to Amazon ECR with the `docker push` command. For more information, check [Pushing a Docker image](https://docs.amazonaws.cn/AmazonECR/latest/userguide/docker-push-ecr-image.html). 

 Multi-architecture container images are fundamentally a set of container images supporting different architectures or operating systems, that you can refer to by a common manifest name. If you are using multi-architecture container images, then in addition to pushing the images to Amazon ECR, you will also have to push a manifest list to Amazon ECR. A manifest list allows for the nested inclusion of other image manifests, where each included image is specified by architecture, operating system and other platform attributes. The following example creates a manifest list, and pushes it to Amazon ECR. 

1. Create a manifest list.

   ```
   docker manifest create {{aws-account-id}}.dkr.ecr.{{aws-region}}.amazonaws.com/{{my-repository}} \
     {{aws-account-id}}.dkr.ecr.{{aws-account-id}}.amazonaws.com/{{my-repository:amd64}} \
   	{{aws-account-id}}.dkr.ecr.{{aws-account-id}}.amazonaws.com/{{my-repository:arm64}} \
   ```

1.  Annotate the manifest list, so that it correctly identifies which image is for which architecture. 

   ```
   docker manifest annotate --arch arm64 {{aws-account-id}}.dkr.ecr.{{aws-region}}.amazonaws.com/{{my-repository}} \
     {{aws-account-id}}.dkr.ecr.{{aws-region}}.amazonaws.com/{{my-repository:arm64}}
   ```

1. Push the manifest.

   ```
   docker manifest push {{aws-account-id}}.dkr.ecr.{{aws-region}}.amazonaws.com/{{my-repository}}
   ```

 For more information on creating and pushing manifest lists to Amazon ECR, check [Introducing multi-architecture container images for Amazon ECR](https://www.amazonaws.cn/blogs/containers/introducing-multi-architecture-container-images-for-amazon-ecr/), and [Pushing a multi-architecture image](https://docs.amazonaws.cn/AmazonECR/latest/userguide/docker-push-multi-architecture-image.html). 

## Create a SageMaker AI Model
<a name="realtime-endpoints-graviton-model"></a>

 Create a SageMaker AI Model by calling the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateModel.html) API. 

```
import boto3
from sagemaker import get_execution_role


aws_region = "{{aws-region}}"
sagemaker_client = boto3.client("sagemaker", region_name=aws_region)

role = get_execution_role()

sagemaker_client.create_model(
    ModelName = "{{model-name}}",
    PrimaryContainer = {
        "Image": "{{deep-learning-container-uri}}",
        "ModelDataUrl": "{{model-s3-location}}",
        "Environment": {
            "SAGEMAKER_PROGRAM": "{{inference.py}}",
            "SAGEMAKER_SUBMIT_DIRECTORY": "{{inference-script-s3-location}}",
            "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
            "SAGEMAKER_REGION": aws_region,
        }
    },
    ExecutionRoleArn = role
)
```

## Create an endpoint configuration
<a name="realtime-endpoints-graviton-epc"></a>

 Create an endpoint configuration by calling the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateEndpointConfig.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) API. For a list of Graviton-based instances, check [Compute optimized instances](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/compute-optimized-instances.html). 

```
sagemaker_client.create_endpoint_config(
    EndpointConfigName = "{{endpoint-config-name}}",
    ProductionVariants = [
        {
            "VariantName": "{{variant-name}}",
            "ModelName": "{{model-name}}",
            "InitialInstanceCount": {{1}},
            "InstanceType": "{{ml.c7g.xlarge}}", # Graviton-based instance
       }
    ]
)
```

## Create an endpoint
<a name="realtime-endpoints-graviton-ep"></a>

 Create an endpoint by calling the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateEndpoint.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateEndpoint.html) API. 

```
sagemaker_client.create_endpoint(
    EndpointName = "{{endpoint-name}}",
    EndpointConfigName = "{{endpoint-config-name}}"
)
```