Customizing Docker images for Flink and FluentD - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Customizing Docker images for Flink and FluentD

Take the following steps to customize Docker images for Amazon EMR on EKS with Apache Flink or FluentD images. These include technical guidance for getting a base image, customizing it, publishing it, and submitting a workload.

Before you customize your Docker image, make sure that you have completed the following prerequisites:

The base image contains the Amazon EMR runtime and connectors that you need to access other Amazon Web Services services. If you're using Amazon EMR on EKS with Flink version 6.14.0 or higher, you can get the base images from the Amazon ECR Public Gallery. Browse the gallery to find the image link and pull the image to your local workspace. For example, for the Amazon EMR 6.14.0 release, the following docker pull command returns the latest standard base image. Replace emr-6.14.0:latest with the release version you want.

docker pull public.ecr.aws/emr-on-eks/flink/emr-6.14.0-flink:latest

The following are links to the Flink gallery image and Fluentd gallery image:

The following steps describe how to customize the base image you pulled from Amazon ECR.

  1. Create a new Dockerfile on your local Workspace.

  2. Edit the Dockerfile and add the following content. This Dockerfile uses the container image you pulled from public.ecr.aws/emr-on-eks/flink/emr-7.2.0-flink:latest.

    FROM public.ecr.aws/emr-on-eks/flink/emr-7.2.0-flink:latest USER root ### Add customization commands here #### USER hadoop:hadoop

    Use the following configuration if you're using Fluentd.

    FROM public.ecr.aws/emr-on-eks/fluentd/emr-7.2.0:latest USER root ### Add customization commands here #### USER hadoop:hadoop
  3. Add commands in the Dockerfile to customize the base image. The following command demonstrates how to install Python libraries.

    FROM public.ecr.aws/emr-on-eks/flink/emr-7.2.0-flink:latest USER root RUN pip3 install --upgrade boto3 pandas numpy // For python 3 USER hadoop:hadoop
  4. In the same directory of where you created DockerFile, run the following command to build the Docker image. The field you supply following the -t flag is your custom name for the image.

    docker build -t <YOUR_ACCOUNT_ID>.dkr.ecr.<YOUR_ECR_REGION>.amazonaws.com/<ECR_REPO>:<ECR_TAG>

You can now publish the new Docker image to your Amazon ECR registry.

  1. Run the following command to create an Amazon ECR repository to store your Docker image. Provide a name for your repository, such as emr_custom_repo. For more information, see Create a repository in the Amazon Elastic Container Registry User Guide.

    aws ecr create-repository \ --repository-name emr_custom_repo \ --image-scanning-configuration scanOnPush=true \ --region <AWS_REGION>
  2. Run the following command to authenticate to your default registry. For more information, see Authenticate to your default registry in the Amazon Elastic Container Registry User Guide.

    aws ecr get-login-password --region <AWS_REGION> | docker login --username Amazon --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.<YOUR_ECR_REGION>.amazonaws.com
  3. Push the image. For more information, see Push an image to Amazon ECR in the Amazon Elastic Container Registry User Guide.

    docker push <YOUR_ACCOUNT_ID>.dkr.ecr.<YOUR_ECR_REGION>.amazonaws.com/<ECR_REPO>:<ECR_TAG>

Make the following changes to your FlinkDeployment spec to use a custom image. To do so, enter your own image in the spec.image line of your deployment spec.

apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: basic-example spec: flinkVersion: v1_18 image: <YOUR_ACCOUNT_ID>.dkr.ecr.<YOUR_ECR_REGION>.amazonaws.com/<ECR_REPO>:<ECR_TAG> imagePullPolicy: Always flinkConfiguration: taskmanager.numberOfTaskSlots: "1"

To use a custom image for your Fluentd job, enter your own image in the monitoringConfiguration.image line of your deployment spec.

monitoringConfiguration: image: <YOUR_ACCOUNT_ID>.dkr.ecr.<YOUR_ECR_REGION>.amazonaws.com/<ECR_REPO>:<ECR_TAG> cloudWatchMonitoringConfiguration: logGroupName: flink-log-group logStreamNamePrefix: custom-fluentd