Getting started with Amazon Managed Service for Apache Flink for Python - Managed Service for Apache Flink
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Managed Service for Apache Flink was previously known as Amazon Kinesis Data Analytics for Apache Flink.

Getting started with Amazon Managed Service for Apache Flink for Python

This section introduces you to the fundamental concepts of a Managed Service for Apache Flink using Python and the Table API. It describes the available options for creating and testing your applications. It also provides instructions for installing the necessary tools to complete the tutorials in this guide and to create your first application.

Components of a Managed Service for Apache Flink application

Note

Amazon Managed Service for Apache Flink supports all Apache Flink APIs. Depending on the API you choose, the structure of the application is slightly different. One popular approach when developing an Apache Flink application in Python is to define the application flow using SQL embedded in Python code. This is the approach that we follow in the following Gettgin Started tutorial.

To process data, your Managed Service for Apache Flink application uses a Python script to define the data flow that processes input and produces output using the Apache Flink runtime.

A typical Managed Service for Apache Flink application has the following components:

  • Runtime properties: You can use runtime properties to configure your application without recompiling your application code.

  • Sources: The application consumes data from one or more sources. A source uses a connector to read data from an external system such as a Kinesis data stream, or an Amazon MSK topic. You can also use special connectors to generate data from within the application. When you use SQL, the application defines sources as source tables.

  • Transformations: The application processes data by using one or more transformations that can filter, enrich, or aggregate data. When you use SQL, the application defines transformations as SQL queries.

  • Sinks: The application sends data to external sources through sinks. A sink uses a connector to send data to an external system such as a Kinesis data stream, an Amazon MSK topic, an Amazon S3 bucket, or a relational database. You can also use a special connector to print the output for development purposes. When you use SQL, the application defines sinks as sink tables into which you insert results. For more information, see Writing data using sinks in Managed Service for Apache Flink.

Your Python application might also require external dependencies, such as additional Python libraries or any Flink connector your application uses. When you package your application, you must include every dependency that your application requires. This tutorial demonstrates how to include connector dependencies and how to package the application for deployment on Amazon Managed Service for Apache Flink.

Prerequisites

To complete this tutorial, you must have the following:

  • Python 3.11, preferably using a standalone environment like VirtualEnv (venv), Conda, or Miniconda.

  • Git client - install the Git client if you have not already.

  • Java Development Kit (JDK) version 11 - install a Java JDK 11 and set the JAVA_HOME environment variable to point to your install location. If you don't have a JDK 11, you can use Amazon Corretto or any standard JDK of our choice.

    • To verify that you have the JDK correctly installed, run the following command. The output will be different if you are using a JDK other than Amazon Corretto 11. Make sure that the version is 11.x.

      $ java --version openjdk 11.0.23 2024-04-16 LTS OpenJDK Runtime Environment Corretto-11.0.23.9.1 (build 11.0.23+9-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.23.9.1 (build 11.0.23+9-LTS, mixed mode)
  • Apache Maven - install Apache Maven if you have not done so already. For more information, see Installing Apache Maven.

    • To test your Apache Maven installation, use the following command:

      $ mvn -version
Note

Although your application is written in Python, Apache Flink runs in the Java Virtual Machine (JVM). It distributes most of the dependencies, such as the Kinesis connector, as JAR files. To manage these dependencies and to package the application in a ZIP file, use Apache Maven. This tutorial explains how to do so.

Warning

We recommend that you use Python 3.11 for local development. This is the same Python version used by Amazon Managed Service for Apache Flink with the Flink runtime 1.19.

Installing the Python Flink library 1.19 on Python 3.12 might fail.

If you have another Python version installed by default on your machine, we recommend that you create a standalone environment such as VirtualEnv using Python 3.11.

IDE for local development

We recommend that you use a development environment such as PyCharm or Visual Studio Code to develop and compile your application.

Then, complete the first two steps of the Getting started with Amazon Managed Service for Apache Flink (DataStream API):

To get started, see Create an Application.