Amazon Glue Streaming options - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Glue Streaming options

Designates a connection to a Kafka cluster or an Amazon Managed Streaming for Apache Kafka cluster.

You can read and write to Kafka data streams using information stored in a Data Catalog table, or by providing information to directly access the data stream. You can read information from Kafka into a Spark DataFrame, then convert it to a Amazon Glue DynamicFrame. You can write DynamicFrames to Kafka in a JSON format. If you directly access the data stream, use these options to provide the information about how to access the data stream.

If you use getCatalogSource or create_data_frame_from_catalog to consume records from a Kafka streaming source, or getCatalogSink or write_dynamic_frame_from_catalog to write records to Kafka, and the job has the Data Catalog database and table name information, and can use that to obtain some basic parameters for reading from the Kafka streaming source. If you use getSource, getCatalogSink, getSourceWithFormat, getSinkWithFormat, createDataFrameFromOptions or create_data_frame_from_options, or write_dynamic_frame_from_catalog, you must specify these basic parameters using the connection options described here.

You can specify the connection options for Kafka using the following arguments for the specified methods in the GlueContext class.

  • Scala

    • connectionOptions: Use with getSource, createDataFrameFromOptions, getSink

    • additionalOptions: Use with getCatalogSource, getCatalogSink

    • options: Use with getSourceWithFormat, getSinkWithFormat

  • Python

    • connection_options: Use with create_data_frame_from_options, write_dynamic_frame_from_options

    • additional_options: Use with create_data_frame_from_catalog, write_dynamic_frame_from_catalog

    • options: Use with getSource, getSink

For notes and restrictions about streaming ETL jobs, consult Streaming ETL notes and restrictions.