Prerequisites to use Apache Iceberg Tables as a destination - Amazon Data Firehose
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Firehose supports database as a source in all Amazon Web Services Regions except China Regions, Amazon GovCloud (US) Regions, and Asia Pacific (Malaysia). This feature is in preview and is subject to change. Do not use it for your production workloads.

Prerequisites to use Apache Iceberg Tables as a destination

Before you begin, complete the following prerequisites.

  • Create an Amazon S3 bucket – You must create an Amazon S3 bucket to add metadata file path during tables creation. For more information, see Create an S3 bucket.

  • Create an IAM role with required permissions – Firehose needs an IAM role with specific permissions to access Amazon Glue tables and write data to Amazon S3. The same role is used to grant Amazon Glue access to Amazon S3 buckets. You need this IAM role when you create Iceberg Table and a Firehose stream. For more information, see Grant Firehose access to an Apache Iceberg Tables destination.

  • Create Apache Iceberg Tables – If you are configuring unique keys in the Firehose stream for updates and deletes, Firehose validates if the table and unique keys exist as a part of stream creation. For this scenario, you must create tables before creating the Firehose stream. You can use Amazon Glue to create Apache Iceberg Tables. For more information, see Creating Apache Iceberg tables. If you are not configuring unique keys in the Firehose stream, then you don't require to create Iceberg tables before creating a Firehose stream.

    Note

    Firehose supports the following table version and format for Apache Iceberg tables.

    • Table format version – Firehose only supports V2 table format. Do not create tables in V1 format, else you get an error and data is delivered to S3 error bucket instead.

    • Data storage format –Firehose writes data to Apache Iceberg Tables in Parquet format.

    • Row level operation –Firehose supports Merge-on-Read (MOR) mode of writing data to Apache Iceberg Tables.