Deliver data to Apache Iceberg Tables with Amazon Data Firehose - Amazon Data Firehose
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Deliver data to Apache Iceberg Tables with Amazon Data Firehose

Apache Iceberg is a high-performance open-source table format for performing big data analytics. Apache Iceberg brings the reliability and simplicity of SQL tables to Amazon S3 data lakes, and makes it possible for open-source analytics engines like Spark, Flink, Trino, Hive, and Impala to concurrently work with the same data. For more information about Apache Iceberg, see https://iceberg.apache.org/.

You can use Firehose to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. Firehose guarantees exactly-once delivery to Iceberg Tables. This feature requires using the Amazon Glue Data Catalog.

Firehose can also directly deliver streaming data to Amazon S3 Tables. Amazon S3 Tables provide storage that is optimized for large-scale analytics workloads, with features that continuously improve query performance and reduce storage costs for tabular data. With built-in support for Apache Iceberg, you can query tabular data in Amazon S3 with popular query engines including Amazon Athena, Amazon Redshift, and Apache Spark. For more information on Amazon S3 Tables, see Amazon S3 Tables. Firehose integration with Amazon S3 Tables is in preview in all Regions where Amazon S3 Tables is available. Do not use it for your production workloads.