Deliver data to Apache Iceberg Tables with Amazon Data Firehose
Apache Iceberg is a high-performance open-source table format for performing big
data analytics. Apache Iceberg brings the reliability and simplicity of SQL tables to Amazon S3
data lakes, and makes it possible for open-source analytics engines like Spark, Flink,
Trino, Hive, and Impala to concurrently work with the same data. For more information about
Apache Iceberg, see https://iceberg.apache.org/
You can use Firehose to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. Firehose guarantees exactly-once delivery to Iceberg Tables. This feature requires using the Amazon Glue Data Catalog.
Firehose can also directly deliver streaming data to Amazon S3 Tables. Amazon S3 Tables provide storage
that is optimized for large-scale analytics workloads, with features that continuously
improve query performance and reduce storage costs for tabular data. With built-in support
for Apache Iceberg, you can query tabular data in Amazon S3 with popular query engines including
Amazon Athena, Amazon Redshift, and Apache Spark. For more information on Amazon S3 Tables, see Amazon S3 Tables