Referencing Iceberg tables in Amazon Redshift - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Redshift will no longer support the creation of new Python UDFs starting November 1, 2025. If you would like to use Python UDFs, create the UDFs prior to that date. Existing Python UDFs will continue to function as normal. For more information, see the blog post .

Referencing Iceberg tables in Amazon Redshift

Amazon Redshift provides multiple ways to reference Apache Iceberg tables stored in your data lake. You can use external schemas to create references to Data Catalog databases containing Iceberg tables, or use three-part notation for direct access to auto-mounted catalogs.

Using external schemas to reference Iceberg tables

External schemas provide a way to reference tables in your Data Catalog from within Amazon Redshift. When you create an external schema, you establish a connection between your Amazon Redshift database and a specific Data Catalog database that contains your Iceberg tables.

To create an external schema for Iceberg tables:

CREATE EXTERNAL SCHEMA schema_name FROM DATA CATALOG DATABASE 'glue_database_name' IAM_ROLE 'arn:aws:iam::account-id:role/role-name';

After creating the external schema, you can query Iceberg tables using two-part notation:

SELECT * FROM schema_name.iceberg_table_name;

You can also join Iceberg tables with local Amazon Redshift tables:

SELECT r.customer_id, i.order_date, r.customer_name FROM local_customers r JOIN schema_name.iceberg_orders i ON r.customer_id = i.customer_id;

Using three-part notation with auto-mounted catalogs

Three-part notation allows you to directly reference tables in auto-mounted catalogs without creating external schemas. This method is particularly useful when working with Amazon S3 table buckets federated with Amazon Lake Formation. For information about setting up automatic mounting of the Data Catalog, see Simplify external object access in Amazon Redshift using automatic mounting of the Amazon Glue Data Catalog.

The syntax for three-part notation is:

"catalog_name".database_name.table_name

For example, to query an Iceberg table in an auto-mounted Amazon S3 table catalog:

SELECT * FROM "my_table_bucket@s3tablescatalog".my_database.my_iceberg_table;

For more information about integrating Amazon S3 table buckets with Amazon Redshift, see Integrating S3 Tables with Amazon Redshift in the Amazon S3 User Guide.

You can also use the USE statement to set a default catalog and database:

USE "my_table_bucket@s3tablescatalog".my_database; SELECT * FROM my_iceberg_table;

To set a search path for schema resolution:

USE "my_table_bucket@s3tablescatalog"; SET search_path TO my_database; SELECT * FROM my_iceberg_table;

Best practices for referencing Iceberg tables

Consider the following best practices when referencing Iceberg tables in Amazon Redshift:

  • Use descriptive schema names – When creating external schemas, use names that clearly indicate the source and purpose of the data, such as sales_data_lake or customer_analytics.

  • Leverage table statistics – Ensure that column statistics are generated for your Iceberg tables using Amazon Glue to optimize query performance. Amazon Redshift uses these statistics for query planning and optimization.

  • Consider data freshness – Iceberg tables may be updated by other services while you're querying them. Amazon Redshift provides transactional consistency, ensuring you see a consistent snapshot of the data during your query execution.

  • Use appropriate IAM permissions – Ensure that your Amazon Redshift cluster or workgroup has the necessary IAM permissions to access the Amazon S3 locations where your Iceberg tables are stored, as well as the Data Catalog metadata.

  • Monitor query performance – Use Amazon Redshift query monitoring features to track the performance of queries against Iceberg tables and optimize as needed.

Common referencing patterns

The following examples demonstrate common patterns for referencing Iceberg tables:

Aggregating data across multiple Iceberg tables:

SELECT region, SUM(sales_amount) as total_sales, COUNT(*) as transaction_count FROM data_lake.sales_transactions WHERE transaction_date >= '2024-01-01' GROUP BY region ORDER BY total_sales DESC;

Joining Iceberg tables with local Amazon Redshift tables:

SELECT c.customer_name, c.customer_tier, SUM(o.order_amount) as total_orders FROM customers c JOIN data_lake.order_history o ON c.customer_id = o.customer_id WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY c.customer_name, c.customer_tier;

Using three-part notation with complex queries:

WITH recent_orders AS ( SELECT customer_id, order_date, order_amount FROM "analytics_bucket@s3tablescatalog".ecommerce.orders WHERE order_date >= CURRENT_DATE - INTERVAL '7 days' ) SELECT customer_id, COUNT(*) as order_count, AVG(order_amount) as avg_order_value FROM recent_orders GROUP BY customer_id HAVING COUNT(*) > 1;