Accessing Amazon S3 tables using the Amazon Glue Iceberg REST endpoint - Amazon Simple Storage Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Accessing Amazon S3 tables using the Amazon Glue Iceberg REST endpoint

Once your S3 table buckets are integrated with the Amazon Glue Data Catalog you can use the Amazon Glue Iceberg REST endpoint to connect to your S3 tables from Apache Iceberg-compatible clients, such as PyIceberg or Spark. The Amazon Glue Iceberg REST endpoint implements the Iceberg REST Catalog Open API specification which provides a standardized interface for interacting with Iceberg tables. To access S3 tables using the endpoint you need to configure permissions through a combination of IAM policies and Amazon Lake Formation grants. The following sections explain how to set up access, including creating the necessary IAM role, defining the required policies, and establishing Lake Formation permissions for both database and table-level access.

For an end to end walkthrough using PyIceberg, see Access data in Amazon S3 Tables using PyIceberg through the Amazon Glue Iceberg REST endpoint.

Create an IAM role for your client

To access tables through Amazon Glue endpoints, you need to create an IAM role with permissions to Amazon Glue and Lake Formation actions. This procedure explains how to create this role and configure its permissions.

  1. Open the IAM console at https://console.amazonaws.cn/iam/.

  2. In the left navigation pane, choose Policies.

  3. Choose Create a policy, and choose JSON in policy editor.

  4. Add the following inline policy that grants permissions to access Amazon Glue and Lake Formation actions:

    { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "glue:GetCatalog", "glue:GetDatabase", "glue:GetDatabases", "glue:GetTable", "glue:GetTables", "glue:CreateTable", "glue:UpdateTable" ], "Resource": [ "arn:aws:glue:<region>:<account-id>:catalog", "arn:aws:glue:<region>:<account-id>:catalog/s3tablescatalog", "arn:aws:glue:<region>:<account-id>:catalog/s3tablescatalog/<s3_table_bucket_name>", "arn:aws:glue:<region>:<account-id>:table/s3tablescatalog/<s3_table_bucket_name>/<namespace>/*", "arn:aws:glue:<region>:<account-id>:database/s3tablescatalog/<s3_table_bucket_name>/<namespace>" ] }, { "Effect": "Allow", "Action": [ "lakeformation:GetDataAccess" ], "Resource": "*" } ] }
  5. After you create the policy, create an IAM role and choose Custom trust policy as the Trusted entity type.

  6. Enter the following for the Custom trust policy.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<accountid>:role/<Admin_role>" }, "Action": "sts:AssumeRole", "Condition": {} } ] }

Define access in Lake Formation

Lake Formation provides fine-grained access control for your data lake tables. When you integrated your S3 bucket with the Amazon Glue Data Catalog, your tables were automatically registered as resources in Lake Formation. To access these tables, you must grant specific Lake Formation permissions to your IAM identity, in addition to its IAM policy permissions.

The following steps explain how to apply Lake Formation access controls to allow your Iceberg client to connect to your tables. You must sign in as a data lake administrator to apply these permissions.

Allow external engines to access table data

In Lake Formation, you must enable full table access for external engines to access data. This allows third-party applications to get temporary credentials from Lake Formation when using an IAM role that has full permissions on the requested table.

Open the Lake Formation console at https://console.amazonaws.cn/lakeformation/.

  1. Open the Lake Formation console at https://console.amazonaws.cn/lakeformation/, and sign in as a data lake administrator.

  2. In the navigation pane under Administration, choose Application integration settings.

  3. Select Allow external engines to access data in Amazon S3 locations with full table access. Then choose Save.

Grant Lake Formation permissions on your table resources

Next, grant Lake Formation permissions to the IAM role you created for your Iceberg-compatible client. These permissions will allow the role to create and manage tables in your namespace. You need to provide both database and table-level permissions:

To grant database permissions

  1. Open the Amazon Lake Formation console at https://console.amazonaws.cn/lakeformation/, and sign in as a data lake administrator.

  2. In the navigation pane, choose Data permissions and then choose Grant.

  3. On the Grant Permissions page, under Principals, choose IAM users and roles and select the IAM role you created for Amazon Glue Iceberg REST endpoint access.

  4. Under LF-Tags or catalog resources, choose Named Data Catalog resources.

  5. For Catalogs, choose the Amazon Glue data catalog that was created for your table bucket. For example, <accoundID>:s3tablescatalog/<table-bucket-name>.

  6. For Databases, choose mynamespace.

  7. For Table permissions, choose Create table and Describe.

  8. Choose Grant.

To grant table permissions

  1. Open the Amazon Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator.

  2. In the navigation pane, choose Data permissions and then choose Grant.

  3. On the Grant Permissions page, under Principals, choose IAM users and roles and select the IAM role you created for Amazon Glue Iceberg REST endpoint access.

  4. Under LF-Tags or catalog resources, choose Named Data Catalog resources.

  5. For Catalogs, choose the Amazon Glue data catalog that was created for your table bucket. For example, <accoundID>:s3tablescatalog/<table-bucket-name>.

  6. For Databases, choose the S3 table bucket namespace that you created.

  7. For Tables, choose ALL_TABLES.

  8. For Table permissions, choose Super.

  9. Choose Grant.

Set up your environment to use the endpoint

After you have setup the IAM role with the permissions required for table access you can use it to run Iceberg clients from your local machine by configuring the Amazon CLI with your role, using the following command:

aws sts assume-role --role-arn "arn:aws:iam::<accountid>:role/<glue-irc-role>" --role-session-name <glue-irc-role>

To access tables through the Amazon Glue REST endpoint, you need to initialize a catalog in your Iceberg-compatible client. This initialization requires specifying custom properties, including sigv4 properties, the endpoint URI and the warehouse location. Specify these properties as follows:

  • Sigv4 properties - Sigv4 must be enabled, the signing name is glue

  • Warehouse location - This is your table bucket, specified in this format: <accountid>:s3tablescatalog/<table-bucket-name>

  • Endpoint URI - Refer to the Amazon Glue service endpoints reference guide for the region-specific endpoint

The following example shows how to initialize a pyIceberg catalog.

rest_catalog = load_catalog( s3tablescatalog, **{ "type": "rest", "warehouse": "<accountid>:s3tablescatalog/<table-bucket-name>", "uri": "https://glue.<region>.amazonaws.com/iceberg", "rest.sigv4-enabled": "true", "rest.signing-name": "glue", "rest.signing-region": region } )

For additional information about the Amazon Glue Iceberg REST endpoint implementation, see Connecting to the Data Catalog using Amazon Glue Iceberg REST endpoint in the Amazon Glue User Guide.