Register S3 table bucket catalogs in Athena - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Register S3 table bucket catalogs in Athena

Amazon S3 table buckets are a bucket type in Amazon S3 that is purpose-built to store tabular data in Apache Iceberg tables. Table buckets automate table management tasks such as compaction, snapshot management, and garbage collection to continuously optimize query performance and minimize cost. Whether you're just starting out, or have thousands of tables in your Iceberg environment, table buckets simplify data lakes at any scale. For more information, see Table buckets.

Considerations and limitations

  • DDL operations like CREATE TABLE, CREATE TABLE AS SELECT, CREATE VIEW etc., are not supported.

  • Supports read and write operations like SELECT, INSERT, UPDATE, DELETE, and MERGE.

  • Athena does not support CREATE TABLE on S3 table buckets, so initial setup must be performed using another engine, such as Spark on EMR, or using the S3 Tables API.

  • Reuse of query result is not supported.

  • If you encounter the error Invalid choice: 's3tables' when you use the CLI, make sure to upgrade to the latest Amazon CLI version.

Setting up before you query S3 table bucket from Athena

Complete these prerequisite steps before you query S3 table bucket from Athena
  1. Create an S3 table bucket. For more information, see Creating a table bucket in Amazon Simple Storage Service User Guide.

  2. Create a table namespace. For more information, see Create a namespace in Amazon Simple Storage Service User Guide.

  3. Create an S3 table by following steps in Creating an Amazon S3 table.

    If you use CLI to create the table, then you need to upload Iceberg metadata file of the table to warehouse location separately and update the metadata location using update-table-metadata-location to specify the table schema. If you use a query engine to create the table, you do not need this additional step and optionally, you can populate data into the table with INSERT query as show in Create a table and upload data tutorial. Following are the steps to create the table using CLI.

    1. Get the warehouse location and version token as shown in the following example command.

      aws s3tables get-table-metadata-location \ --region <region e.g. us-east-1> \ --table-bucket-arn arn:aws:s3tables:<region e.g. us-east-1>:<account ID>:bucket/amzn-s3-demo-bucket\ --namespace <S3 table namespace e.g. test_namespace> \ --name <S3 table name e.g. test_table>
    2. Create a temporary Iceberg table with Athena by using the following example query. Athena will store the metadata file in the warehouse location.

      CREATE TABLE default.temp_table (id bigint, data string, category string) PARTITIONED BY (category, bucket(16, id)) LOCATION '<warehouse location e.g. s3://<uid>--table-s3>' TBLPROPERTIES ( 'table_type' = 'ICEBERG' )
    3. Get the metadata file location as shown in the following example command. You can find it in Table.Parameters.metadata_location from the result.

      aws glue get-table \ --catalog-id <account ID> \ --database-name default \ --name temp_table
    4. Update the metadata location of S3 table with the following example command.

      aws s3tables update-table-metadata-location \ --region <region e.g. us-east-1> \ --table-bucket-arn arn:aws:s3tables:<region e.g. us-east-1>:<account ID>:bucket/amzn-s3-demo-bucket \ --namespace <S3 table namespace e.g. test_namespace> \ --name <S3 table name e.g. test_table> \ --metadata-location <Table.Parameters.metadata_location value from previous step result> \ --version-token <version token from step a>
    5. Use the following query to drop the temporary Iceberg table that you created in Athena console.

      DROP TABLE temp_table
  4. Make sure that the integration of your table buckets with Amazon Glue Data Catalog and Amazon Lake Formation is successful by following Prerequisites for integration and Integrating table buckets with Amazon analytics services in Amazon Simple Storage Service User Guide.

    Note

    If you enabled the integration while creating an S3 table bucket from the S3 console in Step 1, then you can skip this step.

  5. For the user/role that you want to use to submit query from Athena, grant Lake Formation permission on the S3 table, either through the Lake Formation console or CLI.

    Console
    1. Open the Amazon Lake Formation console at https://console.amazonaws.cn/lakeformation/ and sign in as a data lake administrator. For more information on how to create a data lake administrator, see Create a data lake administrator.

    2. In the navigation pane, choose Data permissions and then choose Grant.

    3. On the Grant Permissions page, under Principals, choose the principal that you want to use to submit query from Athena.

    4. Under LF-Tags or catalog resources, choose Named Data Catalog resources.

    5. For Catalogs, choose a glue data catalog that you created from the integration of your table bucket. For example, <accoundID>:s3tablescatalog/amzn-s3-demo-bucket.

    6. For Databases, choose the S3 table namespace that you created. Athena uses S3 table namespace as the database.

    7. For Tables, choose the S3 table that you created in S3 table bucket.

    8. For Table permissions, choose Super.

    9. Choose Grant.

    CLI
    1. Make sure that you are running Amazon CLI command as a data lake administrator. For more information, see Create a data lake administrator.

    2. Run the following command to grant Lake Formation permission on S3 table to allow the user/role to submit query from Athena.

      aws lakeformation grant-permissions \ --region <region e.g. us-east-1> \ --cli-input-json \ '{ "Principal": { "DataLakePrincipalIdentifier": "<user or role ARN e.g. arn:aws:iam::<Account ID>:role/ExampleRole>" }, "Resource": { "Table": { "CatalogId": "<Account ID>:s3tablescatalog/amzn-s3-demo-bucket", "DatabaseName": "<S3 table namespace e.g. test_namespace>", "Name": "<S3 table name e.g. test_table>" } }, "Permissions": [ "ALL" ] }'
  6. Submit a query from Athena with the above granted user/role. In this example, s3tablescatalog is the parent glue data catalog created from the integration and s3tablescatalog/amzn-s3-demo-bucket is the child glue data catalog created for each S3 table bucket. There are two ways in which you can query.

    • Specify the child glue catalog (s3tablescatalog/amzn-s3-demo-bucket) as a catalog directly. You can do this either on console or with Amazon CLI.

      Console
      • Open the Athena console at https://console.aws.amazon.com/athena/.

      • In the query editor, enter a query like SELECT * FROM "s3tablescatalog/amzn-s3-demo-bucket"."test_namespace"."test_table" LIMIT 10.

      CLI

      Run the following command.

      aws athena start-query-execution \ --query-string 'SELECT * FROM "s3tablescatalog/amzn-s3-demo-bucket"."test_namespace"."test_table" LIMIT 10' \ --work-group "primary"
    • Create Athena data catalog from the child glue data catalog in Athena console and specify it as catalog in the query. For more information, see Register S3 table bucket catalogs with the Athena console.

Register S3 table bucket catalogs with the Athena console

To register S3 table bucket catalogs with Athena console, perform the following steps.

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. In the navigation pane, choose Data sources and catalogs.

  3. On the Data sources and catalogs page, choose Create data source.

  4. For Choose a data source, choose Amazon S3 - Amazon Glue Data Catalog.

  5. In the Amazon Glue Data Catalog section, for Data source account, choose Amazon Glue Data Catalog in this account.

  6. For Create a table or register a catalog, choose Register a new Amazon Glue Catalog.

  7. In the Data source details section, for Data source name, enter the name that you want to use to specify the data source in your SQL queries or use the default name that is generated.

  8. For Catalog, choose Browse to search for a list of Amazon Glue catalogs in the same account. If you do not see any existing catalogs, create one in Amazon Glue console.

  9. In the Browse Amazon Glue catalogs dialog box, select the catalog that you want to use, and then choose Choose.

  10. (Optional) For Tags, enter any key/value pairs that you want to associate with the data source.

  11. Choose Next.

  12. On the Review and create page, verify that the information that you entered is correct, and then choose Create data source.