Integrating Amazon S3 Tables with Amazon analytics services - Amazon Simple Storage Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Integrating Amazon S3 Tables with Amazon analytics services

This topic covers the prerequisites and procedures needed to integrate your Amazon S3 table buckets with Amazon analytics services. For an overview of how the integration works, see S3 Tables integration overview.

Note

This integration uses the Amazon Glue and Amazon Lake Formation services and might incur Amazon Glue request and storage costs. For more information, see Amazon Glue Pricing.

Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine that you're using.

Prerequisites for integration

The following prerequisites are required to integrate table buckets with Amazon analytics services:

Important

When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by Amazon Lake Formation or the Amazon Glue Data Catalog. In this case, your table won't be visible to Amazon analytics services such as Amazon Athena, even if your table buckets are integrated with Amazon analytics services.

If your table definition contains capital letters, you receive the following error message when running a SELECT query in Athena: "GENERIC_INTERNAL_ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names."

Integrating table buckets with Amazon analytics services

This integration must be done once per Amazon Region.

Important

The Amazon analytics services integration now uses the WithPrivilegedAccess option in the registerResource Lake Formation API operation to register S3 table buckets. The integration also now creates the s3tablescatalog catalog in the Amazon Glue Data Catalog by using the AllowFullTableExternalDataAccess option in the CreateCatalog Amazon Glue API operation.

If you set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating. To migrate to the updated integration, see Migrating to the updated integration process.

  1. Open the Amazon S3 console at https://console.amazonaws.cn/s3/.

  2. In the left navigation pane, choose Table buckets.

  3. Choose Create table bucket.

    The Create table bucket page opens.

  4. Enter a Table bucket name and make sure that the Enable integration checkbox is selected.

  5. Choose Create table bucket. Amazon S3 will attempt to automatically integrate your table buckets in that Region.

The first time that you integrate table buckets in any Region, Amazon S3 creates a new IAM service role on your behalf. This role allows Lake Formation to access all table buckets in your account and federate access to your tables in Amazon Glue Data Catalog.

To integrate table buckets using the Amazon CLI

The following steps show how to use the Amazon CLI to integrate table buckets. To use these steps, replace the user input placeholders with your own information.

  1. Create a table bucket.

    aws s3tables create-table-bucket \ --region us-east-1 \ --name amzn-s3-demo-table-bucket
  2. Create an IAM service role that allows Lake Formation to access your table resources.

    1. Create a file called Role-Trust-Policy.json that contains the following trust policy:

      JSON

      Create the IAM service role by using the following command:

      aws iam create-role \ --role-name S3TablesRoleForLakeFormation \ --assume-role-policy-document file://Role-Trust-Policy.json
    2. Create a file called LF-GluePolicy.json that contains the following policy:

      JSON
      { "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationPermissionsForS3ListTableBucket", "Effect": "Allow", "Action": [ "s3tables:ListTableBuckets" ], "Resource": [ "*" ] }, { "Sid": "LakeFormationDataAccessPermissionsForS3TableBucket", "Effect": "Allow", "Action": [ "s3tables:CreateTableBucket", "s3tables:GetTableBucket", "s3tables:CreateNamespace", "s3tables:GetNamespace", "s3tables:ListNamespaces", "s3tables:DeleteNamespace", "s3tables:DeleteTableBucket", "s3tables:CreateTable", "s3tables:DeleteTable", "s3tables:GetTable", "s3tables:ListTables", "s3tables:RenameTable", "s3tables:UpdateTableMetadataLocation", "s3tables:GetTableMetadataLocation", "s3tables:GetTableData", "s3tables:PutTableData" ], "Resource": [ "arn:aws-cn:s3tables:us-east-1:111122223333:bucket/*" ] } ] }

      Attach the policy to the role by using the following command:

      aws iam put-role-policy \ --role-name S3TablesRoleForLakeFormation \ --policy-name LakeFormationDataAccessPermissionsForS3TableBucket \ --policy-document file://LF-GluePolicy.json
  3. Create a file called input.json that contains the following:

    { "ResourceArn": "arn:aws-cn:s3tables:us-east-1:111122223333:bucket/*", "WithFederation": true, "RoleArn": "arn:aws-cn:iam::111122223333:role/S3TablesRoleForLakeFormation" }

    Register table buckets with Lake Formation by using the following command:

    aws lakeformation register-resource \ --region us-east-1 \ --with-privileged-access \ --cli-input-json file://input.json
  4. Create a file called catalog.json that contains the following catalog:

    { "Name": "s3tablescatalog", "CatalogInput": { "FederatedCatalog": { "Identifier": "arn:aws-cn:s3tables:us-east-1:111122223333:bucket/*", "ConnectionName": "aws:s3tables" }, "CreateDatabaseDefaultPermissions":[], "CreateTableDefaultPermissions":[], "AllowFullTableExternalDataAccess": "True" } }

    Create the s3tablescatalog catalog by using the following command. Creating this catalog populates the Amazon Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.

    aws glue create-catalog \ --region us-east-1 \ --cli-input-json file://catalog.json
  5. Verify that the s3tablescatalog catalog was added in Amazon Glue by using the following command:

    aws glue get-catalog --catalog-id s3tablescatalog

The Amazon analytics services integration process has been updated. If you've set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating by using the following steps. For more information about the migration or integration process, see Creating an Amazon S3 Tables catalog in the Amazon Glue Data Catalog in the Amazon Lake Formation Developer Guide.

  1. Open the Amazon Lake Formation console at https://console.amazonaws.cn/lakeformation/, and sign in as a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator in the Amazon Lake Formation Developer Guide.

  2. Delete your s3tablescatalog catalog by doing the following:

    • In the left navigation pane, choose Catalogs.

    • Select the option button next to the s3tablescatalog catalog in the Catalogs list. On the Actions menu, choose Delete.

  3. Deregister the data location for the s3tablescatalog catalog by doing the following:

    • In the left navigation pane, go to the Administration section, and choose Data lake locations.

    • Select the option button next to the s3tablescatalog data lake location, for example, s3://tables:region:account-id:bucket/*.

    • On the Actions menu, choose Remove.

    • In the confirmation dialog box that appears, choose Remove.

  4. Now that you've deleted your s3tablescatalog catalog and data lake location, you can follow the steps to integrate your table buckets with Amazon analytics services by using the updated integration process.

Note

If you want to work with SSE-KMS encrypted tables in integrated Amazon analytics services, the role you use needs to have permission to use your Amazon KMS key for encryption operations. For more information, see Granting IAM principals permissions to work with encrypted tables in integrated Amazon analytics services.

After you integrate your IAM principal is granted Lake Formation permissions to access your tables, if you want to allow other IAM principals to access tables, you need to grant Lake Formation permissions on your tables to those principals. For more information, see Managing access to a table or database with Lake Formation.