Tutorial: Getting started with S3 Tables
In this tutorial, you create a table bucket and integrate table buckets in your Region with Amazon analytics services. Next, you will use the Amazon CLI to create your first namespace and table in your table bucket. Then, you use Amazon Lake Formation to grant permission on your table, so you can begin querying your table with Athena.
Tip
If you're migrating tabular data from general purpose buckets to table buckets, the
Amazon Solutions Library has a guided solution to assist you. This solution automates
moving Apache Iceberg and Apache Hive tables that are
registered in Amazon Glue Data Catalog and stored in general purpose buckets to table buckets by
using Amazon Step Functions and Amazon EMR with Apache Spark. For more information, see
Guidance for Migrating Tabular Data from Amazon S3 to S3 Tables
Topics
Step 1: Create a table bucket and integrate it with Amazon analytics services
In this step, you use the Amazon S3 console to create your first table bucket. For other ways to create a table bucket, see Creating a table bucket.
Note
By default, the Amazon S3 console automatically integrates your table buckets with Amazon SageMaker Lakehouse, which allows Amazon analytics services to automatically discover and access your S3 Tables data. If you create your first table bucket programmatically by using the Amazon Command Line Interface (Amazon CLI), Amazon SDKs, or REST API, you must manually complete the Amazon analytics services integration. For more information, see Using Amazon S3 Tables with Amazon analytics services.
Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/
. In the navigation bar on the top of the page, choose the name of the currently displayed Amazon Web Services Region. Next, choose the Region in which you want to create the table bucket.
In the left navigation pane, choose Table buckets.
Choose Create table bucket.
Under General configuration, enter a name for your table bucket.
The table bucket name must:
Be unique within for your Amazon Web Services account in the current Region.
Be between 3 and 63 characters long.
Consist only of lowercase letters, numbers, and hyphens (
-
).Begin and end with a letter or number.
After you create the table bucket, you can't change its name. The Amazon Web Services account that creates the table bucket owns it. For more information about naming table buckets, see Table bucket naming rules.
-
In the Integration with Amazon analytics services section, make sure that the Enable integration checkbox is selected.
If Enable integration is selected when you create your first table bucket by using the console, Amazon S3 attempts to integrate your table bucket with Amazon analytics services. This integration allows you to use Amazon analytics services to access all tables in the current Region. For more information, see Using Amazon S3 Tables with Amazon analytics services.
Choose Create bucket.
Step 2: Create a table namespace and a table
For this step, you create a namespace in your table bucket, and then create a new table under that namespace. You can create a table namespace and a table by using either the console or the Amazon CLI.
Important
When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by Amazon Lake Formation or the Amazon Glue Data Catalog. In this case, your table won't be visible to Amazon analytics services such as Amazon Athena, even if your table buckets are integrated with Amazon analytics services.
If your table definition contains capital letters, you receive the following error message when
running a SELECT
query in Athena: "GENERIC_INTERNAL_ERROR: Get table request
failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource -
Invalid table or column names."
The following procedure uses the Amazon S3 console to create a namespace and a table with Amazon Athena.
To create a table namespace and a table
Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/
. -
In the left navigation pane, choose Table buckets.
-
On the Table buckets page, choose the table bucket that you want to create a table in.
-
On the table bucket details page, choose Create table with Athena.
-
In the Create table with Athena dialog box, choose Create a namespace, and then enter a name in the Namespace name field. Namespace names must be 1 to 255 characters and unique within the table bucket. Valid characters are a–z, 0–9, and underscores (
_
). Underscores aren't allowed at the start of namespace names. -
Choose Create namespace.
-
Choose Create table with Athena.
-
The Amazon Athena console opens and the Athena query editor appears. The query editor is populated with a sample query that you can use to create a table. Modify the query to specify the table name and columns that you want your table to have.
-
When you're finished modifying the query, choose Run to create your table.
If your table creation was successful, the name of your new table appears in the list of tables in Athena. When you navigate back to the Amazon S3 console, your new table appears in the Tables list on the details page for your table bucket after you refresh the list.
To use the following Amazon CLI example commands to create a namespace in your table bucket, and then
create a new table with a schema under that namespace, replace the
values with your own.user input
placeholder
Prerequisites
-
Attach the
AmazonS3TablesFullAccess
policy to your IAM identity. -
Install Amazon CLI version 2.23.10 or higher. For more information, see Installing or updating the latest version of the Amazon CLI in the Amazon Command Line Interface User Guide.
Create a new namespace in your table bucket by running the following command:
aws s3tables create-namespace \ --table-bucket-arn arn:aws-cn:s3tables:
us-east-1
:111122223333
:bucket/amzn-s3-demo-table-bucket
\ --namespacemy_namespace
Confirm that your namespace was created successfully by running the following command:
aws s3tables list-namespaces \ --table-bucket-arn arn:aws-cn:s3tables:
us-east-1
:111122223333
:bucket/amzn-s3-demo-table-bucket
Create a new table with a table schema by running the following command:
aws s3tables create-table --cli-input-json file://
mytabledefinition.json
For the
mytabledefinition.json
file, use the following example table definition:{ "tableBucketARN": "arn:aws-cn:s3tables:
us-east-1
:111122223333
:bucket/amzn-s3-demo-table-bucket
", "namespace": "my_namespace
", "name": "my_table
", "format": "ICEBERG", "metadata": { "iceberg": { "schema": { "fields": [{"name": "id", "type": "int","required": true}, {"name": "name", "type": "string"}, {"name": "value", "type": "int"}
] } } } }
(Optional) Step 3: Grant Lake Formation permissions on your table
For this step, you grant Lake Formation permissions on your new table to other IAM principals. These permissions allow principals other than you to access table bucket resources by using Athena and other Amazon analytics services. For more information, see Granting permission on a table or database. If you're the only user who will access your tables, you can skip this step.
-
Open the Amazon Lake Formation console at https://console.amazonaws.cn/lakeformation/
, and sign in as a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator. In the navigation pane, choose Data permissions and then choose Grant.
On the Grant Permissions page, under Principals, choose IAM users and roles and choose the IAM user or role that you want to allow to run queries on your table.
Under LF-Tags or catalog resources, choose Named Data Catalog resources.
-
Do one of the following, depending on whether you want to grant access to all of the tables in your account or whether you want to grant access to only the resources within the table bucket that you created:
-
For Catalogs, choose the account-level catalog that you created when you integrated your table bucket. For example,
.111122223333
:s3tablescatalog -
For Catalogs, choose the subcatalog for your table bucket. For example,
.111122223333
:s3tablescatalog/amzn-s3-demo-table-bucket
-
-
(Optional) If you chose the subcatalog for your table bucket, do one or both of the following:
For Databases, choose the table bucket namespace that you created.
For Tables, choose the table that you created in your table bucket, or choose All tables.
-
Depending on whether you chose a catalog or subcatalog and depending on whether you then chose a database or a table, you can set permissions at the catalog, database, or table level. For more information about Lake Formation permissions, see Managing Lake Formation permissions in the Amazon Lake Formation Developer Guide.
Do one of the following:
-
For Catalog permissions, choose Super to grant the other principal all permissions on your catalog, or choose more fine-grained permissions, such as Describe.
-
For Database permissions, you can't choose Super to grant the other principal all permissions on your database. Instead, choose more fine-grained permissions, such as Describe.
-
For Table permissions, choose Super to grant the other principal all permissions on your table, or choose more fine-grained permissions, such as Select or Describe.
Note
When you grant Lake Formation permissions on a Data Catalog resource to an external account or directly to an IAM principal in another account, Lake Formation uses the Amazon Resource Access Manager (Amazon RAM) service to share the resource. If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, Amazon RAM sends an invitation to the grantee account to accept or reject the resource grant. Then, to make the shared resource available, the data lake administrator in the grantee account must use the Amazon RAM console or Amazon CLI to accept the invitation. For more information about cross-account data sharing, see Cross-account data sharing in Lake Formation in the Amazon Lake Formation Developer Guide.
-
-
Choose Grant.
Step 4: Query data with SQL in Athena
You can query your table with SQL in Athena. Athena supports Data Definition Language (DDL), Data Manipulation Language (DML), and Data Query Language (DQL) queries for S3 Tables.
You can access the Athena query either from the Amazon S3 console or through the Amazon Athena console.
The following procedure uses the Amazon S3 console to access the Athena query editor so that you can query a table with Amazon Athena.
To query a table
Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/
. -
In the left navigation pane, choose Table buckets.
-
On the Table buckets page, choose the table bucket that contains the table that you want to query.
-
On the table bucket details page, choose the option button next to the name of the table that you want to query.
-
Choose Query table with Athena.
-
The Amazon Athena console opens and the Athena query editor appears with a sample
SELECT
query loaded for you. Modify this query as needed for your use case. -
To run the query, choose Run.
To query a table
Open the Athena console at https://console.amazonaws.cn/athena/
. -
Query your table. The following is a sample query that you can modify. Make sure to replace the
with your own information.user input placeholders
SELECT * FROM "s3tablescatalog/
amzn-s3-demo-table-bucket
"."my_namespace
"."my_table
" LIMIT 10 -
To run the query, choose Run.