Synchronize Delta Lake metadata
Athena synchronizes table metadata, including schema, partition columns, and table properties, to Amazon Glue if you use Athena to create your Delta Lake table. As time passes, this metadata can lose its synchronization with the underlying table metadata in the transaction log. To keep your table up to date, you can choose one of the following options:
-
Use the Amazon Glue crawler for Delta Lake tables. For more information, see Introducing native Delta Lake table support with Amazon Glue crawlers
in the Amazon Big Data Blog and Scheduling an Amazon Glue crawler in the Amazon Glue Developer Guide. -
Drop and recreate the table in Athena.
-
Use the SDK, CLI, or Amazon Glue console to manually update the schema in Amazon Glue.
Note that the following features require your Amazon Glue schema to always have the same schema as the transaction log:
-
Lake Formation
-
Views
-
Row and column filters
If your workflow does not require any of this functionality, and you prefer not to
maintain this compatibility, you can use CREATE TABLE
DDL in Athena and then
add the Amazon S3 path as a SerDe parameter in Amazon Glue.
You can use the following procedure to create a Delta Lake table with the Athena and Amazon Glue consoles.
To create a Delta Lake table using the Athena and Amazon Glue consoles
Open the Athena console at https://console.amazonaws.cn/athena/
. -
In the Athena query editor, use the following DDL to create your Delta Lake table. Note that when using this method, the value for
TBLPROPERTIES
must be'spark.sql.sources.provider' = 'delta'
and not'table_type' = 'delta'
.Note that this same schema (with a single of column named
col
of typearray<string>
) is inserted when you use Apache Spark (Athena for Apache Spark) or most other engines to create your table.CREATE EXTERNAL TABLE [db_name.]table_name(col array<string>) LOCATION 's3://amzn-s3-demo-bucket/
your-folder
/' TBLPROPERTIES ('spark.sql.sources.provider' = 'delta') Open the Amazon Glue console at https://console.amazonaws.cn/glue/
. -
In the navigation pane, choose Data Catalog, Tables.
-
In the list of tables, choose the link for your table.
-
On the page for the table, choose Actions, Edit table.
-
In the Serde parameters section, add the key
path
with the values3://amzn-s3-demo-bucket/
.your-folder
/ -
Choose Save.
To create a Delta Lake table using the Amazon CLI, enter a command like the following.
aws glue create-table --database-name dbname \ --table-input '{"Name" : "tablename", "StorageDescriptor":{ "Columns" : [ { "Name": "col", "Type": "array<string>" } ], "Location" : "s3://
amzn-s3-demo-bucket
/<prefix>
/", "SerdeInfo" : { "Parameters" : { "serialization.format" : "1", "path" : "s3://amzn-s3-demo-bucket
/<prefix>
/" } } }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "EXTERNAL": "TRUE", "spark.sql.sources.provider": "delta" } }'