Enabling orphan file deletion - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Enabling orphan file deletion

You can use Amazon Glue console, Amazon CLI, or Amazon API to enable orphan file deletion for your Apache Iceberg tables in the Data Catalog. For new tables, you can choose Apache Iceberg as table format and enable orphan file deletion optimizer when you create the table. Snapshot retention is disabled by default for new tables.

Console
To enable orphan file deletion
  1. Open the Amazon Glue console at https://console.amazonaws.cn/glue/ and sign in as a data lake administrator, the table creator, or a user who has been granted the glue:UpdateTable and lakeformation:GetDataAccess permissions on the table.

  2. In the navigation pane, under Data Catalog, choose Tables.

  3. On the Tables page, choose an Iceberg table in that you want to enable orphan file deletion.

    Choose the Table optimization tab on the lower section of the page, and choose Enable orphan file deletion.

    You can also choose Enable under Optimization from the Actions menu.

  4. On the Enable optimization page, choose Orphan file deletion under Optimization options.

  5. Under Orphan file deletion configuration, enter the number of days to retain the files before deletion.

  6. Next, choose an IAM role with the required permissions to delete orphan files.

  7. Choose Enable optimization.

Amazon CLI

To enable orphan file deletion for an Iceberg table in Amazon Glue, you need to create a table optimizer of type orphan_file_deletion and set the enabled field to true. To create an orphan file deletion optimizer for an Iceberg table using the Amazon CLI, you can use the following command:

aws glue create-table-optimizer \ --catalog-id 123456789012 \ --database-name iceberg_db \ --table-name iceberg_table \ --table-optimizer-configuration '{"roleArn":"arn:aws:iam::123456789012:role/optimizer_role","enabled":true,"orphanFileDeletionConfiguration":{"icebergConfiguration":{"orphanFileRetentionPeriodInDays":3, "location":'S3 location'}}}'\ --type orphan_file_deletion \ --region Amazon Web Services Region

This command creates an orphan file deletion optimizer for the specified Iceberg table. The key parameters are:

  • roleArn – the ARN of the IAM role with permissions to access the S3 bucket and Glue resources.

  • enabled – Set to true to enable the optimizer.

  • orphanFileRetentionPeriodInDay – The number of days to retain orphan files before deleting them (minimum 1 day).

  • type – Set to orphan_file_deletion to create an orphan file deletion optimizer.

After creating the table optimizer, it will run orphan file deletion periodically (once per day if left enabled). You can check the runs using the list-table-optimizer-runs API. The orphan file deletion job will identify and delete files that are not tracked in the Iceberg metadata for the table.

API

Call CreateTableOptimizer operation to create the orphan file deletion optimizer for a specific table.